# Introduction

I spent sometime researching some interesting datasets for the exploration of timseries modelling and analysis. In this notebook, I will walk you through these datasets.

In [1]:
%load_ext autoreload
%autoreload 2

In [15]:
import pandas as pd

from ts import data
from ts.viz import *

# Global Temperatures Dataset

**Source**: [Github](https://github.com/datasets/global-temp)

**Description**: Global Temperature Time Series (monthly and annual)

**Ideas**: Simple dataste, useful to show seanoality (Annual temp fluctuations) vs trend (Global Warming).

In [36]:
df = data.load_globaltemp(frmt='pandas', mode='monthly')

In [37]:
df.head()

Unnamed: 0,Source,Date,Mean
0,GCAG,2016-12,0.7895
1,GISTEMP,2016-12,0.81
2,GCAG,2016-11,0.7504
3,GISTEMP,2016-11,0.93
4,GCAG,2016-10,0.7292


# Trump Tweets

**Source**: [Kaggle](https://www.kaggle.com/austinreese/trump-tweets)

**Description**: Tweets by Donald Trump from May 2009 to June 2020

**Ideas**: Could be useful to explore any cyclical patterns (maybe using DFT), or simply visualize to see how the tweeting frequency changed overtime. My hypothesis that this is non-stationary timeseries, as the tweeting pattern of Donald Trump were likely different before and after the election.

In [38]:
df = data.load_trumptweets()
print(df.date.min(), df.date.max())
df.info()

2009-05-04 13:54:25 2020-06-17 21:28:52
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43352 entries, 0 to 43351
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         43352 non-null  int64 
 1   link       43352 non-null  object
 2   content    43352 non-null  object
 3   date       43352 non-null  object
 4   retweets   43352 non-null  int64 
 5   favorites  43352 non-null  int64 
 6   mentions   20386 non-null  object
 7   hashtags   5583 non-null   object
dtypes: int64(3), object(5)
memory usage: 2.6+ MB


# MotionSense Dataset: Sensor Based Human Activity and Attribute Recognition

**Source**: [Kaggle](https://www.kaggle.com/malekzadeh/motionsense-dataset)

**Description**: This dataset includes time-series data generated by accelerometer and gyroscope sensors (attitude, gravity, userAcceleration, and rotationRate). It is collected with an iPhone 6s kept in the participant's front pocket using SensingKit which collects information from Core Motion framework on iOS devices. A total of 24 participants in a range of gender, age, weight, and height performed 6 activities in 15 trials in the same environment and conditions: downstairs, upstairs, walking, jogging, sitting, and standing. With this dataset, we aim to look for personal attributes fingerprints in time-series of sensor data, i.e. attribute-specific patterns that can be used to infer gender or personality of the data subjects in addition to their activities.

**Ideas**: Could be useful to explore for experimenting with machine learning models

In [39]:
help(data.load_motionsense)

Help on function load_motionsense in module ts.data:

load_motionsense(frmt: str = 'pandas', device: str = None, subject: int = None, subjects_info: bool = False)
    Loads subsets of the Motionsense dataset. You can either load the subject information datset, which
    provides metadata about the subjects, or the data for a particular device and subject pair. Here is a
    list of devices:
    
    - dws_1
    - dws_2
    - dws_11
    - jog_9
    - jog_16
    _ sit_5
    _ sit_13
    - std_6
    - std_14
    - ups_3
    - ups_4
    - ups_12
    _ wlk_7
    - wlk_8
    - wlk_15
    
    There are 24 subject.
    
    Arguments:
    - frmt: returned data structure (pandas or spark dataframes)
    - device: device name from the above list
    - subject: subject id (1 to 24) 
    - subjects_info: Boolean, if true, it will load subjects metadata and ignore the device and subject arguments
    
    Return:
    - Path, Pandas dataframe or Spark dataframe



Load data for a specific device and subject

In [40]:
df = data.load_motionsense(device='wlk_7', subject=2, frmt='pandas')
df.head(2)

Unnamed: 0.1,Unnamed: 0,attitude.roll,attitude.pitch,attitude.yaw,gravity.x,gravity.y,gravity.z,rotationRate.x,rotationRate.y,rotationRate.z,userAcceleration.x,userAcceleration.y,userAcceleration.z
0,0,1.30653,-1.118072,0.739332,0.422231,0.899259,-0.114253,-1.752874,2.553555,0.768259,0.660883,0.203051,-0.19257
1,1,1.423767,-1.11688,0.839693,0.433757,0.898737,-0.064239,-2.256292,0.72374,0.323775,0.130238,-0.259348,-0.106828


Load meta data about subjects

In [41]:
df = data.load_motionsense(subjects_info=True)
df.head(2)

Unnamed: 0,code,weight,height,age,gender
0,1,102,188,46,1
1,2,72,180,28,1
