# Time Series Analysis Mini Project:  Let's explore my steps, and if we are feeling froggy maybe use FB prophet to model...

In [None]:
import pandas as pd
import numpy as np
import datetime

import matplotlib.pyplot as plt
import seaborn as sns

# Acquire!

### Imported steps from my personal Iphone pedometer. 

#### Steps to repeat:
1. Export from Iphone pedometer in settings
2. Save to file
3. Upload using pandas!

In [None]:
df = pd.read_csv("Export.csv")

In [None]:
df.head(2)

In [None]:
df.info()

# Prepare!!

### Initial thoughts:
- This is a really simple data frame
- Only 3 columns to explore in additon to datetime

### Action Items:
- Lets rename some columns in a pythonic manner (h/t Ryan Orsinger)
- Lets set the index as datetime so we can do some time series analysis.

### Personal Notes on Time-Series Analysis
- For the past couple of projects I have been blowing off TSA by moving the target variable from the next time-series row into the current row.  For this mini-project I wanted to do "pure" TSA.
- I also have been anxious to explore FB prophet, so I will be importing and exploring here.


In [None]:
#rename columns
df = df.rename(columns={'Date':'date',
                  'Steps': 'steps',
                    'Distance': 'distance',
                  'Floors Ascended': 'floors_ascended'})

In [None]:
# # set index to datetime
df['date'] = pd.to_datetime(df.date)
df = df.sort_values('date').set_index('date')

In [None]:
df.info()

# Explore!!!

### Questions to explore:
1. What is the relationship between `steps` and `distance`
2. Is there any value to the `floors_ascended` column
3. What day of the week to I walk the most
4. Do I purposefully try to get over my steps ceiling (10,000 steps)
5. How often do I cross my steps ceiling?

In [None]:
df['validate_steps'] = df.steps/ df.distance

In [None]:
df.validate_steps.describe()

In [None]:
df.validate_steps.hist(color='red')
plt.title("Histogram of steps per mile by day")

**Takeaways**: It looks like there is enough of a variation the steps and distance are not calculated off the same data, which means that both columns have value in exploration. 

    (Unnecessary explanation: if steps and distance had little to no variation then we could assume that they are being calculated by the same measure either gps location or a gyroscopic movement within the phone.  Since they are not, we can examine the relationship between the two features, or use either as a target. 
    
**Next step**:  Prove it statistically

In [None]:
print(df[['steps', 'distance']].corr())
sns.heatmap(df[['steps', 'distance']].corr())

In [None]:
sns.heatmap(df.corr())

In [None]:
# exploring floors_ascended:
df.floors_ascended.describe()

### Is there a use for floors ascended?

In [None]:
sns.boxplot(y='floors_ascended', data=df)
plt.title('How many floors do I walk up?')

**Takeaways**:  Anecdotal information:  My house is single story.  I don't walk up stairs to go to work.  My neighboorhood is flat.  More than 50% of the days, I do not walk up a flight of stairs.   However, there might be some valuable information in what days are the non-zero days.  Worth taking a look...

In [None]:
print(f'I do not ascend a full set of stairs {round((df.floors_ascended == 0).mean(),4 )*100} percent of the days in this data set')

In [None]:
df.floors_ascended.hist(bins=23)
plt.title("Frequency of Floors Ascended")
plt.xlabel('Number of Floors')
plt.ylabel('Frequency')

### Is there a relationship between distance I walked and weather I climbed a floor?

In [None]:
# create bool column based on weather I have climbed a stair on that day
df['has_ascended'] = df.floors_ascended > 0

In [None]:
sns.boxplot(x='has_ascended', y='steps', data=df)

In [None]:
df.groupby('has_ascended').steps.describe()

In [None]:
df[df.has_ascended == False].steps.resample('M').mean().plot.line()

### Am I more likely to hit my goal on days I ascend stairs?

In [None]:
#create new feature called hit_goal for walked more than 10,000 steps
df['hit_goal'] = df.steps >= 10_000

In [None]:
untrue = round(df.groupby('has_ascended').hit_goal.mean()[0]*100, 1)
unfalse = round(df.groupby('has_ascended').hit_goal.mean()[1]*100, 1)

In [None]:
print(f'On days I ascend a flight of stairs I hit my steps goal {unfalse}% of the time,')
print(f'On days where I do not ascend a flight of stairs I hit my steps goal {untrue}% of the time')


### What days of the week do I hit my goal?

In [None]:
df.groupby('weekday').hit_goal.mean().plot.bar()

In [None]:
df

In [None]:
# How much do I walk?
df.steps.describe()

In [None]:
df.steps.hist(bins=25, color='navy')
plt.title('Histogram of distribution of steps')

**Takeaways**: There is a spike right around 10,000 steps which is my daily goal. 

In [None]:
df.distance.hist(bins=12)

### What day of the week do I walk the most steps

In [None]:
df['weekday'] = df.index.day_name()

In [None]:
df['day_of_the_week'] = df.index.dayofweek()

In [None]:
df['day_total'] = df.index.weekday()

In [None]:
df

In [None]:
df.groupby('weekday').steps.mean().plot.bar()

In [None]:
plt.figure(figsize=(12,5))
df.steps.resample("M").mean().plot()
plt.title("Average Steps by Month")

In [None]:
plt.figure(figsize=(12,5))
df.steps.resample("W").mean().plot(color="Red")
plt.title("Average Steps by Week")

### What day of the week do I walk the most?

In [None]:
df.groupby('weekday').steps.describe()