## Time Series and Forecasting
This section will walk through simple time series analysis and forecasting. It uses Pandas, NumPy, and a forecasting library developed by Facebook called Prophet.

* Forecasting is about predicting the future based on data from the past (time-series)
  - Traditional forecasting methods use classifical statistical methods: linear regression analysis, logistic regression analysis, clustering, factor analysis, and time-series.

* Forecasting is important in many contexts
  - Organizational operations: allows for efficient allocation of scarce resources and for setting goals based on evidence
  - Policy: understanding macro trends in environment and climate

* Difficulties in forecasting:
  - Comletely automatic forecasting techniques are brittle and inflexible
  - Analysts who can produce high quality forecasts are rare because forecasting is a specialized skill, as much art as science

* Libraries such as Prophet help with many of the "artsy" components of forecasting
  - Helping to determine how much data to incorporate. Prophet works best with hourly, daily, or weekly observations with at least a few months (preferably more than year) of history
  - It includes multuiple "human-scale" seasonalities such as day of week and time of year to help with variations in trends across weekends and other complexities.
  - It can help track holidays that occur at irregular intervals that are known in advance (such as the Super Bowl)
  - It provides ways to gauge historical trend changes, such as due to a product change or a modification in operational data collection (such as how logs are accumulated)

* Sophisticated time series forecasting combines multiple types of analysis

#### Methods for Producing Forecasts (via Prophet)
While there are many methods to create forecasts, we will focus on the mechanism used by Prophet: [additive regression models](https://en.wikipedia.org/wiki/Additive_model). 

* _An additive model is a class of nonparametric regression._
  - _In non-parametric regression, predictors are constructed according to information derived from the data._
  - _Nonparametric regression require larger sample sizes than traditional regression methods as the data must supply the model structure as well as model estimates._

* Prophet forecasta are composed of:
  - Piecewise linear or logistic growth curve trend. Prophet automatically determines this by selecting changepoints from the data.
  - Yearly season component modeled using Fourier series.
  - Weekly seasonal component which uses dummy encoding of variables. _Dummy encoded variables are true/false (binary) encodings of categorical information. It usually specifies whether an observation is a member of a specific category: for example, whether a patient has a given disease; or if a patient was exposed to a particular type of drug. This type of encoding can be given to a classifier such as a regressor without implying directionality._
  - Prophet can also take into account holidays


### Import Dependencies

In [None]:
# Import Python Dependencies
from fbprophet import Prophet     # Prophet is a forecasting library developed by Facebook
import numpy as np
import pandas as pd

%matplotlib inline

### Resample and Prepare Data

In [None]:
%%time

# Resampling data from minute interval to day
bit_df = pd.read_csv('../input/coinbase/coinbaseUSD_1-min_data_2014-12-01_to_2018-01-08.csv',
  low_memory=False, error_bad_lines=False)
bit_df['Timestamp'] = bit_df.Timestamp.astype('int', errors='ignore')


# Convert unix time to datetime
bit_df['date'] = pd.to_datetime(bit_df.Timestamp, unit='s', errors='coerce')

# Reset index
bit_df = bit_df.set_index('date')

# Ensure that all data has been coerced
bit_df['Open'] = pd.to_numeric(bit_df['Open'], errors='coerce')
bit_df['Close'] = pd.to_numeric(bit_df['Close'], errors='coerce')
bit_df['High'] = pd.to_numeric(bit_df['High'], errors='coerce')
bit_df['Low'] = pd.to_numeric(bit_df['Low'], errors='coerce')

# Rename columns so easier to code
bit_df = bit_df.rename(columns={'Open':'open', 'High': 'hi', 'Low': 'lo',
  'Close': 'close', 'Volume_(BTC)': 'vol_btc',
  'Volume_(Currency)': 'vol_cur',
  'Weighted_Price': 'wp', 'Timestamp': 'ts'})

# Resample and only use recent samples that aren't missing
bit_df = bit_df.resample('d').agg({'open': 'mean', 'hi': 'mean',
  'lo': 'mean', 'close': 'mean', 'vol_btc': 'sum',
  'vol_cur': 'sum', 'wp': 'mean', 'ts': 'min'}).iloc[-1000:]

# drop last row as it is not complete
bit_df = bit_df.iloc[:-1]

In [None]:
# needs ds and y columns
ts = (bit_df
    .reset_index()
    .rename(columns={'date': 'ds', 'close': 'y'})
[['ds', 'y']]
)

View date types:

In [None]:
ts.dtypes

Set index of the datset to be the date `ds` (in order to visualize the relationship):

In [None]:
# ts.set_index('ds').plot(figsize=(14,10))
ts.set_index('ds')

In [None]:
m = Prophet(daily_seasonality=True)
m.fit(ts)

Use prophet to create forecasts from the historical data:

In [None]:
# Make a future object and predict into it
future = m.make_future_dataframe(periods=24)
forecast = m.predict(future)
forecast

invert the dataframe to look at the structure of the forecast:

In [None]:
forecast.T

Plot the prediction, including the uncertainty lines:

In [None]:
# plot the prediction, include the uncertainty lines
ax = m.plot(forecast, uncertainty=True)

Inspect the component trends:

* how data trends summarized over an adjusted period

In [None]:
# look at the trend, yearly, weekly and daily componentsb
ax = m.plot_components(forecast)

#### Exercise: Snow Data
This exercise looks at forecasting using snow data.

* Use prophet to predict 100 days in the future of Snow Depth (SNWD)
* What month has the most snow


#### Try Using Log of Data
Predictions may work better if we tweak the data. In this case let's try taking the log of the bitcoin price. _Most predictive and analytic techniques work best when they can find a meaningful separation. For some types of data, this can be difficult because of how spread out or compact it is. Applying a transform, such as taking the `log` gives the model a better chance of finding a meaningful pattern._

In [None]:
ts2 = ts.assign(y=lambda x: np.log(x.y))
# ts2.set_index('ds').plot()
ts2.set_index('ds')

In [None]:
m2 = Prophet() #dont need daily_seasonality=True)
m2.fit(ts2)
future2 = m2.make_future_dataframe(periods=24)
forecast2 = m2.predict(future2)

In [None]:
# plot the prediction, include the uncertainty lines
ax = m2.plot(forecast2, uncertainty=True)

In [None]:
ax = m2.plot_components(forecast2)

#### Exercise: Log of Time Series
This exercise looks at forecasting using snow data.

* Run the snow calculation using the log of the snow depth. Does it track better? _Hint: might need to add 1 before logging_