# Time Series Analysis in the Real World:  Climate Change

You are a data scientist interested in climate change.  You obtain a dataset containing the average monthly temperatures over 10 years (1981-1990) in the city Melbourne, Australia.  You would like to build a predictive model to forecast Melbourne's temperature based on the observed data.  This data is stored as a CSV file on the Math@Work server.

In [1]:
import pandas as pd
temperatures = pd.read_csv('https://www.mathatwork.org/DATA/melbourne-temps-monthly.csv')

**1]** Examine the characteristics of your dataset.

**2]** Examine the summary statistics of your dataset.

**3]** Are there any outliers in your dataset?  Is the data approximately normal or is it skewed?  Explain and support your explanation with analyses.

Working on the assumption that the data has been preprocessed to include no missing values or bad data entries, conclude that the data is of good quality for analyses.

**4]** Plot a line chart of the data.  Explain in the cell below your graph whether there is clear evidence of trend and/or seasonality.

**5]** Chart the rolling statistics plots.  In the cell below your graph, explain by looking at the rolling statistics whether there is clear evidence of trend and/or seasonality.

**6]** Use **seasonal_decompose( )** to decompose your time series data.  Plot the decomposition and explain in the cell below your graph whether there is clear evidence of trend and/or seasonality.

**7]** Use a Dickey-Fuller test to determine whether your *temperatures* dataset is stationary.  Explain below your analysis using 85% confidence.

**8]** Apply first order differencing to the *temperatures* data.  

**9]** Apply a Dickey-Fuller test to recheck stationarity.  Explain in the cell below your analysis your interpretation of the p-value at a 95% confidence level. 

**10]** Assuming that *temperatures_first_diff* was confirmed stationary, use Python's **auto.arima( )** to find the best ARIMA model to make predictions regarding Melbourne's average monthly temperatures.  Pass in *seasonal=True* if there was evidence of seasonality present.

**11]** Split the data into testing and training subsets.  Use 1/1/89 to 12/1/90 as your test data and train on the rest.

**12]** Use the training data to train your model.

**13]** Use the testing data to evaluate your model.  In the cell below your analysis, explain how well your model predicts.

**14]** Convert your predictions to a Pandas DataFrame.  Then use Pandas **.sort_index( )** to reorder the DataFrame by Month.  Print your predictions to the screen.

**15]** Plot the predictions vs the actual temperatures data values to help you see how good your predictions are.  Remember to use Pandas **.sort( )** to reorder the test DataFrame by Month before plotting.

**16]** Does the plot verify what R Squared indicated regarding how well future data is likely to be predicted by the model? Explain.

**17]** Assuming your R Squared is satisfactory and therefore indicates good model performance, use the ARIMA model to make predictions about future Melbourne average monthly temperatures.  Refit the model to the entire *temperatures* time series dataset.  Print the refitted model summary.

**18]** Make predictions 24 months into the future.

**19]** Run the following code to create the data months for which you are projecting and then convert your projections into a DataFrame with indices equal to the future date months.  The code assumes you named your time series DataFrame *temperatures* and your projections DataFrame *projections*.

In [None]:
from dateutil.relativedelta import *

new_date = temperatures.index[-1]
new_index = []

for y in range(len(test)):
    new_date = new_date + relativedelta(months=+1)
    new_index.append(new_date)
    
projections = pd.DataFrame(projections, index=new_index, columns=['Projections'])
print(projections)

**20]** Plot your *temperatures* and *projections* data.

**21]** Do your projections indicate a trend in Melbourne's average montly temperatures? Explain.