### CASE : 
- To do analysis--by utilizing Zillow's housing data and suggest the best real estate zip codes to invest in Ocala Florida today.

### Goal : 
- To Predict the best two ZIPCODES to invest to have assured returns

### Assumption:
- You are an investor with a minimum of $125,000 to deploy upfront.
- Your time horizon for investment is Minimum 3years and Max 10 years (this is not a liquid investment).
- You seek to maximize growth potential by tapping into home value appreciation in"Horse Capital of the World" OCALA
- You are aware that all the returns are subject to future market condition and the investment is going to be a calculated risk.

### Why OCALA:
- Ocala is one of only five cities (four in the US and one in France) permitted under Chamber of Commerce guidelines to use the title, "Horse Capital of the World", based on annual revenue produced by the horse industry
- In the last decades of the twentieth century, the greater Ocala area had one of the highest growth rates in the country for a city its size. 
- There are 30 elementary, ten middle and ten public high schools in Marion County

### Load the Data/Filtering for Chosen Zipcodes

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 
import datetime
import seaborn as sns
pd.set_option('display.max_columns',500)
pd.set_option('display.max_rows', 500)
pd.set_option('display.width', 1000)
import plotly.graph_objects as go

In [None]:
df=pd.read_csv("zillow_data.csv")

In [None]:
df.shape

In [None]:
df.head() 

In [None]:
df.info() 

In [None]:
df['City'].value_counts() 

#### Getting only null columns

In [None]:
null_columns = df.columns[df.isnull().any()]
df[null_columns].isnull().sum()
#print(df[df.isnull().any(axis=1)][null_columns].head()) 

#### Dropping all rows with null values in all 272 columns

In [None]:
df= df.dropna(thresh=272) 

In [None]:
df.shape

In [None]:
df.isnull().any()

##### Observation: City , State, Metro & County Name are four Object based columns

* The data time line is monthly from 1996-Apr to 2018 Apr 
* RegionName column looks like the ZipCode column

#### Finding Number of Unique value for the columns

In [None]:
df.iloc[:,0:8].nunique()

In [None]:
df.describe() 

##### Working on City column & RegionName(Zip code)

In [None]:
df['City'].value_counts() 

#### Top two cities with maximum ZIPCODES

In [None]:
df_NY = df[df['City'] == 'New York']
df_LA = df[df['City'] == 'Los Angeles']

In [None]:
print(f'{df_NY.RegionName.nunique()} unique Zip Code/s are there in New York.')

In [None]:
print(f'{df_LA.RegionName.nunique()} unique Zip Code/s are there in Los Angelos.')

In [None]:
df_NY.head() 

In [None]:
df_LA.head() 

### Ocala_EDA

In [None]:
df_Ocala= df[df['City'] == 'Ocala']

In [None]:
print(f'{df_Ocala.RegionName.nunique()} unique Zip Code/s are there in Ocala.')

In [None]:
df_Ocala.head(10) 

In [None]:
df_Ocala.info() 

In [None]:
def show_distplot(dataframe, RegionName, column_name):
    sns.distplot(dataframe[column_name])
    plt.title(f'{RegionName} {column_name}')

In [None]:
df['1996-04'].min()

In [None]:
df_Ocala['1996-04'].min()

In [None]:
df['1996-04'].max()

In [None]:
df_Ocala['1996-04'].max()

In [None]:
show_distplot(df_Ocala, 'RegionName', '1996-04')

In [None]:
show_distplot(df_Ocala, 'RegionName', '2018-04')

In [None]:
fig = plt.figure(figsize = (12, 8))
fig.suptitle('Apr_1996')

ax1 = plt.subplot(1, 2, 1) # just changed name of axis from ax1 to ax2 and plt number from 1 to 2
plt.scatter(df_NY['RegionName'], df_NY['1996-04'])
ax1.set_title('NY by Region')

ax2 = plt.subplot(1, 2, 2) # just changed name of axis from ax2 to ax3 and plt number from 2 to 3
plt.scatter(df_LA['RegionName'], df_LA['1996-04'])
ax2.set_title('LA by Region')

In [None]:
list(df_Ocala.columns)

In [None]:
for x in list(df_Ocala.columns)[2:]:
    print(x)

#### Finding Min & Max value for Ocala each Year

In [None]:
min_dict = {}
for x in list(df_Ocala.columns)[1:]:
    min_dict[x] = df_Ocala[x].min() 

In [None]:
min_dict

In [None]:
max_dict = {}
for x in list(df_Ocala.columns)[1:]:
    max_dict[x] = df_Ocala[x].max() 

In [None]:
max_dict

#### Dropping Columns for testing df_Ocala

In [None]:
df_Ocala=df_Ocala.drop(["RegionID", "City", "State", "Metro", "CountyName", "SizeRank"], axis=1)

In [None]:
df_Ocala.head() 

#### Renaming RegionName to Zipcode

In [None]:
df_Ocala.rename(columns={'RegionName': 'Zipcode'}, inplace=True)
df_Ocala.head()

#### Reseting Index

In [None]:
df_Ocala.set_index('Zipcode', inplace=True)

In [None]:
df_Ocala.info() 

#### Testing the Transpose Feature Dates to Row

In [None]:
df_Ocala=df_Ocala.transpose()
df_Ocala.head() 

In [None]:
df_Ocala.describe() 

In [None]:
df_Ocala.plot(figsize=(17,8))

#### Observation : Evidently this time series df_Ocala is not stationary in nature

#### Zipcodes with Top 5 Mean 

In [None]:
Ocala_mean = df_Ocala.mean()

In [None]:
Ocala_mean.head(10) 

In [None]:
Ocala_mean.nlargest() 

#### Converting index to datetime 

In [None]:
df_Ocala.index=pd.to_datetime(df_Ocala.index)
df_Ocala.info()

#### Converting column names to string

In [None]:
df_Ocala.columns = df_Ocala.columns.astype(str)

### Dickey_Fuller Test

In [None]:
from statsmodels.tsa.stattools import adfuller

#create a function that will help us to quickly 
def test_stationarity(timeseries, window):
    
    #Determing rolling statistics
    rolmean = timeseries.rolling(window=window).mean()
    rolstd = timeseries.rolling(window=window).std()

    #Plot rolling statistics:
    fig = plt.figure(figsize=(12, 8))
    orig = plt.plot(timeseries.iloc[window:], color='blue',label='Original')
    mean = plt.plot(rolmean, color='red', label='Rolling Mean')
    std = plt.plot(rolstd, color='black', label = 'Rolling Std')
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show()
    
    #Perform Dickey-Fuller test:
    print ('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print (dfoutput)
    return dfoutput

##### Testing Stationary

In [None]:
# #test the stationarity of the untransformed dataset
# test_stationarity(df_Ocala, 18)

- As the rolling mean for all zip codes shows, 
- The assumption of stationarity is not met, as rolling mean is not constant over time
- To be tested with individual Zip codes

#### 34471

In [None]:
data71 = df_Ocala["34471"]
dftest = adfuller(data71)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput)

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary

In [None]:
output34471 = test_stationarity(df_Ocala['34471'], 18)

In [None]:
df_Ocala['34471'].plot(figsize=(17,8)) 

#### 34474

In [None]:
data74 = df_Ocala["34474"]
dftest = adfuller(data74)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput)

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary

In [None]:
output34474 = test_stationarity(df_Ocala['34474'], 18)

In [None]:
df_Ocala['34474'].plot(figsize=(12, 8)) 

#### 34476

In [None]:
data76 = df_Ocala["34476"]
dftest = adfuller(data76)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput)

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary

In [None]:
output34476 = test_stationarity(df_Ocala['34476'], 18)

#### 34480

In [None]:
data80 = df_Ocala["34480"]
dftest = adfuller(data80)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput)

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary

In [None]:
output34480 = test_stationarity(df_Ocala['34480'], 18)

#### 34482

In [None]:
data82 = df_Ocala["34482"]
dftest = adfuller(data82)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput) 

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary

In [None]:
output34482 = test_stationarity(df_Ocala['34482'], 18)

#### Preparing a DataFrame for Results of Dickey-Fuller Test

In [None]:
df_dft =pd.concat([output34471, output34474, output34476,output34480,output34482 ], axis=1)

In [None]:
df_dft

- 34474 & 34482 have the lowest p-value , higher lags
- Test statistics less than critical value Reject the null hypothesis 
- Test statistics greater than critical value fail to Reject the null hypothesis 

In [None]:
df_Ocala.plot(figsize = (20,15), subplots=True, legend=True)
plt.show()

In [None]:
df_Ocala.plot(figsize = (20,6), style = ".r")
plt.show()

In [None]:
df_Ocala.shape

### Log Transformation

In [None]:
def log_transformation(ts_data_frame):
    for col in ts_data_frame.columns:
        col= pd.Series(np.log(df_Ocala[col]))
        dftest = adfuller(col)
        dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
        for key,value in dftest[4].items():
            dfoutput['Critical Value (%s)'%key] = value
            print(dftest)

            print ('Results of Dickey-Fuller Test:')

            print(dfoutput)

In [None]:
log_transformation(df_Ocala)

#### 34471

In [None]:
log_71 = pd.Series(np.log(df_Ocala["34471"]))
fig = plt.figure(figsize=(12,6))
plt.plot(log_71, color="blue")
plt.xlabel("month", fontsize=14)
plt.ylabel("log(monthly sales)", fontsize=14)
plt.show()

In [None]:
data71_log = log_71
dftest = adfuller(data71_log)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput)

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary post log_tran

In [None]:
log_output34471 = test_stationarity(data71_log, 18)

#### 34474

In [None]:
log_74 = pd.Series(np.log(df_Ocala["34474"]))
fig = plt.figure(figsize=(12,6))
plt.plot(log_74, color="blue")
plt.xlabel("month", fontsize=14)
plt.ylabel("log(monthly sales)", fontsize=14)
plt.show()

In [None]:
data74_log = log_74
dftest = adfuller(data74_log)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput)

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary post log_tran

In [None]:
log_output34474 = test_stationarity(data74_log, 18)

#### 34476

In [None]:
log_76 = pd.Series(np.log(df_Ocala["34476"]))
fig = plt.figure(figsize=(12,6))
plt.plot(log_76, color="blue")
plt.xlabel("month", fontsize=14)
plt.ylabel("log(monthly sales)", fontsize=14)
plt.show()

In [None]:
data76_log = log_76
dftest = adfuller(data76_log)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput)

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary post log_tran

In [None]:
log_output34476 = test_stationarity(data76_log, 18)

#### 34480

In [None]:
log_80 = pd.Series(np.log(df_Ocala["34480"]))
fig = plt.figure(figsize=(12,6))
plt.plot(log_80, color="blue")
plt.xlabel("month", fontsize=14)
plt.ylabel("log(monthly sales)", fontsize=14)
plt.show()

In [None]:
data80_log = log_80
dftest = adfuller(data80_log)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput)

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary post log_tran

In [None]:
log_output34480 = test_stationarity(data80_log, 18)

#### 34482

In [None]:
log_82 = pd.Series(np.log(df_Ocala["34482"]))
fig = plt.figure(figsize=(12,6))
plt.plot(log_82, color="blue")
plt.xlabel("month", fontsize=14)
plt.ylabel("log(monthly sales)", fontsize=14)
plt.show()

In [None]:
data82_log = log_82
dftest = adfuller(data82_log)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

print ('Results of Dickey-Fuller Test:')

print(dfoutput)

if dftest[0] < dftest[4]['5%']:
    print ('Reject Hypothesis - Time series is Stationary')
else:
    print ('Failed to Reject Hypothesis -time series is Non Stationary')

##### Testing Stationary post log_tran

In [None]:
log_output34482 = test_stationarity(data82_log, 18)

#### Preparing a DataFrame for Results of Dickey-Fuller Test post log transformation

In [None]:
df_dft_log =pd.concat([log_output34471,log_output34474, log_output34476, log_output34480,log_output34482], axis=1)

In [None]:
df_dft_log

In [None]:
df_dft

#### Differencing

In [None]:
## rolling average set to 12 months
df_Ocala_log=np.log(df_Ocala) #log transformation first
Ocala_diff = df_Ocala_log.diff(periods=12) #need to check/verify what this is doing

fig = plt.figure(figsize=(11,7))
plt.plot(Ocala_diff, color='blue')
plt.legend(loc='best')
plt.title('Differenced sales series')
plt.show(block=False)

In [None]:
#missing the first rows at the beginning
Ocala_diff= Ocala_diff[12:]
Ocala_diff.head(10) 

In [None]:
## rolling average set to 12 months
df_Ocala_log=np.log(df_Ocala) #log transformation first
Ocala_diff = df_Ocala_log.diff(periods=12) #need to check/verify what this is doing

fig = plt.figure(figsize=(11,7))
plt.plot(Ocala_diff, color='blue')
plt.legend(loc='best')
plt.title('Differenced sales series')
plt.show(block=False)

#missing the first rows at the beginning
Ocala_diff= Ocala_diff[12:]
Ocala_diff.head(10) 

#### Test_Dickey_Fuller

In [None]:
def test_dickey_fuller(ts_data_frame):
    for col in df_Ocala.columns:
        dftest = adfuller(ts_data_frame[col])
        dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
#         for key,value in dftest[4].items():
#             dfoutput['Critical Value (%s)'%key] = value
        print(dftest)

        print ('Results of Dickey-Fuller Test:')

        print(dfoutput) 

In [None]:
#performs a rolling mean
rolmean = Ocala_diff.rolling(window = 4).mean()
rolmean=rolmean[3:]
rolmean.head()
test_dickey_fuller(rolmean) 

In [None]:
df_Ocala.columns

In [None]:
test_dickey_fuller(df_Ocala)

In [None]:
rolmean.dropna(inplace=True)
rolmean.tail(5)

### Seasonal Decompose

In [None]:
rolmean.index

#### 34471

In [None]:
# import seasonal_decompose
from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(rolmean['34471'])

# Gather the trend, seasonality and noise of decomposed object
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# Plot gathered statistics
plt.figure(figsize=(12,24))
plt.subplot(411)
plt.plot(rolmean, label='Original', color="blue")
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend', color="blue")
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality', color="blue")
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals', color="blue")
plt.legend(loc='best')
plt.tight_layout()

In [None]:
# import seasonal_decompose
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(rolmean['34471'])

# Gather the trend, seasonality and noise of decomposed object
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# Plot gathered statistics
plt.figure(figsize=(12,8))
plt.subplot(221)
plt.plot(rolmean, label='Original', color="blue")
plt.legend(loc='best')
plt.subplot(222)
plt.plot(trend, label='Trend', color="blue")
plt.legend(loc='best')
plt.subplot(223)
plt.plot(seasonal,label='Seasonality', color="blue")
plt.legend(loc='best')
plt.subplot(224)
plt.plot(residual, label='Residuals', color="blue")
plt.legend(loc='best')
plt.tight_layout()

#### 34474

In [None]:
# import seasonal_decompose
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(rolmean['34474'])

# Gather the trend, seasonality and noise of decomposed object
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# Plot gathered statistics
plt.figure(figsize=(12,8))
plt.subplot(221)
plt.plot(rolmean, label='Original', color="blue")
plt.legend(loc='best')
plt.subplot(222)
plt.plot(trend, label='Trend', color="blue")
plt.legend(loc='best')
plt.subplot(223)
plt.plot(seasonal,label='Seasonality', color="blue")
plt.legend(loc='best')
plt.subplot(224)
plt.plot(residual, label='Residuals', color="blue")
plt.legend(loc='best')
plt.tight_layout()

#### 34476

In [None]:
# import seasonal_decompose
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(rolmean['34476'])

# Gather the trend, seasonality and noise of decomposed object
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# Plot gathered statistics
plt.figure(figsize=(12,8))
plt.subplot(221)
plt.plot(rolmean, label='Original', color="blue")
plt.legend(loc='best')
plt.subplot(222)
plt.plot(trend, label='Trend', color="blue")
plt.legend(loc='best')
plt.subplot(223)
plt.plot(seasonal,label='Seasonality', color="blue")
plt.legend(loc='best')
plt.subplot(224)
plt.plot(residual, label='Residuals', color="blue")
plt.legend(loc='best')
plt.tight_layout()

#### 34480

In [None]:
# import seasonal_decompose
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(rolmean['34480'])

# Gather the trend, seasonality and noise of decomposed object
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# Plot gathered statistics
plt.figure(figsize=(12,8))
plt.subplot(221)
plt.plot(rolmean, label='Original', color="blue")
plt.legend(loc='best')
plt.subplot(222)
plt.plot(trend, label='Trend', color="blue")
plt.legend(loc='best')
plt.subplot(223)
plt.plot(seasonal,label='Seasonality', color="blue")
plt.legend(loc='best')
plt.subplot(224)
plt.plot(residual, label='Residuals', color="blue")
plt.legend(loc='best')
plt.tight_layout()

#### 34482

In [None]:
# import seasonal_decompose
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(rolmean['34471'])

# Gather the trend, seasonality and noise of decomposed object
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# Plot gathered statistics
plt.figure(figsize=(12,8))
plt.subplot(221)
plt.plot(rolmean, label='Original', color="blue")
plt.legend(loc='best')
plt.subplot(222)
plt.plot(trend, label='Trend', color="blue")
plt.legend(loc='best')
plt.subplot(223)
plt.plot(seasonal,label='Seasonality', color="blue")
plt.legend(loc='best')
plt.subplot(224)
plt.plot(residual, label='Residuals', color="blue")
plt.legend(loc='best')
plt.tight_layout()

#### Dropping NAN

In [None]:
df_Ocala_dc=decomposition.resid.dropna()
df_Ocala_dc.tail()

### Auto Correlation

In [None]:
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf # for determining (p,q) orders

In [None]:
Ocala_diff.head(15) 

#### 34471 

In [None]:
plt.figure(figsize=(12,5))
pd.plotting.autocorrelation_plot(log_71);

##### Auto Correlation at 40 lags 
- The red line meets the dotted line at around 40 lags

In [None]:
title = 'Autocorrelation: Ocala 34471'
lags = 40
plot_acf(df_Ocala['34471'],title=title,lags=lags);

#### 34474

In [None]:
plt.figure(figsize=(12,5))
pd.plotting.autocorrelation_plot(log_74);

##### Auto Correlation at 40 lags 
- The red line meets the dotted line at around 40 lags

In [None]:
title = 'Autocorrelation: Ocala 34474'
lags = 40
plot_acf(df_Ocala['34474'],title=title,lags=lags);

#### 34476 

In [None]:
plt.figure(figsize=(12,5))
pd.plotting.autocorrelation_plot(log_76);

##### Auto Correlation at 40 lags 
- The red line meets the dotted line at around 40 lags

In [None]:
title = 'Autocorrelation: Ocala 34476'
lags = 40
plot_acf(df_Ocala['34476'],title=title,lags=lags);

#### 34480

In [None]:
plt.figure(figsize=(12,5))
pd.plotting.autocorrelation_plot(log_80);

##### Auto Correlation at 40 lags 
- The red line meets the dotted line at around 40 lags

In [None]:
title = 'Autocorrelation: Ocala 34480'
lags = 40
plot_acf(df_Ocala['34480'],title=title,lags=lags);

#### 34482

In [None]:
plt.figure(figsize=(12,5))
pd.plotting.autocorrelation_plot(log_82);

##### Auto Correlation at 40 lags 
- The red line meets the dotted line at around 40 lags

In [None]:
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf # for determining (p,q) orders
title = 'Autocorrelation: Ocala 34482'
lags = 40
plot_acf(df_Ocala['34482'],title=title,lags=lags);