#Exploring Energy Balancing Authority Data for Portland

The following explores the EBA data for Portland.  I've made a joint data frame with the weather, and electricity demand.
I've explored some correlations between weather and demand.  I've also looked at the power spectra of the demand, which shows annual, daily and weekly oscillations.
I got a bit bogged down in trying to smooth out that noise (in the service of making a nice linear model).

I'm going to re-orient this to focus on just making a simpler rolling model, based on two-weeks of data.

At the outset, what is our goal?
I want to develop a model to predict electricity demand a day ahead.  I will use existing historical data for both demand, and weather data. (Real forecasts are stuck relying on weather forecasts, which I have not found just yet.)

Then what are the existing methods for doing this?
There must be some reviews on this, as well as readily applicable techniques.  
* ARMA - need to remove obvious daily/weekly seasonality? I really need an ARMAX model, to include temperature.  
 ARMA models require stationarity, and I'm currently tying myself up in knots trying to remove the daily oscillations.
One approach is to just consider one hour at a time? (Here we have a so-called unit root, where $x_{t+dt} = r x_t$, with $|r|=1$.)

* Gradient Boosted Regression Trees (Winning technique at GEFCOM 2014 forecasting competition)

* Recurrent Neural Network - train a neural network.  Feed it time-of-day, temperature, weekend/holiday.  (I have a small play network going.)

Metric for success: MSE.

Comparison: Average Day. Or Persistence model (same as yesterday).




In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from statsmodels.tsa.stattools import adfuller, acf, pacf, arma_order_select_ic
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.seasonal import seasonal_decompose

%matplotlib inline
%load_ext autoreload
%autoreload 2

from get_weather_data import convert_isd_to_df, convert_state_isd

pi=np.pi

  from pandas.core import datetools


In [2]:
air_df = pd.read_csv('data/air_code_df.gz')

#Just get the weather station data for cities in Oregon.
df_weather=convert_state_isd(air_df,'OR')

#Read all of the weather data in.
#df_weather=pd.read_csv('data/airport_weather.gz',index_col=0,parse_dates=True)


done with Mahlon Sweet Field


done with Salem Municipal Airport/McNary Field


done with Portland International Airport


In [3]:
#load electricity data
df_eba=pd.read_csv('data/EBA_time.gz',index_col=0,parse_dates=True)
#df_region_eba=pd.read_csv('data/EBA_region_time.gz',index_col=0,parse_dates=True)

In [4]:
#Select temperature for Portland, OR
msk1=np.array(df_weather['city']=='Portland')
msk2=np.array(df_weather['state']=='OR')

df_pdx_weath=df_weather.loc[msk1&msk2]


In [5]:
#get electricity data for Portland General Electric
msk=df_eba.columns.str.contains('Portland')
df_pdx=df_eba.loc[:,msk]


### Anomaly Detection

A quick look at the portland data suggests that there are both real outliers, and ones from errors in the data process (100x surrounding values).  

Tests should be for total interchange = 0, and 
Demand=Net Gen - Net Interchange

In [10]:
vnew=[735567.85,736564,0,10000]
fig=plt.figure(figsize=(15,6))
ax = fig.add_axes([0.1, 0.1, 0.6, 0.75])
ax.plot(df_pdx)
ax.legend(df_pdx.columns.values,loc='upper left',bbox_to_anchor=(1,1),prop={'size':9})

<matplotlib.figure.Figure at 0x7efbc24a8668>

<matplotlib.legend.Legend at 0x7efbc2340b70>

In [None]:
#Check that the energy is balanced for this small subset: Demand = Net Generation - Net Interchange.
#Seems to not be true.  

In [196]:
dem=df_pdx.iloc[:,0]
gen=df_pdx.iloc[:,2]
net=df_pdx.iloc[:,3]
plt.figure()
plt.plot(dem-(-gen+net),'r')

[<matplotlib.lines.Line2D at 0x7ff7a24a4b00>]

<matplotlib.figure.Figure at 0x7ff7a2587978>

The data in later 2015 seem pretty crappy.  Looking at the EBA user notes, this seems to be a common complaint.
The other errors seem to involve some anomalous zero points in the temperature series.  For temperature series where huge swings are unlikely
it may be feasible to replace anomalous 0 values with the average of the neighbouring points.  In case of actual zero values, this shouldn't be a large problem?

In [6]:
#Make a combined Portland Dataframe for demand vs weather.
dem=df_pdx.iloc[:,0]
df_joint=pd.DataFrame(dem)
df_joint=df_joint.join(df_pdx_weath)
df_joint.head()
x=df_joint.iloc[:,0]
y=df_joint.iloc[:,1]
df_joint['TempShift']=150+abs(df_joint['Temp']-150)
df_joint=df_joint.rename(columns={df_joint.columns[0]:'Demand'})
#df_joint.head()

In [198]:
plt.figure()
plt.plot(df_joint['Temp'],df_joint.iloc[:,0],'rx')
plt.ylabel('Hourly Demand (kWh)')
plt.xlabel('Temperature (Celcius x10)')
plt.title('Energy Usage vs Temperature in Portland, OR')

Text(0.5,1,'Energy Usage vs Temperature in Portland, OR')

<matplotlib.figure.Figure at 0x7ff7a24b1898>

In [8]:
plt.figure()
plt.plot(df_joint['WindSpeed'],df_joint.iloc[:,0],'rx')
plt.xlabel('Wind Speed (m/s x10)')
plt.ylabel('Hourly Demand (kWh)')
plt.title('Energy Usage vs Temperature in Portland, OR')

<matplotlib.figure.Figure at 0x7efbb9752ba8>

Text(0.5,1,'Energy Usage vs Temperature in Portland, OR')

In [11]:
plt.figure()
plt.plot(df_joint['Precip-1hr'],df_joint.iloc[:,0],'rx')
plt.ylabel('Demand (kWh)')
plt.xlabel('Precipitation (mm x 10)')
plt.title('Energy Usage vs Precipitation in Portland, OR')

<matplotlib.figure.Figure at 0x7efbb98c6f28>

Text(0.5,1,'Energy Usage vs Precipitation in Portland, OR')

So the scatterplot for temperature versus demand shows a clear (expected) trend as the tempererature becomes excessively hot or cold.
It looks like two blobs with similar slopes for deviations from 15 Celcius.  You can also see anomalous values at zero,
and extremely high values.  I'm skeptical of the 9000kWh value?

Let's also plot the correlation matrix across the whole time series.  Evidently a temperature  deviation from 15 celcius shows the largest correlation, with wind speed being the next most important.
I know the coldest temperatures in some places emerge in inversions (with absolutely no air movement).

In [None]:
My naive model for how energy usage would vary is a factor for deviation from some ideal temperature, as well as daily and yearly oscillations.
\begin{equation}
    \text{Demand}= A_0+A_T|T-T_0|+A_\text{day}\sin\left( \frac{2\pi t}{24}+\phi_{\text{day}}\right)+A_\text{year}\sin\left(\frac{2\pi d}{365}+\phi_{\text{year}}\right)
\end{equation}
where $t$ is the hour of the day in 24 hour time, and $d$ is the number of days since the start of the year.

To get a sense of those oscillations, let's look at the autocorrelation function for demand, as a function of time.  (Alternatively, the power spectrum?)

# Removing Extremes

Lets try to clean up some of this data.
My strategy is to find missing (or zero values) or excessive data.  Find values larger than 3x standard deviations from the mean.
Those extreme values are replaced with the mean of the two neighbouring points.
This is also carried out for points with zero. Under the assumption that the data are otherwise continuous, the smoothing should not be a large distortion.


In [7]:
def avg_extremes(df,window=2):
    """avg_extremes(df)
    Replace extreme outliers, or zero values with the average on either side.
    Suitable for occasional anomalous readings.
    """
    mu=df.mean()
    sd=df.std()
    msk1=(df-mu)>4*sd
    msk2 = df==0
    msk=msk1|msk2
    print( "Number of extreme values {}. Number of zero values {}".format(sum(msk1),sum(msk2)))
    ind= np.arange(len(df))[msk]
    for i in ind:
        df.iloc[i]=(df.iloc[i-window]+df.iloc[i-window])/2

    return df

def remove_na(df,window=2):
    """remove_na(df)
    Replace all NA values with the mean value of the series.
    """
    na_msk=np.isnan(df.values)
    #first pass:replace them all with the mean value - if a whole day is missing.
    print( "Number of NA values {}".format(sum(na_msk)))
    df[na_msk]=df.mean()

    ind= np.arange(len(df))[na_msk]
    #for isolated values, replace by the average on either side.    
    for i in ind:
        df.iloc[i]=(df.iloc[i-window]+df.iloc[i-window])/2
    return df


# Auto regressive modelling

A popular approach assumes that the current demand is probably the same as the previous demand, with some noise.
This is the auto-regressive, integrated, moving average (ARIMA) class of models that are popular linear models within econometric forecasting.

In [34]:
def make_seasonal_plots(dem,temp,per,nlags):
    """Make seasonal decomposition of temperature, and demand curves.
    Plots those decompositions, and their correlation/autocorrelation plots.
    dem- input demand series
    temp-input temperature series
    per - input date to index on for plotting, e.g. '2016-03'
    nlags - number of lags for correlation plots.
    """
    #Carry out the "demand" and "temperature" seasonal decompositions.
    dem_decomposition = seasonal_decompose(dem,two_sided=False)
    dem_mu=dem.mean()
    dem_trend = dem_decomposition.trend/dem_mu  #Find rolling average over most important period.
    dem_seasonal = dem_decomposition.seasonal/dem_mu  #Find the dominant frequency components
    dem_residual = dem_decomposition.resid/dem_mu  #Whatever is left.

    temp_decomposition = seasonal_decompose(temp,two_sided=False)
    temp_mu=temp.mean()
    temp_trend = temp_decomposition.trend/temp_mu  #Find rolling average over most important period.
    temp_seasonal = temp_decomposition.seasonal/temp_mu  #Find the dominant frequency components
    temp_residual = temp_decomposition.resid/temp_mu  #Whatever is left.

    # numna= lambda x:np.sum(np.isnan(x))
    # print('NA:(trend,seasonal,residual,whole)',numna(temp_trend),numna(temp_seasonal),numna(temp_residual),numna(temp))

    #Plot out the decompositions
    plt.figure(figsize=(15,9))
    plt.title('Normalized Seasonal Decomposition')
    plt.subplot(411)
    plt.plot(dem_trend[per],'b',temp_trend[per],'k')
    plt.ylabel('Trend')
    plt.subplot(412)
    plt.plot(dem_seasonal[per],'b',temp_seasonal[per],'k')
    plt.ylabel('Seasonal Oscillation')
    plt.subplot(413)
    plt.plot(dem_residual[per],'b',temp_residual[per],'k')
    plt.ylabel('Residuals')
    plt.subplot(414)
    plt.plot(dem[per]/dem_mu,'b',temp[per]/temp_mu,'k')
    plt.ylabel('Data')
    plt.show()

    #Plot the auto-correlation plots.
    nlags=np.min([len(dem[per])-1,nlags,len(temp[per])-1])
    print('Nlags',nlags)
    #plt.figure(figsize=(10,6))
    fig, (ax1, ax2) = plt.subplots(1,2)
    plot_acf(temp_residual[per],'b-x','Temp Residual',ax1,ax2,nl=nlags)
    plot_acf(dem_residual[per],'r-+','Demand Residual',ax1,ax2,nl=nlags)
    plt.legend()
    plt.show()
    #plt.figure(figsize=(10,6))
    fig, (ax1, ax2) = plt.subplots(1,2)
    plot_acf(temp[per],'b-x','Temp',ax1,ax2,nl=nlags)
    plot_acf(dem[per],'r-+','Demand',ax1,ax2,nl=nlags)
    plt.legend()
    plt.show()

    return None

def plot_acf(ts,ls,line_label,ax1,ax2,nl=50):
    """plot_acf(ts,ls,line_label,ax1,ax2,nl)
    Plot the auto-correlation plots for a timeseries (ts) up to a given number of lags (nl)
    Give a specific linestyle (ls), and label.
    Inputs:
    ts - time series
    ls - line style to use when plotting
    line_label - label for this times seris
    ax1, ax2 - axes for sub-plots
    nl - number of lags
    """
    #Actually do those auto-corellations, on the series, and its absolute value.
    ts2 = ts[np.isfinite(ts)]
    lag_acf = acf(ts2,nlags=nl)
    lag_pacf=pacf(ts2,nlags=nl,method='ols')
    #5% confidence intervals.
    sd = 1.96/np.sqrt(len(ts2))
    #Make some purty subplots.

    plt.title('Auto Correlation')
    plt.axhline(y=sd,color='gray')
    plt.axhline(y=-sd,color='gray')
    plt.xlabel('Lag')
    ax1.plot(lag_acf,ls,label=line_label)   
    plt.title('Partial Auto Correlation')
    plt.xlabel('Lag')
    plt.axhline(y=sd,color='gray')
    plt.axhline(y=-sd,color='gray')
    ax2.plot(lag_pacf,ls,label=line_label)    
    return None


In [36]:
#%pdb


In [35]:
per='2016-01'
dem=df_joint.loc[per,'Demand'].asfreq('H')
dem=avg_extremes(dem)
dem=remove_na(dem)

temp=df_joint.loc[per,'Temp'].asfreq('H')
temp=avg_extremes(temp)
temp=remove_na(temp)

make_seasonal_plots(dem,temp,per,50)

<matplotlib.figure.Figure at 0x7efbb991af98>

<matplotlib.figure.Figure at 0x7efbb9740be0>

Nlags 50


<matplotlib.figure.Figure at 0x7efbb99094a8>

Number of extreme values 0. Number of zero values 0
Number of NA values 0
Number of extreme values 0. Number of zero values 19
Number of NA values 0
(JBM) Freq is  24
(JBM) Freq is  24


Evidently, this finds the day timescale.  I'm a bit skeptical of these plots, and this approach (trying simple seasonality reduction on the whole data set at once).  I think the seasonal component has not been completely removed.
The "seasonal_decompose" method works by estimating the frequency of the data.  The trend is found by taking rolling averages within each period, and the seasonality is found by averages over multiple periods.  The remainder once these are subtracted is the "noise" process.

There is an additional year-long oscillations are still buried in the trend.  Of course, this data has only two years worth of data. 

In [11]:
#Do some tests for stationarity
ad_results=adfuller(dem['2016-10'],autolag='BIC')
names=["Test statistic","p-value","#Lags","Num observed","Critical Values"]

for i in range(0,5):
    print( names[i],ad_results[i])


Test statistic -2.17059796702
p-value 0.217089195899
#Lags 20
Num observed 723
Critical Values {'1%': -3.4394269973845657, '5%': -2.8655458544300387, '10%': -2.5689031745512492}


The above plot is the raw auto-correlation between the demand and temperature.  I think there is a substantive daily oscillation left by the naive seasonal approach.  This assumes a single oscillation, repeated for all cases.  In this data however, there is a clear daily signal, which it picks out.  However, this will vary over the course of the year.

Diebold's text "Elements of Forecasting" suggests putting in dummy variables for seasonality.  So hour of day, and day of year.  The resulting series.  

In [None]:
So looking at just an hour of the day, the seasonal split manages to work fairly well at making the residual series a stationary one.
The "trend" is effectively picking out the anticipated annual shifts, and the "seasonality" is pulling out a small week long oscillation (the amplitude is much smaller than the trend).  The residuals also seem to be stationary now.  

The autocorrelation plots also show some oscillations (I think the seasonal reduction is pretty crap), but here they decay to within error after
6 days.  
The raw demand auto-correlations might be showing annual oscillations in temperature and electricity usage that would get stronger from 120-240 days.

Turns out the "seasonal" part 

If we look at the correlation plots for various hours there are a couple clear trends.  Looking at 6pm, shows a really clear weekly (7 day) signal.  This is not as obvious at other times of day (6am, 9am, 12pm).  Note that I have not selected out weekends, or holidays here.  Weekends might be strongly contributing to the weekly oscillation.  


In [37]:
#Compare series at noon
msk=df_joint.index.hour==12

dem=df_joint[msk]['Demand'].asfreq('D')
dem=avg_extremes(dem)
dem=remove_na(dem)

temp=df_joint[msk]['Temp'].asfreq('D')
temp=avg_extremes(temp)
temp=remove_na(temp)
make_seasonal_plots(dem,temp,'2016-03',40)

<matplotlib.figure.Figure at 0x7efb9990be80>

<matplotlib.figure.Figure at 0x7efbb0164e80>

Nlags 30


<matplotlib.figure.Figure at 0x7efbb9947240>

Number of extreme values 7. Number of zero values 0
Number of NA values 6
Number of extreme values 0. Number of zero values 5
Number of NA values 2
(JBM) Freq is  7
(JBM) Freq is  7


## Fourier Plots

I'm curious about the power spectrum for this series.  I'm also unfamiliar with Python's FFT routine, so this is a good time to play around.
I'm going to look at the Fourier spectrum for the demand, over a single year.  I'll then try to filter the data by using removing the peaks at the daily, weekly, and annual timescales. 

In [124]:
#clean up the data
dem_t=df_joint['Demand']['2015-07':'2016-06'].copy()
dem_t=avg_extremes(dem_t)
dem_t=remove_na(dem_t)
dem_tv=dem_t.values

#set up FFT time/frequency scales
Nt = len(dem_tv)
#scale time to days.
Tmax = Nt/24
dt = 1/24
t = np.arange(0,Tmax,dt)
df = 1/Tmax
fmax=0.5/dt
f = np.arange(-fmax,fmax,df)

#carry out fft 
dem_f=np.fft.fftshift(dem_tv)
dem_f=np.fft.fft(dem_f)
dem_f=np.fft.ifftshift(dem_f)

Number of extreme values 0. Number of zero values 0


In [66]:
plt.figure(figsize=(15,10))
spec=abs(dem_f)**2
spec/=sum(spec)
plt.semilogy(f,spec)
fcut=1/7
plt.axis([-10*fcut,10*fcut,1E-10,1])
plt.xlabel('Frequency (1/day)')
plt.ylabel('Normalized Demand Power Spectrum')
plt.show()

<matplotlib.figure.Figure at 0x7ff7cdd870b8>

This is a normalized power spectrum for the demand data.  You can clearly see the peaks arising from daily and weekly oscillations.
There is a small peak at very low frequencies, which corresponds to the annual oscillation.  However, given we only have 2 years of data, this
is almost exactly the Nyquist frequency (lowest frequency that can be resolved).  Let's examine both the high (intra-day) and low (year-long) frequency scales.
The top figure, shows the low frequency (year-long) data.  The lower plot shows nearly the whole frequency spectrum.  Note the peaks at 1,2,3,etc.  These are the daily frequency oscillations.  They also share correlations with other frequencies fo

In [38]:
from EBA_fft import remove_square_peak, remove_sinc_peak, invert_fft, fft_detrend, moving_avg

In [167]:
f_trend_tot,f_detrend = fft_detrend(dem_f,f,4/365,remove_square_peak)
#now take a rolling average of the remainder.
dem_f_s=moving_avg(f_detrend,50)
f_trend_tot+=dem_f_s
f_detrend-=dem_f_s

In [168]:
plt.figure(figsize=(12,9))
w=1
plt.axis([-0.2,5,1E3,1E8])
plt.semilogy(f,abs(f_trend_tot),f,abs(f_detrend),f,5E4/(1+(f/w)**2))
plt.show()

<matplotlib.figure.Figure at 0x7ff7a37dada0>

In [169]:
#check out what this detrending looks like.

t_trend=invert_fft(f_trend_tot)
t_detrend=invert_fft(f_detrend)

# t_trend=pd.Series(t_trend,index=dem_t.index)
# t_detrend=pd.Series(t_detrend,index=dem_t.index)


In [170]:
plt.figure(figsize=(12,9))
plt.plot(t,dem_t,'b',t,t_trend,'r',t,t_detrend,'g')
#plt.axis([550,560,min(t_detrend),max(dem_t)])

<matplotlib.figure.Figure at 0x7ff7a38386a0>

[<matplotlib.lines.Line2D at 0x7ff7a37adac8>,
 <matplotlib.lines.Line2D at 0x7ff7a37adc88>,
 <matplotlib.lines.Line2D at 0x7ff7a36c84e0>]

So that used just July/2015-June/2016 data to find the trend.  Let's now see how this does when applied to the next year's data.
The trend can be appended to itself as a "prediction".  

In [191]:
dem_t2=df_joint['Demand']['2015-07':'2017-06'].copy()
dem_t2=avg_extremes(dem_t2)
dem_t2=remove_na(dem_t2)

#need to ditch a day due to leap year in 2016 elongating the year.
#This might be screwing things up based on day of the week, and leading to a week-long offset
t_trend2 = np.append(t_trend,t_trend[24:])
t_trend2 = pd.Series(t_trend2,index=dem_t2.index)

Number of extreme values 1. Number of zero values 3


In [193]:
plt.figure(figsize=(12,9))
per='2016-10'
plt.plot(-t_trend2[per]+dem_t2[per],'b-')

<matplotlib.figure.Figure at 0x7ff7a3391278>

[<matplotlib.lines.Line2D at 0x7ff7a333f278>]

## Goals

What is my goal here?  To develop a model for day-ahead electricity forecasts, that optimizes the mean square error.  I have been playing with trying to capture an entire year's data.  (I wanted to explore the seasonal patterns, and try fitting a basic model.)

My goal here was to develop a simple linear model for comparison with neural network approaches.
However, trying to forecast a year's power (at daily resolution) is a fool's errand.  What is a smaller task, I can play with?
I could try fitting day-ahead curves, using the last two week's data.  Each day is then its own problem, with much more manageable requirements.
To finish the ARMA stuff, I can estimate the expected ARMA parameters from a bunch of separate two-week periods. Once the model parameters
are set, I can fit the model for each period, and forecast the next day's behaviour. Those parameters can then be used in the future, perhaps
with feedback based on how they worked in the past.

I also want to fit a Long Short-Term Memory neural network to this data.  This will be done in TensorFlow,  where I will try to build the network using the lower-level instructions, rather than any built-in operations .  This problem seems a good match for this technique, since there are clear correlations, and some scope for nonlinearities.  In this case we must select parameters for the size and depth of
the network.

In [65]:
#Make a small 2-week test set.
per='2015-12'
df_train=df_joint.loc[per,['Demand','Temp']].copy()

df_train_cln=df_train.asfreq('H')
for col in df_train.columns:
    df_train_cln[col]=avg_extremes(df_train[col])
    df_train_cln[col]=remove_na(df_train_cln[col])    


Number of extreme values 0. Number of zero values 0
Number of NA values 0
Number of extreme values 0. Number of zero values 5
Number of NA values 0


In [85]:
mu=df_train_cln['Demand'].mean()
y=df_train_cln['Demand']/mu
res = arma_order_select_ic(y, max_ar=24, ic=['aic', 'bic'], trend='nc')





































  invarcoefs = -np.log((1-params)/(1+params))






















  newparams = ((1-np.exp(-params))/(1+np.exp(-params))).copy()
  newparams = ((1-np.exp(-params))/(1+np.exp(-params))).copy()
  tmp = ((1-np.exp(-params))/(1+np.exp(-params))).copy()
  tmp = ((1-np.exp(-params))/(1+np.exp(-params))).copy()
  newparams = ((1-np.exp(-params))/
  (1+np.exp(-params))).copy()
  (1+np.exp(-params))).copy()
  tmp = ((1-np.exp(-params))/
  (1+np.exp(-params))).copy()
  (1+np.exp(-params))).copy()


In [79]:
res

{'aic':              0             1             2
 0          NaN  12818.530810  11872.021290
 1  9373.914993   8750.299673   8483.608489
 2  8583.201771   8425.473876   8405.075432
 3  8391.789924   8329.199133           NaN
 4  8393.642930   8390.672140   8331.518802,
 'aic_min_order': (3, 1),
 'bic':              0             1             2
 0          NaN  12827.754892  11885.857413
 1  9383.139075   8764.135796   8502.056653
 2  8597.037894   8443.922040   8428.135637
 3  8410.238089   8352.259338           NaN
 4  8416.703135   8418.344386   8363.803089,
 'bic_min_order': (3, 1)}

In [71]:
make_seasonal_plots(df_train_cln['Demand'],df_train_cln['Temp'],per,50)

LinAlgError: SVD did not converge

> [0;32m/home/jonathan/anaconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py[0m(99)[0;36m_raise_linalgerror_svd_nonconvergence[0;34m()[0m
[0;32m     97 [0;31m[0;34m[0m[0m
[0m[0;32m     98 [0;31m[0;32mdef[0m [0m_raise_linalgerror_svd_nonconvergence[0m[0;34m([0m[0merr[0m[0;34m,[0m [0mflag[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0m
[0m[0;32m---> 99 [0;31m    [0;32mraise[0m [0mLinAlgError[0m[0;34m([0m[0;34m"SVD did not converge"[0m[0;34m)[0m[0;34m[0m[0m
[0m[0;32m    100 [0;31m[0;34m[0m[0m
[0m[0;32m    101 [0;31m[0;32mdef[0m [0mget_linalg_error_extobj[0m[0;34m([0m[0mcallback[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0m
[0m


LinAlgError: SVD did not converge

  return (S > tol).sum(axis=-1)


Nlags 50


<matplotlib.figure.Figure at 0x7efb99bd1898>

(JBM) Freq is  24
(JBM) Freq is  24


In [56]:
fig, (ax1, ax2) = plt.subplots(1,2)
plot_acf(df_train_cln.loc[per,'Demand'],'b-x','Temp',ax1,ax2,nl=50)

<matplotlib.figure.Figure at 0x7efb99bc4748>

# Appendices

I've accumulated things I was playing with here, such as the distinction between auto-correlation, and partial auto-correlation plots, and numpy's fft syntax.

## ACF vs PACF

The following example helped me understand the distinction between the ACF and PACF.  The PACF tries to remove the correlation due to the intermediate variables, to find how the innovation/noise a step $k$ in the past, affects the present.   The following model models a random walk, and adds on a delayed copy of itself.  You can see the peaks in the PACF at lags corresponding to the enforced lag.  So the ACF tells us the order of the auto-regression, and PACF tells us the order of the moving average.  

In [114]:
Nx=10000
s=2
x = np.arange(0,Nx)
z= np.random.randn(Nx)
z1=np.zeros(Nx)

z1[s:Nx] = z[0:Nx-s]
y = 2*x +2*z - .5*z1

tindex = pd.date_range('2015-01-01',periods=Nx)
ts = pd.Series(y,index=tindex)
plt.figure()
plot_acf(ts,'r-+','T0',nl=10)
plt.show()

<matplotlib.figure.Figure at 0x7efb99c70358>