# Forecasting for July

Technical Challenge for Data Science Candidates

You want to forecast the “Outside Temperature” for the first 9 days of the next month.

Assume that:

  - The average temperature for each day of July is constant and equal to 25 degrees;

  - For the 1st of July, the pattern of the temperatures across the day with respect to the average temperature on that day is similar to the one found on 1st of June, for the 2nd of July is similar to the average on the 2nd of June, etc.
  
Produce a “.txt” file with your forecast for July (from 1st July to 9th July) with the sample values for each time for e.g. dd/mm/yyyy, Time, Outside Temperature.

## Implementation

This is easier than it looks. There is some discussion of residuals in the other notebook, but if I apply ratio residuals (instead of absolute ones), then it is functionally the same as multiplying the day in June numbers by the ratio of the expected July average with the day in June average.

So, for example, if the average for a day in June is 16 degrees, then multiply the day's individual numbers by 22/16 = 1.375 and that gives you the predicted July values.

In [265]:
import numpy as np
import pandas as pd

import matplotlib
from cycler import cycler
import matplotlib.pyplot as plt

pd.__version__

'0.24.2'

In [266]:
# If you turn this feature on, you can display each result as it happens.
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [267]:
df1 = pd.read_pickle("200606.pkl")

In [268]:
df1.info()
df1.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4319 entries, 90 to 4408
Data columns (total 20 columns):
Date                      4319 non-null object
Time                      4319 non-null object
Temp Humidity Index       4319 non-null float64
Outside Temperature       4319 non-null float64
WindChill                 4319 non-null float64
Hi Temperature            4319 non-null float64
Low Temperature           4319 non-null float64
Outside Humidity          4319 non-null int64
DewPoint                  4319 non-null float64
WindSpeed                 4319 non-null int64
Hi                        4319 non-null int64
Wind Direction            4319 non-null object
Rain                      4319 non-null float64
Barometer                 4319 non-null float64
Inside  Temperature       4319 non-null float64
Inside  Humidity          4319 non-null int64
ArchivePeriod             4319 non-null int64
dttm0                     4319 non-null datetime64[ns]
m0                        4319 non

Unnamed: 0,Date,Time,Temp Humidity Index,Outside Temperature,WindChill,Hi Temperature,Low Temperature,Outside Humidity,DewPoint,WindSpeed,Hi,Wind Direction,Rain,Barometer,Inside Temperature,Inside Humidity,ArchivePeriod,dttm0,m0,dy0
90,01/06/2006,00:00,9.6,9.6,9.6,9.6,9.5,81,6.4,1,10,WNW,0.0,1014.4,21.1,40,10,2006-06-01 00:00:00,6,1
91,01/06/2006,00:10,9.5,9.5,9.5,9.6,9.5,81,6.4,1,5,SW,0.0,1014.4,21.1,40,10,2006-06-01 00:10:00,6,1
92,01/06/2006,00:20,9.5,9.5,9.5,9.6,9.5,81,6.4,1,7,WSW,0.0,1014.3,21.1,40,10,2006-06-01 00:20:00,6,1
93,01/06/2006,00:30,9.5,9.5,9.5,9.6,9.5,82,6.6,1,6,W,0.0,1014.2,21.1,40,10,2006-06-01 00:30:00,6,1
94,01/06/2006,00:40,9.5,9.5,9.5,9.6,9.5,83,6.8,1,6,SW,0.0,1014.1,21.0,40,10,2006-06-01 00:40:00,6,1


In [269]:
## Filter down to first 9 days
ndays=9
tag = 'Outside Temperature'
cut0 = df1.dttm0.min().normalize() + pd.DateOffset(days=ndays)
df2 = df1[ df1['dttm0'] < cut0]
df3 = df2[['dttm0', 'dy0', tag]]
df3.head()

Unnamed: 0,dttm0,dy0,Outside Temperature
90,2006-06-01 00:00:00,1,9.6
91,2006-06-01 00:10:00,1,9.5
92,2006-06-01 00:20:00,1,9.5
93,2006-06-01 00:30:00,1,9.5
94,2006-06-01 00:40:00,1,9.5


## Day averages in June

The daily averages have been calculated before

In [270]:
july0 = 22.0
m0 = df3.groupby('dy0')[tag].mean().to_frame()
m0

Unnamed: 0_level_0,Outside Temperature
dy0,Unnamed: 1_level_1
1,12.520139
2,12.960417
3,12.840278
4,12.013889
5,12.186806
6,16.304167
7,14.219444
8,15.315278
9,12.748611


In [271]:
m1 = (july0 / m0[tag]).to_frame()
m1.rename(columns={ tag: 'm1'}, inplace=True)
m1

Unnamed: 0_level_0,m1
dy0,Unnamed: 1_level_1
1,1.757169
2,1.697476
3,1.713359
4,1.831214
5,1.805231
6,1.349348
7,1.547177
8,1.436474
9,1.725678


In [272]:
df4 = df3.merge(m1, on='dy0')
df4[tag] = df4[tag] * df4['m1']
df4['dttm0'] = df4['dttm0'] + pd.DateOffset(months=1)
df4['Date'] = df4['dttm0'].map(lambda x: x.date())
df4['Time'] = df4['dttm0'].map(lambda x: x.time())
df4.drop(columns=['dy0', 'm1', 'dttm0'], inplace=True)

In [274]:
df4.head()
df4.tail()

Unnamed: 0,Outside Temperature,Date,Time
0,16.868822,2006-07-01,00:00:00
1,16.693106,2006-07-01,00:10:00
2,16.693106,2006-07-01,00:20:00
3,16.693106,2006-07-01,00:30:00
4,16.693106,2006-07-01,00:40:00


Unnamed: 0,Outside Temperature,Date,Time
1291,17.947053,2006-07-09,23:10:00
1292,17.774485,2006-07-09,23:20:00
1293,17.774485,2006-07-09,23:30:00
1294,17.601917,2006-07-09,23:40:00
1295,17.601917,2006-07-09,23:50:00
