# Forecasting for July

Technical Challenge for Data Science Candidates

You want to forecast the “Outside Temperature” for the first 9 days of the next month.

Assume that:

  - The average temperature for each day of July is constant and equal to 25 degrees;

  - For the 1st of July, the pattern of the temperatures across the day with respect to the average temperature on that day is similar to the one found on 1st of June, for the 2nd of July is similar to the average on the 2nd of June, etc.
  
Produce a “.txt” file with your forecast for July (from 1st July to 9th July) with the sample values for each time for e.g. dd/mm/yyyy, Time, Outside Temperature.

## Implementation

The algorithm I'll use is to calculate the average temperature for each day of June.

For each day in June, calculate the offsets from that average. 

Then, take those offsets and apply those to the average temperature for July of 25 degrees.

Note that this isn't a particularly good predictor because it doesn't apply like for like. It's probably better to use a Normalization method.

Probably better is to calculate the residuals for each day in June and add those. So (X - mu) / std. where std is the sample standard deviation. Gives the absolute residuals.

But better still is to calculate the percentage residuals from the mean on each day.

But this is actually the same as scaling up the June readings for any one day, by the ratio of the expected July number, 22 degrees, by the average of the June day. So if the average in June is 16, then multiply up the June numbers by 22/16 = 1.375.

In [2]:
import numpy as np
import pandas as pd

import matplotlib
from cycler import cycler
import matplotlib.pyplot as plt

pd.__version__

'0.24.2'

In [3]:
# If you turn this feature on, you can display each result as it happens.
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Residuals

Absolute residuals

In [6]:
# Like a day in June.
mu, sigma = 16, 0.4 # mean and standard deviation
s0=np.random.normal(mu, sigma, 24)
s0=np.sort(s0)
s0

array([15.23950968, 15.25854225, 15.41795319, 15.49131178, 15.67496635,
       15.73867892, 15.86573684, 15.87682797, 15.92005993, 15.92392121,
       15.97383664, 15.98386534, 16.08186518, 16.0887025 , 16.12626189,
       16.14214232, 16.18745778, 16.20060806, 16.22515699, 16.23949539,
       16.31556415, 16.36149431, 16.44654032, 16.70269323])

In [7]:
# Sample statistics
mu0 = np.mean(s0)
std0 = np.std(s0, ddof=1)
(mu0, std0, mu0/std0)
r0 = (s0 - mu0)/std0
r0

(15.978466343345284, 0.3666751023025071, 43.57663294565073)

array([-2.01529   , -1.96338418, -1.52863707, -1.32857278, -0.82770821,
       -0.65395066, -0.30743703, -0.27718918, -0.15928654, -0.14875603,
       -0.01262617,  0.01472421,  0.28199033,  0.30063714,  0.4030695 ,
        0.44637875,  0.56996354,  0.60582711,  0.6727772 ,  0.71188103,
        0.91933652,  1.04459769,  1.27653603,  1.97511879])

In [8]:
# Like a day in July
mu1 = 22
s1=np.ones(s0.size) * mu1
s1

array([22., 22., 22., 22., 22., 22., 22., 22., 22., 22., 22., 22., 22.,
       22., 22., 22., 22., 22., 22., 22., 22., 22., 22., 22.])

In [9]:
s1x = s1 + r0
s1x

array([19.98471   , 20.03661582, 20.47136293, 20.67142722, 21.17229179,
       21.34604934, 21.69256297, 21.72281082, 21.84071346, 21.85124397,
       21.98737383, 22.01472421, 22.28199033, 22.30063714, 22.4030695 ,
       22.44637875, 22.56996354, 22.60582711, 22.6727772 , 22.71188103,
       22.91933652, 23.04459769, 23.27653603, 23.97511879])

In [10]:
mu1 = np.mean(s1x)
std1 = np.std(s1x, ddof=1)
(mu1, std1, mu1/std1)
r1x = (s1x - mu1)/std1
r1x

(22.0, 1.0, 22.0)

array([-2.01529   , -1.96338418, -1.52863707, -1.32857278, -0.82770821,
       -0.65395066, -0.30743703, -0.27718918, -0.15928654, -0.14875603,
       -0.01262617,  0.01472421,  0.28199033,  0.30063714,  0.4030695 ,
        0.44637875,  0.56996354,  0.60582711,  0.6727772 ,  0.71188103,
        0.91933652,  1.04459769,  1.27653603,  1.97511879])

## Residuals

Ratio residuals

In [13]:
# Like a day in June.
s0

array([15.23950968, 15.25854225, 15.41795319, 15.49131178, 15.67496635,
       15.73867892, 15.86573684, 15.87682797, 15.92005993, 15.92392121,
       15.97383664, 15.98386534, 16.08186518, 16.0887025 , 16.12626189,
       16.14214232, 16.18745778, 16.20060806, 16.22515699, 16.23949539,
       16.31556415, 16.36149431, 16.44654032, 16.70269323])

In [14]:
# Sample statistics
mu00 = mu0
std00 = std0

# Calculate ratios
s01 = (s0 - mu0)/mu0
s01

array([-0.04624703, -0.04505589, -0.03507928, -0.03048819, -0.01899431,
       -0.01500691, -0.00705509, -0.00636096, -0.00365532, -0.00341367,
       -0.00028975,  0.00033789,  0.00647114,  0.00689904,  0.00924967,
        0.01024353,  0.01307957,  0.01390257,  0.01543894,  0.0163363 ,
        0.02109701,  0.02397151,  0.02929405,  0.04532518])

In [15]:
mu0 = np.mean(s01)
std0 = np.std(s01, ddof=1)
(mu0, std0, mu0/std0)
r0 = (s01 - mu0)/std0
r0

(6.071532165918825e-17, 0.022948078646810807, 2.6457692861195645e-15)

array([-2.01529   , -1.96338418, -1.52863707, -1.32857278, -0.82770821,
       -0.65395066, -0.30743703, -0.27718918, -0.15928654, -0.14875603,
       -0.01262617,  0.01472421,  0.28199033,  0.30063714,  0.4030695 ,
        0.44637875,  0.56996354,  0.60582711,  0.6727772 ,  0.71188103,
        0.91933652,  1.04459769,  1.27653603,  1.97511879])

In [19]:
s00 = (1 + ((r0 * std0) - mu0)) * mu00

In [20]:
(np.mean(s00), np.std(s00, ddof=1), np.mean(s00)/np.std(s00, ddof=1))

(15.978466343345282, 0.3666751023025066, 43.576632945650786)

In [22]:
# Like a day in July
mu01 = 22

In [23]:
s02 = (1 + ((r0 * std0) - mu0)) * mu01
s02

array([20.98256527, 21.00877032, 21.22825576, 21.32925976, 21.58212511,
       21.66984795, 21.84478804, 21.8600589 , 21.91958296, 21.92489937,
       21.99362558, 22.00743363, 22.142365  , 22.15177898, 22.20349275,
       22.22535776, 22.2877505 , 22.3058565 , 22.33965677, 22.35939864,
       22.46413415, 22.52737322, 22.64446908, 22.99715399])

In [24]:
mu02 = np.mean(s02)
std02 = np.std(s02, ddof=1)
(mu02, std02, mu02/std02)

(22.0, 0.5048577302298372, 43.576632945650786)

In [26]:
(mu00, std00, mu00/std00)

(15.978466343345284, 0.3666751023025071, 43.57663294565073)

In [27]:
s02

array([20.98256527, 21.00877032, 21.22825576, 21.32925976, 21.58212511,
       21.66984795, 21.84478804, 21.8600589 , 21.91958296, 21.92489937,
       21.99362558, 22.00743363, 22.142365  , 22.15177898, 22.20349275,
       22.22535776, 22.2877505 , 22.3058565 , 22.33965677, 22.35939864,
       22.46413415, 22.52737322, 22.64446908, 22.99715399])

In [28]:
s00

array([15.23950968, 15.25854225, 15.41795319, 15.49131178, 15.67496635,
       15.73867892, 15.86573684, 15.87682797, 15.92005993, 15.92392121,
       15.97383664, 15.98386534, 16.08186518, 16.0887025 , 16.12626189,
       16.14214232, 16.18745778, 16.20060806, 16.22515699, 16.23949539,
       16.31556415, 16.36149431, 16.44654032, 16.70269323])

In [29]:
s02 / s00

array([1.37685304, 1.37685304, 1.37685304, 1.37685304, 1.37685304,
       1.37685304, 1.37685304, 1.37685304, 1.37685304, 1.37685304,
       1.37685304, 1.37685304, 1.37685304, 1.37685304, 1.37685304,
       1.37685304, 1.37685304, 1.37685304, 1.37685304, 1.37685304,
       1.37685304, 1.37685304, 1.37685304, 1.37685304])

In [31]:
22/16

1.375

In [34]:
s0 * 22/np.mean(s0)

array([20.98256527, 21.00877032, 21.22825576, 21.32925976, 21.58212511,
       21.66984795, 21.84478804, 21.8600589 , 21.91958296, 21.92489937,
       21.99362558, 22.00743363, 22.142365  , 22.15177898, 22.20349275,
       22.22535776, 22.2877505 , 22.3058565 , 22.33965677, 22.35939864,
       22.46413415, 22.52737322, 22.64446908, 22.99715399])