<h1><center>Data Collection and Cleaning</center></h1>

# Prediction Input Parameters
1. Poolreturntemp (Done)
1. OATemp (Done)
1. OAHum **
1. uv index (Done)
1. HourOfDay (Need to extract)
1. DayOfYear (Need to extract)
1. Flow (PumpPower) **


# Other Inputs
1. "Efficiency" (Done)

# Captured Trainable Output Parameters
1. Predicted Temp Rise (poolheatedtemp, done)
1. Predicted PoolReturnTemp (t+1h)
1. Predicted PoolReturnTemp (t+3h)
1. Predicted HeatingPowerDemand (Done)

# Historical Data
## Data Rows
1. t
1. t+5min
1. t+10min

## Inputs
1. OATemp
1. OAHum
1. CloudCover/SolarLoad/UV Index
1. PoolReturnTemp
1. HourOfDay
1. DayOfYear

## Outputs
1. HeatingPowerDemand
1. TempRise
1. PoolReturnTemp(t+1h)
1. PoolReturnTemp(t+3h)

# Preprocessors
1. Extractor - accumulates rows of data. Data only reports on changes, so maintains history per row
```
python3 ./extractor.py > hpdata.csv
```  
1. Runtime filter - removes data when system is not operational. This is important becuase we can't measure pool temp then. Note: Removes first 5 minutes of each day as the system is stabilizing then. (Note: Need to complete this. For now, using pump power levels, but this leads to some bad data.)
```
python3 ./filterrunonly.py  hpdata.csv  > hpdatafiltered.csv
```
1. Time slicer - Filters data to once every 5 minute buckets
1. Future Capture - For each 5 minute bucket, looks forward 3 hours to determine pool temp rise. Special processing required for end of day processing. For last 3 hours, take temp rise * hours difference /3. Might be able to use that calculation for everything and just always look forward 36 buckets. Only report if there is enough future data to show a change (drop the tail).
1. Convert for linux (if needed)
```
dos2unix hpdata.csv
```
1. Remove and NaN
```
# Remove nan entries
sed -i '/nan/d' ./hpdata.csv
````


In [47]:
import pandas as pd
import numpy as np
from datetime import datetime


# Make numpy values easier to read.
np.set_printoptions(precision=3, suppress=True)

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

In [48]:

pool_train = pd.read_csv(
    "hpdatabuckets.csv",
    names={"last_changed":datetime,
           "hppower":tf.float32,
           "pumppower":tf.float32,
           "poolheatedtemp":tf.float32,
           "pooltemp":tf.float32,
           "oa_temp":tf.float32,
           "oa_hum":tf.float32,
           "uv_index":tf.float32,
           "efficiency":tf.float32,
           "fwd_min":tf.float32,
           "fwd_delta_temp":tf.float32,
           "fwd_delta_per_hr":tf.float32},
        
            low_memory=False)

pool_train.head()

Unnamed: 0,last_changed,hppower,pumppower,poolheatedtemp,pooltemp,oa_temp,oa_hum,uv_index,efficiency,fwd_min,fwd_delta_temp,fwd_delta_per_hr
0,2020-06-22 16:30:11.285661-04:00,4742.0,835.0,94.0,90.0,90.4,59.0,5.0,8.435259384226065,174.813129,-1.0,-0.343224
1,2020-06-22 16:35:01.054978-04:00,0.0,293.0,90.0,90.0,90.4,59.0,5.0,inf,174.983538,-1.0,-0.342889
2,2020-06-22 16:40:00.076547-04:00,0.0,296.0,90.0,90.0,90.4,59.0,5.0,inf,175.032815,-1.0,-0.342793
3,2020-06-22 16:45:00.084615-04:00,0.0,297.0,90.0,90.0,90.4,59.0,5.0,inf,175.034219,-1.0,-0.34279
4,2020-06-22 16:50:01.165865-04:00,0.0,297.0,90.0,90.0,90.3,59.0,4.0,inf,174.981548,-1.0,-0.342893


In [67]:
pool_features = pool_train.copy()
pool_labels = pool_features.pop("fwd_delta_per_hr")

pool_extran1 = pool_features.pop("last_changed")
pool_pumpppower = pool_features.pop("pumppower")
pool_heatedtemp = pool_features.pop("poolheatedtemp")
pool_efficiency = pool_features.pop("efficiency")
pool_fwd_min = pool_features.pop("fwd_min")
pool_fwd_delta_temp = pool_features.pop("fwd_delta_temp")

pool_features_ary = np.array(pool_features)
pool_features_ary

array([[4742. ,   90. ,   90.4,   59. ,    5. ],
       [   0. ,   90. ,   90.4,   59. ,    5. ],
       [   0. ,   90. ,   90.4,   59. ,    5. ],
       ...,
       [   0. ,   51. ,   39.6,   73. ,    0. ],
       [   0. ,   51. ,   39.6,   73. ,    0. ],
       [   0. ,   51. ,   39.3,   73. ,    0. ]])

In [69]:
pool_model = tf.keras.Sequential([
  layers.Dense(64),
  layers.Dense(1)
])

pool_model.compile(loss = tf.losses.MeanSquaredError(),
                      optimizer = tf.optimizers.Adam())

In [70]:
pool_model.fit(pool_features_ary, pool_labels, epochs=10)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f3d1868d5f8>

In [71]:
normalize = preprocessing.Normalization()


In [72]:
normalize.adapt(pool_features_ary)


In [73]:
norm_pool_model = tf.keras.Sequential([
  normalize,
  layers.Dense(64),
  layers.Dense(1)
])

norm_pool_model.compile(loss = tf.losses.MeanSquaredError(),
                           optimizer = tf.optimizers.Adam())

norm_pool_model.fit(pool_features_ary, pool_labels, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f3d186a46a0>

In [74]:
dark_sky_temperture = np.array(pool_features['oa_temp'])
dark_sky_temperture[:100]

array([90.4, 90.4, 90.4, 90.4, 90.3, 90.2, 90.2, 90.2, 90.2, 90. , 89.9,
       89.8, 89.8, 89.6, 89.1, 89.1, 88.9, 88.8, 88.7, 88.6, 88.5, 88.5,
       88.2, 88. , 87.9, 87.7, 87.7, 87.5, 87.3, 87.3, 86.9, 86.7, 86.7,
       86.3, 86. , 85.8, 85.6, 85.3, 85.2, 85.2, 85.2, 85.2, 84.7, 84.7,
       84.1, 83.9, 83.7, 83.4, 82.8, 82.6, 82.7, 82.7, 82.9, 82.7, 82.5,
       82.3, 82.1, 81.9, 81.7, 81.5, 81.2, 81. , 81. , 80.8, 80.6, 80.6,
       80.2, 80. , 79.9, 79.9, 79.6, 79.5, 79.3, 79.3, 78.9, 78.8, 78.8,
       78.6, 78.5, 80.4, 80.5, 80.7, 80.7, 81. , 81.1, 81.2, 81.1, 81.2,
       81.2, 81.4, 81.5, 81.6, 81.8, 81.9, 81.9, 81.9, 82.2, 82.3, 82.5,
       82.7])

In [75]:
pooltemp = np.array(pool_features['pooltemp'])
pooltemp[:10]

array([90., 90., 90., 90., 90., 90., 90., 90., 90., 90.])

In [96]:
hppower = 4500
pooltemp = 90
oa_temp = 50
oa_hum = 10
uv_index = 0

dp = np.array([[hppower, pooltemp,oa_temp,oa_hum,uv_index]])
print(dp)

fwd_delta_per_hr = pool_model.predict(dp)
print(fwd_delta_per_hr)


[[4500   90   50   10    0]]
[[0.933]]


In [58]:
pool_model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 64)                320       
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 65        
Total params: 385
Trainable params: 385
Non-trainable params: 0
_________________________________________________________________
