# Inferencing

This notebook is used for inferencing in parallel for many models, saved by the training notebook

At a high level, we use the last 48 hours of meter data (used for lags) and forecasted weather data to predict the load of each
meter using previously trained models.

The historical data required is in this schema (similar to the creation of the training data set):
```json
{
  "MeterNumber": "string",
  "ReadingTimestamp": "datetime",
  "KWHConsumption": "double",
  "Latitude", "double",
  "Longitude", "double",
  "Grouping": "string"
}
```

We then generate a dataset used for the models to use, per meter, that uses the historical values as lag inputs, and the historical weather

In [None]:

# Data manipulation
# ==============================================================================
import numpy as np
import pandas as pd

# PySpark 
# ==============================================================================
from pyspark.sql.functions import col, min,max,sum,avg, lit, date_trunc, row_number,lag, sum
from pyspark.sql.window import Window

import os
from datetime import datetime
from datetime import timedelta
from datetime import timezone
import requests

import cloudpickle



In [None]:
#params
lags = [1,2,3,4,22,23,24,46,47,48]

##Enter in Azure Maps Information#####
##don't do this in production, use key vault
########################################
azureMapsClientId = '07e7da45-ab6b-405b-95e5-a64a81f54132'
azureMapsKey = 'pSI8EYMxTkVlM8a1bgqiQ-g1cVVcjHcbUabawH21feo'
#########################################

In [None]:
# mount the data lake
### 
###  It is up to you on how to mount the data lake container - however, it must be mounted to /mnt/lf for this example
###

Mounted load-forecasting successfully
Out[6]: 'wasbs://load-forecasting@stgmadlssharedcc.blob.core.windows.net'

In [None]:
#load the historical data here for T - 48 hours
#this is used as a demo. 
latest_meter_reads_df = spark.read.parquet('/mnt/lf/inference/current_readings.parquet')

#read the latest timestamp
latest_timestamp = latest_meter_reads_df\
    .withColumn("ForecastTimestamp", date_trunc('hour',latest_meter_reads_df.ReadingTimestamp))\
    .select(col("ForecastTimestamp"))\
    .orderBy(col("ForecastTimestamp").desc())\
    .toPandas()['ForecastTimestamp'][0]
print(latest_timestamp)

lag_start_time = latest_timestamp + timedelta(hours = -128)
start_pred_time = latest_timestamp
end_pred_time = latest_timestamp + timedelta(hours = 24)

print("Lag start time for prediction: "+str(lag_start_time))
print("Start time for prediction: "+str(start_pred_time))
print("End time for prediction: "+str(end_pred_time))
lag_meter_reads_df = latest_meter_reads_df.where(col("ReadingTimestamp") > lag_start_time)

#convert to hourly
lag_meter_reads_df = lag_meter_reads_df.groupBy("Grouping",date_trunc('hour', lag_meter_reads_df.ReadingTimestamp).alias("ReadingTimestamp"))\
    .agg(\
         avg("Latitude").alias("Latitude"),\
         avg("Longitude").alias("Longitude"),\
         sum("KWHConsumption").alias("KWHConsumption"))

2022-10-28 23:00:00
Lag start time for prediction: 2022-10-23 15:00:00
Start time for prediction: 2022-10-28 23:00:00
End time for prediction: 2022-10-29 23:00:00


## Generate initial inferencing dataset. 

The inferencing dataset is the meter joined with the latest timestamp. 

A high level, we are going to do the following: 

1. Retrieve the latest time for the historical dataset.  This will serve as 'T'
2. Create a dataset that includes a range from T to T+24 (inclusive) for each model.  We will end up with something like:
'MeterNumber','ForecastTimestamp'
3. Generate the lags for our models

In [None]:
#set up lags
lag_window = Window.partitionBy("Grouping").orderBy(col("ReadingTimestamp"))
select_window = Window.partitionBy("Grouping").orderBy(col("ReadingTimestamp").desc())

inference_start_df = lag_meter_reads_df

# we choose the MAX lags because we need to have all the lags
# to push through when we are predicting. 
for lag_num in range(1, np.amax(lags) + 1):
    lag_column_name = "lag_"+str(lag_num)
    inference_start_df = inference_start_df.withColumn(lag_column_name, lag("KWHConsumption", lag_num).over(lag_window))
    
inference_start_df = inference_start_df.withColumn("row", row_number().over(select_window))\
    .where(col("row") == 1)\
    .drop("row")
display(inference_start_df)

Grouping,ReadingTimestamp,Latitude,Longitude,KWHConsumption,lag_1,lag_2,lag_3,lag_4,lag_5,lag_6,lag_7,lag_8,lag_9,lag_10,lag_11,lag_12,lag_13,lag_14,lag_15,lag_16,lag_17,lag_18,lag_19,lag_20,lag_21,lag_22,lag_23,lag_24,lag_25,lag_26,lag_27,lag_28,lag_29,lag_30,lag_31,lag_32,lag_33,lag_34,lag_35,lag_36,lag_37,lag_38,lag_39,lag_40,lag_41,lag_42,lag_43,lag_44,lag_45,lag_46,lag_47,lag_48
0,2022-10-28T23:00:00.000+0000,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604
1,2022-10-28T23:00:00.000+0000,51.0,-114.0,17.039999844506383,14.39000004157424,15.139999940991402,12.040000038221478,11.700000006705524,12.410000011324884,10.239999998360872,8.649999903514981,8.050000032410026,10.219999942928553,8.94999997317791,13.240000078454614,6.509999999776483,6.109999975189567,7.530000006780028,6.579999998211861,8.369999943301082,7.489999990910292,8.680000018328428,12.09999997727573,14.919999873265624,17.849999947473407,21.2300001103431,15.459999926388264,10.779999924823642,13.260000057518482,6.949999963864684,7.659999990835786,13.750000104308128,10.04999995790422,7.949999962002039,9.24999998509884,9.220000011846423,9.210000041872265,13.089999999850988,10.129999900236726,7.749999994412064,6.46000000834465,7.820000026375055,6.6200000159442425,10.229999946430326,9.289999963715672,7.03000003285706,8.439999965950847,16.429999953135848,20.16000002995133,17.890000076964498,15.959999907761812,11.320000007748604
2,2022-10-28T23:00:00.000+0000,51.0,-114.0,15.529999973252416,17.680000016465783,12.599999949336052,12.869999995455146,18.229999849572778,22.18999987654388,19.630000058561563,19.769999923184518,19.97999998368323,11.960000043734908,12.54999998025596,15.009999997913836,8.709999995306134,8.790000015869737,9.46000001952052,8.959999987855554,10.35999995470047,11.620000002905726,13.780000044032931,17.3600000962615,16.46999995224178,18.84999994933605,20.109999937936664,16.02999997884035,15.969999939203262,18.21999998949468,14.190000044181945,13.999999966472387,18.230000089854,15.220000080764294,15.230000156909227,17.54000007547438,12.089999984949827,12.890000043436885,17.289999993517995,14.420000087469816,11.049999997019768,11.38999997265637,10.650000043213367,13.089999916031957,12.160000029951334,11.98999996110797,14.000000001862643,19.38999993726611,22.80000008642673,21.85999983549118,22.30999972112477,21.289999971166253,22.420000042766333
3,2022-10-28T23:00:00.000+0000,51.0,-114.0,8.970000023022294,9.470000067725778,8.610000051558018,9.4100000243634,8.09000002220273,10.500000029802322,11.029999980702996,9.430000025779009,9.269999958574772,10.970000006258488,10.509999949485064,10.450000016018748,9.620000014081596,7.63999998010695,8.28999998420477,8.929999899119139,10.38000001758337,11.579999988898637,14.030000073835254,10.77999996393919,17.33000000938773,19.97999992966652,17.49999985843897,15.850000075995922,11.049999922513962,9.770000064745544,11.05000001564622,9.199999997392297,12.150000041350722,10.77000005170703,14.789999973028898,11.120000012218952,10.199999978765844,13.910000002011657,12.699999945238233,10.410000020638108,9.040000040084124,9.50999997369945,9.030000003054738,10.71000000834465,11.15000002272427,12.480000039562585,13.159999918192623,13.10000006854534,16.009999986737967,18.40999997779727,17.70000008121133,15.64999994635582,14.830000046640636
4,2022-10-28T23:00:00.000+0000,51.0,-114.0,18.359999967738982,15.130000058561563,13.75000000745058,14.38999985717237,16.999999960884452,14.929999995976686,16.600000077858567,14.559999955818055,14.8199999127537,13.089999958872797,13.38999997638166,14.83000005595386,12.009999990463257,12.629999993368983,10.859999991953371,11.900000050663948,12.440000029280782,13.11999992467463,15.49999994598329,17.880000103265047,16.240000009536743,21.219999922439456,18.149999963119622,24.300000060349703,16.939999831840396,18.35999995842576,16.809999957680702,22.65000008791685,21.610000018030405,12.430000023916364,19.26999997533858,13.390000030398369,11.620000053197144,13.090000009164214,11.499999973922968,13.30000003054738,10.809999980032444,12.779999863356352,9.259999960660934,9.520000003278255,8.930000038817525,15.570000056177378,16.279999984428287,16.880000069737434,20.24000001139939,18.13999993912876,19.62999989464879,19.349999951198697,15.17000011727214
5,2022-10-28T23:00:00.000+0000,51.0,-114.0,14.830000022426248,12.580000046640636,13.51000008545816,18.34000010602176,17.71999993175268,13.840000092983246,16.0499998498708,14.180000009015204,13.739999949932098,15.170000031590462,15.21000001206994,13.11999998986721,11.639999948441982,10.499999929219484,12.77999996766448,10.279999932274222,12.10999995097518,15.50999990478158,14.35999995097518,13.969999993219972,16.849999979138374,18.02999989129603,23.679999943822622,15.90000006556511,11.55000009201467,12.240000056102872,15.909999949857593,16.140000112354755,12.74000003747642,12.229999959468842,11.059999980032444,12.930000001564622,15.010000046342611,15.970000056549909,15.070000000298023,12.050000045448542,11.899999951943755,11.510000059381127,11.780000044032931,14.390000080689788,14.00000008009374,13.860000092536213,13.760000115260482,19.320000080391765,22.07999991066754,20.299999829381704,22.50999996252358,16.670000098645687,12.559999937191606
6,2022-10-28T23:00:00.000+0000,51.0,-114.0,23.84999999217689,20.370000021532174,17.629999924451113,15.750000117346644,22.850000070407987,17.319999936968088,16.010000051930547,16.84000001102686,16.169999899342656,17.73000008612871,19.939999900758263,12.17999996803701,12.849999990314243,11.209999993443487,10.86999997124076,12.129999982193112,10.260000012815,11.729999950155616,12.100000012665989,16.46999995596707,18.89000003412366,16.170000018551946,19.199999935925007,16.93999994173646,18.80000010319054,16.08999990299344,15.349999878555536,19.909999990835782,18.680000115185976,16.369999999180436,17.549999970942736,15.479999991133807,14.209999961778522,18.440000101923943,18.870000042021275,18.010000003501773,9.869999883696437,10.759999994188547,10.079999977722764,11.59000007994473,11.37999996356666,11.730000076815486,12.759999990463257,14.4800001103431,15.140000090003014,17.50000006891787,16.44999996200204,17.219999887049198,16.76999987848103
7,2022-10-28T23:00:00.000+0000,51.0,-114.0,12.00000006519258,14.5399999152869,14.99999999254942,15.70999987050891,21.420000091195107,17.650000168010592,15.670000018551946,17.070000106468797,12.719999950379131,12.92000000923872,11.069999974220991,12.659999944269655,17.339999889954925,17.889999823644757,16.5799998678267,15.989999894052744,17.300000067800283,16.77999985590577,19.86000019311905,20.96000012010336,17.209999982267618,15.950000002980232,21.539999989792705,15.319999953731894,18.359999960288405,20.78999986872077,21.789999969303608,18.720000000670552,14.359999993816018,18.61000000499189,13.259999914094806,13.520000018179417,12.31000006198883,14.829999890178442,14.23999996110797,11.439999960362911,10.290000019595029,9.6500000115484,8.389999963343143,8.749999923631549,9.369999984279277,10.440000034868715,11.60999997332692,17.11000006273389,19.72000006213784,19.599999852478504,20.139999946579337,19.07000006735325,16.219999888911843
8,2022-10-28T23:00:00.000+0000,51.0,-114.0,21.090000078082085,19.829999905079603,18.45000007748604,15.490000039339066,18.66999995522201,16.270000010728836,19.239999890327454,17.870000019669533,18.750000171363357,13.28999998793006,15.699999989941716,11.599999982863665,10.819999974220991,11.179999969899654,8.390000026673079,9.289999974891543,8.740000003948808,10.549999983981252,13.310000039637089,16.429999934509397,18.460000129416585,17.039999913424253,18.440000023692846,19.17000006698072,20.550000032410026,15.60999989323318,10.809999968856571,16.01000007428229,12.75999996997416,11.94000000320375,16.36000001989305,18.25000002980232,14.970000008121133,20.869999984279275,15.540000006556513,13.790000027045608,10.409999992698433,10.54999988526106,10.17999998293817,7.580000020563602,8.659999992698431,8.730000028386712,15.329999981448054,15.419999916106462,22.46999991312623,20.84000001847744,25.829999973997477,16.99999988824129,24.57999999448657
9,2022-10-28T23:00:00.000+0000,51.0,-114.0,12.479999965056775,9.979999989271164,9.2699998896569,10.840000042691829,15.230000173673034,21.989999901503325,20.08000001870096,17.079999949783087,13.449999995529652,16.159999951720238,10.390000054612756,11.639999948441982,9.759999964386225,8.969999955967069,7.15000001527369,8.399999991059303,8.360000032931566,9.069999989122152,14.299999963492157,17.899999978020787,15.779999993741512,19.19999992102385,18.09999993816018,9.669999986886978,12.51999991759658,12.359999956563115,11.790000054985285,10.599999953061342,12.600000074133275,14.929999953135848,12.529999991878867,13.66999993659556,19.71999995969236,19.62999990582466,16.609999926760793,12.639999967068434,9.589999992400408,8.240000035613775,7.610000064596534,9.2800000179559,11.799999974668026,10.099999969825149,9.51000002399087,13.819999849423766,18.770000014454126,23.29999994672835,23.439999770373102,17.330000035464764,14.120000023394825


## Determine Weather each Meter

And also determine the weather forecasts for T to T+24 at an hourly interval for those weather stations.

In [None]:
forecast_times_df = generate_dates(str(start_pred_time), str(end_pred_time))
forecast_times_df = forecast_times_df.withColumnRenamed("date_time_ref","ForecastTimeStamp")

weather_df = inference_start_df.select("Grouping","Latitude","Longitude").crossJoin(forecast_times_df)

#weather_df is used to store the forecast times at a given interval
display(weather_df)

Grouping,Latitude,Longitude,ForecastTimeStamp
0,51.0,-114.0,2022-10-29T01:00:00.000+0000
0,51.0,-114.0,2022-10-29T04:00:00.000+0000
0,51.0,-114.0,2022-10-29T07:00:00.000+0000
0,51.0,-114.0,2022-10-29T10:00:00.000+0000
0,51.0,-114.0,2022-10-29T13:00:00.000+0000
0,51.0,-114.0,2022-10-29T16:00:00.000+0000
0,51.0,-114.0,2022-10-29T23:00:00.000+0000
0,51.0,-114.0,2022-10-29T05:00:00.000+0000
0,51.0,-114.0,2022-10-29T00:00:00.000+0000
0,51.0,-114.0,2022-10-29T08:00:00.000+0000


In [None]:
import pygeohash as pgh

class ForecastData():
    def __init__(self):   # constructor function using self
        self.GeoHash = None
        self.ForecastTimeStamp = None 
        self.TemperatureC = None 
        self.DewPointC = None 
        self.RelativeHumidity = None 
        self.PrecipitationAmountmm = None 
        self.WindDirectionDegrees = None 
        self.WindSpeedKmh = None
        self.VisibilityKm = None 
        self.StationPressurekPa = None 
        self.Humidex = None 
        self.WindChillC = None 

def load_forecast(geohash, lat, lon):
    forecasts = []
    
    url = f'https://atlas.microsoft.com/weather/forecast/hourly/json?api-version=1.1&query={lat},{lon}&duration=24&subscription-key={azureMapsKey}'
    print(url)
    maps_response = requests.get(url)
    if maps_response.status_code == 200:
        print('forecast loaded successfully!')
        forecast_dict = maps_response.json()
        for forecast_dict in forecast_dict.get('forecasts'):
            forecastDat = ForecastData()
            forecastDat.ForecastTimeStamp = datetime.strptime(forecast_dict.get('date'),'%Y-%m-%dT%H:%M:%S%z')
            #convert to utc
            forecastDat.ForecastTimeStamp = datetime.fromtimestamp(forecastDat.ForecastTimeStamp.timestamp(), tz=timezone.utc)
            
            forecastDat.TemperatureC = forecast_dict.get('temperature').get('value')
            forecastDat.DewPointC = forecast_dict.get('dewPoint').get('value')
            forecastDat.RelativeHumidity = forecast_dict.get('relativeHumidity')
            if forecast_dict.get('totalLiquid') is not None:
                forecastDat.PrecipitationAmountmm = forecast_dict.get('totalLiquid').get('value')
            else:
                forecastDat.PrecipitationAmountmm = 0
                
            if forecast_dict.get('visibility') is not None:
                forecastDat.VisibilityKm = forecast_dict.get('visibility').get('value')
            
            if forecast_dict.get('wind') is not None:
                forecastDat.WindDirectionDegrees = forecast_dict.get('wind').get('direction').get('degrees')
                forecastDat.WindSpeedKmh = forecast_dict.get('wind').get('speed').get('value')
            else:
                forecastDat.WindDirectionDegrees = 0
                forecastDat.WindSpeedKmh = 0
            
            forecastDat.GeoHash = geohash
            forecasts.append(forecastDat)
    else:
        print(f'forecast load failed! Response status code returned: {maps_response.status_code}')
    
    forecasts_pdf = pd.DataFrame([vars(s) for s in forecasts])
    with open('/dbfs/mnt/lf/inference/_forecast_wx_data/'+geohash+'.parquet', mode='wb') as file:
        forecasts_pdf.to_parquet(file)

geohash_udf = udf(lambda lat,lon: pgh.encode(latitude = lat,longitude=lon,precision=7))

In [None]:
#load each forecast!

weather_geohash_pdf = weather_df.withColumn('GeoHash', geohash_udf('Latitude','Longitude')).select("GeoHash", "Latitude","Longitude").distinct().toPandas()

for i in range(len(weather_geohash_pdf)):
    row = weather_geohash_pdf.iloc[i]
    load_forecast(row.GeoHash,row.Latitude,row.Longitude)

In [None]:
forecast_data_df = spark.read.parquet('/mnt/lf/inference/_forecast_wx_data/*.parquet')
display(forecast_data_df)

GeoHash,ForecastTimeStamp,TemperatureC,DewPointC,RelativeHumidity,PrecipitationAmountmm,WindDirectionDegrees,WindSpeedKmh,VisibilityKm,StationPressurekPa,Humidex,WindChillC
c3nfju7,2022-11-29T19:00:00.000+0000,-15.6,-23.9,50,0.0,359.0,14.5,16.1,,,
c3nfju7,2022-11-29T20:00:00.000+0000,-15.0,-25.6,42,0.0,358.0,12.9,16.1,,,
c3nfju7,2022-11-29T21:00:00.000+0000,-15.0,-26.1,39,0.0,355.0,9.7,16.1,,,
c3nfju7,2022-11-29T22:00:00.000+0000,-15.6,-27.2,36,0.0,355.0,9.7,16.1,,,
c3nfju7,2022-11-29T23:00:00.000+0000,-16.1,-27.2,39,0.0,24.0,9.7,16.1,,,
c3nfju7,2022-11-30T00:00:00.000+0000,-17.2,-27.2,42,0.0,78.0,9.7,16.1,,,
c3nfju7,2022-11-30T01:00:00.000+0000,-17.8,-27.2,45,0.0,107.0,8.0,16.1,,,
c3nfju7,2022-11-30T02:00:00.000+0000,-18.3,-26.7,47,0.0,114.0,8.0,16.1,,,
c3nfju7,2022-11-30T03:00:00.000+0000,-18.3,-26.7,50,0.0,119.0,8.0,16.1,,,
c3nfju7,2022-11-30T04:00:00.000+0000,-18.9,-26.1,54,0.0,119.0,8.0,16.1,,,


## Build initial training dataset, with forecasting data.

In [None]:
#this is temporary, because we are assuming the time is *now*
inference_df = inference_start_df.drop("ReadingTimestamp").withColumn("GeoHash", geohash_udf('Latitude','Longitude'))

#usually we would do a JOIN with the weather_df at this point, but we do a  join with the forecast data
inference_df = inference_df.join(forecast_data_df, forecast_data_df.GeoHash == inference_df.GeoHash, 'inner')\
    .drop(inference_df.GeoHash)\
    .drop(forecast_data_df.GeoHash)\
    .orderBy("Grouping","ForecastTimestamp")

inference_df = inference_df.na.fill(value=999)

display(inference_df)

Grouping,Latitude,Longitude,KWHConsumption,lag_1,lag_2,lag_3,lag_4,lag_5,lag_6,lag_7,lag_8,lag_9,lag_10,lag_11,lag_12,lag_13,lag_14,lag_15,lag_16,lag_17,lag_18,lag_19,lag_20,lag_21,lag_22,lag_23,lag_24,lag_25,lag_26,lag_27,lag_28,lag_29,lag_30,lag_31,lag_32,lag_33,lag_34,lag_35,lag_36,lag_37,lag_38,lag_39,lag_40,lag_41,lag_42,lag_43,lag_44,lag_45,lag_46,lag_47,lag_48,ForecastTimeStamp,TemperatureC,DewPointC,RelativeHumidity,PrecipitationAmountmm,WindDirectionDegrees,WindSpeedKmh,VisibilityKm,StationPressurekPa,Humidex,WindChillC
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-29T19:00:00.000+0000,-15.6,-23.9,50,0.0,359.0,14.5,16.1,999,999,999
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-29T20:00:00.000+0000,-15.0,-25.6,42,0.0,358.0,12.9,16.1,999,999,999
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-29T21:00:00.000+0000,-15.0,-26.1,39,0.0,355.0,9.7,16.1,999,999,999
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-29T22:00:00.000+0000,-15.6,-27.2,36,0.0,355.0,9.7,16.1,999,999,999
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-29T23:00:00.000+0000,-16.1,-27.2,39,0.0,24.0,9.7,16.1,999,999,999
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-30T00:00:00.000+0000,-17.2,-27.2,42,0.0,78.0,9.7,16.1,999,999,999
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-30T01:00:00.000+0000,-17.8,-27.2,45,0.0,107.0,8.0,16.1,999,999,999
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-30T02:00:00.000+0000,-18.3,-26.7,47,0.0,114.0,8.0,16.1,999,999,999
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-30T03:00:00.000+0000,-18.3,-26.7,50,0.0,119.0,8.0,16.1,999,999,999
0,51.0,-114.0,17.670000104233623,14.609999869018791,10.719999950379131,12.729999978095291,13.689999943599105,15.540000041946769,15.210000030696392,13.439999919384718,12.680000042542815,12.690000044181945,14.69000005722046,14.620000014081596,10.64999993145466,11.00000001490116,10.190000001341105,10.620000014081596,10.019999964162707,10.640000013634562,11.159999979659917,16.290000109001994,16.100000044330955,16.26999994367361,23.85000003129244,19.909999947994947,15.659999933093786,20.61999991722405,12.679999990388753,13.279999976977706,11.089999973773956,11.38999998755753,14.33999994583428,16.189999906346202,12.370000008493662,14.739999970421197,13.609999937936664,15.189999889582396,10.839999996125698,11.119999960064888,11.90000006183982,13.649999940767884,12.879999991506338,13.33999994955957,13.979999953880906,15.910000059753656,17.750000005587935,19.559999836608768,22.880000174045563,17.94999993033707,17.319999950006604,2022-11-30T04:00:00.000+0000,-18.9,-26.1,54,0.0,119.0,8.0,16.1,999,999,999


## Do Inferencing

In [None]:
def predict_load_for_meter(inference_pdf):
    steps = 24
    target_coumn = 'KWHConsumption'
    
    start = datetime.now()
    #meter_num = inference_pdf["MeterNumber"][0]
    grouping = str(inference_pdf["Grouping"][0])
    
    model_name = 'MeterModel_'+grouping+'.pkl'
    model_dir = '/dbfs/mnt/lf/models/'+grouping
    model_save_path = model_dir + '/' + model_name
    
    print(f'loading model from {model_save_path}')
    
    if not os.path.exists(model_save_path):
        return pd.DataFrame({'group':grouping, 'valid': False, 'time':[None],'predicted_kwh':[None]})
    
    with open(model_save_path, 'rb') as f:
        model = cloudpickle.load(f)
    
    inference_pdf['ForecastTimeStamp'] = pd.to_datetime(inference_pdf['ForecastTimeStamp'])
    inference_pdf = inference_pdf.set_index('ForecastTimeStamp')
    inference_pdf = inference_pdf.sort_index()
    #inference_pdf = inference_pdf.drop(columns='MeterNumber')
    inference_pdf = inference_pdf.drop(columns='Grouping')
    #inference_pdf.drop(columns='Weather') # we don't care about this, future improvement maybe
    
    #generate the time params we are interested in
    inference_pdf['ReadingMonth'] = inference_pdf.index.month
    inference_pdf['ReadingWeekDay'] = inference_pdf.index.day_of_week + 1
    inference_pdf['ReadingHour'] = inference_pdf.index.hour + 1
    inference_pdf['ReadingDay'] = inference_pdf.index.day
    
    columns = ["TemperatureC",\
                "DewPointC",\
                "RelativeHumidity",\
                "PrecipitationAmountmm",\
                "WindDirectionDegrees",\
                "WindSpeedKmh",\
                "VisibilityKm",\
                "StationPressurekPa",\
                "Humidex",\
                "WindChillC",\
                "ReadingMonth",\
                "ReadingWeekDay",\
                "ReadingHour",\
                "ReadingDay"]
  

    for lag in lags:
        columns.append('lag_'+str(lag))
    
    prediction_vect = inference_pdf #our initialization vector. we will use this to make the predictions, and shuffle down the lags
    
    times = []
    preds = []
    
    print("predicting meter load: "+str(grouping))
    for step in range(1,steps):
        prediction_x = prediction_vect[columns].iloc[[step]]  #predict one time at a time
        prediction_result = model.predict(prediction_x)

        prediction_vect['lag_1'] = prediction_result[0]
        #shuffle
        i = np.amax(lags) - 1
        while i > 1:
            prediction_vect['lag_'+str(lag)] = prediction_vect['lag_'+str(lag - 1)]
            i -= 1
        
        times.append(prediction_vect.index[step])
        preds.append(prediction_result[0])
        
    print("completed prediction for meter model: "+str(grouping))
    
    end = datetime.now()
    
    delta = end - start

    # time difference in seconds
    print(f"Model prediction time {grouping}: {delta.total_seconds()} seconds")
        
    return pd.DataFrame({'group':grouping,'valid':True, 'time':times,'predicted_kwh':preds})

In [None]:
#parallel
inference_df = inference_df.repartition(40)

In [None]:
inference_result = inference_df\
    .groupBy("Grouping")\
    .applyInPandas(predict_load_for_meter, schema = "group string, valid boolean, time_utc timestamp, predicted_kwh float")

inference_result.write.mode("overwrite").parquet('/mnt/lf/inference/forecast_bygroup.parquet')
display(inference_result)

group,valid,time,predicted_kwh
0,True,2022-11-29T20:00:00.000+0000,15.841278
0,True,2022-11-29T21:00:00.000+0000,17.171062
0,True,2022-11-29T22:00:00.000+0000,17.994022
0,True,2022-11-29T23:00:00.000+0000,18.852146
0,True,2022-11-30T00:00:00.000+0000,19.428883
0,True,2022-11-30T01:00:00.000+0000,19.652073
0,True,2022-11-30T02:00:00.000+0000,19.628502
0,True,2022-11-30T03:00:00.000+0000,19.295357
0,True,2022-11-30T04:00:00.000+0000,19.01281
0,True,2022-11-30T05:00:00.000+0000,18.927084


In [None]:
#predict_load_for_meter(inference_df.where(col("MeterNumber") == '1095433').toPandas())

## Output Weather
We output the weather we used for forecasting for reporting purposes

In [None]:
forecast_data_df = inference_df.groupBy("ForecastTimeStamp").agg(\
                                                                 avg("TemperatureC").alias("AvgTemperatureC"),\
                                                                 avg("DewPointC").alias("AvgDewPointC"),\
                                                                 avg("RelativeHumidity").alias("AvgRelativeHumidity"),\
                                                                 avg("PrecipitationAmountmm").alias("AvgPrecipitationAmountmm"),\
                                                                 avg("WindDirectionDegrees").alias("AvgWindDirectionDegrees"),\
                                                                 avg("WindSpeedKmh").alias("AvgWindSpeedKmh"),\
                                                                 avg("VisibilityKm").alias("AvgVisibilityKm"))

forecast_data_df.write.mode("overwrite").parquet('/mnt/lf/inference/weather_forecast_data.parquet')