# Mock API testing and test set exploration

This notebook aims at understanding the datapoints that are given as inputs to the model and the expected outputs (format and values).
You should see a graph describing how the different inputs are linked together and notes about useless features.

## Imports and Mock API set up

In [2]:
from local_mock_api import make_env

In [3]:
env = make_env()
iter_test = env.iter_test()

## Demo API usage

The following cell has been taken from the example notebook provided in the Kaggle description. It shows how the API provides the format of the input data we have to predict the target production.

In [4]:
# counter = 0
# for (test, revealed_targets, client, historical_weather,
#         forecast_weather, electricity_prices, gas_prices, sample_prediction) in iter_test:
#     if counter == 0:
#         print(test.head(3))
#         print(revealed_targets.head(3))
#         print(client.head(3))
#         print(historical_weather.head(3))
#         print(forecast_weather.head(3))
#         print(electricity_prices.head(3))
#         print(gas_prices.head(3))
#         print(sample_prediction.head(3))
#     sample_prediction['target'] = 0
#     env.predict(sample_prediction)
#     counter += 1

## Test data exploration

The API yields data points used as inputs for our model. This section aims at understanding all the inputs and their relationships.

In [5]:
for (test, revealed_targets, client, historical_weather,
    forecast_weather, electricity_prices, gas_prices, sample_prediction) in iter_test:
    test0 = test
    revealed_targets0 = revealed_targets
    client0 = client
    historical_weather0 = historical_weather
    forecast_weather0 = forecast_weather
    electricity_prices0 = electricity_prices
    gas_prices0 = gas_prices
    sample_prediction0 = sample_prediction
    sample_prediction['target'] = 0
    env.predict(sample_prediction)
    break

### test/train dataframes

The train and test dataframes are the central table to the inputs. Those dataframes list all the predictions to provide for the given datapoint which is basically predictions about `all clients for the prediction date`x`production and consumption`x`24 hours for the prediction day` which is around $66 \times 2 \times 24 = 3168$.
Test dataframe explanations:
- Location (using `county`) can be mapped using `county_id_to_name_map.json` _(maybe locate county to get latitude and longitude and associate to weather forecast?)_
- `county`x`is_business`x`product_type` identify a unique client at prediction time (there can be added or removed clients) in `client.csv`
- For each client, we predict both production and consumption, which are represented by 2 rows for each client one with `is_consumption` to 0, the other to 1
- `prediction_datetime` or `datetime` for train can help reference the related `gas_prices.forecast_date`, `electricity_prices.forecast_date`, `forecast_weather.forecast_datetime`
- `target`to predict (either consumption or production of each client), directly in train dataframe, in `revealed_targets` for test dataframe


In [45]:
test0

Unnamed: 0,county,is_business,product_type,is_consumption,prediction_datetime,row_id,prediction_unit_id,currently_scored
0,0,0,1,0,2023-05-28 00:00:00,2005872,0,False
1,0,0,1,1,2023-05-28 00:00:00,2005873,0,False
2,0,0,2,0,2023-05-28 00:00:00,2005874,1,False
3,0,0,2,1,2023-05-28 00:00:00,2005875,1,False
4,0,0,3,0,2023-05-28 00:00:00,2005876,2,False
...,...,...,...,...,...,...,...,...
3115,15,1,0,1,2023-05-28 23:00:00,2008987,64,False
3116,15,1,1,0,2023-05-28 23:00:00,2008988,59,False
3117,15,1,1,1,2023-05-28 23:00:00,2008989,59,False
3118,15,1,3,0,2023-05-28 23:00:00,2008990,60,False


### Client dataframe

The client dataframe list all the clients to predict at a specific time. It is linked to the test/train dataframe by the `county`x`is_business`x`product_type` columns. Additional information:
- `eic_count` can be used to estimate the maximum consumption of the client (predict relative to this value, valid when `is_consumption` is 1)
- `installed_capacity`can be used to estimate the maximum production of the client (predict relative to this value, valid when `is_consumption` is 0)
- `date`might be useless because only one date is provided at a time

In [48]:
client0

Unnamed: 0,product_type,county,eic_count,installed_capacity,is_business,date
0,1,0,507,4960.215,0,2023-05-26
1,2,0,11,34.000,0,2023-05-26
2,3,0,1516,15977.560,0,2023-05-26
3,0,0,25,1273.200,1,2023-05-26
4,1,0,98,2885.600,1,2023-05-26
...,...,...,...,...,...,...
61,1,15,51,415.600,0,2023-05-26
62,3,15,160,1885.750,0,2023-05-26
63,0,15,15,620.000,1,2023-05-26
64,1,15,20,624.500,1,2023-05-26


### Historical weather


In [12]:
historical_weather0

Unnamed: 0,datetime,temperature,dewpoint,rain,snowfall,surface_pressure,cloudcover_total,cloudcover_low,cloudcover_mid,cloudcover_high,windspeed_10m,winddirection_10m,shortwave_radiation,direct_solar_radiation,diffuse_radiation,latitude,longitude
0,2023-05-26 11:00:00,13.5,9.0,0.0,0.0,1018.5,30,31,3,0,6.305556,272,592.0,420.0,172.0,57.6,21.7
1,2023-05-26 11:00:00,13.4,8.9,0.2,0.0,1013.2,47,31,32,0,6.111111,268,612.0,446.0,166.0,57.6,22.2
2,2023-05-26 11:00:00,16.4,7.8,0.2,0.0,1017.7,60,21,69,0,6.138889,263,655.0,512.0,143.0,57.6,22.7
3,2023-05-26 11:00:00,-0.1,-2.0,0.0,0.0,1024.1,100,100,100,100,7.166667,167,0.0,0.0,0.0,57.6,23.2
4,2023-05-26 11:00:00,14.4,6.4,0.0,0.0,1017.8,29,8,37,0,5.000000,233,730.0,614.0,116.0,57.6,23.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2683,2023-05-27 10:00:00,10.8,2.1,0.0,0.0,1023.7,26,12,25,0,6.194444,308,599.0,476.0,123.0,59.7,26.2
2684,2023-05-27 10:00:00,11.1,1.2,0.0,0.0,1023.5,28,13,27,0,6.250000,309,580.0,470.0,110.0,59.7,26.7
2685,2023-05-27 10:00:00,9.6,4.4,0.0,0.0,1023.2,0,0,0,0,8.055556,300,603.0,508.0,95.0,59.7,27.2
2686,2023-05-27 10:00:00,10.8,2.5,0.0,0.0,1022.8,24,15,17,0,6.611111,308,561.0,454.0,107.0,59.7,27.7


In [9]:
forecast_weather0

Unnamed: 0,latitude,longitude,origin_datetime,hours_ahead,temperature,dewpoint,cloudcover_high,cloudcover_low,cloudcover_mid,cloudcover_total,10_metre_u_wind_component,10_metre_v_wind_component,forecast_datetime,direct_solar_radiation,surface_solar_radiation_downwards,snowfall,total_precipitation
0,57.6,21.7,2023-05-27 02:00:00,1,9.859155,5.508813,0.000000,0.000000,0.026901,0.026901,3.616620,-1.281012,2023-05-27 03:00:00,0.000000,0.0,0.0,0.0
1,57.6,22.2,2023-05-27 02:00:00,1,5.916284,4.613428,0.000000,0.000000,0.000000,0.000000,2.164227,-0.245367,2023-05-27 03:00:00,0.000000,0.0,0.0,0.0
2,57.6,22.7,2023-05-27 02:00:00,1,9.111963,6.878442,0.000000,0.000000,0.000000,0.000000,3.809247,-1.583502,2023-05-27 03:00:00,0.000000,0.0,0.0,0.0
3,57.6,23.2,2023-05-27 02:00:00,1,10.746606,5.006372,0.000000,0.000000,0.000000,0.000000,4.106854,-5.625006,2023-05-27 03:00:00,0.000000,0.0,0.0,0.0
4,57.6,23.7,2023-05-27 02:00:00,1,10.791895,4.701074,0.000000,0.000000,0.000000,0.000000,4.188153,-7.184332,2023-05-27 03:00:00,0.000000,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5371,59.7,26.2,2023-05-27 02:00:00,48,10.847314,8.898950,0.028778,0.001839,0.022369,0.050690,4.215645,2.895729,2023-05-29 02:00:00,0.146667,0.0,0.0,0.0
5372,59.7,26.7,2023-05-27 02:00:00,48,11.198389,9.006738,0.040070,0.000000,0.000183,0.040253,4.372139,3.321266,2023-05-29 02:00:00,0.573333,0.0,0.0,0.0
5373,59.7,27.2,2023-05-27 02:00:00,48,11.429346,8.950708,0.000000,0.000000,0.015320,0.015320,3.170479,3.839821,2023-05-29 02:00:00,0.146667,0.0,0.0,0.0
5374,59.7,27.7,2023-05-27 02:00:00,48,11.556787,9.706201,0.000000,0.004807,0.096924,0.098389,3.417305,3.224586,2023-05-29 02:00:00,0.004444,0.0,0.0,0.0


In [10]:
electricity_prices0

Unnamed: 0,forecast_date,euros_per_mwh,origin_date
0,2023-05-27 00:00:00,87.54,2023-05-26 00:00:00
1,2023-05-27 01:00:00,82.69,2023-05-26 01:00:00
2,2023-05-27 02:00:00,82.7,2023-05-26 02:00:00
3,2023-05-27 03:00:00,84.26,2023-05-26 03:00:00
4,2023-05-27 04:00:00,87.67,2023-05-26 04:00:00
5,2023-05-27 05:00:00,85.77,2023-05-26 05:00:00
6,2023-05-27 06:00:00,82.7,2023-05-26 06:00:00
7,2023-05-27 07:00:00,82.99,2023-05-26 07:00:00
8,2023-05-27 08:00:00,72.61,2023-05-26 08:00:00
9,2023-05-27 09:00:00,53.74,2023-05-26 09:00:00


In [11]:
gas_prices0

Unnamed: 0,forecast_date,lowest_price_per_mwh,highest_price_per_mwh,origin_date
0,2023-05-27,28.3,34.1,2023-05-26
