```yaml
titan: v1
service:
  image: scipy
  machine:
    cpu: 2
    memory: 1536MB
```

In [1]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split, cross_validate
import json

In [3]:
# Reading the dataset from a Gitlab repo
url = "https://storage.googleapis.com/tutorial-datasets/weather_data_GER_2016.csv"
weather = pd.read_csv(url)

In [4]:
weather.head()

Unnamed: 0,timestamp,cumulated hours,lat,lon,v1,v2,v_50m,h1,h2,z0,SWTDN,SWGDN,T,rho,p
0,2016-01-01T00:00:00Z,0,47.5,5.625,0.81,1.88,3.36,2,10,0.052526,0.0,0.0,277.350159,1.236413,99282.710938
1,2016-01-01T01:00:00Z,1,47.5,5.625,0.77,1.61,2.63,2,10,0.05251,0.0,0.0,277.025665,1.23939,99300.164062
2,2016-01-01T02:00:00Z,2,47.5,5.625,0.66,1.22,1.89,2,10,0.052495,0.0,0.0,277.223755,1.243861,99310.992188
3,2016-01-01T03:00:00Z,3,47.5,5.625,0.96,1.35,1.62,2,10,0.05248,0.0,0.0,277.13324,1.24739,99314.773438
4,2016-01-01T04:00:00Z,4,47.5,5.625,1.14,1.56,1.83,2,10,0.05248,0.0,0.0,276.867767,1.248869,99324.796875


In [5]:
weather.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2248704 entries, 0 to 2248703
Data columns (total 15 columns):
 #   Column           Dtype  
---  ------           -----  
 0   timestamp        object 
 1   cumulated hours  int64  
 2   lat              float64
 3   lon              float64
 4   v1               float64
 5   v2               float64
 6   v_50m            float64
 7   h1               int64  
 8   h2               int64  
 9   z0               float64
 10  SWTDN            float64
 11  SWGDN            float64
 12  T                float64
 13  rho              float64
 14  p                float64
dtypes: float64(11), int64(3), object(1)
memory usage: 257.3+ MB


In [6]:
weather.keys()

Index(['timestamp', 'cumulated hours', 'lat', 'lon', 'v1', 'v2', 'v_50m', 'h1',
       'h2', 'z0', 'SWTDN', 'SWGDN', 'T', 'rho', 'p'],
      dtype='object')

Next, we read the weather data for Germany in 2016 by reading the full csv file.

The data in the file contains the following:

* wind
  * v1: velocity [m/s] @ height h1 (2 meters above displacement height)
  * v2: velocity [m/s] @ height h2 (10 meters above displacement height)
  * v_50m: velocity [m/s] @ 50 meters above ground
  * h1: height above ground [m] (h1 = displacement height +2m)
  * h2: height above ground [m] (h2 = displacement height +10m)
  * z0: roughness length [m]
* solar parameters:
  * SWTDN: total top-of-the-atmosphere horizontal radiation [W/m²]
  * SWGDN: total ground horizontal radiation [W/m²]
* temperature data
  * T: Temperature [K] @ 2 meters above displacement height (see h1)
* air data
  * Rho: air density [kg/m³] @ surface
  *p: air pressure [Pa] @ surface

In [7]:
# Reading the dataset from a Gitlab repo
url = "https://storage.googleapis.com/tutorial-datasets/time_series_60min_singleindex_filtered.csv"
production = pd.read_csv(url)


In [8]:
production.head()


Unnamed: 0,utc_timestamp,cet_cest_timestamp,DE_wind_generation_actual
0,2015-12-31T23:00:00Z,2016-01-01T00:00:00+0100,8638
1,2016-01-01T00:00:00Z,2016-01-01T01:00:00+0100,8579
2,2016-01-01T01:00:00Z,2016-01-01T02:00:00+0100,8542
3,2016-01-01T02:00:00Z,2016-01-01T03:00:00+0100,8443
4,2016-01-01T03:00:00Z,2016-01-01T04:00:00+0100,8295


In [9]:
# Merge datasets
weather_by_day = weather.groupby(weather.index).mean()
combined = pd.merge(production, weather_by_day, how='left', left_index=True, right_index=True)



In [10]:
combined.head()

Unnamed: 0,utc_timestamp,cet_cest_timestamp,DE_wind_generation_actual,cumulated hours,lat,lon,v1,v2,v_50m,h1,h2,z0,SWTDN,SWGDN,T,rho,p
0,2015-12-31T23:00:00Z,2016-01-01T00:00:00+0100,8638,0,47.5,5.625,0.81,1.88,3.36,2,10,0.052526,0.0,0.0,277.350159,1.236413,99282.710938
1,2016-01-01T00:00:00Z,2016-01-01T01:00:00+0100,8579,1,47.5,5.625,0.77,1.61,2.63,2,10,0.05251,0.0,0.0,277.025665,1.23939,99300.164062
2,2016-01-01T01:00:00Z,2016-01-01T02:00:00+0100,8542,2,47.5,5.625,0.66,1.22,1.89,2,10,0.052495,0.0,0.0,277.223755,1.243861,99310.992188
3,2016-01-01T02:00:00Z,2016-01-01T03:00:00+0100,8443,3,47.5,5.625,0.96,1.35,1.62,2,10,0.05248,0.0,0.0,277.13324,1.24739,99314.773438
4,2016-01-01T03:00:00Z,2016-01-01T04:00:00+0100,8295,4,47.5,5.625,1.14,1.56,1.83,2,10,0.05248,0.0,0.0,276.867767,1.248869,99324.796875


In [21]:
X_wind = combined[['v1', 'v2', 'v_50m', 'z0']]

y_wind = combined['DE_wind_generation_actual']

regr = RandomForestRegressor(max_depth=2, random_state=0)
a_model = regr.fit(X_wind, y_wind)

In [22]:
regr.predict(X_wind)

array([ 8120.25051961,  8120.25051961,  8120.25051961, ...,
       11539.41293338, 11539.41293338, 11539.41293338])

In [23]:
# Now we can see the coefficients of our model
print(f'alpha = {a_model.estimators_}')
print(f'betas = {a_model.n_features_}')

alpha = [DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=209652396), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=398764591), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=924231285), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1478610112), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=441365315), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1537364731), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=192771779), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1491434855), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1819583497), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=530702035), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=626610453), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1650906866), DecisionTreeRe

In [24]:
# GET /alphas
print(f'alpha = {a_model.estimators_}')

alpha = [DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=209652396), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=398764591), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=924231285), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1478610112), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=441365315), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1537364731), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=192771779), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1491434855), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1819583497), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=530702035), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=626610453), DecisionTreeRegressor(max_depth=2, max_features='auto', random_state=1650906866), DecisionTreeRe

In [25]:
# GET /betas
print(f'betas = {a_model.n_features_}')

betas = 4


In [26]:
# Mock request object for local API testing
headers = {
'content-type': 'application/json'
}
body = json.dumps({
  "data": [[1.44, 1.77, 2, 0.054]]
})
REQUEST = json.dumps({ 'headers': headers, 'body': body })

In [27]:
# POST /prediction
body = json.loads(REQUEST)['body']
# predict the cluster for new samples. Function to be exposed through Titan
input_params = json.loads(body)['data']
#input_params = [[0.44, 1.77, 2, 0.054]]
print(a_model.predict(input_params))

[8120.25051961]
