# The Inverse Problem 
## a problem set concerning modelling cable temperature with propagation time

### Introduction

The energy transition forces Alliander to maximize its use of the current power grid. In order to do that insight into cable temperatures is critical, as cable temperature determines the cable carrying capacity. In this problem set we investigate a new method to measure cable temperatures in the field using so-called propagation time.

**Factors contributing to cable heating**
Many factors play a role in determining the temperature of a cable in the field: the current (amperes) the cable is carrying, the soil background temperature (degrees Celsius), the cable type, the cable depth and the thermal properties of the soil surrounding the cable. In practice all these parameters are typically known except the thermal properties of the soil, which requires analysis of field samples to determine. In this problem set the thermal properties of the soil will be treated as unknowns, though you should be aware their influence on cable temperatures is significant.

**Thermal properties of soil**
The way heat dissipates through soil is controlled by two parameters: the thermal resistivity, denoted here with a G, and the thermal capacitity, denoted with a c. The thermal resistivity with units Km/W ranges typically from 0.5 to 1.5, the higher the thermal resistivity the more difficult for heat to dissipate. A higher G-value thus results in a hotter cable, keeping all other variables fixed. The thermal capacity (ranging from 1.5e6 to 2.5e6) is only of influence on dynamic heating of cables, where a low c-value will result in higher peaks in a dynamic temperature profile. 

**Steady state conditions**
In steady state conditions, meaning a stable state of the cable after a very long period of heating up with constant current and soil conditions, the G-value is still important, but the c-value has no influence on **steady state temperatures**. Moreover, in steady state the relation between cable temperature and load is quadratic i.e.
$$ T_{cable} \quad \approx \quad \alpha * I^2$$
where the parameter $\alpha$ depends on the cable type and thermal properties of the soil.


**Relation between cable temperature and propagation time**

The propagation time, denoted with a $p$, is a measured value from a sensor connected to the cable. The sensor measures how long it takes for a signal to travel from one end of the cable to the other end. The relation between cable temperature and this measured value propagation time is approximately linear. For the purposes of this problem set we will assume the relation to be exactly linear i.e. there exist parameters $a_1, a_0$ such that
$$ a_1 * p + a_0 = T_{cable}$$
for any simultaneous measurement of $p$ and $T_{cable}$.

### The Inverse Problem

The inverse problem can be summarised as follows: 

*"Given a sufficiently rich dataset of measurements, can the relation between temperature and propagation time be deduced despite certain unknown variables?"*

The answer will of course depend on the meanings of 'sufficiently rich dataset', 'deducing the relation' and 'certain unknown variables'. In the problem set below we will provide you with more and more complex problems and ask you to find the relation between propagation time and temperature. We will specify explicitly what the metric of error used is (this is what you should optimize for). The known parameters are always current values, soil temperatures and propagation time measurements. Unknowns are the thermal properties of the soil as well as the cable temperatures.


### Example calculation data you are free to use (and are also free to not use)

You can use the data below to understand the relation between soil properties, load en cable temperature.

The steady state temperature calculation requires specifying the current load (I), the thermal resistivity (G) and thermal capacity (c) of the soil and the background soil temperature (T_ground). The T_cable value is the cable conductor temperature in degrees Celsius. 

The dynamic calculation requires aligned timeseries for the current and the soil temperature. The column names specify the soil conditions for which the calculation was made. 

In [4]:
import os
import pandas as pd
import numpy as np
from pathlib import Path

In [7]:
datapath = Path(os.getcwd()).parent / "data" / "inverse_problem" # path to data direcotry
steady_states = pd.read_csv(datapath / "steady_states.csv")
steady_states

Unnamed: 0,T_ground,I,G,T_cable
0,0.0,0.0,0.5,0.000000
1,0.0,60.0,0.5,0.874377
2,0.0,120.0,0.5,3.536116
3,0.0,180.0,0.5,8.105471
4,0.0,240.0,0.5,14.798519
...,...,...,...,...
145,20.0,60.0,1.5,21.794267
146,20.0,120.0,1.5,27.328914
147,20.0,180.0,1.5,37.093218
148,20.0,240.0,1.5,52.027180


In [8]:
# this dataset contains the computed cable temperatus at different soil parameters G,c for the specified current data en ground temperatures
dynamic_profiles = pd.read_csv(datapath/"dynamic_profiles.csv",index_col=0,parse_dates=True)
dynamic_profiles

Unnamed: 0,"0.5,1500000.0","0.5,1750000.0","0.5,2000000.0","0.5,2250000.0","0.5,2500000.0","0.75,1500000.0","0.75,1750000.0","0.75,2000000.0","0.75,2250000.0","0.75,2500000.0",...,"1.25,2000000.0","1.25,2250000.0","1.25,2500000.0","1.5,1500000.0","1.5,1750000.0","1.5,2000000.0","1.5,2250000.0","1.5,2500000.0",I,T_ground
2020-01-10 00:00:00,7.270000,7.270000,7.270000,7.270000,7.270000,7.270000,7.270000,7.270000,7.270000,7.270000,...,7.270000,7.270000,7.270000,7.270000,7.270000,7.270000,7.270000,7.270000,105.0,7.27
2020-01-10 01:00:00,8.452906,8.448909,8.445497,8.442530,8.439912,8.481690,8.476812,8.472645,8.469020,8.465821,...,8.513013,8.508594,8.504691,8.540610,8.534351,8.528993,8.524326,8.520202,105.0,7.29
2020-01-10 02:00:00,8.939579,8.930642,8.923008,8.916365,8.910499,9.008195,8.996931,8.987308,8.978934,8.971540,...,9.085064,9.074404,9.064989,9.152901,9.137552,9.124418,9.112975,9.102863,105.0,7.31
2020-01-10 03:00:00,9.182343,9.169320,9.158188,9.148494,9.139928,9.288878,9.272006,9.257590,9.245041,9.233957,...,9.411807,9.395235,9.380598,9.519777,9.495565,9.474855,9.456816,9.440878,105.0,7.33
2020-01-10 04:00:00,9.324518,9.308553,9.294899,9.283000,9.272479,9.463054,9.441893,9.423807,9.408059,9.394146,...,9.627415,9.605967,9.587023,9.770666,9.738934,9.711805,9.688181,9.667313,105.0,7.35
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-02-12 19:00:00,10.062597,10.058580,10.053563,10.048165,10.042663,10.955840,10.942180,10.928883,10.916058,10.903701,...,12.661847,12.629010,12.597800,13.620668,13.568788,13.520433,13.474960,13.431915,105.0,6.76
2020-02-12 20:00:00,10.042984,10.039446,10.034790,10.029672,10.024399,10.926840,10.913751,10.900916,10.888490,10.876489,...,12.615066,12.583124,12.552726,13.561558,13.511226,13.464227,13.419963,13.378010,105.0,6.76
2020-02-12 21:00:00,10.014640,10.011563,10.007253,10.002404,9.997348,10.889770,10.877218,10.864814,10.852756,10.841082,...,12.561382,12.530260,12.500606,13.496401,13.447484,13.401726,13.358569,13.317616,105.0,6.75
2020-02-12 22:00:00,9.997380,9.994745,9.990769,9.986178,9.981330,10.864374,10.852332,10.840333,10.828616,10.817245,...,12.520435,12.490064,12.461093,13.444749,13.397130,13.352510,13.310371,13.270339,105.0,6.75


## Problem 1: deducing a linear relation with steady state measurements
In this problem you are given measurements of the current, soil temperature and measure propagation time for various steady state situations. The thermal parameters of the soil are unknowns, as is the linear relation between propagation time and cable temperature. Create a model that estimates cable temperature for the training data set. 

Note: you can use the example data set of steady state temperatures to deduce the relation between I, T the soil parameters and cable temperature. Please do not perform a grid search using the validate answer function, that function should only be used once at the end to verify your model.

In [11]:
def get_train_data():
    return pd.read_csv(datapath / "problem1.csv")[['I','T_ground','p']]

def validate_answer(answer):
    true = pd.read_csv(datapath / "problem1.csv")['T_cable']
    return np.sqrt(np.mean((true-answer)**2))

df = get_train_data()
df

Unnamed: 0,I,T_ground,p
0,176,-1,163.64243
1,127,19,161.119746
2,226,11,160.542667
3,174,1,163.345679
4,168,23,159.803807
5,150,21,160.449228
6,132,13,162.03186
7,36,20,161.733466
8,119,19,161.224095
9,60,17,162.096793


In [None]:
# please try to beat the following model: this model always sets the cable temperature at 18 degrees regardless of propagation time
import numpy as np
validate_answer(np.repeat(18,30))

9.577973446078564

## Problem 2: deducing a linear relation with dynamic measurements 

In this problem you are given measurement timeseries of the current, soil temperature and propagation time for a given cable (you can assume constant thermal parameters for the soil). The thermal parameters of the soil are unknowns, as is the linear relation between propagation time and cable temperature. Create a model that estimates cable temperature for the training data set. 

Note: you can use the example data set of dynamic temperatures profiles to deduce the relation between I, T the soil parameters and cable temperature. Please do not perform a grid search using the validate answer function, that function should only be used once at the end to verify your model.

In [15]:
def get_train_data():
    return pd.read_csv(datapath / "problem2.csv",index_col=0,parse_dates=True)[['I','T_ground','p']]

def validate_answer(answer):
    true = pd.read_csv(datapath / "problem2.csv")['T_cable']
    return np.sqrt(np.mean((true-answer)**2))

df = get_train_data()
df

Unnamed: 0,I,T_ground,p
2020-01-10 00:00:00,105.0,7.27,133.327900
2020-01-10 01:00:00,105.0,7.29,133.042732
2020-01-10 02:00:00,105.0,7.31,132.912256
2020-01-10 03:00:00,105.0,7.33,132.838285
2020-01-10 04:00:00,105.0,7.35,132.789833
...,...,...,...
2020-02-12 19:00:00,105.0,6.76,132.161642
2020-02-12 20:00:00,105.0,6.76,132.171692
2020-02-12 21:00:00,105.0,6.75,132.183372
2020-02-12 22:00:00,105.0,6.75,132.192162


In [16]:
# please try to beat the following model: this model always sets the cable temperature at 18 degrees regardless of propagation time
validate_answer(np.repeat(13,816))

5.042045551478225

## Final challenge: deducing a linear relation with dynamic noisy measurements 
This challenge is almost the same as problem 2, however the soil parameters have changed, the linear relation between propagation time and temperature has changed and now the measurements of propagation time are noisy. Can you still train an accurate model?

Note: you can use the example data set of steady state temperatures to deduce the relation between I, T the soil parameters and cable temperature. Please do not perform a grid search using the validate answer function, that function should only be used once at the end to verify your model.

In [17]:
def get_train_data():
    return pd.read_csv(datapath / "problem3.csv",index_col=0,parse_dates=True)[['I','T_ground','p']]

def validate_answer(answer):
    true = pd.read_csv(datapath / "problem3.csv")['T_cable']
    return np.sqrt(np.mean((true-answer)**2))

df = get_train_data()
df

Unnamed: 0,I,T_ground,p
2020-01-10 00:00:00,105.0,7.27,143.303790
2020-01-10 01:00:00,105.0,7.29,143.236537
2020-01-10 02:00:00,105.0,7.31,143.149646
2020-01-10 03:00:00,105.0,7.33,143.120794
2020-01-10 04:00:00,105.0,7.35,143.102482
...,...,...,...
2020-02-12 19:00:00,105.0,6.76,142.332563
2020-02-12 20:00:00,105.0,6.76,142.302252
2020-02-12 21:00:00,105.0,6.75,142.203459
2020-02-12 22:00:00,105.0,6.75,142.033545


In [18]:
# please try to beat the following model: this model always sets the cable temperature at 18 degrees regardless of propagation time
validate_answer(np.repeat(12,816))

5.323246331230754