Once [my sister](https://shminke.wordpress.com) asked me to do some data interpolation.
She had [a table](http://www.seeinglight.com/reciprocity.shtml#brucetmax) and wanted to know values for Required Exposure given some values of Metered Exposure missing in a table, namely for 3, 4, 6, 7, 8, and 9 seconds.

In [1]:
# first I retyped data to a CSV format
!cat data.csv

2, 2.5
5, 7
10, 15
15, 24
20, 33
30, 50
60,	120
120,	270
240,	600
600,	1680
1200,	3900
1800,	6600

In [2]:
# then I used standard Python way to process and model them
import pandas as pd

data = pd.read_csv("data.csv", header=-1)
# t_m - time metered (metered exposure)
# T_e - time of exposure (required exposure)
data.columns = ["t_m", "T_e"]
data

Unnamed: 0,t_m,T_e
0,2,2.5
1,5,7.0
2,10,15.0
3,15,24.0
4,20,33.0
5,30,50.0
6,60,120.0
7,120,270.0
8,240,600.0
9,600,1680.0


The main thing of course was feature engineering. That was taken from [here](https://www.flickr.com/photos/janokelly/6804638225/)

$$T_e=t_m\left(a\log^2t_m+b\log t_m+c\right)$$

Although the formula is highly nonlinear, it's still can be rewritten as a simple linear regression:

$$y=ax_1+bx_2+c$$

where $y=\frac{T_e}{t_m}$, $x_1=\log t_m$, and $x_2=\log^2t_m$.

In [3]:
# here we just encode abovementioned transformations
import numpy as np

data["x_1"] = np.log(data.t_m)
data["x_2"] = data["x_1"] ** 2
data["y"] = data.T_e / data.t_m

In [4]:
# we use standard Lasso Regression to avoid overfitting
# which is highly probable given that we have only eleven samples in our train set
from sklearn.linear_model import LassoCV

# we use leave-one-out cross-validation strategy
# to reduce overfitting risk even more
lr = LassoCV(cv=len(data))
lr.fit(data[["x_1", "x_2"]], data["y"])
# the first coefficient is zero
# that's because Lasso has L_1 regularization included by default
# so Lasso can make the formula more simple and comprehendable
print(lr.coef_)
print(lr.intercept_)

[0.         0.04051689]
1.2723888002536607


One can make the formula prettier by rounding it's coefficients (not much will be lost in precision):

$$T_e=t_m\left(0.04\log^2t_m+1.27\right)$$

In [5]:
def get_t_e(lr: LassoCV, t_m: float) -> float:
    """ get T_e from t_m
    
    param lr: trained Lasso Regression
    param t_m: value of time metered
    return: required time of exposure
    """
    l = np.log(t_m)
    return t_m * ((
        np.round(lr.coef_[0], 2) +
        np.round(lr.coef_[1], 2) * l
    ) * l + np.round(lr.intercept_, 2))

In [6]:
# now we can find required exposures for some missing values
# they were tested by my sister and found coherent with the actual film used
np.round([get_t_e(lr, x) for x in [3, 4, 6, 7, 8, 9]])

array([ 4.,  5.,  8., 10., 12., 13.])

In [7]:
# here we can see that percentage error is quite small for different points,
# small and large alike
data.T_e / get_t_e(lr, data.t_m) - 1

0    -0.030420
1     0.019211
2     0.012094
3     0.023449
4     0.012906
5    -0.038124
6     0.030638
7     0.028899
8     0.011533
9    -0.036751
10   -0.009378
11    0.042459
dtype: float64

We can also look at some extreme points taken from [other tables](http://www.seeinglight.com/reciprocity.shtml#dontmax)
to see how this formula can extrapolate values.

In [8]:
get_t_e(lr, 3600) / 3600

3.9521851830716295

For one hour metered we receive about 4 hours required, and the table gives the value of 3.5 hours:(

Well, at least the interpolated values have worked:)