# Derivatives-Oriented Self-Supervision

The following notebook will accompany my research.  
In this research, I'd like to introduce a novel framework to recognize ODE from data.  
In further steps, I'll compare my method with the state-of-the-art deep learning model.

# 0. Imports 

In [1]:
import numpy as np
import scipy
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm_notebook as tqdm

sns.set(style='darkgrid')
%matplotlib ipympl

np.random.seed(42)

# 1. Data Generation 

In this section I'll generate a data tables for the experiments.  
I'll use the a simple *Harmonic Motion* ODE to get data:  
$$
\frac{d^2y}{dx^2} + \frac{k}{m}y = 0
$$
I'll solve this equation using the exact solution:  
For $$ 
y = c_1sin(c_2x)  
$$

We will hope to find the ODE from the data generated from the exact solution

In [2]:
def f(x, c1, c2):
    return c1*np.sin(c2*x)

In [3]:
x1 = sorted(np.random.random(1000) * 10)
c11 = 1
c21 = 1
y1 = [f(x, c11, c21) for x in x1]

Let's check that the ODE indeed solve the problem:

In [4]:
def derivative(x, y, i):
    """
    Assume x, y,  sorted by x
    return f'(x_i, y_i)
    """
    return 0.5 * (((y[i+1] - y[i]) / (x[i+1] - x[i])) + ((y[i] - y[i-1]) / (x[i] - x[i-1])))

In [5]:
def second_derivative(x, y, i):
    h1 = x[i] - x[i-1]
    h2 = x[i+1] - x[i]
    h3 = x[i+1] - x[i-1]
    
    e1 = y[i-1] / (h1*h3)
    e2 = y[i] / (h1*h2)
    e3 = y[i+1] / (h2*h3)
    
    return 2*e1 - 2*e2 + 2*e3

In [7]:
np.mean([x1[i] - x1[i-1] for i in range(1,len(x1))]) ** 2 

9.921788168510964e-05

In [8]:
y1_d2 = [second_derivative(x1, y1, i) for i in range(len(x1) - 2)]

In [9]:
[(y , d2y) for y, d2y in zip(y1[1:-1],y1_d2)]

[(0.05059422857879562, -0.21271887541708123),
 (0.05519311047958403, -0.05069725379325973),
 (0.06946531697719419, -0.05841901337191757),
 (0.09184091439303001, -0.07217277545305478),
 (0.10816448386298694, -0.08983200312638928),
 (0.10973683943827932, -0.103249918104666),
 (0.11329268077282306, -0.11039819488905778),
 (0.121245701908507, -0.11475942443530585),
 (0.1296532153951921, -0.12139926285499314),
 (0.13057068018634987, -0.1271573779922619),
 (0.14343841164985788, -0.13455616549526894),
 (0.14493438470414205, -0.13965001665928867),
 (0.15395145107921285, -0.14744264514911265),
 (0.16511862819330636, -0.1546722169430268),
 (0.18011350110310476, -0.16640193683474536),
 (0.18121154884593352, -0.1754848394448345),
 (0.1828718439737832, -0.18139905513089616),
 (0.19583157972319676, -0.1866414538542358),
 (0.19936706600580328, -0.19269390179306356),
 (0.2043943378244866, -0.19986529307971068),
 (0.2228071709415087, -0.20886461028783287),
 (0.23062442421214951, -0.21928565085545415),


In [10]:
np.asarray([y + d2y for y, d2y in zip(y1[1:-1],y1_d2)]).mean()

-0.0007374828793387645

And with different x:

In [89]:
x2 = sorted(np.random.random(1000) * 10)
c11 = 1
c21 = 1
y2 = [f(x, c11, c21) for x in x2]

In [12]:
y2_d2 = [second_derivative(x2, y2, i) for i in range(len(x1) - 2)]

In [13]:
np.asarray([y + d2y for y, d2y in zip(y2[:-2],y2_d2)]).mean()

-0.0001813606306867713

And with different $c_1$:  
Note that $c_2$ must be 1 because the coefficient of $y$ is 1.

In [76]:
x3 = sorted(np.random.random(1000) * 10)
c13 = 5
c23 = 1 # have to be the same as y's coeffitient
y3 = [f(x, c13, c23) for x in x3]
y3_d2 = [second_derivative(x3, y3, i) for i in range(len(x3) - 2)]
np.asarray([y + d2y for y, d2y in zip(y3[1:-1],y3_d2)]).mean()

-0.003643403208276407

And let's have another function for comparison:  
$$
y' + 5y = 0
$$  
The exact solution of this DE is:  
$$
y = e^{-5t+c_1}
$$

In [57]:
def g(x, c1):
    return np.exp(-5*x + c1)

In [84]:
x4 = sorted(np.random.random(1000) * 10)
c4 = 2
y4 = [g(x, c4) for x in x4]
y4_d2 = [second_derivative(x4, y4, i) for i in range(len(x4) - 2)]
np.asarray([y + d2y for y, d2y in zip(y4[1:-1],y4_d2)]).mean()

4.025551933913121

We can see the difference between the equation, although the simillarity within them.

In [96]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

subsut_size=1000

fig = plt.figure()
ax = fig.add_subplot(1,2,1)
ax.scatter(x1[:subsut_size],y1[:subsut_size])
ax = fig.add_subplot(1,2,2)
ax.scatter(x3[:subsut_size],y3[:subsut_size])
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

We've got 2 sets of points, each set represents different initial values.  
In order to determine if those sets follow the same D-th order ODE, we want to find linear combination such that:  
$$ \sum_{i=0}^D{\alpha_i y^{(i)}_j} : j \in [0,N] | y_j = y(x_j) $$  
Thus, we can use a regression model (_Linear-Regression_ for linear ODEs, or DNNs elsewhere) to find those $ \alpha $s.  
Then, we can use those weights to feed-forward the other set, and assume to get all the values to be 0 ($\pm \epsilon$).

In [98]:
def create_derivatives(x, y):
    d1 = [derivative(x, y, i) for i in range(1, len(x1) - 1)]
    d2 = [second_derivative(x, y, i) for i in range(len(x1) - 2)]
    return d1, d2

In [101]:
y1_d1, y1_d2 = create_derivatives(x1, y1)
y2_d1, y2_d2 = create_derivatives(x2, y2)
y3_d1, y3_d2 = create_derivatives(x3, y3)
y4_d1, y4_d2 = create_derivatives(x4, y4)

x_train1 = np.vstack([y1[1:-1], y1_d1, y1_d2]).T
x_train2 = np.vstack([y3[1:-1], y3_d1, y3_d2]).T
x_train = np.vstack([x_train1, x_train2])
y_train = np.zeros(x_train.shape[0])

x_val = np.vstack([y2[1:-1], y2_d1, y2_d2]).T
y_val = np.zeros(x_val.shape[0])

x_test = np.vstack([y4[1:-1], y4_d1, y4_d2]).T
y_test = np.zeros(x_test.shape[0])

# y_train1 = np.random.normal(scale=1e-5,size=x_train1.shape[0])

In [102]:
print(x_train.shape)
x_train1.shape, x_train2.shape

(1996, 3)


((998, 3), (998, 3))

While most of the self-supervised learning implementations over sequences problems assuming auto-regressive model:
$x_{t+1} =f(x_t,x_{t-1} ,…,x_{t-p} )$
to predict following parts of the data (words, pixels, graph edges etc. [5]), we'll use subsets of the data to form a differential equation that fits to the observation's behavior. 

In [103]:
from sklearn.linear_model import LinearRegression

def train_lr(x, y):
    lr = LinearRegression(fit_intercept=True).fit(x, y)
    y_pred = lr.predict(x)
    print(f'Score: {lr.score(x, y)}')
    print(f'coefficients: {lr.coef_}')
    return lr
    
train_lr(x_train, y_train)

Score: 1.0
coefficients: [0. 0. 0.]


LinearRegression()

Now, let's try DNN:

In [104]:
import torch
import torch.nn as nn

class RegressionNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RegressionNet, self).__init__()
        self.input_layer = nn.Linear(input_size, hidden_size)
        self.activation = nn.ReLU()
        self.hidden = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.activation(x)
        x = self.hidden(x)
        return x

In [121]:
import torch
import torch.nn as nn

class RegressionNet2(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RegressionNet2, self).__init__()
        self.input_layer = nn.Linear(input_size, output_size)
        
    def forward(self, x):
        x = self.input_layer(x)
        return x

In [122]:
input_size = x_train.shape[1]
hidden_size = 20
output_size = 1
model = RegressionNet2(input_size, hidden_size, output_size)

In [106]:
def to_torch(x_train, y_train):
    X = torch.from_numpy(x_train.astype(np.float32))
    y = torch.from_numpy(y_train.astype(np.float32))
    y = y.view(y.shape[0], 1)
    return X, y

In [128]:
def train_net(model, x_train, y_train):
    # hyperparameters
    learning_rate = 0.01
    n_iters = 1000

    loss = nn.SmoothL1Loss()
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
    
    X, y = to_torch(x_train, y_train)

    for epoch in range(n_iters):
        # predict = forward pass with our model
        y_predicted = model(X)

        # loss
        l = loss(y_predicted, y)

        # calculate gradients = backward pass
        l.backward()

        # update weights
        optimizer.step()

        # zero the gradients after updating
        optimizer.zero_grad()

        if (epoch+1) % 10 == 0:
    #         w, b = model.parameters()
            print('epoch ', epoch+1,' loss = ', l.item())
train_net(model, x_train, y_train)

epoch  10  loss =  0.002664015395566821
epoch  20  loss =  0.002214511390775442
epoch  30  loss =  0.0018413609359413385
epoch  40  loss =  0.0015313506592065096
epoch  50  loss =  0.001273742294870317
epoch  60  loss =  0.001059665228240192
epoch  70  loss =  0.0008817604393698275
epoch  80  loss =  0.0007339159492403269
epoch  90  loss =  0.000611051800660789
epoch  100  loss =  0.0005089472979307175
epoch  110  loss =  0.0004240949056111276
epoch  120  loss =  0.0003535794385243207
epoch  130  loss =  0.00029497858486138284
epoch  140  loss =  0.00024627914535813034
epoch  150  loss =  0.00020580811542458832
epoch  160  loss =  0.00017217520507983863
epoch  170  loss =  0.00014422494859900326
epoch  180  loss =  0.00012099725427106023
epoch  190  loss =  0.00010169410961680114
epoch  200  loss =  8.565240568714216e-05
epoch  210  loss =  7.232107600430027e-05
epoch  220  loss =  6.124216452008113e-05
epoch  230  loss =  5.2035116823390126e-05
epoch  240  loss =  4.438363976078108e-0

In [131]:
X, y = to_torch(x_test, y_test)
y_predicted = model(X)
y_predicted.mean()

tensor(0.4828, grad_fn=<MeanBackward0>)

In [147]:
X, y = to_torch(x_val, y_test)
y_predicted = model(X)
y_predicted.mean()

tensor(0.0001, grad_fn=<MeanBackward0>)

In [132]:
model.input_layer.weight

Parameter containing:
tensor([[ 0.1196, -0.0011,  0.1197]], requires_grad=True)

Great! now we got some non-trivial solution.  
Let's test with biology example:  
$$
\frac{dy}{dx} = a - by
$$
and the exact solution is:  
$$
y(x) = \frac{a}{b} - (\frac{a}{b} - y_0)e^{-bt}
$$  
We'll chose $a,b=1$

In [133]:
def h(x, y0, a, b):
    dif = a/b
    return dif - (dif-y0) * np.exp(-b*x)

In [143]:
x5 = sorted(np.random.random(1000) * 10)
a = b = 1
y0 = 0
y5 = [h(x, y0, a, b) for x in x4]
y5_d2 = [second_derivative(x4, y4, i) for i in range(len(x4) - 2)]
np.asarray([y + d2y for y, d2y in zip(y5[1:-1],y5_d2)]).mean()

4.770881732908231

In [144]:
y5_d1, y5_d2 = create_derivatives(x5, y5)
x_test2 = np.vstack([y5[1:-1], y5_d1, y5_d2]).T
y_test2 = np.zeros(x_test2.shape[0])

In [145]:
X, y = to_torch(x_test2, y_test2)
y_predicted = model(X)
y_predicted.mean()

tensor(-10.2253, grad_fn=<MeanBackward0>)