Correlations only make sense when both time series are stationary. This analysis shows two things:

1. features are non stationary and require differencing.
2. correlations are different for different ids.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [None]:
import kagglegym
env = kagglegym.make()
observation = env.reset()
train = observation.train

In [None]:
train.fillna(0, inplace=True)

In [None]:
gf = train.copy(True)
gf = train.pivot('timestamp', 'id', 'technical_20')
y = train.pivot('timestamp', 'id', 'y')
gf.fillna(0, inplace=True)
y.fillna(0, inplace=True)

Let us take technical 20 as an example and run a simple correlation.

In [None]:
print (np.corrcoef(gf[train.id[0]].values, y[train.id[0]].values)[0, 1])

Now lets plot technical 20 for a single asset. say id = 0.

In [None]:
import matplotlib.pyplot as plt

X = gf[train.id[0]].values
Y = y[train.id[0]].values
plt.plot(X, color='r')
plt.show()

That does not look stationary. Now lets plot after taking the first differential of technical 20 by asset Id 0

In [None]:
X = np.diff(X)
plt.plot(X, color='r')

Still does not look stationary. Lets difference again.

In [None]:
X = np.diff(X)
plt.plot(X)
plt.show()

Now lets compute the correlation between the two times differenced X and Y values.

In [None]:
print (np.corrcoef(X, Y[2:])[0, 1])

**That is a whopping -17%**

But what about other assets? Let us try a random 47th asset.

In [None]:
print(np.corrcoef(gf[train.id[47]].diff().fillna(0).diff().fillna(0).values, y[train.id[47]].values)[0, 1])

This is better than running correlations on non-stationary data but still is only positive 0.4%.

**CONCLUSION:**

Correlations on un-differenced data are spurious and make no sense. You have to difference to find stationary series before looking for correlations. Fortunately train.y (appears to be asset returns) are already stationary.

Secondly correlations are not same across asset ids. Each asset has a different correlation to features.

Hope this helps. Now lets get cracking this challenge.

*If you like this analysis please upvote. Else let me know if I have misunderstood something.*