## Motivation
When reviewing notebook [EDA / A Quant's Prespective](https://www.kaggle.com/hamzashabbirbhatti/eda-a-quant-s-prespective), I have realized that the first 5 tags share very similar correlation heatmaps. Then, after I have searched over all public notebooks and discussions, it appears that none of them seems to make an analysis on inter-tag features. Hence, I decided to write this notebook, and hoped to provide additional insights into our very masked problem.
## Methods
For the first 5 tags, each contains 17 features. In this notebook, I have, therefore, created 17 order lists to store the corresponding features according to their positional order, i.e. list 0 contains 5 features which are the first features in all five tags, list 1 contains 5 features which are the second features... Then, correlation heap maps are plotted for each order list.
## Results
For each correlation map, it clearly illustrates strong correlations between all features in the order lists (relatively than in each tag). Moreover, an interesting phenomenon shows that, for any order list, correlation steadily decreases as the tag moves away. For example, in order list 0, the correlations between feature 9 and 15, 13, 11, 7 decrease steadily, between 15 and 13, 11, 7 decrease also steadily. (Feature 9 is in tag0, 15 in tag1, 13 in tag2, 11 in tag3, 7 in tag4)
 
 
# ***Most surprisingly, this pattern is exactly the same as resp1-4.***
What does this correlation pattern means? I yet to have an answer.  

My hypothesis is that the correlation indicates time-series autocorrelation, aka, the possibility of lag statistics. 

Pleas feel free to contact me or leave a comment if you have better ideas or criticism of my methodology. THX

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.gridspec as gridspec
from collections import defaultdict
import warnings
import datatable as dt
warnings.filterwarnings("ignore")

In [None]:
df = dt.fread('../input/jane-street-market-prediction/train.csv').to_pandas()
meta_data=dt.fread('../input/jane-street-market-prediction/features.csv').to_pandas()

In [None]:
df.sort_values(by= ['date','ts_id'],inplace=True)
sample_df = df.query('date == 0')
sample_df = sample_df.apply(lambda x: x.fillna(x.median()),axis=0)

### resp1-4 correlation heatmap, which is similar to below order lists heatmaps

In [None]:
resp_df = sample_df.iloc[:,2:6]
#resp_df = pd.concat([sample_df.iloc[:,6],resp_df], axis=1)
corr = resp_df.corr()

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(15, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

### Construct order lists

In [None]:
categories =  defaultdict(list)

for columns  in meta_data.columns[1:]:
    categories[f'{columns}'].append(meta_data.query(f'{columns} == True')['feature'].to_list())

tag_0_df = sample_df[[*categories['tag_0'][0]]]
tag_1_df = sample_df[[*categories['tag_1'][0]]]
tag_2_df = sample_df[[*categories['tag_2'][0]]]
tag_3_df = sample_df[[*categories['tag_3'][0]]]
tag_4_df = sample_df[[*categories['tag_4'][0]]]

tag_df_ls=[tag_0_df,tag_1_df,tag_2_df,tag_3_df,tag_4_df]

for t in tag_df_ls:
    print(len(t.columns.values.tolist()))

In [None]:
order_df_ls=[]
for i in range(len(tag_0_df.columns.values.tolist())):
    order_df=pd.concat([x_df.iloc[:, i] for x_df in tag_df_ls], axis=1)
    order_df_ls.append(order_df)
print(len(tag_0_df.columns.values.tolist()))

### 17 Heatmaps of 17 order lists

In [None]:
corr = order_df_ls[0].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[1].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[2].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[3].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[4].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[5].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[6].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[7].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[8].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[9].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[10].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[11].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[12].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[13].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[14].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[15].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr = order_df_ls[16].corr()
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 10))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, cmap='BrBG',  center=0,vmin=-1, vmax=1, annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})