# Spatial Dependency among Return Rate Series

## About this Notebook
In this kernel, I illustrate some interesting observations found when exploring correlations of return rate series of different investments. The result shows that there might be some **spatial dependency** among different investments, which can be used to help model learn better skills.

To see more analysis, please refer to [Ubiquant Market Prediction - A Simple EDA
](https://www.kaggle.com/abaojiang/ubiquant-market-prediction-a-simple-eda).

<div class="alert alert-block alert-warning">
    <h4>If you find this kernel useful, please upvote it, thanks a lot!!</h4>
</div>

In [None]:
# Import packages 
import os
import warnings
from tqdm import tqdm

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
import plotly.graph_objects as go 
import plotly.express as px 

# Configuration 
warnings.simplefilter('ignore')
pd.set_option('display.max_columns', 200)

In [None]:
# Variable definitions
DATA_PATH_RAW = "../input/ubiquant-market-prediction"
DATA_PATH_RAW_CUSTOM = "../input/ubiquant-raw"
BASE_COLS = ['row_id', 'time_id', 'investment_id', 'target']
FEAT_COLS = [f'f_{i}' for i in range(300)]

In [None]:
df = pd.read_parquet(os.path.join(DATA_PATH_RAW_CUSTOM, 'train_light.parquet'))
df.head()

In [None]:
target_map = df.pivot(index='investment_id', columns='time_id', values='target')
n_samples_inv_id = df.groupby('investment_id').agg('size')
corrs_inv = target_map.T.corr()   # Derive corr of return rates of different invs
corrs_inv = abs(corrs_inv[corrs_inv != 1])   # Take off-diagonal corrs
inv_ids_leg = list(n_samples_inv_id[n_samples_inv_id > 600].index)
corrs_inv = corrs_inv.loc[inv_ids_leg, inv_ids_leg]

corrs_max = {'inv_id1': [], 'inv_id2': [], 'corr': []}
for inv_id, corr_vec in corrs_inv.iterrows():
    corrs_max['inv_id1'].append(inv_id)
    corrs_max['inv_id2'].append(corr_vec.index[corr_vec.argmax()])
    corrs_max['corr'].append(corr_vec.max())
corrs_max = pd.DataFrame.from_dict(corrs_max, orient='columns')
corrs_max.sort_values('corr', ascending=False, inplace=True)
corrs_max.head()

In [None]:
inv_ids_top = [194, 1144, 1121, 1929, 2406, 2669]
target_map_ = target_map.loc[inv_ids_top, :]
sns.pairplot(target_map_.T, corner=True)

In [None]:
fig = go.Figure()
for i, inv_id in tqdm(enumerate(inv_ids_top)):
    target_vec = target_map[target_map.index == inv_id].values[0]
    target_cumsum = np.nancumsum(target_vec)
    
    fig.add_trace(go.Scatter(x=target_map.columns, y=target_cumsum, 
                             mode='lines', name=f'inv_{inv_id}'))
    del target_vec, target_cumsum
    
fig.update_layout(
    title="Cumulative Return Rate of Highly Correlated Investments",
    xaxis_title="time_id",
    yaxis_title="Cumulative Return Rate",
    legend_title="investment_id",
)
fig.show()