# Adjusted Target. Still doesn't match the Given one

This notebook is the continuation of the previos one:   
***Be careful! Target is not what it claims to be!***     
https://www.kaggle.com/code/vasiliisitdikov/be-careful-target-is-not-what-it-claims-to-be/notebook

## 1. Description

In the previous notebook we computed the Target column using the defenition provided in **JPX Competition Metric Definition** article. https://www.kaggle.com/code/smeitoma/jpx-competition-metric-definition     
We used pure Close column previous time. It occures that more then 2% of calculated values don't match the given ones.    

The discussion revealed two possible causes:     
1. Close Adjustment
2. Rounding

**In this notebook we introduce the way to calculate the Adjusted Target without Close price recalculations and compare the new computed values of Target with given ones**

## 2. Data loading and AdjustmentFactor description

In [None]:
import numpy as np 
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt

#### We will use the same 'train' data of stock prices as for the previous notebook

In [None]:
df = pd.read_csv("../input/jpx-tokyo-stock-exchange-prediction/train_files/stock_prices.csv")
df.head()

Let us check the main statistics of AdjustmentFactor

In [None]:
df.AdjustmentFactor.describe().to_frame('Value')

In [None]:
print(f'There are {df.AdjustmentFactor.isna().sum()} NAN values in AdjustmentFactor feature')

In [None]:
print(f'There are {df[df.AdjustmentFactor != 1].shape[0]} different from 1 values in AdjustmentFactor feature')
df[df.AdjustmentFactor != 1].head()

#### Let's also check the feature distribution:

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(16, 4))
sns.histplot(df.loc[df.AdjustmentFactor != 1, 'AdjustmentFactor'], ax=ax[0])
sns.histplot(df.loc[df.AdjustmentFactor < 1, 'AdjustmentFactor'], ax=ax[1])
plt.suptitle('AdjustmentFactor distribution')
ax[0].set_title('AF != 1')
plt.title ('AF < 1')
plt.show()

## 3. What Adjustment factor deals with Close and Target

#### To investigate what's going on around the non equal to 1 AdjustmentFactor, let's create the useful function:

In [None]:
def nearest_rows(data: pd.DataFrame, date: str, sec_code: int, nrows: int = 3, before: bool = True) -> pd.DataFrame:
    """
    Parameters:
        data (pd.DataFrame): The data (table) for the look up
        date (str): The date we are looking for from the Date column of data
        sec_code (int): The numerical code of security we are looking for from the SecuritiesCode of data
        nrows (int): The number of next rows we want to see in addition to the strict look up (3 by default)
        before (bool): 'True' if we want to see the rows before the strict look up row id addition to the next ones (True by default)
    Returns:
        pd.DataFrame: The 'window' of input dataframe with seeking Date and SecuritiesCode and several rows around this strict lookup
    """
    td = data[data.SecuritiesCode == sec_code]
    td = td.sort_values('Date').reset_index(drop=True)
    indx = td[td.Date == date].index[0]
    min_index, max_index = max(indx - nrows*before, 0), min(indx + nrows, td.shape[0] - 1)
    return td.loc[min_index:max_index, :]

In [None]:
nearest_rows(df, date='2017-01-17', sec_code=6861)

In [None]:
nearest_rows(df, date='2017-01-17', sec_code=8057)

#### In the examples above we can see that AdjustmentFactor is not equal to 1 for the day, **before** the day of Closing price denomination

#### Let us check the difference between Adjusted Close Target calculations, Simple Close Target Calculations and the given Target the the example above manually:

In [None]:
print("Date '2017-01-16', SecuritiesCode 8057")
print(f'The Simple Close Target Calculations Value: {(2402.0 - 490.0)/490:.6f}')
print(f'The Adjusted Close Target Calculations Value: {(2402.0 - 490.0*5)/(490*5):.6f}')
print(f"The Given Target Value: {df.loc[(df.Date=='2017-01-16') & (df.SecuritiesCode==8057), 'Target'].values[0]:.6f}")

#### We can see that the Adjusted Close Target Calculations value is equal to the given Target value. So we can spread these kind of calculations for the whole data.     
However, it is worth to point out that computing the Adjusted Target we use the day **before** the 'adjustment day'!

## 4. Adjusted Target calculations

There are two ways to compute the Adjusted target:
1. Using cumulated product reasign AdjustedFactor and recompute the Close price, then calculate the target.
2. To come up with some way to avoid Close price reculculations.

Since the first (**cumprod**) aproach is quite stright forward, for this notebook **I introduce another one**.     
You are welcome to compare them an use one you like more.

#### The NOT Cumprod aproach summary:

1. It is clear that for the most days we should not recalculate Target since there are 730 'adjusted' cases only.
2. We need to recalculate Target for the days before the adjustment days.
3. We cannot use the recalculated (adjusted) Close as a 'close' price of our deal, only as an open (base) one, c.e in denominator only.     

Thus, let's us use the mask to bring the new adjusted Targets to our 'simple close' calculations

#### In the previos notebook the pivot_table aproach was introdused as an alternative to the grouping one. Let us use it again:

In [None]:
def pivot_pct_calculation(data: pd.DataFrame, periods: int = 1, shift: int = 0, dropna: bool = True, close_col='Close') -> pd.DataFrame:
    """
    Parameters:
        data (pd.DataFrame): The data (table) for the transformation
        periods (int): The parameter to use in pct_change function = Periods to shift for forming percent change (1 by default)
        shift (int): the post-calculations shift of Target. To make a non shifted table use 0 (by default). To compare with JPX Target column use -2
        dropna (bool): Do not include columns whose entries are all NaN for pivot_table creation (True by default)
        close_col (str): The column name used for calculations ('Close' by default)
    Returns:
        pd.DataFrame: The pivot table with Dates as index, SecuritiesCodes as columns, returns as values
    """
    td = pd.pivot_table(data, index='Date', columns='SecuritiesCode', values=close_col, dropna=dropna)
    print(td.shape)
    td = td.pct_change(periods)
    return td.shift(shift)

#### For the Simple Close return in the previous notebook we had:

In [None]:
prc_df = pivot_pct_calculation(df, shift=-2, dropna=False)
prc_df.head()

#### For the new, Adjusted Close, return we have:

In [None]:
adj_close_df = df.loc[:, ['Date', 'SecuritiesCode', 'Close', 'AdjustmentFactor']]
adj_close_df.loc[~adj_close_df.Close.isna(), 'AdjustedClose'] = adj_close_df.loc[~adj_close_df.Close.isna(), 'Close']*adj_close_df.loc[~adj_close_df.Close.isna(), 'AdjustmentFactor']
prc_adj_df = pivot_pct_calculation(adj_close_df, shift=-2, dropna=False, close_col='AdjustedClose')
prc_adj_df.head()

#### It is clear that:
1. All values for the days before the adjustment are wrong for prc_df table
2. All values for the days with lag 2 from the adjustment are wrong for prc_adj_df.
3. All other values are the same     

#### Let us use the mask to combine these tables:

Firstly, let's compute the AdgustmentFactor pivot table:

In [None]:
af_df = pd.pivot_table(df, index='Date', columns='SecuritiesCode', values='AdjustmentFactor')
af_df.head()

Then, let's create the mask from it:

In [None]:
mask = (af_df.shift(-1) != 1) & (~af_df.shift(-1).isna())
mask.head()

FInally, let's check that our dataframes are consistant and apply the mask:

In [None]:
assert (prc_adj_df.index == prc_df.index).prod()
assert (prc_adj_df.columns == prc_df.columns).prod()
assert (prc_adj_df.index == mask.index).prod()
assert (prc_adj_df.columns == mask.columns).prod()
prc_masked_df = prc_df*(~mask) + prc_adj_df*mask
prc_masked_df.head()

#### That is what we were looking for.    
#### It's the time to check the results.

## 5. The results comparison

#### Firstly, let's bring all cells to proceed with comparison in one function:

In [None]:
def calculated_vs_given_target(calculated_data: pd.DataFrame, given_data: pd.DataFrame) -> pd.DataFrame:
    """
    Parameters:
        calculated_data (pd.DataFrame): The pivot table with calculated return
        given_data (pd.DataFrame): The given data with return as Target feature
    Returns:
        pd.DataFrame: A table with Date and SecuritiesCode as a look up columns aligned with Calculated and Given columns for return
    """
    calculated_target = pd.melt(calculated_data, ignore_index=False, value_name='Target').reset_index().sort_values(['Date','SecuritiesCode']).dropna(subset=['Target']).reset_index(drop=True)
    given_target = given_data.loc[given_data.Date <= calculated_target.Date.max(), ['Date', 'SecuritiesCode', 'Target']].sort_values(['Date','SecuritiesCode']).dropna(subset=['Target']).reset_index(drop=True)
    min_gd = given_target.groupby('SecuritiesCode')['Date'].min()
    calculated_target['MinDate'] = calculated_target['SecuritiesCode'].map(min_gd)
    calculated_target = calculated_target[calculated_target.Date >= calculated_target.MinDate].drop(columns=['MinDate']).reset_index(drop=True)
    assert calculated_target.shape[0] == given_target.shape[0]
    assert (calculated_target.index == given_target.index).prod()
    assert (calculated_target.Date == given_target.Date).prod()
    assert (calculated_target.SecuritiesCode == given_target.SecuritiesCode).prod()
    calculated_target = calculated_target.rename(columns= {'Target': 'Calculated'})
    calculated_target = calculated_target.merge(given_target, on=['Date', 'SecuritiesCode']).rename(columns= {'Target': 'Given'})
    return calculated_target

#### Then, we apply this function to the Not Adjusted and Adjusted calculations

In [None]:
diff_notadjusted_df = calculated_vs_given_target(prc_df, df)
diff_notadjusted_df.head()

In [None]:
diff_adjusted_df = calculated_vs_given_target(prc_masked_df, df)
diff_adjusted_df.head()

#### One more function to combine all results:

In [None]:
def calculated_vs_given_target_comparison(diff_data: pd.DataFrame, threshold: float = 10**(-10)) -> pd.DataFrame:
    """
    Parameters:
        diff_data (pd.DataFrame): The dataframe table with Date, SecuritiesCode, Calculated, Given columns
        threshold (float): The threshold to compare significance of the difference
    Returns:
        pd.DataFrame: The input dataframe with data filtered by threshold (only 'significantly' different rows)
    """
    diff_data['Diff'] = diff_data['Calculated'] - diff_data['Given']
    diff_data['Diff'] = abs(diff_data['Diff']) > threshold
    diff_data = diff_data[diff_data.Diff]
    diff_data = diff_data.drop(columns=['Diff'])
    return diff_data

#### Now we can build the table to compare our number of differences with given Target value within simple Close and Adjusted Close tables in one look:

In [None]:
adj_n = list()
nadj_n = list()
for i in range(1, 11):
    adj_n.append(calculated_vs_given_target_comparison(diff_adjusted_df, 10**(-i)).shape[0])
    nadj_n.append(calculated_vs_given_target_comparison(diff_notadjusted_df, 10**(-i)).shape[0])

In [None]:
compare_df = pd.DataFrame({'Threshold':[f'10**({-x})' for x in range(1, 11)],
             'Not Adjusted': nadj_n,
             'Adjusted': adj_n})
compare_df

In [None]:
plt.figure(figsize=(16, 5))
plt.plot(compare_df.Threshold, compare_df['Not Adjusted'], label='Not Adjusted')
plt.plot(compare_df.Threshold, compare_df.Adjusted, label='Adjusted')
plt.yscale('log')
plt.title('Counts of different values for calculated and given Target (log)')
plt.xlabel('Threshold')
plt.ylabel('Number of difference with Given Target (log)')
plt.legend()
plt.show()

## 6. Among Us

#### There is the only one case with the huge difference in our adjusted calculations:

In [None]:
calculated_vs_given_target_comparison(diff_adjusted_df, 10**(-1))

In [None]:
nearest_rows(df, date='2018-09-21', sec_code=4628)

#### Let me skip it for now. You are welcome to bring your thoughts in discussion about this case

# Conclusion

#### In this notebook we continued the topic of Target calculations from the Close feature, applying 'not cumprod' approach for this adjustment
See the previous notebook ***Be careful! Target is not what it claims to be!***     
https://www.kaggle.com/code/vasiliisitdikov/be-careful-target-is-not-what-it-claims-to-be/notebook

#### Using Target Adjustment we got more consistent result than we have got before. There is only 1 case with huge difference and 19 with threshhold of 10**(-3)

#### We assume that the reason for the difference in the rest cases is the rounding of price (see my comment on the previous discussion). However, you are welcome to check this assumption.