### This notebook is supposed to compare returnsOpenNextMktres10 with returnsOpenPrevMktres10.

In [None]:
from datetime import date
import numpy as np
import pandas as pd
from kaggle.competitions import twosigmanews
import gc
pd.options.display.max_rows = 999
(market_train_df, _) = twosigmanews.make_env().get_training_data()
market_train_df.time = market_train_df.time.dt.date
columns = ['time',
           'assetCode',
           'assetName',
           'returnsOpenPrevMktres10',
           'returnsOpenNextMktres10']
market_train_df = market_train_df[columns]
gc.collect();
dates = market_train_df[['time']].drop_duplicates().sort_values(by='time').reset_index(drop=True)
dates['timeNext'] = dates.time.shift(-11)
market_train_df['timeNext'] = market_train_df.merge(dates,
                                                    on = 'time',
                                                    how = 'left')['timeNext']
dates['timePrev'] = dates.time.shift(11)
market_train_df['timePrev'] = market_train_df.merge(dates,
                                                    on = 'time',
                                                    how = 'left')['timePrev']
market_train_df['returnsOpenPrevMktres10_shifted'] = market_train_df.merge(market_train_df,
                                                                           left_on = ['timeNext', 'assetCode'],
                                                                           right_on = ['time', 'assetCode'],
                                                                           how = 'left',
                                                                           suffixes = ('', '_1'))['returnsOpenPrevMktres10_1']

market_train_df['returnsOpenNextMktres10_shifted'] = market_train_df.merge(market_train_df,
                                                                           left_on = ['timePrev', 'assetCode'],
                                                                           right_on = ['time', 'assetCode'],
                                                                           how = 'left',
                                                                           suffixes = ('', '_2'))['returnsOpenNextMktres10_2']

**2 columns were added to the initial dataset:**

*     *returnsOpenPrevMktres10_shifted* shows what *returnsOpenPrevMktres10* for this particular asset was 10 days after current date (so, this column should be compared with *returnsOpenNextMktres10*);
*     *returnsOpenNextMktres10_shifted* shows what *returnsOpenNextMktres10* for this particular asset was predicted 10 days before current date (this column should be compared with *returnsOpenPrevMktres10*).




The only case of market-residualized returns mismatch I see is Cisco Systems Inc, summer 2016.
For instance, on June 21, 2016, we estimated, that residualized return 10 days later would be 0,056966. But 10 days later, on July 7, 2016, we got -0,006932. The same mismatch is observed until August 2, 2016.


For all other assets returnsOpenPrevMktres10 and returnsOpenNextMktres10 are equal (with regard of 10-days lag, of course).



In [None]:
mask1 = np.isclose(market_train_df.dropna()['returnsOpenPrevMktres10'],
                   market_train_df.dropna()['returnsOpenNextMktres10_shifted'], 
                   rtol=1e-8, atol=1e-12)

market_train_df.dropna().loc[~mask1, ['time',
                                      'assetCode',
                                      'assetName',
                                      'returnsOpenPrevMktres10',
                                      'timePrev',
                                      'returnsOpenNextMktres10_shifted']]

* On August 3, 2016, there was no errors in our estimates of future residualized return.

In [None]:
market_train_df[(market_train_df.assetCode=='CSCO.O')&(market_train_df.time==date(2016,8,3))][['time',
                                                                                               'assetCode',
                                                                                               'assetName',
                                                                                               'returnsOpenNextMktres10',
                                                                                               'timeNext',
                                                                                               'returnsOpenPrevMktres10_shifted']]

In [None]:
mask2 = np.isclose(market_train_df.dropna()['returnsOpenNextMktres10'],
                   market_train_df.dropna()['returnsOpenPrevMktres10_shifted'], 
                   rtol=1e-8, atol=1e-12)

market_train_df.dropna().loc[~mask2, ['time',
                                      'assetCode',
                                      'assetName',
                                      'returnsOpenNextMktres10',
                                      'timeNext',
                                      'returnsOpenPrevMktres10_shifted']]

 I'm not sure whether this is important for the competition, but I'd be glad to discuss it with kagglers. 
### Maybe someone knows what is the reason of this mismatch? 