# How to import get_Xy from Xy module

In [1]:
from diff_cap_packages import Xy

# Usage of `Xy.get_market_Xy()`

## Default
In this case, I've only inputted the stock ID of our target (`'930060'`) to get our matrices $X$ and $y$.

By default: 
- the target is 1 day ahead (`target_ahead_by = 1`)
- values are outputted as day-to-day percent change rather than prices (`percent_change = True`)
- the path of the stock_prices.csv file is coded into the function (`path = 'data/stock_prices.csv'`)

In [2]:
X, y = Xy.get_market_Xy('930060')
print("\nFeatures matrix X")
display(X.head())
print("\nTarget matrix y")
display(y.head())


Features matrix X


Unnamed: 0_level_0,930060,699903,879841,314909,15362F,315452,884570,992762,879650,315449,...,9511Z8,95335N,96147L,9664FT,9664FU,9911WP,9930FR,99142R,2569A8,2579PR
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2003-01-02,,,,,,,,,,,...,,,,,,,,,,
2003-01-03,0.007843,0.0,0.0,0.019327,-0.010101,0.0,-0.023091,0.025126,0.0,0.014815,...,,,,,,,,,,
2003-01-06,-0.003891,0.0,0.0,0.019878,0.0,0.008143,0.0,0.044118,0.0,0.0,...,,,,,,,,,,
2003-01-07,-0.003906,0.0,-0.008621,0.064468,0.0,0.0,0.0,0.032864,0.0,-0.014599,...,,,,,,,,,,
2003-01-08,0.0,0.0,0.0,-0.035493,0.0,0.0,0.0,-0.009091,0.0,-0.022222,...,,,,,,,,,,



Target matrix y


Unnamed: 0_level_0,930060 +1 day
date,Unnamed: 1_level_1
2003-01-02,
2003-01-03,0.007843
2003-01-06,-0.003891
2003-01-07,-0.003906
2003-01-08,0.0


## Prices instead of percent change
In this case, I have set `percent_change = False`. Rather than showing percent changes, this code outputs price data.

In [3]:
X, y = Xy.get_market_Xy('930060', percent_change=False)
print("\nFeatures matrix X")
display(X.head())
print("\nTarget matrix y")
display(y.head())


Features matrix X


Unnamed: 0_level_0,930060,699903,879841,314909,15362F,315452,884570,992762,879650,315449,...,9511Z8,95335N,96147L,9664FT,9664FU,9911WP,9930FR,99142R,2569A8,2579PR
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2003-01-02,25.5,1.47,5.8,32.08,9.9,6.14,5.63,1.99,1.7,13.5,...,,,,,,,,,,
2003-01-03,25.7,1.47,5.8,32.7,9.8,6.14,5.5,2.04,1.7,13.7,...,,,,,,,,,,
2003-01-06,25.6,1.47,5.8,33.35,9.8,6.19,5.5,2.13,1.7,13.7,...,,,,,,,,,,
2003-01-07,25.5,1.47,5.75,35.5,9.8,6.19,5.5,2.2,1.7,13.5,...,,,,,,,,,,
2003-01-08,25.5,1.47,5.75,34.24,9.8,6.19,5.5,2.18,1.7,13.2,...,,,,,,,,,,



Target matrix y


Unnamed: 0_level_0,930060 +1 day
date,Unnamed: 1_level_1
2003-01-02,25.5
2003-01-03,25.7
2003-01-06,25.6
2003-01-07,25.5
2003-01-08,25.5


## Predicting further ahead than 1 day (default)
In this case, I have set `target_ahead_by = 5`. Rather than the target matrix $y$ being 1 day ahead of the feature matrix $X$, it is now 5 days ahead.

Notice how the values in the target matrix $y$ for 2003-01-02 to 2003-01-08 correspond to the values 5 days ahead in the feature matrix $X$, which are 2003-01-09 to 2003-01-15. (Looking at the first column of $X$.)

In [4]:
X, y = Xy.get_market_Xy('930060', target_ahead_by=5)
print("\nFeatures matrix X")
display(X.head(10))
print("\nTarget matrix y")
display(y.head(10))


Features matrix X


Unnamed: 0_level_0,930060,699903,879841,314909,15362F,315452,884570,992762,879650,315449,...,9511Z8,95335N,96147L,9664FT,9664FU,9911WP,9930FR,99142R,2569A8,2579PR
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2003-01-02,,,,,,,,,,,...,,,,,,,,,,
2003-01-03,0.007843,0.0,0.0,0.019327,-0.010101,0.0,-0.023091,0.025126,0.0,0.014815,...,,,,,,,,,,
2003-01-06,-0.003891,0.0,0.0,0.019878,0.0,0.008143,0.0,0.044118,0.0,0.0,...,,,,,,,,,,
2003-01-07,-0.003906,0.0,-0.008621,0.064468,0.0,0.0,0.0,0.032864,0.0,-0.014599,...,,,,,,,,,,
2003-01-08,0.0,0.0,0.0,-0.035493,0.0,0.0,0.0,-0.009091,0.0,-0.022222,...,,,,,,,,,,
2003-01-09,-0.003922,0.0,0.001739,-0.00993,0.0,0.0,0.0,0.009174,0.0,0.0,...,,,,,,,,,,
2003-01-10,0.003937,0.0,0.001736,-0.00295,0.0,-0.008078,0.0,0.0,0.0,0.003788,...,,,,,,,,,,
2003-01-13,0.0,0.013605,0.0,0.006213,0.0,0.016287,0.0,0.0,0.0,0.0,...,,,,,,,,,,
2003-01-14,0.003922,0.0,0.024263,0.020582,0.0,0.0,-0.018182,-0.009091,0.0,0.018868,...,,,,,,,,,,
2003-01-15,0.0,-0.013423,0.003384,-0.014693,0.0,0.0,0.037037,0.009174,0.0,0.0,...,,,,,,,,,,



Target matrix y


Unnamed: 0_level_0,930060 +5 day
date,Unnamed: 1_level_1
2003-01-02,
2003-01-03,0.007843
2003-01-06,-0.003891
2003-01-07,-0.003906
2003-01-08,0.0
2003-01-09,-0.003922
2003-01-10,0.003937
2003-01-13,0.0
2003-01-14,0.003922
2003-01-15,0.0


# Varun's documentation

In [5]:
Xy.get_market_Xy?

**The two functions in the block below are just broken down versions of the above function.**

For example, if we only want to extract the features matrix ($X$) and do not have a particular target matrix in mind, we can use `Xy.get_X`. We wouldn't want to use `Xy.get_Xy` because that would require us to include parameters about our target.

Next Steps: 
- I'm hoping to add functionality to `Xy.get_y` so that we can extract multiple target matrixes $(y_0 ... y_n)$ from calling the function once and use them to quickly train lots of models.
- It would also be nice to incorporate Patryk's technical indicators and other engineered features into the outputted feature matrix $X$.



In [9]:
Xy.get_delayed_X(stock = "930060", period_start = 5, period_stop=51, period_step=5)

Unnamed: 0_level_0,5 day delay,10 day delay,15 day delay,20 day delay,25 day delay,30 day delay,35 day delay,40 day delay,45 day delay,50 day delay
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2003-01-02,,,,,,,,,,
2003-01-03,,,,,,,,,,
2003-01-06,,,,,,,,,,
2003-01-07,,,,,,,,,,
2003-01-08,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...
2021-06-24,-0.021946,-0.013138,-0.015774,-0.011140,0.018357,-0.001172,0.014405,-0.002421,0.017214,0.012083
2021-06-25,-0.033558,0.003915,-0.008061,0.008005,-0.027979,0.003423,0.010000,0.000874,0.002460,0.002347
2021-06-28,-0.006061,0.007118,0.000580,0.001176,0.001449,0.002534,-0.003465,-0.013483,0.004318,0.009569
2021-06-29,0.016021,-0.003001,0.003191,0.021250,0.001350,0.008459,0.007650,-0.017404,0.008991,-0.007865


**The function in the block below is filtering the data from the listings.**

So, we take n numbers of date before the given date. It shows the dataframe of what we define investable universe. The dataframe includes stocks that have in the last n date point 95% of the prices are available. Then we look at the volume, if the volume is too low, we drop the stock. 



In [1]:
from diff_cap_packages import filters #import the package 
#filters.get_dataframe_prices(past number of days, the point in date, % percentage that tolerates nan, volumes threshold)
data_new = filters.get_dataframe_prices(1200,  "2021-06-30", 5,  150)
data_new 


# filters.names_stocks("01/01/2008","01/01/2020")

Unnamed: 0_level_0,930060,314909,992762,933382,997026,930523,923904,930529,936978,936977,...,26535Y,2562XT,2736ZT,8914PL,2766GV,8747J8,9225R0,8718CJ,86544E,96147L
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-09-23,102.00,153.59,0.15,16.94,2.97,383.50,167.40,227.08,94.49,82.81,...,,47.96,,14.25,,13.61,8.22,10.00,6.95,31.08
2016-09-26,101.13,152.00,0.15,17.02,2.97,374.34,163.19,221.75,93.40,79.93,...,,47.41,,14.30,,13.80,8.45,10.04,6.95,30.36
2016-09-27,101.60,151.90,0.15,16.93,2.80,370.84,159.35,219.40,93.27,78.35,...,,47.41,,14.50,,13.89,8.01,10.04,6.95,29.01
2016-09-28,102.00,151.50,0.15,16.94,2.76,383.57,163.64,216.30,91.75,77.40,...,,48.14,,14.50,,13.89,7.95,10.09,6.95,29.36
2016-09-29,101.87,155.59,0.15,17.03,2.77,391.00,172.34,222.55,94.28,81.07,...,,48.51,,14.50,,13.89,8.20,10.09,6.95,30.24
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-06-24,99.28,134.00,0.39,21.16,9.55,1607.17,574.03,270.27,70.67,250.00,...,11.30,7.75,31.95,18.55,4.60,8.30,21.05,14.30,3.21,60.47
2021-06-25,99.00,135.49,0.39,21.16,9.36,1643.54,585.43,264.72,70.62,252.23,...,10.79,7.69,32.00,18.48,4.60,8.15,21.47,14.20,3.21,59.50
2021-06-28,97.65,131.65,0.39,21.16,8.85,1639.70,573.64,265.17,70.25,254.27,...,11.10,7.56,32.00,18.55,4.65,8.60,21.90,14.15,3.90,59.54
2021-06-29,99.33,135.08,0.39,21.16,8.83,1633.94,572.86,261.97,71.39,255.10,...,11.18,7.42,32.13,18.80,4.80,8.29,21.78,13.92,3.50,59.52


In [3]:
filters.volumes.index

DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06', '2003-01-07',
               '2003-01-08', '2003-01-09', '2003-01-10', '2003-01-13',
               '2003-01-14', '2003-01-15',
               ...
               '2021-06-17', '2021-06-18', '2021-06-21', '2021-06-22',
               '2021-06-23', '2021-06-24', '2021-06-25', '2021-06-28',
               '2021-06-29', '2021-06-30'],
              dtype='datetime64[ns]', name='date', length=4656, freq=None)