# Preamble
**I was creating a notebook for this competition and I have decided to create new features to enrich our dataset and model, however to not to overfill my current  [notebook](https://www.kaggle.com/code/okanzkaya/jpx-tokyo-ts-cv-implementation-lgbm-optuna-fe) I will perform feature engineering deeply in this notebook.**

**I will add the indicators that I think are significant factors on price, also I would appreciate if you have any idea or suggestions about my indicators.**

**I am not going to submit this notebook or develop a model, in this notebook I will only create TA(Technical Analysis) features for other notebooks as suplemental** 
****

# Why Feature Engineering?

**The number of features we have in this competition dataset is limited, in order to improve accuracy of our model and make it generalize better we will derive new features.**

**I will choose the indicators that are trusted and commonly used in the field of finance and I will apply their formulas in Python.**
****

# What is TA-Lib?

**TA-Lib is the most popular Python library for Technical Analysis, with Cython architecture TA-Lib performs quite fast and efficient.**

# Let's Import Useful Libraries

# Install and import TA-Lib

**TA-Lib helps us create technical analysis tools with ease.**

In [1]:
!pip install ../input/talib0419/talib_binary-0.4.19-cp37-cp37m-manylinux1_x86_64.whl
import talib

Processing /kaggle/input/talib0419/talib_binary-0.4.19-cp37-cp37m-manylinux1_x86_64.whl
Installing collected packages: talib-binary
Successfully installed talib-binary-0.4.19
[0m

In [2]:
import pandas as pd
prices_df = pd.read_csv('../input/jpx-tokyo-stock-exchange-prediction/train_files/stock_prices.csv')
prices_df = prices_df.drop(['ExpectedDividend'], axis=1) # trivial imo
#prices_df = prices_df.dropna() # DO NOT!
prices_df.interpolate(method='linear', inplace=True) # pandas interpolation fills NaN values with the mean of two upper and lower neighbour values.
prices_df.isnull().sum()

RowId               0
Date                0
SecuritiesCode      0
Open                0
High                0
Low                 0
Close               0
Volume              0
AdjustmentFactor    0
SupervisionFlag     0
Target              0
dtype: int64

In [3]:
prices_df

Unnamed: 0,RowId,Date,SecuritiesCode,Open,High,Low,Close,Volume,AdjustmentFactor,SupervisionFlag,Target
0,20170104_1301,2017-01-04,1301,2734.0,2755.0,2730.0,2742.0,31400,1.0,False,0.000730
1,20170104_1332,2017-01-04,1332,568.0,576.0,563.0,571.0,2798500,1.0,False,0.012324
2,20170104_1333,2017-01-04,1333,3150.0,3210.0,3140.0,3210.0,270800,1.0,False,0.006154
3,20170104_1376,2017-01-04,1376,1510.0,1550.0,1510.0,1550.0,11300,1.0,False,0.011053
4,20170104_1377,2017-01-04,1377,3270.0,3350.0,3270.0,3330.0,150800,1.0,False,0.003026
...,...,...,...,...,...,...,...,...,...,...,...
2332526,20211203_9990,2021-12-03,9990,514.0,528.0,513.0,528.0,44200,1.0,False,0.034816
2332527,20211203_9991,2021-12-03,9991,782.0,794.0,782.0,794.0,35900,1.0,False,0.025478
2332528,20211203_9993,2021-12-03,9993,1690.0,1690.0,1645.0,1645.0,7200,1.0,False,-0.004302
2332529,20211203_9994,2021-12-03,9994,2388.0,2396.0,2380.0,2389.0,6500,1.0,False,0.009098


**I am going to use these indicators, I mentioned in this [article](https://medium.com/@okanozkaya987/top-technical-indicators-and-formulas-for-traders-and-analysts-4929cf6081f6).**

* **Simple Moving Average (SMA)**
* **Relative Strength Index (RSI)**
* **Moving Average Convergence Divergence (MACD)**
* **Exponential Moving Average (EMA)**
* **Standard Deviation (STD)**
* **Bollinger Bands (BB)**

In [4]:
def features(df):
    
    close = df['Close']
    volume = df['Volume']
    opening = df['Open']
    high = df['High']
    low = df['Low']
    
    
    df['EMA'] = talib.EMA(close, timeperiod=30) # Exponential Moving Average
    df['SMA'] = talib.SMA(close, timeperiod=30) # Simple Moving Average
    df['RSI'] = talib.RSI(close, timeperiod=14) # Relative Strength Index
    df['STDDEV'] = talib.STDDEV(close, timeperiod=5, nbdev=1) # Standard Deviation
    df['macd'], df['macdsignal'], df['macdhist'] = talib.MACD(close, fastperiod=12, slowperiod=26, signalperiod=9) # Moving Average Convergence/Divergence
    df['upperband'], df['middleband'], df['lowerband'] = talib.BBANDS(close, timeperiod=5, nbdevup=2, nbdevdn=2, matype=0) # Bollinger Bands
    return df

prices_df = prices_df.groupby('SecuritiesCode').apply(features)
prices_df = prices_df.dropna(axis=0).reset_index(drop=True) # drop

In [5]:
prices_df

Unnamed: 0,RowId,Date,SecuritiesCode,Open,High,Low,Close,Volume,AdjustmentFactor,SupervisionFlag,...,EMA,SMA,RSI,STDDEV,macd,macdsignal,macdhist,upperband,middleband,lowerband
0,20170221_1301,2017-02-21,1301,2853.0,2883.0,2850.0,2882.0,35900,1.0,False,...,2761.274226,2748.733333,82.210289,22.579637,33.163835,16.820463,16.343372,2886.759274,2841.6,2796.440726
1,20170221_1332,2017-02-21,1332,562.0,567.0,560.0,562.0,1987400,1.0,False,...,559.953294,557.933333,47.010621,5.192302,4.873062,7.143227,-2.270165,577.584604,567.2,556.815396
2,20170221_1333,2017-02-21,1333,3275.0,3285.0,3250.0,3280.0,131900,1.0,False,...,3234.488270,3226.000000,53.192793,12.409674,20.517708,16.257713,4.259994,3303.819347,3279.0,3254.180653
3,20170221_1376,2017-02-21,1376,1429.0,1448.0,1429.0,1448.0,3800,1.0,False,...,1481.271043,1479.733333,41.026569,8.280097,-24.739427,-26.285473,1.546046,1453.360193,1436.8,1420.239807
4,20170221_1377,2017-02-21,1377,3180.0,3180.0,3150.0,3165.0,92900,1.0,False,...,3233.163178,3231.666667,42.883295,6.324555,-34.742766,-38.084250,3.341485,3187.649111,3175.0,3162.350889
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2266526,20211203_9990,2021-12-03,9990,514.0,528.0,513.0,528.0,44200,1.0,False,...,556.056743,563.666667,39.908947,7.694154,-13.925969,-7.568447,-6.357522,536.388307,521.0,505.611693
2266527,20211203_9991,2021-12-03,9991,782.0,794.0,782.0,794.0,35900,1.0,False,...,818.175868,825.133333,42.065552,9.329523,-20.337700,-20.271023,-0.066677,795.259046,776.6,757.940954
2266528,20211203_9993,2021-12-03,9993,1690.0,1690.0,1645.0,1645.0,7200,1.0,False,...,1703.412006,1708.366667,26.799638,14.905033,-15.484976,-7.578197,-7.906779,1695.010065,1665.2,1635.389935
2266529,20211203_9994,2021-12-03,9994,2388.0,2396.0,2380.0,2389.0,6500,1.0,False,...,2399.384579,2403.666667,48.129008,20.819222,-17.060623,-14.205326,-2.855298,2397.038444,2355.4,2313.761556


# Conclusion
**We got 6 new features, it is optional you can reduce or increase the number of features**

**These features will help our model to generalize better and perform better.**
*****

**Caution: Avoid redundant features, having 100 features will not turn your model into a perfect model.**