# Preamble
**I was creating a notebook for this competition and I have decided to create new features to enrich our dataset and model, however to not to overfill my current  [notebook](https://www.kaggle.com/code/okanzkaya/jpx-tokyo-ts-cv-implementation-lgbm-optuna-fe) I will perform feature engineering deeply in this notebook.**

**I will add the indicators that I think are significant factors on price, also I would appreciate if you have any idea or suggestions about my indicators.**

**I am not going to submit this notebook or develop a model, in this notebook I will only create TA(Technical Analysis) features for other notebooks as suplemental** 
****

# Why Feature Engineering?

**The number of features we have in this competition dataset is limited, in order to improve accuracy of our model and make it generalize better we will derive new features.**

**I will choose the indicators that are trusted and commonly used in the field of finance and I will apply their formulas in Python.**
****

# What is TA-Lib?

**TA-Lib is the most popular Python library for Technical Analysis, with Cython architecture TA-Lib performs quite fast and efficient.**

# Let's Import Useful Libraries

# Install and import TA-Lib

**TA-Lib helps us create technical analysis tools with ease.**

In [None]:
!pip install ../input/talib0419/talib_binary-0.4.19-cp37-cp37m-manylinux1_x86_64.whl
import talib

In [None]:
import pandas as pd
prices_df = pd.read_csv('../input/jpx-tokyo-stock-exchange-prediction/train_files/stock_prices.csv')
prices_df = prices_df.drop(['ExpectedDividend'], axis=1) # trivial imo
#prices_df = prices_df.dropna() # DO NOT!
prices_df.interpolate(method='linear', inplace=True) # pandas interpolation fills NaN values with the mean of two upper and lower neighbour values.
prices_df.isnull().sum()

In [None]:
prices_df

**I am going to use these indicators, I mentioned in this [article](https://medium.com/@okanozkaya987/top-technical-indicators-and-formulas-for-traders-and-analysts-4929cf6081f6).**

* **Simple Moving Average (SMA)**
* **Relative Strength Index (RSI)**
* **Moving Average Convergence Divergence (MACD)**
* **Exponential Moving Average (EMA)**
* **Standard Deviation (STD)**
* **Bollinger Bands (BB)**

In [None]:
def features(df):
    
    close = df['Close']
    volume = df['Volume']
    opening = df['Open']
    high = df['High']
    low = df['Low']
    
    
    df['EMA'] = talib.EMA(close, timeperiod=30) # Exponential Moving Average
    df['SMA'] = talib.SMA(close, timeperiod=30) # Simple Moving Average
    df['RSI'] = talib.RSI(close, timeperiod=14) # Relative Strength Index
    df['STDDEV'] = talib.STDDEV(close, timeperiod=5, nbdev=1) # Standard Deviation
    df['macd'], df['macdsignal'], df['macdhist'] = talib.MACD(close, fastperiod=12, slowperiod=26, signalperiod=9) # Moving Average Convergence/Divergence
    df['upperband'], df['middleband'], df['lowerband'] = talib.BBANDS(close, timeperiod=5, nbdevup=2, nbdevdn=2, matype=0) # Bollinger Bands
    return df

prices_df = prices_df.groupby('SecuritiesCode').apply(features)
prices_df = prices_df.dropna(axis=0).reset_index(drop=True) # drop

In [None]:
prices_df

# Conclusion
**We got 6 new features, it is optional you can reduce or increase the number of features**

**These features will help our model to generalize better and perform better.**
*****

**Caution: Avoid redundant features, having 100 features will not turn your model into a perfect model.**