In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
import plotly.express as px
import plotly

In [None]:
# calculate factors functions
def returnMAE(df, n):
    avr = pd.Series()
    for id in SecuritiesCode:
        avr = pd.concat([avr, df[df.SecuritiesCode == id].Close.rolling(window=n, min_periods=1).mean()], ignore_index = False)    
    df[f'average{n}'] = avr 
    return df


def returnEWM(df):
    avr = pd.Series()
    for id in SecuritiesCode:
        avr = pd.concat([avr, df[df.SecuritiesCode == id].Close.ewm(com=0.5, adjust=True).mean()], ignore_index = False)    
    
    df[f'EWM'] = avr
    return df
    
def returnBollDown(df,n=20):
    bl1 = pd.Series()
    bl2 = pd.Series()
    for id in SecuritiesCode:
        bl1 = pd.concat([bl1, df[df.SecuritiesCode == id].Close.rolling(window=n, min_periods=1).apply(lambda x: x.mean()-2*x.std(), raw=False)], ignore_index = False)    
        bl2 = pd.concat([bl2, df[df.SecuritiesCode == id].Close.rolling(window=n, min_periods=1).apply(lambda x: x.mean()+2*x.std(), raw=False)], ignore_index = False)
    df[f'bollDown'] = bl1 
    df[f'bollUp'] = bl2
    return df

def returnMAEVolume(df, n=12):
    avr = pd.Series()
    for id in SecuritiesCode:
        avr = pd.concat([avr, df[df.SecuritiesCode == id].Volume.rolling(window=n, min_periods=1).mean()], ignore_index = False)    
    
    df[f'MAEVolume_{n}'] = avr 
    return df

def returnSTDVolume(df, n=10):
    avr = pd.Series()
    for id in SecuritiesCode:
        avr = pd.concat([avr, df[df.SecuritiesCode == id].Volume.rolling(window=n, min_periods=1).std()], ignore_index = False)    
    
    df[f'STDVolume_{n}'] = avr 
    return df

def calUpNumber(x):
    data = x.iloc[1:]
    data_shift = x.shift(1).iloc[2:]
    new = data_shift - data
    return new[new > 0].count()
        
def returnUpDate(df, n=13):
    number = pd.Series(dtype='float64')
    for id in SecuritiesCode:
        number = pd.concat([number, df[df.SecuritiesCode == id].Volume.rolling(window=n, min_periods=1).apply(calUpNumber, raw=False)], ignore_index = False)
    
    df[f'NumberUp_{n-1}'] = number 
    return df

def returnWillingness(df, n=26):
    will = pd.Series(dtype='float64')
    for id in SecuritiesCode:
        df1 = df[df.SecuritiesCode == id]
        df1['diff1'] = df1.High - df1.Close.shift(1)
        df1['diff2'] = df1.Close.shift(1) - df1.Low
        sum1 = df1.diff1.rolling(window=n, min_periods=1).sum()
        sum2 = df1.diff2.rolling(window=n, min_periods=1).sum()
        will = pd.concat([will, sum1/sum2], ignore_index=False)
    df['Willness'] = will
    return df

# Introduction
In this notebook, I will calculate some factors that are often used to track market trends in the stock and options markets.
Although our game is a predicted rate of change (according to Notebook1, Target's calculation is the rate of change of the closing price over a two-day period.), these factors are also often used to track trends, but the information they contain will still have some impact on the game. Help. I'm still looking through some really good notebooks with great Insights to get more match info to correct my factors.

> I may make some mistakes, please suggest corrections in comment

TODO
- I will gradually add missing information, such as the calculation method of each facotr
- Due to time issues, I haven't finished this notebook yet, but I should keep updating
- I will add some other factors to this notebook so that we can better track market trends
- It is also a matter of time. I will not test the performance of these factors in the model in the first few versions. If you have a test, please give your feedback


Please Vote if this notebook was helpful to you, and leave your helpful comments to help me improve my work

In [None]:
# data input
df = pd.read_csv('../input/jpx-tokyo-stock-exchange-prediction/train_files/stock_prices.csv')

# Calculate New Features
Let's start with the stock price and calculate some new factors to help us predict Target.
Each new factor is calculated from each stock.
-in the following metrics, because most of them involve window operations, the first day's data cannot be used.
The information of most factors comes from the information base of my country, which means that most of the factors have been proved to be reliable in the Chinese market, but we cannot guarantee that all factors are reliable in any market. we can still learn from these structural ideas.

## Basic Factors

### Daily Money Flow (Estimated)

In [None]:
df['money_flow'] = (df.Close + df.High + df.Low)/3 * df.Volume

### Last n-day Moving Average

By calculating the average of the past n days, the noise brought by some data can be reduced

In [None]:

SecuritiesCode = np.sort(df.SecuritiesCode.unique())
df = df.sort_values(by=['SecuritiesCode', 'Date'])


In [None]:
df = returnMAE(df, 5) 
df = returnMAE(df, 10) 
df = returnMAE(df, 20) 
df = returnMAE(df, 60) 
df = returnMAE(df, 120) 

In [None]:
fig = px.line(df[df.SecuritiesCode == 1301], x='Date', y=['Close', 'average60', 'average120'])
fig
fig.update_layout(
            title='Moving Average')

### Exponential Moving Average
Sometimes we want the most recent date to have a larger weight, this time using an exponential moving average
> I'm looking for some simple way to calculate the intraday exponential average of $n$, if you know the method, please leave a message in the comment section~

In [None]:
df = returnEWM(df)

In [None]:
fig = px.line(df[df.SecuritiesCode == 1301], x='Date', y=['Close', 'EWM', 'average120'])
fig


### Bollinger Bands

Bollinger Bands are used to judge the future trend of a stock. If a stock falls below the Bollinger Bands, we believe that the stock is likely to show a downward trend in the future.

In [None]:
df = returnBollDown(df,n=20)

In [None]:
fig = px.line(df[df.SecuritiesCode == 1301], x='Date', y=['Close', 'bollDown', 'bollUp'])
fig

## Emotional Factors
As we all know, most investors are chasing the rise and fall, so there are mainly trading volume, and the rise and fall of the situation can be used to build some factors to track market sentiment.

### Rise Day Number


In [None]:
df = returnUpDate(df)

### Volume Moving Average


In [None]:
df = returnMAEVolume(df)

This picture can clearly show that trading volume-related characteristics can express a certain market sentiment.

Each block in this chart represents the number of days that have risen in the past 12 days, and we can see that there are a large number of transactions when the number of days is 4, 5, 6, 7, which shows that investors are chasing the rally at this time, and the trading volume is gradually decreasing after the number of days is 8, 9. It shows that the choice of most investors for this situation is conservative.

In [None]:

fig1 = px.histogram(df[df.SecuritiesCode == 1301], x='Date', y=['MAEVolume_12', 'Volume'], facet_col='NumberUp_12',)
fig1.for_each_annotation(lambda x: x.update(text=x.text.split('=')[-1]))
fig1


### Volume Standard Deviation Moving Average

In [None]:
df = returnSTDVolume(df)
df = returnSTDVolume(df, 20)

Now let's take a look at the fluctuation of the volume, obviously, when the number of days is 4, 5, 6, 7, the volume fluctuates greatly.

Note that when the number of days is 8, the volatility is also very large, which means that there is a certain panic in the market, and there is a lot of selling.

In [None]:
fig = px.line(df[df.SecuritiesCode == 1301], x='Date', y=['STDVolume_10', 'STDVolume_20'], facet_col='NumberUp_12')
fig

### Investor Willingness

We measure investor willingness over the past N days Close, High, and Low.



In [None]:
df = returnWillingness(df)

In [None]:
fig = px.line(df[df.SecuritiesCode == 1301], x='Date', y=['Willness'], facet_col='NumberUp_12')
fig

In [None]:
# df.to_csv('./data_preprocess.csv')