## stockAI
stockAI is an integrated library for stock price big data research. From data loading to data preprocessing, machine learning model learning to backtesting, everything is available in this library.

This file shows the overall flow using stockAI.

In [1]:
import os
import sys
sys.path.append("/project/stockAI/github/main/stockAI")
import stockAI as sai

# 1. Data Loading
>1\) Bringing up markets by country: `sai.get_markets`  
2) Bringing up stocks by market: `sai.get_tickers`  
3) Load Data: `sai.load_data` 

<br>

StockAI has a total of 69 stock market share data available. The list of available markets is as follows:

`'KOSPI', 'KOSDAQ', 'KONEX', 'PNK', 'NMS', 'NYQ', 'ASX', 'NCM', 'NGM', 'ASE', 'JKT', 'GER', 'PAR', 'TOR', 'BRU', 'OBB', 'BUE', 'LSE', 'KLS', 'PCX', 'VAN', 'AMS', 'EBS', 'TWO', 'OSL', 'STO', 'CPH', 'VIE', 'SAO', 'SES', 'NSI', 'MEX', 'HKG', 'NZE', 'MCE', 'BSE', 'ISE', 'FRA', 'MCX', 'MIL', 'MUN', 'ATH', 'BER', 'ENX', 'STU', 'LIS', 'TLV', 'DUS', 'IST', 'HAM', 'HAN', 'TAL', 'TLO', 'SET', 'TAI', 'HEL', 'ICE', 'IOB', 'CNQ', 'RIS', 'KSC', 'LIT', 'KOE', 'DOH', 'EUX', 'SHH', 'CCS', 'MDD', 'SHZ', 'MAD'`



 

### 1) Bringing up markets by country: `sai.get_markets(country:list)`  

### 2) Bringing up stocks by market: `sai.get_tickers(date:list, tickers:list=None)`  

In [None]:
lst_tickers = sai.get_tickers(markets=['KOSPI'], date=2016)
print(len(lst_tickers), lst_tickers[:5])

### 3) Load Data: `sai.load_data(date:list, tickers:list=None)`  

In [None]:
raw_data = sai.load_data(date=['2016-01-01', '2021-12-31'], tickers=lst_tickers, )
print(raw_data.shape)
raw_data.head()

# 2. Data Preprocessing
>1\) Add Secondary Indicators: `sai.add_index`  
2) scaling: `sai.scaling`  
3) Convert to time series data: `sai.time_series`

### 1) Add Secondary Indicators: `sai.add_index(data:pd.DataFrame(), index_list:list)`  

- **Transaction price (end price * volume), the rate of change in the closing price of the next day (dependent variable)**  
'trading_value','next_change'

- **TA Package Delivery Assistance Indicators**  
'MA5', 'MA20', 'MA60','MA120', 'MFI','ADI','OBV','CMF','FI','EOM_EMV','VPT','NVI','VMAP', 'BHB','BLB','KCH','KCL','KCM','DCH','DCL','DCM','UI', 'SMA','EMA','WMA','MACD','VIneg','VIpos','TRIX','MI','CCI', 'DPO','KST','Ichimoku','ParabolicSAR','STC', 'RSI','SRSI','TSI','UO','SR','WR','AO','KAMA','ROC','PPO','PVO'

You can add up to 49 auxiliary indicators as a function to add auxiliary indicators.  
( \* The default dependent variable set in stockAI is next_change, which is added here. )

In [None]:
check_index = ['MA5', 'MA20', 'MA60','MA120', 
             'trading_value','next_change',
             'MFI','ADI','OBV','CMF','FI','EOM_EMV','VPT','NVI','VMAP',
             'BHB','BLB','KCH','KCL','KCM','DCH','DCL','DCM','UI',
             'SMA','EMA','WMA','MACD','VIneg','VIpos','TRIX','MI','CCI','DPO','KST','Ichimoku','ParabolicSAR','STC',
             'RSI','SRSI','TSI','UO','SR','WR','AO','KAMA','ROC','PPO','PVO']

check_df = sai.add_index(data=raw_data, index_list=check_index)
check_df

In [None]:
check_df = check_df.drop(columns=['Market'])
check_df.shape

### 2) scaling: `sai.scaling(data:pd.DataFrame(), scaler_type:String, window_size:Int=None)`  

Standardization of stock price data offers four methods: the previously well-known minmax, standard, robust scalers, and the previous day's closing price to standardize.
- minmax
- standard
- robust
- div-close

In [None]:
check_scaled_KR = sai.scaling(data=check_df, scaler_type="div-close", window_size=None)
check_scaled_KR

### 3) Convert to time series data: `sai.time_series(data:pd.DataFrame(), day:Int=10)`

When machine learning model predicted that machine learning model prediction, D0 to generate n-day data (D-n-1, D-1, D-1, D-1, D-1 and D0).

For example, the example code below shows that transformed into the D0 standard 10 days series data.

In [None]:
df_time_series = sai.time_series(check_df, day=10)
df_time_series_scaled = sai.time_series(check_scaled_KR, day=10)
df_time_series_scaled

In [None]:
df_time_series.to_csv("time_series_0129.csv", index=False)
df_time_series_scaled.to_csv("time_series_scaled_0129.csv", index=False)

**\* If the data is large, it takes a long time to preprocess the data, so it stores the preprocessed data, and below, it loads the stored data and uses it.**

In [None]:
df_time_series = pd.read_csv("time_series_0129.csv")
df_time_series_scaled = pd.read_csv("time_series_scaled_0129.csv")

In [None]:
data = df_time_series # 스케일링 전 데이터
data_scaled = df_time_series_scaled # 스케일링 후 데이터

# train, test dataset 분리 
train_data = data[(data['Date'] >= '2017-01-01') & (data['Date'] <= '2020-12-31')]
test_data = data[(data['Date'] >= '2021-01-01') & (data['Date'] <= '2021-12-31')]

# train, test dataset 분리 (scaled) 
train_data_scaled = data_scaled[(data_scaled['Date'] >= '2017-01-01') & (data_scaled['Date'] <= '2020-12-31')]
test_data_scaled = data_scaled[(data_scaled['Date'] >= '2021-01-01') & (data_scaled['Date'] <= '2021-12-31')]

print(train_data.shape, test_data.shape)
print(train_data_scaled.shape, test_data_scaled.shape)

# 3. Trader 
>1\) Trader Definitions  
2) Save Dataset to Traders: `sai.save_dataset`  
3) Model fitting: `sai.trader_train`   
4) Model evaluation and threshold settings: `sai.get_eval_by_threshold`, `sai.set_threshold`

### 1) Trader Definitions

In [None]:
# trader 객체를 저장하는 리스트 
lst_trader = [] 

In [None]:
from lightgbm import LGBMClassifier

# conditional_buyer: 데이터 필터링 조건으로 매수를 결정하는 객체  
b1_pr = sai.conditional_buyer()

def sampling3(df): # 조건 함수 생성 
    condition1 = (-0.3 <= df.D0_Change) & (df.D0_Change <= 0.3) # 상한가, 하한가 초과하는 예외 제거 
    condition2 = df.D0_trading_value >= 1000000000 # 변동성 조건 1: 거래대금 10억 이상 
    condition3 = (-0.05 >= df.D0_Change) | (0.05 <= df.D0_Change) # 변동성 조건 2: 금일 주가 변화율 5%이상 
    condition = condition1 & condition2 & condition3
    return condition

b1_pr.condition = sampling3  # 조건 함수를 직접 정의하여(sampling1) condition 속성에 저장 


# machinelearning_buyer: 머신러닝 모델로 매수를 결정하는 객체 
b2_pr = sai.machinelearning_buyer()

# 사용자 정의 모델을 algorithm 속성에 저장 
scale_pos_weight = round(72/28 , 2)
params = {  'random_state' : 42,
            'scale_pos_weight' : scale_pos_weight,
            'learning_rate' : 0.1, 
            'num_iterations' : 1000,
            'max_depth' : 4,
            'n_jobs' : 30,
            'boost_from_average' : False,
            'objective' : 'binary' }

b2_pr.algorithm =  LGBMClassifier( **params )


# SubSeller: 다음 날 모두 매도를 결정하는 객체  
sell_all = sai.SubSeller() 


# Trader 객체   
t3 = sai.Trader()
t3.name = 'PororoLightGBM' # Trader의 이름
t3.label = 'class&0.02' # Trader 종속변수 설정 
t3.buyer = sai.Buyer([b1_pr, b2_pr]) # [ conditional buyer, machinelearning buyer ] 
t3.seller = sai.Seller(sell_all)

lst_trader.append(t3)

### 2) Save Dataset to Traders: `sai.save_dataset`  

### 3) Model fitting: `sai.trader_train`   

### 4) Model evaluation and threshold settings: `get_eval_by_threshold`, `set_threshold` 

# 4. Back-Testing
>1\) Making a sales log: `sai.decision`  
2) Simulation: Calculate the yield: `sai.simulation`  
3) Leader Board: `sai.leaderboard`   
4) Visualize Results: `sai.yield_plot`

### 1) Making a sales log: `sai.decision`  

### 2) Simulation: Calculate the yield: `sai.simulation`  

### 3) Leader Board: `sai.leaderboard`

### 4) Visualize Results: `sai.yield_plot`