# Project: Algorithm Trading

This is an experimental practical project of **algorithm trading**. In this project, we used stocks from the Shenzhen Stock Exchange (SSE) as the research subject and provided various **framework-based** operational practices for algorithmic trading. It's important to note that such attempts are quite preliminary but hold a certain level of enlightening significance.

In [1]:
import pandas as pd
import numpy as np
import tushare as ts
import os 
from tqdm import tqdm
import glob
ts.set_token('0948b6427e46bb1bc7fa60b52df96ab0080b9d4ff80209a5fee99277')
debug = False
from datetime import datetime 
pro = ts.pro_api()

## 1. Data Downloading

In this section, we download data from api offered by *tushare*. It offers comprehensive information about financial securities and economics situations. For detailed ducuments, refer to the [official website](https://tushare.pro/document/2).

Here is how we could get **basic stock information**:

In [2]:
trade_date = '20230808'
stocklist = pro.stock_basic(exchange='', list_status='L', 
                            fields='ts_code,symbol,name,area,industry,list_date')
stocklist.head()

Unnamed: 0,ts_code,symbol,name,area,industry,list_date
0,000001.SZ,1,平安银行,深圳,银行,19910403
1,000002.SZ,2,万科A,深圳,全国地产,19910129
2,000004.SZ,4,国华网安,深圳,软件服务,19910114
3,000005.SZ,5,ST星源,深圳,环境保护,19901210
4,000006.SZ,6,深振业A,深圳,区域地产,19920427


Here is how we could consult **trading days** in the given range:

In [3]:
trade_cal = pro.trade_cal(exchange='', start_date='20230808', end_date='20230808')
trade_cal = trade_cal[trade_cal.is_open==1] 
trade_cal.head()

Unnamed: 0,exchange,cal_date,is_open,pretrade_date
0,SSE,20230808,1,20230807


To consider dividends and splits, we should refer to both **adjusted** and **unadjusted** prices.

In [4]:
price_list = []
for trade_date in trade_cal.cal_date:
    daily_price = [] 
    for code in tqdm(stocklist.ts_code.head(30)):  # 'tqdm' creates a bar to show progress
        single_price = ts.pro_bar(ts_code=code, adj=None, # unadjusted price
                                  start_date=trade_date, 
                                  end_date=trade_date)
    
        daily_price.append(single_price)
    daily_price = pd.concat(daily_price)
    price_list.append(daily_price)
    
# merges data into a DataFrame
price_list = pd.concat(price_list).reset_index(drop=True)

# sorted by 'code' and 'date' and save it locally
price_list = price_list.drop_duplicates().sort_values(['ts_code','trade_date']).reset_index(drop=True)
price_list.to_csv(f'price_unadjusted.csv', index=False)

100%|███████████████████████████████████████████| 30/30 [00:06<00:00,  4.52it/s]


In [5]:
price_list = []
for trade_date in trade_cal.cal_date:
    daily_price = [] 
    for code in tqdm(stocklist.ts_code.head(30)):  # 'tqdm' creates a bar to show progress
        single_price = ts.pro_bar(ts_code=code, adj='qfq', # pre-adjusted price
                                  start_date=trade_date, 
                                  end_date=trade_date)
        
        daily_price.append(single_price)
    daily_price = pd.concat(daily_price)
    price_list.append(daily_price)
    
# merge data into a DataFrame
price_list = pd.concat(price_list).reset_index(drop=True)

# sorted by 'code' and 'date' and save it locally 
price_list = price_list.drop_duplicates().sort_values(['ts_code','trade_date']).reset_index(drop=True)
price_list.to_csv(f'price_pre_adjusted.csv', index=False)

100%|███████████████████████████████████████████| 30/30 [00:19<00:00,  1.56it/s]


## 2. Basic factors implementations

In [6]:
import numpy as np
import pandas as pd
from numpy import abs
from numpy import log
from numpy import sign
from scipy.stats import rankdata

In [None]:
def ts_sum(df, window=10):
    '''return sum in a specific time period'''
    return df.rolling(window).sum()

def sma(df, window=10):
    '''return mean value in a specific time period'''
    return df.rolling(window).mean()

def stddev(df, window=10):
    '''return standard deviation in a specific time period'''
    return df.rolling(window).std()

def correlation(x, y, window=10):
    '''return correlation of two variables in a specific time period'''
    return x.rolling(window).corr(y)

def covariance(x, y, window=10):
    '''return covariance of two variables in a specific time period'''
    return x.rolling(window).cov(y)

def rolling_rank(na):
    '''return the rank of the last item put into the array'''
    return rankdata(na)[-1]

def ts_rank(df, window=10):
    '''return the rank of every column in the DataFrame'''
    return df.rolling(window).apply(rolling_rank)

def rolling_prod(na):
    '''return multiplication of all the elements in an array'''
    return np.prod(na)

def product(df, window=10):
    '''return the mulpication value of every column in a DataFrame'''
    return df.rolling(window).apply(rolling_prod)

def ts_min(df, window=10):
    '''return the minimum value of a specific time period'''
    return df.rolling(window).min()

def ts_max(df, window=10):
    '''return the maximum value of a specific time period'''
    return df.rolling(window).max()

def ts_median(df, window=10):
    '''return the median value of a specific time period'''
    return df.rolling(window).median()


def delta(df, period=1):
    '''return the differnce between items in the interval'''
    return df.diff(period)

def delay(df, period=1):
    '''return the corresponding value before the interval'''
    return df.shift(period)

def rank(df):
    '''return the percentage, hiven the rank of every row item sorted by column data'''
    return df.rank(axis=1, pct=True)
#   return df.rank(pct=True)

def scale(df, k=1):
    return df.mul(k).div(np.abs(df).sum())

def ts_argmax(df, window=10):
    return df.rolling(window).apply(np.argmax) + 1 

def ts_argmin(df, window=10):
    return df.rolling(window).apply(np.argmin) + 1

def decay_linear(df, period=10):
    if df.isnull().values.any():
        df.fillna(method='ffill', inplace=True)
        df.fillna(method='bfill', inplace=True)
        df.fillna(value=0, inplace=True)
    na_lwma = np.zeros_like(df)
    na_lwma[:period, :] = df.iloc[:period, :] 
    na_series = df.values
    divisor = period * (period + 1) / 2
    y = (np.arange(period) + 1) * 1.0 / divisor
    for row in range(period - 1, df.shape[0]):
        x = na_series[row - period + 1: row + 1, :]
        na_lwma[row, :] = (np.dot(x.T, y))
    return pd.DataFrame(na_lwma, index=df.index, columns=df.columns) 