# Can you make big money on Chinese stock IPO?

**Author**: [Spark Tseung](https://sparktseung.com)

**Source Code**: [github](https://github.com/sparktseung/Chinese-IPO) 

**Last Modified**: Sept 10, 2020

## Introduction

About two years ago, I started to have some extra money for investing. After a few trial-and-errors, I have resorted to regular purchase of a low-fee Exchange Traded Fund (ETF) tracking the US market index - the cost of "beating the market" just seems too high, and an average of 7~8% per year is already a very decent return.

I have been all content with that until a friend of mine mentioned the unbelievably high return on purchasing the Initial Public Offerings (IPO) of Chinese stocks. He claimed that, if you are lucky enough to be assigned some units during an IPO, the return on the first day of public trading is quite significant (e.g. 100%+). However, it is generally quite difficult to obtain those IPOs in the first place, so it is almost like buying a lottery.

Unfamiliar with, and intentially staying away from, the Chinese stock market, I was quite suspicious at first, but a quick search on the Internet has my confidence shaken. Therefore, being a student in statistics, I naturally started to dig into a pool of data, trying to find out if one can make some big money from Chinese stock IPOs. More specifically, I would like to see if one can make profits by obtaining stocks from IPOs and then selling them shortly after the public trading starts.

## Getting the Data

There are not a lot of data source available for the issue prices of IPOs in the Chinese market. The best I can find is the `akshare` package in Python, which in turn pulls IPO data from [eastmoney](http://data.eastmoney.com/xg/xg/dxsyl.html), a Chinese stock broker. They have three datasets of IPO, `sh` for (mostly) bluechip stocks traded on the Shanghai Stock Exchange, and `zxb`/`cyb` for smaller companies traded on the Shenzhen Stock Exchange. The first few rows are shown below.

In [2]:
# Load required packages
import akshare as ak
import numpy as np
import pandas as pd
import datetime as dt

# https://stackoverflow.com/questions/31517194/how-to-hide-one-specific-cell-input-or-output-in-ipython-notebook
# This cell is tagged as "hidden_cells", so no input or output will show in the html file
# python m nbconvert --to html Chinese-IPO.ipynb
# --TagRemovePreprocessor.remove_cell_tags="{'hidden_cell'}" --TagRemovePreprocessor.remove_all_outputs_tags="{'no_output'}" --TagRemovePreprocessor.remove_input_tags="{'no_input'}" 
# --no-prompt
# Cell tags: hidden_cell, no_output, no_input
# Start jupyterlab to edit the tags

In [3]:
# IPO data for stocks traded on Shanghai Stock Exchange
# Mostly blue-chip stocks
ipo_shzb_df = ak.stock_em_dxsyl(market="上海主板")
# IPO data for stocks traded on Shenzhen Stock Exchange
# Mostly smaller companies
ipo_szzx_df = ak.stock_em_dxsyl(market="中小板")
ipo_szcy_df = ak.stock_em_dxsyl(market="创业板")

100%|██████████| 15/15 [00:07<00:00,  2.06it/s]
100%|██████████| 13/13 [00:08<00:00,  1.54it/s]
100%|██████████| 17/17 [00:07<00:00,  2.20it/s]


In [4]:
ipo_shzb_df.head()

Unnamed: 0,股票代码,股票简称,发行价,最新价,网上发行中签率,网上有效申购股数,网上有效申购户数,网上超额认购倍数,网下配售中签率,网下有效申购股数,网下有效申购户数,网下配售认购倍数,总发行数量,开盘溢价,首日涨幅,打新收益,上市日期,市场
0,605358,N立昂微,4.92,7.08,0.03197,114224888000,15990041,3127.56,0.00446855,90812500000,9112,22378.63,40580000,0.1992,0.439,,2020-09-11,sh
1,605009,N豪悦,62.26,89.65,0.02382,100758868000,15783007,4197.76,0.01456494,18311100000,9316,6865.8,26670000,0.2,0.4399,,2020-09-11,sh
2,605003,众望布艺,25.75,31.38,0.02346,84382582000,15347203,4261.75,0.01675539,13130100000,8208,5968.23,22000000,0.44,0.44,0.01,2020-09-08,sh
3,601702,华峰铝业,3.69,7.77,0.08734,257234929000,15063983,1144.96,0.0166057,150327900000,9346,6022.03,249630000,0.2005,0.439,0.09,2020-09-07,sh
4,605006,山东玻纤,3.84,9.8,0.04813,186996036000,15894809,2077.73,0.00788046,126896100000,9114,12689.61,100000000,0.2005,0.4401,,2020-09-03,sh


In [5]:
# Combine three datasets
ipo_df = pd.concat([ipo_shzb_df, ipo_szzx_df, ipo_szcy_df], ignore_index = True, sort = False)
# Rename columns in English
ipo_df.rename(columns={"股票代码": "ticker", "股票简称": "name",
                       "发行价": "price_issue", "最新价": "price_latest",
                       "网上发行中签率": "prob_online", "网上有效申购股数": "sub_size_online",
                       "网上有效申购户数": "subs_online","网上超额认购倍数": "over_online",
                       "网下配售中签率": "prob_offline", "网下有效申购股数": "sub_size_offline",
                       "网下有效申购户数": "subs_offline","网下配售认购倍数": "over_offline",
                       "总发行数量": "size_total", 
                       "开盘溢价": "list_premium", "首日涨幅": "return_firstday", 
                       "打新收益": "return_ipo", 
                       "上市日期": "list_date", "市场": "market"
                       }, inplace = True)
# Filter by date: 2010-2019
ipo_df['list_date'] = pd.to_datetime(ipo_df['list_date'])
drop_idx = ipo_df[ipo_df['list_date'] > dt.datetime(2019, 12, 31)].index
ipo_df.drop(drop_idx , inplace = True)

After a quick view of the website, it seems that they have IPO data starting from year 2010. I will combine the three datasets, and change the column names in English. Since the datasets are frequently updated, we will focus on the decade of 2010-2019 only. The combined and cleaned data set has 1967 records, and the variables are described as follows.

| Chinese          | English          | Description                                                                               |
|:------------------|:------------------|:-------------------------------------------------------------------------------------------|
| 股票代码         | ticker           | Ticker of the stock                                                                       |
| 股票简称         | name             | Name of the company                                                                       |
| 发行价           | price_issue      | Issue price (in CNY) of IPO, i.e. how much you pay per share before it goes public        |
| 最新价           | price_latest     | Latest trading price (in CNY)                                                             |
| 网上发行中签率   | prob_online      | Probability (in %) of successfully getting some IPO stocks, online application            |
| 网上有效申购股数 | sub_size_online  | Number of shares requested by potential IPO buyers, online application                    |
| 网上有效申购户数 | subs_online      | Number of potential IPO buyers, online application                                        |
| 网上超额认购倍数 | over_online      | How many people are competing for one successful online application, i.e. 1/prob_online   |
| 网下发行中签率   | prob_offline     | Probability (in %) of successfully getting some IPO stocks, offline application           |
| 网下有效申购股数 | sub_size_offline | Number of shares requested by potential IPO buyers, offline application                   |
| 网下有效申购户数 | subs_offline     | Number of potential IPO buyers, offline application                                       |
| 网下超额认购倍数 | subs_offline     | How many people are competing for one successful offline application, i.e. 1/prob_offline |
| 总发行数量       | size_total       | Total number of shares issued in IPO                                                      |
| 开盘溢价         | list_premium     | Premium (in decimal) of first-day trading, i.e. (first-day price / IPO price) - 1         |
| 首日涨幅         | return_firstday  | Price increase (in decimal) of first-day trading, i.e. (close/open) - 1 on the first day  |
| 打新收益         | return_ipo       | Return on IPO (according to some formula by eastmoney)                                    |
| 上市日期         | list_date        | Date of IPO                                                                               |
| 市场             | market           | Market of IPO                                                                             |