# Indian IPOs Market Listing Gains Prediction
This is a guided project from [Dataquest](https://app.dataquest.io/c/143/m/798/guided-project%3A-predicting-listing-gains-in-the-indian-ipo-market-using-tensorflow/1/loading-the-data) which practises Deep Learning Classification algorithm to predict whether an IPO would have a listing gain.

1. [Import Libraries & Dataset](#library)
2. [Exploratory Data Analysis](#eda)  
&emsp;2.1 [Date](#date)

# Import Libraries & Dataset<a id='library'></a>

In [68]:
# array and dataframe
import numpy as np 
import pandas as pd 
# visualization
import seaborn as sns
import matplotlib.pyplot as plt
# deep learning algorithm
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [69]:
df = pd.read_csv("https://raw.githubusercontent.com/moscmh/indian_ipo_market/main/Indian_IPO_Market_Data.csv")
print("Number of rows & columns:", df.shape)
df.head()

Number of rows & columns: (319, 9)


Unnamed: 0,Date,IPOName,Issue_Size,Subscription_QIB,Subscription_HNI,Subscription_RII,Subscription_Total,Issue_Price,Listing_Gains_Percent
0,03/02/10,Infinite Comp,189.8,48.44,106.02,11.08,43.22,165,11.82
1,08/02/10,Jubilant Food,328.7,59.39,51.95,3.79,31.11,145,-84.21
2,15/02/10,Syncom Health,56.25,0.99,16.6,6.25,5.17,75,17.13
3,15/02/10,Vascon Engineer,199.8,1.12,3.65,0.62,1.22,165,-11.28
4,19/02/10,Thangamayil,0.0,0.52,1.52,2.26,1.12,75,-5.2


* `Date`: date when the IPO was listed
* `IPOName`: name of the IPO
* `Issue_Size`: size of the IPO issue, in INR Crores
* `Subscription_QIB`: number of times the IPO was subscribed by the QIB (Qualified Institutional Buyer) investor category
* `Subscription_HNI`: number of times the IPO was subscribed by the HNI (High Networth Individual) investor category
* `Subscription_RII`: number of times the IPO was subscribed by the RII (Retail Individual Investors) investor category
* `Subscription_Total`: total number of times the IPO was subscribed overall
* `Issue_Price`: the price in INR at which the IPO was issued
* `Listing_Gains_Percent`: is the percentage gain in the listing price over the issue price

# Exploratory Data Analysis<a id='eda'></a>

In [70]:
df.describe(include='all')

Unnamed: 0,Date,IPOName,Issue_Size,Subscription_QIB,Subscription_HNI,Subscription_RII,Subscription_Total,Issue_Price,Listing_Gains_Percent
count,319,319,319.0,319.0,319.0,319.0,319.0,319.0,319.0
unique,287,319,,,,,,,
top,16/08/21,Infinite Comp,,,,,,,
freq,4,1,,,,,,,
mean,,,1192.859969,25.684138,70.091379,8.561599,27.447147,375.128527,4.742696
std,,,2384.643786,40.716782,142.454416,14.50867,48.772203,353.897614,47.650946
min,,,0.0,0.0,0.0,0.0,0.0,0.0,-97.15
25%,,,169.005,1.15,1.255,1.275,1.645,119.0,-11.555
50%,,,496.25,4.94,5.07,3.42,4.93,250.0,1.81
75%,,,1100.0,34.635,62.095,8.605,33.395,536.0,25.31


In [71]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 319 entries, 0 to 318
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Date                   319 non-null    object 
 1   IPOName                319 non-null    object 
 2   Issue_Size             319 non-null    float64
 3   Subscription_QIB       319 non-null    float64
 4   Subscription_HNI       319 non-null    float64
 5   Subscription_RII       319 non-null    float64
 6   Subscription_Total     319 non-null    float64
 7   Issue_Price            319 non-null    int64  
 8   Listing_Gains_Percent  319 non-null    float64
dtypes: float64(6), int64(1), object(2)
memory usage: 22.6+ KB


&emsp;There is no missing values. `IPOName` is removed as it contains all unique values. Binary values 0 and 1 are extracted from `Listing_Gains_Percent` when negative and positive respectively.

In [72]:
df.drop(columns='IPOName', inplace=True)

In [73]:
df['gain'] = df['Listing_Gains_Percent'].apply(lambda x: 0 if x <= 0 else 1)

## `Date`<a id='date'></a>

In [74]:
df['Date '] = pd.to_datetime(df['Date '], format='%d/%m/%y')
df.rename({'Date ':'Date'}, inplace=True)
df.head()

Unnamed: 0,Date,Issue_Size,Subscription_QIB,Subscription_HNI,Subscription_RII,Subscription_Total,Issue_Price,Listing_Gains_Percent,gain
0,2010-02-03,189.8,48.44,106.02,11.08,43.22,165,11.82,1
1,2010-02-08,328.7,59.39,51.95,3.79,31.11,145,-84.21,0
2,2010-02-15,56.25,0.99,16.6,6.25,5.17,75,17.13,1
3,2010-02-15,199.8,1.12,3.65,0.62,1.22,165,-11.28,0
4,2010-02-19,0.0,0.52,1.52,2.26,1.12,75,-5.2,0
