<a href="https://colab.research.google.com/github/kundigagandeep/IPL-Auction/blob/main/IPL_Auction_SQL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Steps to be taken


1.   Install required packages (pandasql)
2.   Download dataset from Github
3.   Explore data
4.   Come up with questions
5.   Try and answer them(Keep it simple at the beginning)

In [1]:
#!pip install pandas
!pip install pandasql

Collecting pandasql
  Downloading pandasql-0.7.3.tar.gz (26 kB)
Building wheels for collected packages: pandasql
  Building wheel for pandasql (setup.py) ... [?25l[?25hdone
  Created wheel for pandasql: filename=pandasql-0.7.3-py3-none-any.whl size=26784 sha256=151f6682009efb3dee2c2d4a8fe32c412674e07a312cb1d02d5cde6da948e531
  Stored in directory: /root/.cache/pip/wheels/5c/4b/ec/41f4e116c8053c3654e2c2a47c62b4fca34cc67ef7b55deb7f
Successfully built pandasql
Installing collected packages: pandasql
Successfully installed pandasql-0.7.3


In [2]:
import pandas as pd
from pandasql import sqldf

# Download Dataset

In [3]:
!git clone https://github.com/kundigagandeep/IPL-Auction.git

Cloning into 'IPL-Auction'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (3/3), done.


# Open and view dataset

In [5]:
df = pd.read_csv("/content/IPL-Auction/ipl_2022_dataset.csv", index_col=0)

df.head()

Unnamed: 0,Player,Base Price,TYPE,COST IN ₹ (CR.),Cost IN $ (000),2021 Squad,Team
0,Rashid Khan,Draft Pick,BOWLER,15.0,1950.0,SRH,Gujarat Titans
1,Hardik Pandya,Draft Pick,ALL-ROUNDER,15.0,1950.0,MI,Gujarat Titans
2,Lockie Ferguson,2 Cr,BOWLER,10.0,1300.0,KKR,Gujarat Titans
3,Rahul Tewatia,40 Lakh,ALL-ROUNDER,9.0,1170.0,RR,Gujarat Titans
4,Shubman Gill,Draft Pick,BATTER,8.0,1040.0,KKR,Gujarat Titans


# Data Exploration

In [8]:
#Total number of rows and columns
print('Total number of Rows: ', df.shape[0])
print('Total number of Columns: ', df.shape[1])

Total number of Rows:  633
Total number of Columns:  7


In [18]:
#Data Types

df.dtypes

Player              object
Base Price          object
TYPE                object
COST IN ₹ (CR.)    float64
Cost IN $ (000)    float64
2021 Squad          object
Team                object
dtype: object

In [12]:
#Unique Values in 'Base Price' Column

df['Base Price'].unique()

array(['Draft Pick', '2 Cr', '40 Lakh', '20 Lakh', '1 Cr', '75 Lakh',
       '50 Lakh', '30 Lakh', 'Retained', '1.5 Cr'], dtype=object)

In [13]:
#Distribution of values inside column 'Base Price'

df['Base Price'].value_counts()

20 Lakh       344
50 Lakh       104
2 Cr           48
1 Cr           33
Retained       27
75 Lakh        26
1.5 Cr         20
40 Lakh        16
30 Lakh         9
Draft Pick      6
Name: Base Price, dtype: int64

In [14]:
#Unique Values in 'TYPE' Column

df['TYPE'].unique()

array(['BOWLER', 'ALL-ROUNDER', 'BATTER', 'WICKETKEEPER'], dtype=object)

In [16]:
#Distribution of values inside column 'Base Price'

df['TYPE'].value_counts(normalize=True)

ALL-ROUNDER     0.382306
BOWLER          0.339652
BATTER          0.176935
WICKETKEEPER    0.101106
Name: TYPE, dtype: float64

# Questions to Answer
1. Top 3 batsman who got paid the most? <br>
2. Top 5 bowlers who got paid the most? <br>
3. Highest paid all-rounders? <br>
4. Average pay for Batsman, Bowler, All-Rounder, Wicket-Keeper? <br>
5. List of Retained players with Salary? <br>

# Data Transformation

In [22]:
#Rename columns and save it in the variable df2

df2=df.rename(columns={'Player':'player',
                       'Base Price':'base_price',
                       'TYPE':'type',
                       'COST IN ₹ (CR.)':'cost_inr',
                       'Cost IN $ (000)': 'cost_usd',
                       '2021 Squad':'2021_team',
                       'Team':'2022_team'})

In [26]:
#Dropping USD Column

df3 = df2.drop(['cost_usd'],axis=1)

In [27]:
#Check updated Dataframe

df3.head()

Unnamed: 0,player,base_price,type,cost_inr,2021_team,2022_team
0,Rashid Khan,Draft Pick,BOWLER,15.0,SRH,Gujarat Titans
1,Hardik Pandya,Draft Pick,ALL-ROUNDER,15.0,MI,Gujarat Titans
2,Lockie Ferguson,2 Cr,BOWLER,10.0,KKR,Gujarat Titans
3,Rahul Tewatia,40 Lakh,ALL-ROUNDER,9.0,RR,Gujarat Titans
4,Shubman Gill,Draft Pick,BATTER,8.0,KKR,Gujarat Titans


# Setting up mysql function to run queries

**Basics** <br>

The main function used in pandasql is sqldf. sqldf accepts 2 parametrs - a sql query string - an set of session/environment variables (locals() or **globals()**) <br>

Specifying **locals()** or **globals()** can get tedious. You can defined a short helper function to fix this.

In [29]:
mysql = lambda q: sqldf(q, globals())

# Question 1 - Name top 3 batsman who got paid the most?

In [31]:
mysql("""SELECT player, cost_inr FROM df3 WHERE type = 'BATTER' ORDER BY 2 DESC LIMIT 3""")

Unnamed: 0,player,cost_inr
0,Rohit Sharma,16.0
1,Virat Kohli,15.0
2,Kane Williamson,14.0


# Question 2 - Name top 5 bowlers who get paid the most?

In [None]:
#2-Top 5 bowlers who got paid the most?

mysql("""SELECT player, cost_inr FROM df3 WHERE type = 'BOWLER' ORDER By 2 DESC LIMIT 5 """)

Unnamed: 0,player,cost_inr
0,Rashid Khan,15.0
1,Deepak Chahar,14.0
2,Jasprit Bumrah,12.0
3,Shardul Thakur,10.75
4,Lockie Ferguson,10.0


# Practice Question 1- Name top 5 All-Rounders who get paid the most?

In [None]:
mysql("""    """)

# Question 3 - Name 5 lowest paid wicket-keeper?

In [35]:
mysql(""" SELECT player, cost_inr FROM df3 WHERE type = 'WICKETKEEPER' AND cost_inr is not null ORDER BY 2 LIMIT 5 """)

Unnamed: 0,player,cost_inr
0,N. Jagadeesan,0.2
1,Baba Indrajith,0.2
2,Jitesh Sharma,0.2
3,Aryan Juyal,0.2
4,Luvnith Sisodia,0.2


# Question 4 - What is the Average pay for Batsman, Bowler, All-Rounder, Wicket-Keeper?

In [36]:
mysql("""SELECT type, round(avg(cost_inr),2) average_price FROM df3 GROUP BY 1 ORDER BY 2 DESC""")

Unnamed: 0,type,average_price
0,WICKETKEEPER,5.09
1,BATTER,4.11
2,ALL-ROUNDER,3.61
3,BOWLER,3.07


# Question 5 - List of Retained players with team name and salary?

In [None]:
mysql("""SELECT player, cost_inr FROM df3 WHERE base_price ="Retained" ORDER BY 2 DESC LIMIT 5 """)

Unnamed: 0,player,cost_inr
0,Ravindra Jadeja,16.0
1,Rishabh Pant,16.0
2,Rohit Sharma,16.0
3,Virat Kohli,15.0
4,Sanju Samson,14.0
