# Introduction to Pandas in Python

## 1. What is Pandas?

Pandas is an open-source data analysis and data manipulation library for Python.
It provides powerful data structures such as Series and DataFrame, which allow 
for efficient handling and analysis of structured data.

Pandas is built on top of NumPy and is widely used for data preprocessing, 
cleaning, transformation, and analysis in data science and machine learning.

## Introduction to NumPy 

#### What is NumPy?

NumPy (Numerical Python) is a powerful Python library used for numerical computing. It provides support for multi-dimensional arrays and mathematical functions, making it an essential library for handling structured data before using Pandas.

#### Why NumPy?

1.Faster than Python lists for numerical operations

2.Efficient array manipulations

3.Provides mathematical and statistical functions

### Installing NumPy

##### If you haven’t installed NumPy, you can do so using:

In [830]:
!pip install numpy




### Importing NumPy

In [832]:
import numpy as np  # Import NumPy with an alias for easy usage

### Creating a NumPy Array

In [834]:
arr = np.array([1, 2, 3, 4, 5])
print("NumPy Array:", arr)

NumPy Array: [1 2 3 4 5]


### Basic Operations with NumPy

In [836]:
# Creating an array
arr = np.array([10, 20, 30, 40, 50])

In [837]:
# Performing basic arithmetic
print("Array + 5:", arr + 5)  # Adds 5 to each element
print("Array * 2:", arr * 2)  # Multiplies each element by 2
print("Mean of array:", np.mean(arr))  # Calculates mean
print("Standard Deviation:", np.std(arr))  # Calculates standard deviation

Array + 5: [15 25 35 45 55]
Array * 2: [ 20  40  60  80 100]
Mean of array: 30.0
Standard Deviation: 14.142135623730951


## 2. Installing and Importing Pandas
#### Installing Pandas (if not already installed)


In [839]:
#Uncomment and run the following line in a Jupyter Notebook cell if you haven't installed Pandas yet.
!pip install pandas



### Importing the Pandas Library

In [841]:
# Importing Pandas
import pandas as pd

### Checking Pandas Version

In [843]:
# Displaying Pandas version
print("Pandas Version:", pd.__version__)

Pandas Version: 2.2.2


## 3. Loading Data into a Pandas DataFrame

#### Reading a CSV File into a DataFrame

In [846]:
df=pd.read_csv("C:/Users/kuruv/Downloads/deliveries.csv")  # Loading dataset

In [847]:
df        # Displaying the DataFrame

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
2,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,,,
3,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
4,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,5,SS Iyer,AK Markram,VR Iyer,1,0,1,,0,,,
260916,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,6,VR Iyer,AK Markram,SS Iyer,1,0,1,,0,,,
260917,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,1,VR Iyer,Shahbaz Ahmed,SS Iyer,1,0,1,,0,,,
260918,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,2,SS Iyer,Shahbaz Ahmed,VR Iyer,1,0,1,,0,,,


## 4. Exploring the DataFrame

#### Displaying Data

In [850]:
df.head(100)    # Display first 100 rows

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
2,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,,,
3,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
4,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,15,2,DJ Hussey,AA Noffke,BB McCullum,1,0,1,,0,,,
96,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,15,3,BB McCullum,AA Noffke,DJ Hussey,2,0,2,,0,,,
97,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,15,4,BB McCullum,AA Noffke,DJ Hussey,0,0,0,,0,,,
98,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,15,5,BB McCullum,AA Noffke,DJ Hussey,1,0,1,,0,,,


In [851]:
df.tail(100)          # Display last 100 rows

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
260820,1426312,1,Sunrisers Hyderabad,Kolkata Knight Riders,13,1,PJ Cummins,CV Varun,H Klaasen,4,0,4,,0,,,
260821,1426312,1,Sunrisers Hyderabad,Kolkata Knight Riders,13,2,PJ Cummins,CV Varun,H Klaasen,0,1,1,byes,0,,,
260822,1426312,1,Sunrisers Hyderabad,Kolkata Knight Riders,13,3,H Klaasen,CV Varun,PJ Cummins,0,0,0,,0,,,
260823,1426312,1,Sunrisers Hyderabad,Kolkata Knight Riders,13,4,H Klaasen,CV Varun,PJ Cummins,2,0,2,,0,,,
260824,1426312,1,Sunrisers Hyderabad,Kolkata Knight Riders,13,5,H Klaasen,CV Varun,PJ Cummins,1,0,1,,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,5,SS Iyer,AK Markram,VR Iyer,1,0,1,,0,,,
260916,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,6,VR Iyer,AK Markram,SS Iyer,1,0,1,,0,,,
260917,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,1,VR Iyer,Shahbaz Ahmed,SS Iyer,1,0,1,,0,,,
260918,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,2,SS Iyer,Shahbaz Ahmed,VR Iyer,1,0,1,,0,,,


### Checking DataFrame Information

In [853]:
#  Display column names, data types, and missing values
print("DataFrame Info:")
df.info()

DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260920 entries, 0 to 260919
Data columns (total 17 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   match_id          260920 non-null  int64 
 1   inning            260920 non-null  int64 
 2   batting_team      260920 non-null  object
 3   bowling_team      260920 non-null  object
 4   over              260920 non-null  int64 
 5   ball              260920 non-null  int64 
 6   batter            260920 non-null  object
 7   bowler            260920 non-null  object
 8   non_striker       260920 non-null  object
 9   batsman_runs      260920 non-null  int64 
 10  extra_runs        260920 non-null  int64 
 11  total_runs        260920 non-null  int64 
 12  extras_type       14125 non-null   object
 13  is_wicket         260920 non-null  int64 
 14  player_dismissed  12950 non-null   object
 15  dismissal_kind    12950 non-null   object
 16  fielder           9354

### Summary Statistics

In [855]:
#Generate summary statistics for numerical columns
print("\nSummary Statistics:")
df.describe()


Summary Statistics:


Unnamed: 0,match_id,inning,over,ball,batsman_runs,extra_runs,total_runs,is_wicket
count,260920.0,260920.0,260920.0,260920.0,260920.0,260920.0,260920.0,260920.0
mean,907066.5,1.483531,9.197677,3.624486,1.265001,0.067806,1.332807,0.049632
std,367991.3,0.502643,5.683484,1.81492,1.639298,0.343265,1.626416,0.217184
min,335982.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
25%,548334.0,1.0,4.0,2.0,0.0,0.0,0.0,0.0
50%,980967.0,1.0,9.0,4.0,1.0,0.0,1.0,0.0
75%,1254066.0,2.0,14.0,5.0,1.0,0.0,1.0,0.0
max,1426312.0,6.0,19.0,11.0,6.0,7.0,7.0,1.0


### Checking Data Types of Columns

In [857]:
df.dtypes               # Display data types of each column

match_id             int64
inning               int64
batting_team        object
bowling_team        object
over                 int64
ball                 int64
batter              object
bowler              object
non_striker         object
batsman_runs         int64
extra_runs           int64
total_runs           int64
extras_type         object
is_wicket            int64
player_dismissed    object
dismissal_kind      object
fielder             object
dtype: object

## 5. Selecting Specific Columns in a DataFrame

### Selecting a Single Column

In [860]:
#Selecting Columns:
#Use df['column'] to select data.
df['inning']       # Selecting the 'inning' column

0         1
1         1
2         1
3         1
4         1
         ..
260915    2
260916    2
260917    2
260918    2
260919    2
Name: inning, Length: 260920, dtype: int64

### Selecting Multiple Columns

In [862]:
#Use df[['col1', 'col2']] to select multiple columns.
df[['total_runs', 'fielder']]

Unnamed: 0,total_runs,fielder
0,1,
1,0,
2,1,
3,0,
4,0,
...,...,...
260915,1,
260916,1,
260917,1,
260918,1,


### Selecting Multiple Columns for a Specific Row Range

In [864]:
#Selecting multiple columns for the rows from 260900 to 260919
df[['bowler', 'batter', 'batsman_runs']][260900:260919]

Unnamed: 0,bowler,batter,batsman_runs
260900,JD Unadkat,Rahmanullah Gurbaz,1
260901,JD Unadkat,VR Iyer,0
260902,JD Unadkat,VR Iyer,2
260903,JD Unadkat,VR Iyer,1
260904,JD Unadkat,Rahmanullah Gurbaz,4
260905,Shahbaz Ahmed,VR Iyer,2
260906,Shahbaz Ahmed,VR Iyer,0
260907,Shahbaz Ahmed,Rahmanullah Gurbaz,0
260908,Shahbaz Ahmed,Rahmanullah Gurbaz,6
260909,Shahbaz Ahmed,Rahmanullah Gurbaz,0


In [865]:
# Selecting 'bowler', 'batter', and 'batsman_runs' columns for every alternate row (step=2) within the positional index range from 260900 to 260919
df[['bowler', 'batter', 'batsman_runs']][260900:260919:2]

Unnamed: 0,bowler,batter,batsman_runs
260900,JD Unadkat,Rahmanullah Gurbaz,1
260902,JD Unadkat,VR Iyer,2
260904,JD Unadkat,Rahmanullah Gurbaz,4
260906,Shahbaz Ahmed,VR Iyer,0
260908,Shahbaz Ahmed,Rahmanullah Gurbaz,6
260910,Shahbaz Ahmed,SS Iyer,4
260912,AK Markram,VR Iyer,0
260914,AK Markram,VR Iyer,1
260916,AK Markram,VR Iyer,1
260918,Shahbaz Ahmed,SS Iyer,1


In [866]:
df[['bowler', 'batter', 'batsman_runs']][0:260919:200000]     # Selecting with a step of 200000

Unnamed: 0,bowler,batter,batsman_runs
0,P Kumar,SC Ganguly,0
200000,KA Pollard,AT Rayudu,1


## 6. Filtering Rows in a DataFrame

### Filtering Rows Based on a Condition

In [869]:
df[df['batting_team'] == 'Sunrisers Hyderabad']    # Select only rows where batting_team is 'Sunrisers Hyderabad'

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
76859,598000,1,Sunrisers Hyderabad,Pune Warriors,0,1,PA Reddy,MN Samuels,PA Patel,1,0,1,,0,,,
76860,598000,1,Sunrisers Hyderabad,Pune Warriors,0,2,PA Patel,MN Samuels,PA Reddy,1,0,1,,0,,,
76861,598000,1,Sunrisers Hyderabad,Pune Warriors,0,3,PA Reddy,MN Samuels,PA Patel,0,1,1,wides,0,,,
76862,598000,1,Sunrisers Hyderabad,Pune Warriors,0,4,PA Reddy,MN Samuels,PA Patel,0,0,0,,0,,,
76863,598000,1,Sunrisers Hyderabad,Pune Warriors,0,5,PA Reddy,MN Samuels,PA Patel,1,0,1,,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260848,1426312,1,Sunrisers Hyderabad,Kolkata Knight Riders,17,5,JD Unadkat,SP Narine,PJ Cummins,0,0,0,,1,JD Unadkat,lbw,
260849,1426312,1,Sunrisers Hyderabad,Kolkata Knight Riders,17,6,B Kumar,SP Narine,PJ Cummins,0,0,0,,0,,,
260850,1426312,1,Sunrisers Hyderabad,Kolkata Knight Riders,18,1,PJ Cummins,AD Russell,B Kumar,0,0,0,,0,,,
260851,1426312,1,Sunrisers Hyderabad,Kolkata Knight Riders,18,2,PJ Cummins,AD Russell,B Kumar,0,0,0,,0,,,


### Filtering Rows to Exclude Specific Teams


In [871]:
#Filtering Rows:
df[~df["batting_team"].isin(["Sunrisers Hyderabad", "Kolkata Knight Riders", "Royal Challengers Bangalore"])]
#This will return a DataFrame excluding rows where the batting_team is Sunrisers Hyderabad or Kolkata Knight Riders or Royal Challengers Bangalore.

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
225,335983,1,Chennai Super Kings,Kings XI Punjab,0,1,PA Patel,B Lee,ML Hayden,0,0,0,,0,,,
226,335983,1,Chennai Super Kings,Kings XI Punjab,0,2,PA Patel,B Lee,ML Hayden,0,0,0,,0,,,
227,335983,1,Chennai Super Kings,Kings XI Punjab,0,3,PA Patel,B Lee,ML Hayden,1,0,1,,0,,,
228,335983,1,Chennai Super Kings,Kings XI Punjab,0,4,ML Hayden,B Lee,PA Patel,0,0,0,,0,,,
229,335983,1,Chennai Super Kings,Kings XI Punjab,0,5,ML Hayden,B Lee,PA Patel,4,0,4,,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260731,1426311,2,Rajasthan Royals,Sunrisers Hyderabad,19,2,Dhruv Jurel,T Natarajan,TA Boult,4,0,4,,0,,,
260732,1426311,2,Rajasthan Royals,Sunrisers Hyderabad,19,3,Dhruv Jurel,T Natarajan,TA Boult,0,0,0,,0,,,
260733,1426311,2,Rajasthan Royals,Sunrisers Hyderabad,19,4,Dhruv Jurel,T Natarajan,TA Boult,0,0,0,,0,,,
260734,1426311,2,Rajasthan Royals,Sunrisers Hyderabad,19,5,Dhruv Jurel,T Natarajan,TA Boult,0,0,0,,0,,,


### Filtering Rows Where 'match_id' is Greater Than a Certain Value



In [873]:
df[df['match_id'] > 450000]
# Filtering the DataFrame to include only rows where 'match_id' is greater than 450000

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
41593,501198,1,Chennai Super Kings,Kolkata Knight Riders,0,1,S Anirudha,Iqbal Abdulla,M Vijay,0,0,0,,0,,,
41594,501198,1,Chennai Super Kings,Kolkata Knight Riders,0,2,S Anirudha,Iqbal Abdulla,M Vijay,1,0,1,,0,,,
41595,501198,1,Chennai Super Kings,Kolkata Knight Riders,0,3,M Vijay,Iqbal Abdulla,S Anirudha,0,0,0,,0,,,
41596,501198,1,Chennai Super Kings,Kolkata Knight Riders,0,4,M Vijay,Iqbal Abdulla,S Anirudha,0,0,0,,0,,,
41597,501198,1,Chennai Super Kings,Kolkata Knight Riders,0,5,M Vijay,Iqbal Abdulla,S Anirudha,4,0,4,,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,5,SS Iyer,AK Markram,VR Iyer,1,0,1,,0,,,
260916,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,6,VR Iyer,AK Markram,SS Iyer,1,0,1,,0,,,
260917,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,1,VR Iyer,Shahbaz Ahmed,SS Iyer,1,0,1,,0,,,
260918,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,2,SS Iyer,Shahbaz Ahmed,VR Iyer,1,0,1,,0,,,


## 7. Counting Values in a DataFrame

### Counting Occurrences of a Specific Value in a Column

In [876]:
df[df['batter'] == "SS Iyer"].shape[0]
# Count the number of times "SS Iyer" appears in the 'batter' column

2545

In [877]:
df['batter'].value_counts().get("SS Iyer", 0)
# Count the number of times "SS Iyer" appears in the 'batter' column

2545

## 8.Handling Missing Values

### Checking for Missing Values

In [880]:
# Check for missing values
print("\nMissing Values:")
df.isnull().sum()


Missing Values:


match_id                 0
inning                   0
batting_team             0
bowling_team             0
over                     0
ball                     0
batter                   0
bowler                   0
non_striker              0
batsman_runs             0
extra_runs               0
total_runs               0
extras_type         246795
is_wicket                0
player_dismissed    247970
dismissal_kind      247970
fielder             251566
dtype: int64

### Detecting Missing Values in the DataFrame

In [882]:
# Returns a boolean DataFrame indicating missing values
df.isnull()

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True
1,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,True,True
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,True
3,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,True,True
4,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,True,True
260916,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,True,True
260917,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,True,True
260918,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,True,True


### Removing Rows with Missing Values

In [884]:
df.dropna()
#Remove rows with missing values

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
6670,336010,1,Kolkata Knight Riders,Royal Challengers Bangalore,8,6,DJ Hussey,A Kumble,T Taibu,0,1,1,legbyes,1,DJ Hussey,run out,DW Steyn
9022,336020,1,Delhi Daredevils,Deccan Chargers,15,2,G Gambhir,PP Ojha,S Dhawan,0,1,1,wides,1,G Gambhir,stumped,AC Gilchrist
21707,392217,2,Chennai Super Kings,Rajasthan Royals,16,6,ML Hayden,SK Warne,S Badrinath,0,1,1,wides,1,ML Hayden,stumped,NV Ojha
21927,392218,2,Royal Challengers Bangalore,Mumbai Indians,13,2,RE van der Merwe,Harbhajan Singh,MV Boucher,0,1,1,wides,1,RE van der Merwe,stumped,YV Takawale
22808,392222,1,Kings XI Punjab,Mumbai Indians,17,6,K Goel,SL Malinga,PP Chawla,0,1,1,wides,1,PP Chawla,run out,YV Takawale
24387,392228,2,Deccan Chargers,Kolkata Knight Riders,17,1,A Symonds,Mashrafe Mortaza,RG Sharma,0,1,1,noballs,1,A Symonds,run out,Mashrafe Mortaza
29246,419114,2,Delhi Daredevils,Mumbai Indians,8,5,KD Karthik,Harbhajan Singh,MF Maharoof,0,1,1,wides,1,KD Karthik,stumped,AP Tare
38560,419153,1,Kolkata Knight Riders,Chennai Super Kings,4,1,DJ Hussey,R Ashwin,MK Tiwary,0,1,1,wides,1,DJ Hussey,stumped,MS Dhoni
40074,419159,2,Chennai Super Kings,Kings XI Punjab,4,1,M Vijay,RR Powar,SK Raina,0,1,1,wides,1,M Vijay,stumped,KC Sangakkara
40814,419162,2,Royal Challengers Bangalore,Mumbai Indians,6,2,KP Pietersen,Harbhajan Singh,R Dravid,0,1,1,wides,1,KP Pietersen,stumped,AT Rayudu


In [885]:
df           # Displaying the DataFrame

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
2,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,,,
3,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
4,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,5,SS Iyer,AK Markram,VR Iyer,1,0,1,,0,,,
260916,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,6,VR Iyer,AK Markram,SS Iyer,1,0,1,,0,,,
260917,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,1,VR Iyer,Shahbaz Ahmed,SS Iyer,1,0,1,,0,,,
260918,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,2,SS Iyer,Shahbaz Ahmed,VR Iyer,1,0,1,,0,,,


### Filling Missing Values

In [887]:
df.fillna(9)
# Temporarily replace missing values with 9

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,9,9,9
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,9,0,9,9,9
2,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,9,9,9
3,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,9,0,9,9,9
4,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,9,0,9,9,9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,5,SS Iyer,AK Markram,VR Iyer,1,0,1,9,0,9,9,9
260916,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,6,VR Iyer,AK Markram,SS Iyer,1,0,1,9,0,9,9,9
260917,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,1,VR Iyer,Shahbaz Ahmed,SS Iyer,1,0,1,9,0,9,9,9
260918,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,2,SS Iyer,Shahbaz Ahmed,VR Iyer,1,0,1,9,0,9,9,9


In [888]:
df['extras_type'] = df['extras_type'].fillna(0)
  # Fills missing values with 0 permanently
#permanently filling of null values with specified value

In [889]:
df                  # Displaying the DataFrame

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,0,0,,,
2,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,,,
3,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,0,0,,,
4,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,0,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,5,SS Iyer,AK Markram,VR Iyer,1,0,1,0,0,,,
260916,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,6,VR Iyer,AK Markram,SS Iyer,1,0,1,0,0,,,
260917,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,1,VR Iyer,Shahbaz Ahmed,SS Iyer,1,0,1,0,0,,,
260918,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,2,SS Iyer,Shahbaz Ahmed,VR Iyer,1,0,1,0,0,,,


## 9. Selecting Data using Indexing

### Integer-Based Indexing with iloc[]

In [892]:
#Select data by integer-based indexing
df.iloc[50000:55000]	

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
50000,501234,1,Kings XI Punjab,Kolkata Knight Riders,14,4,KD Karthik,Iqbal Abdulla,DJ Hussey,1,0,1,0,0,,,
50001,501234,1,Kings XI Punjab,Kolkata Knight Riders,14,5,DJ Hussey,Iqbal Abdulla,KD Karthik,0,0,0,0,0,,,
50002,501234,1,Kings XI Punjab,Kolkata Knight Riders,14,6,DJ Hussey,Iqbal Abdulla,KD Karthik,0,0,0,0,1,DJ Hussey,lbw,
50003,501234,1,Kings XI Punjab,Kolkata Knight Riders,15,1,KD Karthik,R Bhatia,Bipul Sharma,0,0,0,0,0,,,
50004,501234,1,Kings XI Punjab,Kolkata Knight Riders,15,2,KD Karthik,R Bhatia,Bipul Sharma,0,0,0,0,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
54995,501255,2,Royal Challengers Bangalore,Kolkata Knight Riders,9,5,M Kaif,L Balaji,AB de Villiers,1,0,1,0,0,,,
54996,501255,2,Royal Challengers Bangalore,Kolkata Knight Riders,9,6,AB de Villiers,L Balaji,M Kaif,0,0,0,0,0,,,
54997,501255,2,Royal Challengers Bangalore,Kolkata Knight Riders,9,7,AB de Villiers,L Balaji,M Kaif,0,0,0,0,0,,,
54998,501255,2,Royal Challengers Bangalore,Kolkata Knight Riders,10,1,M Kaif,JD Unadkat,AB de Villiers,0,0,0,0,0,,,


In [893]:
# Selecting a specific value (Row index 10, Column index 2)
df.iloc[10, 2]

'Kolkata Knight Riders'

In [894]:
# Selecting a range of rows (first 2 rows)
df.iloc[0:2]

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,0,0,,,


In [895]:
# Selecting specific rows and columns
df.iloc[0:3, 1:3]

Unnamed: 0,inning,batting_team
0,1,Kolkata Knight Riders
1,1,Kolkata Knight Riders
2,1,Kolkata Knight Riders


### Label-Based Indexing with loc[]

In [897]:
# Selecting a row by label (index)
print(df.loc[60])

match_id                                 335982
inning                                        1
batting_team              Kolkata Knight Riders
bowling_team        Royal Challengers Bangalore
over                                          9
ball                                          5
batter                              BB McCullum
bowler                                 SB Joshi
non_striker                          RT Ponting
batsman_runs                                  6
extra_runs                                    0
total_runs                                    6
extras_type                                   0
is_wicket                                     0
player_dismissed                            NaN
dismissal_kind                              NaN
fielder                                     NaN
Name: 60, dtype: object


In [898]:
# Selecting multiple rows
df.loc[[0, 100, 200]]


Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
100,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,16,1,BB McCullum,Z Khan,DJ Hussey,1,0,1,0,0,,,
200,335982,2,Royal Challengers Bangalore,Kolkata Knight Riders,11,4,Z Khan,SC Ganguly,P Kumar,1,0,1,0,0,,,


In [899]:
# Selecting a specific column
print(df.loc[: "over"]) # All rows, only 'over' column


        match_id  inning           batting_team                 bowling_team  \
0         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
1         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
2         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
3         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
4         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
...          ...     ...                    ...                          ...   
260915   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   
260916   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   
260917   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   
260918   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   
260919   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   

        over  ball       batter        

In [900]:
# Selecting multiple columns
print(df.loc[:, ["non_striker", "dismissal_kind"]])   # All rows, only 'non_striker', 'dismissal_kind' columns


        non_striker dismissal_kind
0       BB McCullum            NaN
1        SC Ganguly            NaN
2        SC Ganguly            NaN
3        SC Ganguly            NaN
4        SC Ganguly            NaN
...             ...            ...
260915      VR Iyer            NaN
260916      SS Iyer            NaN
260917      SS Iyer            NaN
260918      VR Iyer            NaN
260919      SS Iyer            NaN

[260920 rows x 2 columns]


In [901]:
# Selecting a specific value (Row 2, Column 'City')
print(df.loc[2, "dismissal_kind"])

nan


## 10.Filtering Data Using Conditions

In [903]:
# Filtering data using conditions
print(df.loc[df["batsman_runs"] == 9])

Empty DataFrame
Columns: [match_id, inning, batting_team, bowling_team, over, ball, batter, bowler, non_striker, batsman_runs, extra_runs, total_runs, extras_type, is_wicket, player_dismissed, dismissal_kind, fielder]
Index: []


## 11. Handling Duplicates

### Checking for Duplicates

In [906]:
# Remove duplicate rows
print("\nDuplicate Rows:", df.duplicated().sum())


Duplicate Rows: 0


### Removing Duplicate Rows

In [908]:
df.drop_duplicates(inplace=True) # Remove duplicate rows permanently
df

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,0,0,,,
2,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,,,
3,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,0,0,,,
4,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,0,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,5,SS Iyer,AK Markram,VR Iyer,1,0,1,0,0,,,
260916,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,9,6,VR Iyer,AK Markram,SS Iyer,1,0,1,0,0,,,
260917,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,1,VR Iyer,Shahbaz Ahmed,SS Iyer,1,0,1,0,0,,,
260918,1426312,2,Kolkata Knight Riders,Sunrisers Hyderabad,10,2,SS Iyer,Shahbaz Ahmed,VR Iyer,1,0,1,0,0,,,


## 12. Adding and Removing Columns

### Adding a New Column

In [1002]:
df['new_column2']=df['batsman_runs']+df['extra_runs']
df

Unnamed: 0,match_id,bowler,inning,batting_team,bowling_team,over,ball,batsman,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder,new_column2
0,392190,MM Patel,2,Kolkata Knight Riders,Rajasthan Royals,15,6,SC Ganguly,Yashpal Singh,6,1,7,noballs,0,,,,7
1,1216496,L Ngidi,1,Rajasthan Royals,Chennai Super Kings,19,3,JC Archer,TK Curran,6,1,7,noballs,0,,,,7
2,548341,J Theron,2,Pune Warriors,Deccan Chargers,16,1,SPD Smith,M Manhas,6,1,7,noballs,0,,,,7
3,336028,VRV Singh,2,Mumbai Indians,Kings XI Punjab,19,1,SD Chitnis,CRD Fernando,6,1,7,noballs,0,,,,7
4,734007,Sandeep Sharma,1,Sunrisers Hyderabad,Kings XI Punjab,18,4,NV Ojha,MC Henriques,6,1,7,noballs,0,,,,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,501267,JEC Franklin,1,Kolkata Knight Riders,Mumbai Indians,15,5,YK Pathan,JH Kallis,0,0,0,0,0,,,,0
260916,501267,JEC Franklin,1,Kolkata Knight Riders,Mumbai Indians,15,6,YK Pathan,JH Kallis,0,0,0,0,1,YK Pathan,caught,AN Ahmed,0
260917,733977,VR Aaron,1,Sunrisers Hyderabad,Royal Challengers Bangalore,4,6,DA Warner,S Dhawan,0,0,0,0,0,,,,0
260918,1254060,Sandeep Sharma,1,Kolkata Knight Riders,Sunrisers Hyderabad,13,4,N Rana,RA Tripathi,0,0,0,0,0,,,,0


### Dropping a Column

In [1004]:
df = df.drop('new_column2', axis=1)
df
# Dropping a column named 'new_column'

Unnamed: 0,match_id,bowler,inning,batting_team,bowling_team,over,ball,batsman,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,392190,MM Patel,2,Kolkata Knight Riders,Rajasthan Royals,15,6,SC Ganguly,Yashpal Singh,6,1,7,noballs,0,,,
1,1216496,L Ngidi,1,Rajasthan Royals,Chennai Super Kings,19,3,JC Archer,TK Curran,6,1,7,noballs,0,,,
2,548341,J Theron,2,Pune Warriors,Deccan Chargers,16,1,SPD Smith,M Manhas,6,1,7,noballs,0,,,
3,336028,VRV Singh,2,Mumbai Indians,Kings XI Punjab,19,1,SD Chitnis,CRD Fernando,6,1,7,noballs,0,,,
4,734007,Sandeep Sharma,1,Sunrisers Hyderabad,Kings XI Punjab,18,4,NV Ojha,MC Henriques,6,1,7,noballs,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,501267,JEC Franklin,1,Kolkata Knight Riders,Mumbai Indians,15,5,YK Pathan,JH Kallis,0,0,0,0,0,,,
260916,501267,JEC Franklin,1,Kolkata Knight Riders,Mumbai Indians,15,6,YK Pathan,JH Kallis,0,0,0,0,1,YK Pathan,caught,AN Ahmed
260917,733977,VR Aaron,1,Sunrisers Hyderabad,Royal Challengers Bangalore,4,6,DA Warner,S Dhawan,0,0,0,0,0,,,
260918,1254060,Sandeep Sharma,1,Kolkata Knight Riders,Sunrisers Hyderabad,13,4,N Rana,RA Tripathi,0,0,0,0,0,,,


In [1006]:
df   #Displaying DataFrame

Unnamed: 0,match_id,bowler,inning,batting_team,bowling_team,over,ball,batsman,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,392190,MM Patel,2,Kolkata Knight Riders,Rajasthan Royals,15,6,SC Ganguly,Yashpal Singh,6,1,7,noballs,0,,,
1,1216496,L Ngidi,1,Rajasthan Royals,Chennai Super Kings,19,3,JC Archer,TK Curran,6,1,7,noballs,0,,,
2,548341,J Theron,2,Pune Warriors,Deccan Chargers,16,1,SPD Smith,M Manhas,6,1,7,noballs,0,,,
3,336028,VRV Singh,2,Mumbai Indians,Kings XI Punjab,19,1,SD Chitnis,CRD Fernando,6,1,7,noballs,0,,,
4,734007,Sandeep Sharma,1,Sunrisers Hyderabad,Kings XI Punjab,18,4,NV Ojha,MC Henriques,6,1,7,noballs,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,501267,JEC Franklin,1,Kolkata Knight Riders,Mumbai Indians,15,5,YK Pathan,JH Kallis,0,0,0,0,0,,,
260916,501267,JEC Franklin,1,Kolkata Knight Riders,Mumbai Indians,15,6,YK Pathan,JH Kallis,0,0,0,0,1,YK Pathan,caught,AN Ahmed
260917,733977,VR Aaron,1,Sunrisers Hyderabad,Royal Challengers Bangalore,4,6,DA Warner,S Dhawan,0,0,0,0,0,,,
260918,1254060,Sandeep Sharma,1,Kolkata Knight Riders,Sunrisers Hyderabad,13,4,N Rana,RA Tripathi,0,0,0,0,0,,,


## 13. Sorting Data

### Sorting by a Column

In [1008]:
# Sorting data by total_runs in descending order
df=df.sort_values(by='total_runs', ascending=False)
print("\nSorted Data by total_runs:")
df


Sorted Data by total_runs:


Unnamed: 0,match_id,bowler,inning,batting_team,bowling_team,over,ball,batsman,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,392190,MM Patel,2,Kolkata Knight Riders,Rajasthan Royals,15,6,SC Ganguly,Yashpal Singh,6,1,7,noballs,0,,,
45,1422136,Mukesh Choudhary,2,Sunrisers Hyderabad,Chennai Super Kings,1,5,Abhishek Sharma,TM Head,6,1,7,noballs,0,,,
65,392238,JDP Oram,2,Royal Challengers Bangalore,Chennai Super Kings,18,1,LRPL Taylor,V Kohli,6,1,7,noballs,0,,,
64,980999,KC Cariappa,1,Royal Challengers Bangalore,Kings XI Punjab,6,5,V Kohli,CH Gayle,6,1,7,noballs,0,,,
63,336033,MM Patel,2,Chennai Super Kings,Rajasthan Royals,18,5,JA Morkel,MS Gony,6,1,7,noballs,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200625,829797,A Nehra,2,Rajasthan Royals,Chennai Super Kings,0,2,AM Rahane,SR Watson,0,0,0,0,0,,,
200624,335999,VRV Singh,1,Deccan Chargers,Kings XI Punjab,7,4,VVS Laxman,RG Sharma,0,0,0,0,0,,,
200623,1426290,Vijaykumar Vyshak,1,Gujarat Titans,Royal Challengers Bengaluru,12,6,Rashid Khan,R Tewatia,0,0,0,0,0,,,
200622,980901,R Bhatia,1,Mumbai Indians,Rising Pune Supergiants,13,1,AT Rayudu,Harbhajan Singh,0,0,0,0,0,,,


## 14. Grouping and Aggregation

### Grouping by 'ball' and Calculating Aggregations

In [1010]:
# Grouping by 'ball' and calculating average 'over'
grouped_df = df.groupby('ball')['over'].mean()
print("Average over by ball:")
print(grouped_df)


Average over by ball:
ball
1      9.236674
2      9.216952
3      9.197776
4      9.178129
5      9.153214
6      9.127202
7      9.448688
8      9.919663
9     10.048889
10    11.100000
11    10.500000
Name: over, dtype: float64


In [1012]:
# Grouping by 'ball' and calculating median 'over'
grouped_df = df.groupby('ball')['over'].median()
print("Median of over by ball:")
print(grouped_df)

Median of over by ball:
ball
1      9.0
2      9.0
3      9.0
4      9.0
5      9.0
6      9.0
7     10.0
8     10.0
9     11.0
10    13.0
11    10.5
Name: over, dtype: float64


In [1014]:
# Grouping by 'ball' and calculating variance 'over'
grouped_df = df.groupby('ball')['over'].var()
print("Variance of over by ball:")
print(grouped_df)

Variance of over by ball:
ball
1      32.327694
2      32.233301
3      32.136332
4      32.035254
5      31.895832
6      31.753486
7      37.792093
8      41.495072
9      45.948492
10     45.610345
11    112.500000
Name: over, dtype: float64


In [1016]:
# Grouping by 'ball' and calculating standard deviation 'over'
grouped_df = df.groupby('ball')['over'].std()
print("Standard Deviation of 'over' by ball:")
print(grouped_df)


Standard Deviation of 'over' by ball:
ball
1      5.685745
2      5.677438
3      5.668892
4      5.659969
5      5.647640
6      5.635023
7      6.147527
8      6.441667
9      6.778532
10     6.753543
11    10.606602
Name: over, dtype: float64


## 15. Renaming a Column

In [1018]:
df = df.rename(columns={"batter": "batsman"})
# Renaming the column 'batter' to 'batsman' in the DataFrame
# The columns parameter is used to specify the old and new column names
# The change is temporary unless reassigned to df

In [926]:
df          # Displaying the DataFrame

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batsman,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder,new_column1
15389,392190,2,Kolkata Knight Riders,Rajasthan Royals,15,6,SC Ganguly,MM Patel,Yashpal Singh,6,1,7,noballs,0,,,,7
180111,1216496,1,Rajasthan Royals,Chennai Super Kings,19,3,JC Archer,L Ngidi,TK Curran,6,1,7,noballs,0,,,,7
66653,548341,2,Pune Warriors,Deccan Chargers,16,1,SPD Smith,J Theron,M Manhas,6,1,7,noballs,0,,,,7
10814,336028,2,Mumbai Indians,Kings XI Punjab,19,1,SD Chitnis,VRV Singh,CRD Fernando,6,1,7,noballs,0,,,,7
103698,734007,1,Sunrisers Hyderabad,Kings XI Punjab,18,4,NV Ojha,Sandeep Sharma,MC Henriques,6,1,7,noballs,0,,,,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57468,501267,1,Kolkata Knight Riders,Mumbai Indians,15,5,YK Pathan,JEC Franklin,JH Kallis,0,0,0,0,0,,,,0
57469,501267,1,Kolkata Knight Riders,Mumbai Indians,15,6,YK Pathan,JEC Franklin,JH Kallis,0,0,0,0,1,YK Pathan,caught,AN Ahmed,0
100026,733977,1,Sunrisers Hyderabad,Royal Challengers Bangalore,4,6,DA Warner,VR Aaron,S Dhawan,0,0,0,0,0,,,,0
194187,1254060,1,Kolkata Knight Riders,Sunrisers Hyderabad,13,4,N Rana,Sandeep Sharma,RA Tripathi,0,0,0,0,0,,,,0


## 16. Resetting and Setting Index

### Setting a Multi-Level Index

In [1020]:
df = df.set_index(["match_id", "bowler"])
# Setting 'match_id' and 'bowler' as the new multi-level index of the DataFrame
# This organizes data hierarchically, allowing for better grouping and analysis
# The change is temporary unless reassigned to df

In [1022]:
df          # Displaying the DataFrame

Unnamed: 0_level_0,Unnamed: 1_level_0,inning,batting_team,bowling_team,over,ball,batsman,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
match_id,bowler,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
392190,MM Patel,2,Kolkata Knight Riders,Rajasthan Royals,15,6,SC Ganguly,Yashpal Singh,6,1,7,noballs,0,,,
1422136,Mukesh Choudhary,2,Sunrisers Hyderabad,Chennai Super Kings,1,5,Abhishek Sharma,TM Head,6,1,7,noballs,0,,,
392238,JDP Oram,2,Royal Challengers Bangalore,Chennai Super Kings,18,1,LRPL Taylor,V Kohli,6,1,7,noballs,0,,,
980999,KC Cariappa,1,Royal Challengers Bangalore,Kings XI Punjab,6,5,V Kohli,CH Gayle,6,1,7,noballs,0,,,
336033,MM Patel,2,Chennai Super Kings,Rajasthan Royals,18,5,JA Morkel,MS Gony,6,1,7,noballs,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
829797,A Nehra,2,Rajasthan Royals,Chennai Super Kings,0,2,AM Rahane,SR Watson,0,0,0,0,0,,,
335999,VRV Singh,1,Deccan Chargers,Kings XI Punjab,7,4,VVS Laxman,RG Sharma,0,0,0,0,0,,,
1426290,Vijaykumar Vyshak,1,Gujarat Titans,Royal Challengers Bengaluru,12,6,Rashid Khan,R Tewatia,0,0,0,0,0,,,
980901,R Bhatia,1,Mumbai Indians,Rising Pune Supergiants,13,1,AT Rayudu,Harbhajan Singh,0,0,0,0,0,,,


### Resetting the Index

In [1024]:
df = df.reset_index()
# Resetting the index of the DataFrame, converting multi-level indices back into columns
# This restores 'match_id' and 'bowler' as regular columns if they were previously set as an index
# The change is temporary unless reassigned to df

In [1026]:
df         # Displaying the DataFrame

Unnamed: 0,match_id,bowler,inning,batting_team,bowling_team,over,ball,batsman,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,392190,MM Patel,2,Kolkata Knight Riders,Rajasthan Royals,15,6,SC Ganguly,Yashpal Singh,6,1,7,noballs,0,,,
1,1422136,Mukesh Choudhary,2,Sunrisers Hyderabad,Chennai Super Kings,1,5,Abhishek Sharma,TM Head,6,1,7,noballs,0,,,
2,392238,JDP Oram,2,Royal Challengers Bangalore,Chennai Super Kings,18,1,LRPL Taylor,V Kohli,6,1,7,noballs,0,,,
3,980999,KC Cariappa,1,Royal Challengers Bangalore,Kings XI Punjab,6,5,V Kohli,CH Gayle,6,1,7,noballs,0,,,
4,336033,MM Patel,2,Chennai Super Kings,Rajasthan Royals,18,5,JA Morkel,MS Gony,6,1,7,noballs,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260915,829797,A Nehra,2,Rajasthan Royals,Chennai Super Kings,0,2,AM Rahane,SR Watson,0,0,0,0,0,,,
260916,335999,VRV Singh,1,Deccan Chargers,Kings XI Punjab,7,4,VVS Laxman,RG Sharma,0,0,0,0,0,,,
260917,1426290,Vijaykumar Vyshak,1,Gujarat Titans,Royal Challengers Bengaluru,12,6,Rashid Khan,R Tewatia,0,0,0,0,0,,,
260918,980901,R Bhatia,1,Mumbai Indians,Rising Pune Supergiants,13,1,AT Rayudu,Harbhajan Singh,0,0,0,0,0,,,


## 17. Get the dimensions of the DataFrame

In [1028]:
df.shape

(260920, 17)