# JN01 - Descriptive Statistics with Python

## Introduction

Welcome to this Jupyter Notebook project where we delve into the fascinating realm of descriptive statistics applied to the Indian Premier League (IPL) auction dataset. As your guide through this statistical exploration, I invite you to witness the meticulous examination and analysis of the numerical characteristics that define the dynamics of IPL player auctions. Through the lens of descriptive statistics, we will unravel key insights into the central tendencies, variabilities, and distributions within the auction data. Prepare to embark on a journey where statistical measures such as mean, median, mode, and standard deviation become our tools for unraveling the intricacies of team strategies, player valuations, and the overarching trends shaping one of the most dynamic spectacles in the world of cricket. May this endeavor not only deepen your appreciation for the statistical intricacies of the IPL but also sharpen your analytical skills in the domain of sports data.

## Import packages and libraries

Before getting started, we will need to import all the required libraries and extensions. Throughout the course, we will be using pandas and numpy for operations and matplotlib for plotting.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Opening the Dataset using Pandas

In this project, our first step involves opening the IPL auction dataset file using the powerful Pandas library in Python. The following code snippet demonstrates how to read the dataset and load it into a Pandas DataFrame, setting the stage for our descriptive statistical analysis:

In [4]:
# Specifying the file path 
# file_path = "path/to/your/IPL_auction_dataset.csv"  
# Replace with the actual file path

file_path = "auction.csv" # The file is in the present working directory

# Reading the dataset into a Pandas DataFrame
ipl_auction_data = pd.read_csv(file_path)

## Explore the data

Let's start with the head()function to get a quick overview of the dataset. Recall that head() will return as many rows of data as you input into the variable field.

In [5]:
# Displaying the first few rows of the dataset to get an overview

ipl_auction_data.head()

Unnamed: 0.1,Unnamed: 0,Country,Player,Team,Base price,Winning bid,Year
0,0,Guyana,Christopher Barnwell,Royal Challengers Bangalore,30.5,30.5,2013
1,1,South Africa,Johan Botha,Delhi Daredevils,183.0,274.5,2013
2,2,Australia,Daniel Christian,Royal Challengers Bangalore,61.0,61.0,2013
3,3,Australia,Michael Clarke,Pune Warriors India,244.0,244.0,2013
4,4,Australia,Nathan Coulter-Nile,Mumbai Indians,61.0,274.5,2013


In [7]:
# Displaying the first 10 rows of the dataset to get an overview

ipl_auction_data.head(10)

Unnamed: 0.1,Unnamed: 0,Country,Player,Team,Base price,Winning bid,Year
0,0,Guyana,Christopher Barnwell,Royal Challengers Bangalore,30.5,30.5,2013
1,1,South Africa,Johan Botha,Delhi Daredevils,183.0,274.5,2013
2,2,Australia,Daniel Christian,Royal Challengers Bangalore,61.0,61.0,2013
3,3,Australia,Michael Clarke,Pune Warriors India,244.0,244.0,2013
4,4,Australia,Nathan Coulter-Nile,Mumbai Indians,61.0,274.5,2013
5,5,Sri Lanka,Akila Dananjaya,Chennai Super Kings,12.2,12.2,2013
6,6,South Africa,Quinton de Kock,Sunrisers Hyderabad,12.2,12.2,2013
7,7,Barbados,Fidel Edwards,Rajasthan Royals,61.0,128.1,2013
8,8,Australia,James Faulkner,Rajasthan Royals,61.0,244.0,2013
9,9,India,Manpreet Gony,Kings XI Punjab,122.0,305.0,2013


The dataset provides information on players, their countries, associated teams, base prices, winning bids, and the respective years for the Indian Premier League (IPL) auctions, spanning various years from 2013 onwards.

## Use describe() to compute descriptive stats
Now that we have a better understanding of the dataset, let's use Python to compute descriptive stats.

When computing descriptive stats in Python, the most useful function to know is describe(). Data professionals use the describe() function as a convenient way to calculate many key stats all at once. For a numeric column, describe() gives you the following output:

- count: Number of non-NA/null observations
- mean: The arithmetic average
- std: The standard deviation
- min: The smallest (minimum) value
- 25%: The first quartile (25th percentile)
- 50%: The median (50th percentile)
- 75%: The third quartile (75th percentile)
- max: The largest (maximum) value

- Reference: pandas.DataFrame.describe

## Interesting Stat in the Dataset - Winning Bid

One of the main interesting numeric statistics in this dataset is the "Winning bid" column, which represents the final bid amount at which each player was successfully acquired by a team during the IPL auction. Analyzing this column can reveal insights into the distribution of player values and the financial dynamics of team bidding strategies. Descriptive statistics such as the mean, median, and standard deviation of the winning bids could provide a comprehensive understanding of the overall pricing landscape in the IPL auctions.

In [8]:
# Assuming 'ipl_auction_data' is your DataFrame containing the IPL auction dataset
winning_bid_stats = ipl_auction_data['Winning bid'].describe()

# Displaying the descriptive statistics
print(winning_bid_stats)


count     1052
unique     128
top         20
freq       199
Name: Winning bid, dtype: object


## Another Interesting Stat in the Dataset - Base Price

Another interesting statistic in this dataset is the "Base price" column. The base price represents the initial amount set by the auction organizers as the starting bid for each player. Analyzing this column can offer insights into the perceived value of players by the auction organizers and the variation in starting points for player bidding. You can use the describe() method similarly for the "Base price" column to understand its distribution and key summary statistics. This would provide information about the range of initial valuations and potential disparities in perceived player values before the auction process begins.

In [9]:
# Assuming 'ipl_auction_data' is your DataFrame containing the IPL auction dataset
base_price_stats = ipl_auction_data['Base price'].describe()

# Displaying the descriptive statistics
print(base_price_stats)

count    1052.000000
mean       65.252091
std        63.278684
min        10.000000
25%        20.000000
50%        30.000000
75%       100.000000
max       244.000000
Name: Base price, dtype: float64


In [21]:
# Assuming 'ipl_auction_data' is your DataFrame containing the IPL auction dataset
player_stats = ipl_auction_data['Player'].describe()

# Displaying the descriptive statistics
print(player_stats)

count               1052
unique               593
top       Jaydev Unadkat
freq                  10
Name: Player, dtype: object


## Functions for stats

The describe() function is also useful because it reveals a variety of key stats all at once. Python also has separate functions for the mean, median, standard deviation, minimum, and maximum. Earlier in the program, you used mean() and median() to detect outliers. These individual functions are also useful if you want to do further computations based on descriptive stats. For example, you can use the min() and max() functions together to compute the range of your data.

## Use max() and min() to compute range


Utilizing the "Base price" and "Winning bid" columns in the IPL auction dataset, we aim to calculate the range using Python. The range, defined as the difference between the largest and smallest values in a dataset (i.e., range = max - min), provides insights into the variability of values within these columns. Below is a Python code snippet demonstrating the calculation:

In [10]:
# Assuming 'ipl_auction_data' is your DataFrame containing the IPL auction dataset

# Calculate the range for the 'Base price' column
base_price_range = ipl_auction_data['Base price'].max() - ipl_auction_data['Base price'].min()

# Calculate the range for the 'Winning bid' column
winning_bid_range = ipl_auction_data['Winning bid'].max() - ipl_auction_data['Winning bid'].min()

# Display the calculated ranges
print(f"Range for Base Price: {base_price_range}")
print(f"Range for Winning Bid: {winning_bid_range}")


TypeError: unsupported operand type(s) for -: 'str' and 'str'

### Oops! Removing the Error

The error you encountered, TypeError: unsupported operand type(s) for -: 'str' and 'str', indicates that the subtraction operation (-) is not supported between two string objects. In this case, the columns "Base price" and "Winning bid" are treated as strings, and attempting to perform arithmetic operations directly on them results in a TypeError.

To fix this issue, we need to ensure that these columns are treated as numeric types rather than strings.

In [12]:
# Assuming 'ipl_auction_data' is your DataFrame containing the IPL auction dataset

# Convert 'Base price' and 'Winning bid' columns to numeric
ipl_auction_data['Base price'] = pd.to_numeric(ipl_auction_data['Base price'], errors='coerce')
ipl_auction_data['Winning bid'] = pd.to_numeric(ipl_auction_data['Winning bid'], errors='coerce')

# Calculate the range for the 'Base price' column
base_price_range = ipl_auction_data['Base price'].max() - ipl_auction_data['Base price'].min()

# Calculate the range for the 'Winning bid' column
winning_bid_range = ipl_auction_data['Winning bid'].max() - ipl_auction_data['Winning bid'].min()

# Display the calculated ranges
print(f"Range for Base Price: {base_price_range}")
print(f"Range for Winning Bid: {winning_bid_range}")


Range for Base Price: 234.0
Range for Winning Bid: 1615.0


The reported ranges, 234.0 for "Base Price" and 1615.0 for "Winning Bid," represent the spread or variability in initial valuations and final auction prices, respectively, within the Indian Premier League (IPL) dataset, showcasing the diversity of financial dynamics in player auctions.

## Conclusion

In this Jupyter Notebook project, we delved into the numerical intricacies of the IPL auction dataset, exploring player details, team associations, base prices, and winning bids from 2013 onwards. Key highlights included the analysis of the "Winning bid" and "Base Price" columns, providing insights into player valuations and initial bidding strategies. Descriptive statistics enhanced our understanding of the pricing landscape, with mean, median, and range offering valuable perspectives. We also addressed and resolved a crucial error, ensuring accuracy in our computations. This journey illuminated the dynamics of IPL auctions, offering a deeper appreciation for statistical nuances and analytical skills in sports data analysis.