## Reading csv with Pandas

In this notebook, we will read a csv file using Pandas and perform some basic data analysis on it.

In [None]:
# Importing necessary libraries
import pandas as pd

# Reading the csv file into a pandas dataframe
matches_df = pd.read_csv('matches.csv')

### Finding the list of unique cities where matches were played

Let's find the list of unique cities where matches were played in the dataset.

In [None]:
unique_cities = matches_df['city'].unique()
print('List of unique cities where matches were played:')
print(unique_cities)

List of unique cities where matches were played:
['Hyderabad' 'Pune' 'Rajkot' 'Indore' 'Bangalore' 'Mumbai' 'Kolkata'
 'Delhi' 'Chandigarh' 'Kanpur' 'Jaipur' 'Chennai' 'Cape Town'
 'Port Elizabeth' 'Durban' 'Centurion' 'East London' 'Johannesburg'
 'Kimberley' 'Bloemfontein' 'Ahmedabad' 'Cuttack' 'Nagpur' 'Dharamsala'
 'Kochi' 'Visakhapatnam' 'Raipur' 'Ranchi' 'Abu Dhabi' 'Sharjah'
 'Dubai' 'Rising Pune Supergiants' 'Kanpur Nagar']


### Finding the columns which contains null values if any

Let's find the columns which contains null values if any in the dataset.

In [None]:
null_columns = matches_df.columns[matches_df.isnull().any()]
print('Columns containing null values:')
print(null_columns)

Columns containing null values:
Index(['winner', 'player_of_match', 'umpire1', 'umpire2', 'umpire3'], dtype='object')


### Listing down top 5 most played venues

Let's list down the top 5 most played venues in the dataset.

In [None]:
top_venues = matches_df['venue'].value_counts().head()
print('Top 5 most played venues:')
print(top_venues)

Top 5 most played venues:
Eden Gardens                                  77
Wankhede Stadium                              73
M Chinnaswamy Stadium                         73
Feroz Shah Kotla                              67
Rajiv Gandhi International Stadium, Uppal     56
Name: venue, dtype: int64


### Getting top 5 goal scorers of the tournament

Let's get the top 5 goal scorers of the tournament from the dataset.

In [None]:
player_scores = pd.DataFrame(matches_df, columns=['player_of_match', 'win_by_runs', 'win_by_wickets'])
player_scores = player_scores.groupby('player_of_match').sum()
player_scores = player_scores.sort_values(by=['win_by_runs', 'win_by_wickets'], ascending=False).head()
print('Top 5 goal scorers of the tournament:')
print(player_scores)

Top 5 goal scorers of the tournament:
                 win_by_runs  win_by_wickets
player_of_match                              
CH Gayle                  0               1
MEK Hussey                0               1
SR Watson                 0               1
SR Tendulkar              0               1
YK Pathan                 0               1
