# Inspecting a DataFrame Object

## About the Data
In this notebook, we will be working with FIFA players data for 2022 obtained from [Kaggle](https://www.kaggle.com/datasets/stefanoleone992/fifa-22-complete-player-dataset)

## Setup
We will be working with the `players_22.csv` file, so we need to handle our imports and read it in.

In [19]:
import pandas as pd

In [3]:
players = pd.read_csv('players_22.csv')

  exec(code_obj, self.user_global_ns, self.user_ns)


## Examining dataframes
### Is it empty?

In [4]:
players.empty

False

### What are the dimensions?

In [5]:
players.shape

(19239, 111)

### What columns do we have?
We know there are 111 columns, but what are they? Let's use the `columns` attribute to see:

In [6]:
players.columns

Index(['sofifa_id', 'player_url', 'short_name', 'long_name',
       'player_positions', 'overall', 'potential', 'value_eur', 'wage_eur',
       'age',
       ...
       'lcb', 'cb', 'rcb', 'rb', 'gk', 'player_face_url', 'club_logo_url',
       'club_flag_url', 'nation_logo_url', 'nation_flag_url'],
      dtype='object', length=111)

### What does the data look like?
View rows from the top with `head()`:

In [7]:
players.head()

Unnamed: 0,sofifa_id,player_url,short_name,long_name,player_positions,overall,potential,value_eur,wage_eur,age,...,lcb,cb,rcb,rb,gk,player_face_url,club_logo_url,club_flag_url,nation_logo_url,nation_flag_url
0,158023,https://sofifa.com/player/158023/lionel-messi/...,L. Messi,Lionel Andrés Messi Cuccittini,"RW, ST, CF",93,93,78000000.0,320000.0,34,...,50+3,50+3,50+3,61+3,19+3,https://cdn.sofifa.net/players/158/023/22_120.png,https://cdn.sofifa.net/teams/73/60.png,https://cdn.sofifa.net/flags/fr.png,https://cdn.sofifa.net/teams/1369/60.png,https://cdn.sofifa.net/flags/ar.png
1,188545,https://sofifa.com/player/188545/robert-lewand...,R. Lewandowski,Robert Lewandowski,ST,92,92,119500000.0,270000.0,32,...,60+3,60+3,60+3,61+3,19+3,https://cdn.sofifa.net/players/188/545/22_120.png,https://cdn.sofifa.net/teams/21/60.png,https://cdn.sofifa.net/flags/de.png,https://cdn.sofifa.net/teams/1353/60.png,https://cdn.sofifa.net/flags/pl.png
2,20801,https://sofifa.com/player/20801/c-ronaldo-dos-...,Cristiano Ronaldo,Cristiano Ronaldo dos Santos Aveiro,"ST, LW",91,91,45000000.0,270000.0,36,...,53+3,53+3,53+3,60+3,20+3,https://cdn.sofifa.net/players/020/801/22_120.png,https://cdn.sofifa.net/teams/11/60.png,https://cdn.sofifa.net/flags/gb-eng.png,https://cdn.sofifa.net/teams/1354/60.png,https://cdn.sofifa.net/flags/pt.png
3,190871,https://sofifa.com/player/190871/neymar-da-sil...,Neymar Jr,Neymar da Silva Santos Júnior,"LW, CAM",91,91,129000000.0,270000.0,29,...,50+3,50+3,50+3,62+3,20+3,https://cdn.sofifa.net/players/190/871/22_120.png,https://cdn.sofifa.net/teams/73/60.png,https://cdn.sofifa.net/flags/fr.png,,https://cdn.sofifa.net/flags/br.png
4,192985,https://sofifa.com/player/192985/kevin-de-bruy...,K. De Bruyne,Kevin De Bruyne,"CM, CAM",91,91,125500000.0,350000.0,30,...,69+3,69+3,69+3,75+3,21+3,https://cdn.sofifa.net/players/192/985/22_120.png,https://cdn.sofifa.net/teams/10/60.png,https://cdn.sofifa.net/flags/gb-eng.png,https://cdn.sofifa.net/teams/1325/60.png,https://cdn.sofifa.net/flags/be.png


View rows from the bottom with `tail()`. Let's view 2 rows:

In [8]:
players.tail(2)

Unnamed: 0,sofifa_id,player_url,short_name,long_name,player_positions,overall,potential,value_eur,wage_eur,age,...,lcb,cb,rcb,rb,gk,player_face_url,club_logo_url,club_flag_url,nation_logo_url,nation_flag_url
19237,262820,https://sofifa.com/player/262820/luke-rudden/2...,L. Rudden,Luke Rudden,ST,47,60,110000.0,500.0,19,...,26+2,26+2,26+2,32+2,15+2,https://cdn.sofifa.net/players/262/820/22_120.png,https://cdn.sofifa.net/teams/111131/60.png,https://cdn.sofifa.net/flags/ie.png,,https://cdn.sofifa.net/flags/ie.png
19238,264540,https://sofifa.com/player/264540/emanuel-lalch...,E. Lalchhanchhuaha,Emanuel Lalchhanchhuaha,CAM,47,60,110000.0,500.0,19,...,41+2,41+2,41+2,45+2,16+2,https://cdn.sofifa.net/players/264/540/22_120.png,https://cdn.sofifa.net/teams/113040/60.png,https://cdn.sofifa.net/flags/in.png,,https://cdn.sofifa.net/flags/in.png


### What datatypes do we have?

In [9]:
players.dtypes

sofifa_id            int64
player_url          object
short_name          object
long_name           object
player_positions    object
                     ...  
player_face_url     object
club_logo_url       object
club_flag_url       object
nation_logo_url     object
nation_flag_url     object
Length: 111, dtype: object

### Getting extra info and finding nulls

In [10]:
players.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19239 entries, 0 to 19238
Columns: 111 entries, sofifa_id to nation_flag_url
dtypes: float64(16), int64(44), object(51)
memory usage: 16.3+ MB


## Describing and Summarizing
### Get summary statistics

Note that `describe` only works for columns with numeric datatypes by default

In [11]:
players.describe()

Unnamed: 0,sofifa_id,overall,potential,value_eur,wage_eur,age,height_cm,weight_kg,club_team_id,league_level,...,mentality_composure,defending_marking_awareness,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning,goalkeeping_reflexes,goalkeeping_speed
count,19239.0,19239.0,19239.0,19165.0,19178.0,19239.0,19239.0,19239.0,19178.0,19178.0,...,19239.0,19239.0,19239.0,19239.0,19239.0,19239.0,19239.0,19239.0,19239.0,2132.0
mean,231468.086959,65.772182,71.07937,2850452.0,9017.989363,25.210822,181.299704,74.943032,50580.498123,1.354364,...,57.92983,46.601746,48.045584,45.9067,16.406102,16.192474,16.055356,16.229274,16.491814,36.439962
std,27039.717497,6.880232,6.086213,7613700.0,19470.176724,4.748235,6.863179,7.069434,54401.868535,0.747865,...,12.159326,20.200807,21.232718,20.755683,17.574028,16.839528,16.564554,17.059779,17.884833,10.751563
min,41.0,47.0,49.0,9000.0,500.0,16.0,155.0,49.0,1.0,1.0,...,12.0,4.0,5.0,5.0,2.0,2.0,2.0,2.0,2.0,15.0
25%,214413.5,61.0,67.0,475000.0,1000.0,21.0,176.0,70.0,479.0,1.0,...,50.0,29.0,28.0,25.0,8.0,8.0,8.0,8.0,8.0,27.0
50%,236543.0,66.0,71.0,975000.0,3000.0,25.0,181.0,75.0,1938.0,1.0,...,59.0,52.0,56.0,53.0,11.0,11.0,11.0,11.0,11.0,36.0
75%,253532.5,70.0,75.0,2000000.0,8000.0,29.0,186.0,80.0,111139.0,1.0,...,66.0,63.0,65.0,63.0,14.0,14.0,14.0,14.0,14.0,45.0
max,264640.0,93.0,95.0,194000000.0,350000.0,54.0,206.0,110.0,115820.0,5.0,...,96.0,93.0,93.0,92.0,91.0,92.0,93.0,92.0,90.0,65.0


Specifying the 5<sup>th</sup> and 95<sup>th</sup> percentile:

In [12]:
players.describe(percentiles=[0.05, 0.95])

Unnamed: 0,sofifa_id,overall,potential,value_eur,wage_eur,age,height_cm,weight_kg,club_team_id,league_level,...,mentality_composure,defending_marking_awareness,defending_standing_tackle,defending_sliding_tackle,goalkeeping_diving,goalkeeping_handling,goalkeeping_kicking,goalkeeping_positioning,goalkeeping_reflexes,goalkeeping_speed
count,19239.0,19239.0,19239.0,19165.0,19178.0,19239.0,19239.0,19239.0,19178.0,19178.0,...,19239.0,19239.0,19239.0,19239.0,19239.0,19239.0,19239.0,19239.0,19239.0,2132.0
mean,231468.086959,65.772182,71.07937,2850452.0,9017.989363,25.210822,181.299704,74.943032,50580.498123,1.354364,...,57.92983,46.601746,48.045584,45.9067,16.406102,16.192474,16.055356,16.229274,16.491814,36.439962
std,27039.717497,6.880232,6.086213,7613700.0,19470.176724,4.748235,6.863179,7.069434,54401.868535,0.747865,...,12.159326,20.200807,21.232718,20.755683,17.574028,16.839528,16.564554,17.059779,17.884833,10.751563
min,41.0,47.0,49.0,9000.0,500.0,16.0,155.0,49.0,1.0,1.0,...,12.0,4.0,5.0,5.0,2.0,2.0,2.0,2.0,2.0,15.0
5%,184133.9,54.0,62.0,180000.0,500.0,18.0,170.0,64.0,44.0,1.0,...,36.0,11.0,12.0,12.0,6.0,6.0,6.0,6.0,6.0,20.0
50%,236543.0,66.0,71.0,975000.0,3000.0,25.0,181.0,75.0,1938.0,1.0,...,59.0,52.0,56.0,53.0,11.0,11.0,11.0,11.0,11.0,36.0
95%,263046.1,77.0,82.0,11500000.0,37150.0,34.0,193.0,87.0,113301.0,3.0,...,76.0,73.0,75.0,73.0,66.0,63.0,62.0,64.0,67.0,55.0
max,264640.0,93.0,95.0,194000000.0,350000.0,54.0,206.0,110.0,115820.0,5.0,...,96.0,93.0,93.0,92.0,91.0,92.0,93.0,92.0,90.0,65.0


Describe all columns:

In [13]:
players.describe(include='all')

Unnamed: 0,sofifa_id,player_url,short_name,long_name,player_positions,overall,potential,value_eur,wage_eur,age,...,lcb,cb,rcb,rb,gk,player_face_url,club_logo_url,club_flag_url,nation_logo_url,nation_flag_url
count,19239.0,19239,19239,19239,19239,19239.0,19239.0,19165.0,19178.0,19239.0,...,19239,19239,19239,19239,19239,19239,19178,19178,759,19239
unique,,19239,18145,19219,674,,,,,,...,259,259,259,193,115,19239,701,49,33,163
top,,https://sofifa.com/player/158023/lionel-messi/...,J. Rodríguez,Ladislav Krejčí,CB,,,,,,...,64+2,64+2,64+2,62+2,16+2,https://cdn.sofifa.net/players/158/023/22_120.png,https://cdn.sofifa.net/teams/73/60.png,https://cdn.sofifa.net/flags/gb-eng.png,https://cdn.sofifa.net/teams/1369/60.png,https://cdn.sofifa.net/flags/gb-eng.png
freq,,1,13,2,2423,,,,,,...,540,540,540,767,3851,1,33,2608,23,1719
mean,231468.086959,,,,,65.772182,71.07937,2850452.0,9017.989363,25.210822,...,,,,,,,,,,
std,27039.717497,,,,,6.880232,6.086213,7613700.0,19470.176724,4.748235,...,,,,,,,,,,
min,41.0,,,,,47.0,49.0,9000.0,500.0,16.0,...,,,,,,,,,,
25%,214413.5,,,,,61.0,67.0,475000.0,1000.0,21.0,...,,,,,,,,,,
50%,236543.0,,,,,66.0,71.0,975000.0,3000.0,25.0,...,,,,,,,,,,
75%,253532.5,,,,,70.0,75.0,2000000.0,8000.0,29.0,...,,,,,,,,,,


This works on specific columns also:

In [14]:
players.age.describe()

count    19239.000000
mean        25.210822
std          4.748235
min         16.000000
25%         21.000000
50%         25.000000
75%         29.000000
max         54.000000
Name: age, dtype: float64

We can also get the unique values in the `preferred_foot` column:

In [15]:
players.preferred_foot.unique()

array(['Left', 'Right'], dtype=object)

We can then use `value_counts()` to see how many of each unique value we have:

In [18]:
players.preferred_foot.value_counts()

Right    14674
Left      4565
Name: preferred_foot, dtype: int64