# Whirlwind Tour of `pandas`

This notebook will use the Pokemon dataset from Serebii.net to illustrate key concepts

In [10]:
import pandas as pd
from matplotlib import pyplot as plt

In [2]:
pokemon = 'data/pokemon.csv'
df = pd.read_csv(pokemon)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 801 entries, 0 to 800
Data columns (total 41 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   abilities          801 non-null    object 
 1   against_bug        801 non-null    float64
 2   against_dark       801 non-null    float64
 3   against_dragon     801 non-null    float64
 4   against_electric   801 non-null    float64
 5   against_fairy      801 non-null    float64
 6   against_fight      801 non-null    float64
 7   against_fire       801 non-null    float64
 8   against_flying     801 non-null    float64
 9   against_ghost      801 non-null    float64
 10  against_grass      801 non-null    float64
 11  against_ground     801 non-null    float64
 12  against_ice        801 non-null    float64
 13  against_normal     801 non-null    float64
 14  against_poison     801 non-null    float64
 15  against_psychic    801 non-null    float64
 16  against_rock       801 non

# Indexing

To get columns as Series or DataFrame objects, just index using the column name

In [6]:
df['name']

0       Bulbasaur
1         Ivysaur
2        Venusaur
3      Charmander
4      Charmeleon
          ...    
796    Celesteela
797       Kartana
798      Guzzlord
799      Necrozma
800      Magearna
Name: name, Length: 801, dtype: object

To get a specific row in the DataFrame, you can use `.loc[]` to search by index, or `.iloc[]` to search by row number

In [14]:
print(f"We are looking at index 10, {df.iloc[10]['name']}")
df.iloc[10]

We are looking at index 10, Metapod


abilities             ['Shed Skin']
against_bug                     1.0
against_dark                    1.0
against_dragon                  1.0
against_electric                1.0
against_fairy                   1.0
against_fight                   0.5
against_fire                    2.0
against_flying                  2.0
against_ghost                   1.0
against_grass                   0.5
against_ground                  0.5
against_ice                     1.0
against_normal                  1.0
against_poison                  1.0
against_psychic                 1.0
against_rock                    2.0
against_steel                   1.0
against_water                   1.0
attack                           20
base_egg_steps                 3840
base_happiness                   70
base_total                      205
capture_rate                    120
classfication        Cocoon Pokémon
defense                          55
experience_growth           1000000
height_m                    

In this case the index was not specified when creating the DataFrame so it defaults to an integer based index (`RangeIndex`).

In [9]:
df.loc[10]

abilities             ['Shed Skin']
against_bug                     1.0
against_dark                    1.0
against_dragon                  1.0
against_electric                1.0
against_fairy                   1.0
against_fight                   0.5
against_fire                    2.0
against_flying                  2.0
against_ghost                   1.0
against_grass                   0.5
against_ground                  0.5
against_ice                     1.0
against_normal                  1.0
against_poison                  1.0
against_psychic                 1.0
against_rock                    2.0
against_steel                   1.0
against_water                   1.0
attack                           20
base_egg_steps                 3840
base_happiness                   70
base_total                      205
capture_rate                    120
classfication        Cocoon Pokémon
defense                          55
experience_growth           1000000
height_m                    

You can call the index explicitly by using the `.index` attribute. same for columns, with `.columns`.

In [12]:
print(df.columns)
print(df.index)

Index(['abilities', 'against_bug', 'against_dark', 'against_dragon',
       'against_electric', 'against_fairy', 'against_fight', 'against_fire',
       'against_flying', 'against_ghost', 'against_grass', 'against_ground',
       'against_ice', 'against_normal', 'against_poison', 'against_psychic',
       'against_rock', 'against_steel', 'against_water', 'attack',
       'base_egg_steps', 'base_happiness', 'base_total', 'capture_rate',
       'classfication', 'defense', 'experience_growth', 'height_m', 'hp',
       'japanese_name', 'name', 'percentage_male', 'pokedex_number',
       'sp_attack', 'sp_defense', 'speed', 'type1', 'type2', 'weight_kg',
       'generation', 'is_legendary'],
      dtype='object')
RangeIndex(start=0, stop=801, step=1)


# Querying

Sometimes you need to get data using criteria other than the index. DataFrame.query() allows you to easily subset your data.

In [7]:
df.query("pokedex_number==150")

Unnamed: 0,abilities,against_bug,against_dark,against_dragon,against_electric,against_fairy,against_fight,against_fire,against_flying,against_ghost,...,percentage_male,pokedex_number,sp_attack,sp_defense,speed,type1,type2,weight_kg,generation,is_legendary
149,"['Pressure', 'Unnerve']",2.0,2.0,1.0,1.0,1.0,0.5,1.0,1.0,2.0,...,,150,194,120,140,psychic,,122.0,1,1
