# Data Summarization

- Function/Method arguments and attributes<br><br>
    - `inplace="ISO-8859-1"` encoding with the amazon data set (...a subtle introduction of "data" and "object" types)
    - **coercion** as in `df.isnull().sum(axis=1)`<br><br>


- "Data" types (as opposed to "Object" types which will be discussed formally next week)<br><br>
    - `.dtypes` and `.astype()`<br><br>
    
- Summarizing data with `df.describe()` and **statistics** (as opposed to **Statistics**)<br><br>

    - $\bar x$ the **sample mean** `df['col'].mean()` 

      $\displaystyle \bar x = \frac{1}{n}\sum_{i=1}^n x_i$ 

    - $s^2$ the **sample variance** `df['col'].var()` and $s$ the **sample standard deviation (std)**  `df['col'].std()`
      
      $\displaystyle s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar x)^2 \quad \text{ and } \quad s = \sqrt{s^2}$<br><br>     
      
    - **min** `df['col'].min()`, **max** `df['col'].max()` (and $Q1$, the **median**, and $Q3$ which will be discussed later)
    
    
- Sorting, (0-based) indexing, and subsetting<br><br>

    - `.sort_values()`
    - `df[]` versus `df.loc[]` versus `df.iloc[]` (and "index" versus "row")<br><br>
        - *boolean selection* with *logical conditionals* `>` and `==` (and `!=`) versus `=` and `~` / `&` / `|` (and/or)


## Function/Method arguments and attributes

In [8]:
import pandas as pd

url = "https://raw.githubusercontent.com/pointOfive/STA130_F23/main/Data/amazonbooks.csv"
# fail https://github.com/pointOfive/STA130_F23/blob/main/Data/amazonbooks.csv

# 1. demonstrate local file
# 2. demo some ChatGPT

# a *function* with required and default *arguments*
ab = pd.read_csv(url, encoding='UTF-8') # fails
#ab = pd.read_csv(url) # fails, because it defaults to UTF-8
#ab = pd.read_csv(url, encoding="ISO-8859-1")# works!
ab

Unnamed: 0,Title,Author,List Price,Amazon Price,Hard_or_Paper,NumPages,Publisher,Pub year,ISBN-10,Height,Width,Thick,Weight_oz
0,"1,001 Facts that Will Scare the S#*t Out of Yo...",Cary McNeal,12.95,5.18,P,304.0,Adams Media,2010.0,1605506249,7.8,5.5,0.8,11.2
1,21: Bringing Down the House - Movie Tie-In: Th...,Ben Mezrich,15.00,10.20,P,273.0,Free Press,2008.0,1416564195,8.4,5.5,0.7,7.2
2,100 Best-Loved Poems (Dover Thrift Editions),Smith,1.50,1.50,P,96.0,Dover Publications,1995.0,486285537,8.3,5.2,0.3,4.0
3,1421: The Year China Discovered America,Gavin Menzies,15.99,10.87,P,672.0,Harper Perennial,2008.0,61564893,8.8,6.0,1.6,28.8
4,1493: Uncovering the New World Columbus Created,Charles C. Mann,30.50,16.77,P,720.0,Knopf,2011.0,307265722,8.0,5.2,1.4,22.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...
320,Where the Sidewalk Ends,Shel Silverstein,18.99,12.24,H,192.0,HarperCollins,2004.0,60572345,9.3,6.6,1.1,24.0
321,White Privilege,Paula S. Rothenberg,27.55,27.55,P,160.0,Worth Publishers,2011.0,1429233443,9.1,6.1,0.7,8.0
322,Why I wore lipstick,Geralyn Lucas,12.95,5.18,P,224.0,St Martin's Griffin,2005.0,031233446X,8.0,5.4,0.7,6.4
323,"Worlds Together, Worlds Apart: A History of th...",Robert Tignor,97.50,97.50,P,480.0,W. W. Norton & Company,2010.0,393934942,10.7,8.9,0.9,14.4


In [None]:
# *attribute* (not a *method*)
ab.shape

In [9]:
# *methods* (with no *arguments)
ab.isnull().sum()  # missing per column

Title            0
Author           1
List Price       1
Amazon Price     0
Hard_or_Paper    0
NumPages         2
Publisher        1
Pub year         1
ISBN-10          0
Height           4
Width            5
Thick            1
Weight_oz        9
dtype: int64

In [11]:
# *methods* (the latter with an optional *argument*)
ab.isna().sum(axis=1)  # missing per row

0      0
1      0
2      0
3      0
4      0
      ..
320    0
321    0
322    0
323    0
324    0
Length: 325, dtype: int64

In [13]:
ab['# missing on row'] = ab.isna().sum(axis=1)
ab

Unnamed: 0,Title,Author,List Price,Amazon Price,Hard_or_Paper,NumPages,Publisher,Pub year,ISBN-10,Height,Width,Thick,Weight_oz,# missing on row
0,"1,001 Facts that Will Scare the S#*t Out of Yo...",Cary McNeal,12.95,5.18,P,304.0,Adams Media,2010.0,1605506249,7.8,5.5,0.8,11.2,0
1,21: Bringing Down the House - Movie Tie-In: Th...,Ben Mezrich,15.00,10.20,P,273.0,Free Press,2008.0,1416564195,8.4,5.5,0.7,7.2,0
2,100 Best-Loved Poems (Dover Thrift Editions),Smith,1.50,1.50,P,96.0,Dover Publications,1995.0,486285537,8.3,5.2,0.3,4.0,0
3,1421: The Year China Discovered America,Gavin Menzies,15.99,10.87,P,672.0,Harper Perennial,2008.0,61564893,8.8,6.0,1.6,28.8,0
4,1493: Uncovering the New World Columbus Created,Charles C. Mann,30.50,16.77,P,720.0,Knopf,2011.0,307265722,8.0,5.2,1.4,22.4,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
320,Where the Sidewalk Ends,Shel Silverstein,18.99,12.24,H,192.0,HarperCollins,2004.0,60572345,9.3,6.6,1.1,24.0,0
321,White Privilege,Paula S. Rothenberg,27.55,27.55,P,160.0,Worth Publishers,2011.0,1429233443,9.1,6.1,0.7,8.0,0
322,Why I wore lipstick,Geralyn Lucas,12.95,5.18,P,224.0,St Martin's Griffin,2005.0,031233446X,8.0,5.4,0.7,6.4,0
323,"Worlds Together, Worlds Apart: A History of th...",Robert Tignor,97.50,97.50,P,480.0,W. W. Norton & Company,2010.0,393934942,10.7,8.9,0.9,14.4,0


## "Data" types<br>(as opposed to "Object" types which will be discussed formally next week)


In [79]:
ab_isna = ab.isna()
print(ab_isna.dtypes)
ab_isna.head()  # now they're all boolean

Title               bool
Author              bool
List Price          bool
Amazon Price        bool
Hard_or_Paper       bool
NumPages            bool
Publisher           bool
Pub year            bool
ISBN-10             bool
Height              bool
Width               bool
Thick               bool
Weight_oz           bool
# missing on row    bool
dtype: object


Unnamed: 0,Title,Author,List Price,Amazon Price,Hard_or_Paper,NumPages,Publisher,Pub year,ISBN-10,Height,Width,Thick,Weight_oz,# missing on row
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [77]:
# Why  then are these numbers?
print(ab.isna().sum(), end='\n\n')
ab.isna().sum(axis=1)

Title               0
Author              1
List Price          1
Amazon Price        0
Hard_or_Paper       0
NumPages            2
Publisher           1
Pub year            1
ISBN-10             0
Height              4
Width               5
Thick               1
Weight_oz           9
# missing on row    0
dtype: int64



0      0
1      0
2      0
3      0
4      0
      ..
320    0
321    0
322    0
323    0
324    0
Length: 325, dtype: int64

In [78]:
# This is due to something called *coercion* 
# which implicitly changes the data types in an appropriate manner

# But we can explicitly change the types of data ourselves...

print(ab.dtypes)  # originally they were all... "float" and "object" ?
ab.head()  # and `ab['# missing on row'] = ab.isna().sum(axis=1)` become an "int" ?

Title                object
Author               object
List Price          float64
Amazon Price        float64
Hard_or_Paper        object
NumPages            float64
Publisher            object
Pub year            float64
ISBN-10              object
Height              float64
Width               float64
Thick               float64
Weight_oz           float64
# missing on row      int64
dtype: object


Unnamed: 0,Title,Author,List Price,Amazon Price,Hard_or_Paper,NumPages,Publisher,Pub year,ISBN-10,Height,Width,Thick,Weight_oz,# missing on row
0,"1,001 Facts that Will Scare the S#*t Out of Yo...",Cary McNeal,12.95,5.18,P,304.0,Adams Media,2010.0,1605506249,7.8,5.5,0.8,11.2,0
1,21: Bringing Down the House - Movie Tie-In: Th...,Ben Mezrich,15.0,10.2,P,273.0,Free Press,2008.0,1416564195,8.4,5.5,0.7,7.2,0
2,100 Best-Loved Poems (Dover Thrift Editions),Smith,1.5,1.5,P,96.0,Dover Publications,1995.0,486285537,8.3,5.2,0.3,4.0,0
3,1421: The Year China Discovered America,Gavin Menzies,15.99,10.87,P,672.0,Harper Perennial,2008.0,61564893,8.8,6.0,1.6,28.8,0
4,1493: Uncovering the New World Columbus Created,Charles C. Mann,30.5,16.77,P,720.0,Knopf,2011.0,307265722,8.0,5.2,1.4,22.4,0


In [69]:
ab_dropna = ab.dropna()
new_data_types = {'Hard_or_Paper': "category", 
                  'NumPages': int,
                  'Pub year': int}
# rather than doing them separately like 
#ab_dropna_v2['Hard_or_Paper'] = ab_dropna_v2['Hard_or_Paper'].astype("object")

# Demo some ChatGPT?

ab = ab.astype(new_data_types)
#ab_dropna = ab_dropna.astype(new_data_types)
#pd.DataFrame({"Orignal": ab.dtypes, "Adjusted": ab_dropna.dtypes})

Unnamed: 0,Orignal,Adjusted
Title,object,object
Author,object,object
List Price,float64,float64
Amazon Price,float64,float64
Hard_or_Paper,object,category
NumPages,float64,int64
Publisher,object,object
Pub year,float64,int64
ISBN-10,object,object
Height,float64,float64


In [66]:
new_column_names = {k:k+" ("+v+")" for k,v in zip(ab.columns,ab_dropna.dtypes.values.astype(str))}
new_column_names

{'Title': 'Title (object)',
 'Author': 'Author (object)',
 'List Price': 'List Price (float64)',
 'Amazon Price': 'Amazon Price (float64)',
 'Hard_or_Paper': 'Hard_or_Paper (category)',
 'NumPages': 'NumPages (int64)',
 'Publisher': 'Publisher (object)',
 'Pub year': 'Pub year (int64)',
 'ISBN-10': 'ISBN-10 (object)',
 'Height': 'Height (float64)',
 'Width': 'Width (float64)',
 'Thick': 'Thick (float64)',
 'Weight_oz': 'Weight_oz (float64)',
 '# missing on row': '# missing on row (int64)'}

In [70]:
# Use inplace=True rather than ab_dropna = ab_dropna.rename(columns=new_column_names)
ab_dropna.rename(columns=new_column_names, inplace=True)  # if you like
ab_dropna.head()  # "objects" are still not really "categories"

Unnamed: 0,Title (object),Author (object),List Price (float64),Amazon Price (float64),Hard_or_Paper (category),NumPages (int64),Publisher (object),Pub year (int64),ISBN-10 (object),Height (float64),Width (float64),Thick (float64),Weight_oz (float64),# missing on row (int64)
0,"1,001 Facts that Will Scare the S#*t Out of Yo...",Cary McNeal,12.95,5.18,P,304,Adams Media,2010,1605506249,7.8,5.5,0.8,11.2,0
1,21: Bringing Down the House - Movie Tie-In: Th...,Ben Mezrich,15.0,10.2,P,273,Free Press,2008,1416564195,8.4,5.5,0.7,7.2,0
2,100 Best-Loved Poems (Dover Thrift Editions),Smith,1.5,1.5,P,96,Dover Publications,1995,486285537,8.3,5.2,0.3,4.0,0
3,1421: The Year China Discovered America,Gavin Menzies,15.99,10.87,P,672,Harper Perennial,2008,61564893,8.8,6.0,1.6,28.8,0
4,1493: Uncovering the New World Columbus Created,Charles C. Mann,30.5,16.77,P,720,Knopf,2011,307265722,8.0,5.2,1.4,22.4,0


## Summarizing data with `df.describe()` and *statistics* (as opposed to *Statistics*)

The sample mean, sample variance, and sample standard devation are examples of **statistics** which are important in the discipline of **Statistics**
$$\huge \displaystyle \bar x = \frac{1}{n}\sum_{i=1}^n x_i \quad\quad\quad \displaystyle s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar x)^2 \quad\quad\quad s=\sqrt{s^2}$$ 


In [117]:
url = "https://raw.githubusercontent.com/KeithGalli/pandas/master/pokemon_data.csv"
# fail https://github.com/KeithGalli/pandas/blob/master/pokemon_data.csv
pokeaman = pd.read_csv(url) 
colnames_wtype = {k:k+" ("+v+")" for k,v in zip(pokeaman.columns,pokeaman.dtypes.values.astype(str))}
pokeaman.rename(columns=colnames_wtype, inplace=True)
pokeaman

Unnamed: 0,# (int64),Name (object),Type 1 (object),Type 2 (object),HP (int64),Attack (int64),Defense (int64),Sp. Atk (int64),Sp. Def (int64),Speed (int64),Generation (int64),Legendary (bool)
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,80,160,60,170,130,80,6,True


In [87]:
# Why does this not have all the columns?
pokeaman.describe()

Unnamed: 0,# (int64),HP (int64),Attack (int64),Defense (int64),Sp. Atk (int64),Sp. Def (int64),Speed (int64),Generation (int64)
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,208.343798,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.75,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,364.5,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,539.25,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


Because these are summaries for **numieric** data types...

- $\bar x$ the **sample mean** `df['col'].mean()` 

  $\displaystyle \bar x = \frac{1}{n}\sum_{i=1}^n x_i$ 

- $s$ the **sample standard deviation (std)** `df['col'].std()`

  $\displaystyle s = \sqrt{s^2}$

  > $s^2$ the **sample variance** `df['col'].var()`
  >  
  > $\displaystyle s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar x)^2$      
        
- and where **min** `df['col'].min()` and **max** `df['col'].max()` are (hopefully) obvious
- and **25%, 50%, and 75%** are the first, second, and third **quantiles** referred to as $Q1$, the **median**, and $Q3$ (but these will not be discussed later)


In [95]:
# Another "explanation" as to why `.describe()` doesn't have all the columns is "because"
pokeaman['Type 1 (object)'].value_counts()
# where the most frequently occuring value is called the *mode*

Water       112
Normal       98
Grass        70
Bug          69
Psychic      57
Fire         52
Electric     44
Rock         44
Dragon       32
Ground       32
Ghost        32
Dark         31
Poison       28
Steel        27
Fighting     27
Ice          24
Fairy        17
Flying        4
Name: Type 1 (object), dtype: int64

In [99]:
# And where the `dropna=False` *argument* can be added to include a count of missing values
pokeaman['Type 2 (object)'].value_counts(dropna=False)  # 'Type 1 (object)' doesn't have NaNs

NaN         386
Flying       97
Ground       35
Poison       34
Psychic      33
Fighting     26
Grass        25
Fairy        23
Steel        22
Dark         20
Dragon       18
Ice          14
Rock         14
Water        14
Ghost        14
Fire         12
Electric      6
Normal        4
Bug           3
Name: Type 2 (object), dtype: int64

## Sorting, (0-based) indexing, and subsetting

In [None]:
colnames_wotype = {col: col.split(" (")[0] for col in pokeaman.columns.astype(str)}
pokeaman.rename(columns=colnames_wotype, inplace=True)

In [126]:
# sorting
pokeaman.sort_values("Attack", ascending=False) 

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,106,190,100,154,100,130,1,True
232,214,HeracrossMega Heracross,Bug,Fighting,80,185,115,40,105,75,2,False
424,383,GroudonPrimal Groudon,Ground,Fire,100,180,160,150,90,90,3,True
426,384,RayquazaMega Rayquaza,Dragon,Flying,105,180,100,180,100,115,3,True
429,386,DeoxysAttack Forme,Psychic,,50,180,20,180,20,150,3,True
...,...,...,...,...,...,...,...,...,...,...,...,...
139,129,Magikarp,Water,,20,10,55,15,20,80,1,False
261,242,Blissey,Normal,,255,10,10,75,135,55,2,False
230,213,Shuckle,Bug,Rock,20,10,230,10,230,5,2,False
121,113,Chansey,Normal,,250,5,5,35,105,50,1,False


In [139]:
pokeaman[:10][['Name','Type 1']]

Unnamed: 0,Name,Type 1
0,Bulbasaur,Grass
1,Ivysaur,Grass
2,Venusaur,Grass
3,VenusaurMega Venusaur,Grass
4,Charmander,Fire
5,Charmeleon,Fire
6,Charizard,Fire
7,CharizardMega Charizard X,Fire
8,CharizardMega Charizard Y,Fire
9,Squirtle,Water


In [133]:
# (0-based) indexing 

# indexing V1: .iloc and [ rows , cols] specifically [ rowStart : rowEndPlus1 , colstart : rowEndPlus1]

pokeaman.iloc[ :10 , : ] 
pokeaman.iloc[ 0:10 , : ] 
pokeaman.iloc[ :10 , 1:3 ] 

Unnamed: 0,Name,Type 1
0,Bulbasaur,Grass
1,Ivysaur,Grass
2,Venusaur,Grass
3,VenusaurMega Venusaur,Grass
4,Charmander,Fire
5,Charmeleon,Fire
6,Charizard,Fire
7,CharizardMega Charizard X,Fire
8,CharizardMega Charizard Y,Fire
9,Squirtle,Water


In [134]:
# "rows" versus "index"
pokeaman.dropna().iloc[ :10 , 1:3 ]

Unnamed: 0,Name,Type 1
0,Bulbasaur,Grass
1,Ivysaur,Grass
2,Venusaur,Grass
3,VenusaurMega Venusaur,Grass
6,Charizard,Fire
7,CharizardMega Charizard X,Fire
8,CharizardMega Charizard Y,Fire
15,Butterfree,Bug
16,Weedle,Bug
17,Kakuna,Bug


In [135]:
# more "rows" versus "index"
pokeaman.sort_values(["Attack","Defense"], ascending=[False,True]).iloc[ :10, : ]

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,106,190,100,154,100,130,1,True
232,214,HeracrossMega Heracross,Bug,Fighting,80,185,115,40,105,75,2,False
429,386,DeoxysAttack Forme,Psychic,,50,180,20,180,20,150,3,True
426,384,RayquazaMega Rayquaza,Dragon,Flying,105,180,100,180,100,115,3,True
424,383,GroudonPrimal Groudon,Ground,Fire,100,180,160,150,90,90,3,True
711,646,KyuremBlack Kyurem,Dragon,Ice,125,170,100,120,90,95,5,True
494,445,GarchompMega Garchomp,Dragon,Ground,108,170,115,120,95,92,4,False
454,409,Rampardos,Rock,,97,165,60,65,50,58,4,False
387,354,BanetteMega Banette,Ghost,,64,165,75,93,83,75,3,False
527,475,GalladeMega Gallade,Psychic,Fighting,68,165,95,65,115,110,4,False


In [None]:
# (0-based) indexing 

# indexing V2: [row_sequence_subset] or [column_name_list] or [row_sequence_subset][column_name_list]

In [141]:
pokeaman[:10] # pokeaman.iloc[ :10 , : ]
pokeaman[0:10] # pokeaman.iloc[ 0:10 , : ] 
pokeaman[:10][['Name','Type 1']] # pokeaman.iloc[ :10 , 1:3 ] 
# or try
pokeaman[['Name','Type 1']] # but notice that `pokeaman['Name','Type 1']` won't work(!)

Unnamed: 0,Name,Type 1
0,Bulbasaur,Grass
1,Ivysaur,Grass
2,Venusaur,Grass
3,VenusaurMega Venusaur,Grass
4,Charmander,Fire
...,...,...
795,Diancie,Rock
796,DiancieMega Diancie,Rock
797,HoopaHoopa Confined,Psychic
798,HoopaHoopa Unbound,Psychic


In [142]:
# subsetting

# indexing V3: .loc and [ logical_conditional , colname_list ] 

pokeaman.Legendary
pokeaman[pokeaman.Legendary]

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
156,144,Articuno,Ice,Flying,90,85,100,95,125,85,1,True
157,145,Zapdos,Electric,Flying,90,90,85,125,90,100,1,True
158,146,Moltres,Fire,Flying,90,100,90,125,85,90,1,True
162,150,Mewtwo,Psychic,,106,110,90,154,90,130,1,True
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,106,190,100,154,100,130,1,True
...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,80,160,60,170,130,80,6,True


In [143]:
~pokeaman.Legendary
pokeaman[~pokeaman.Legendary]

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...
787,711,GourgeistSuper Size,Ghost,Grass,85,100,122,58,75,54,6,False
788,712,Bergmite,Ice,,55,69,85,32,35,28,6,False
789,713,Avalugg,Ice,,95,117,184,44,46,28,6,False
790,714,Noibat,Flying,Dragon,40,30,35,45,40,55,6,False


In [149]:
(pokeaman["HP"] > 80)  # what would `~(pokeaman["HP"] > 80)` be?
pokeaman[ pokeaman["HP"] > 80 ]

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
22,18,Pidgeot,Normal,Flying,83,80,75,70,70,101,1,False
23,18,PidgeotMega Pidgeot,Normal,Flying,83,80,80,135,80,121,1,False
36,31,Nidoqueen,Poison,Ground,90,92,87,75,85,76,1,False
39,34,Nidoking,Poison,Ground,81,102,77,85,75,85,1,False
41,36,Clefable,Fairy,,95,70,73,95,90,60,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...
789,713,Avalugg,Ice,,95,117,184,44,46,28,6,False
791,715,Noivern,Flying,Dragon,85,70,80,97,80,123,6,False
792,716,Xerneas,Fairy,,126,131,95,131,98,99,6,True
793,717,Yveltal,Dark,Flying,126,131,95,131,98,99,6,True


In [146]:
# (pokeaman["HP"] > 80) & (pokeaman["Type 2"] == "Fighting")
pokeaman[ (pokeaman["HP"] > 80) & (pokeaman["Type 2"] == "Fighting") ]

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
67,62,Poliwrath,Water,Fighting,90,95,95,70,90,70,1,False
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,106,190,100,154,100,130,1,True
504,454,Toxicroak,Poison,Fighting,83,106,65,86,65,85,4,False
558,499,Pignite,Fire,Fighting,90,93,55,70,55,55,5,False
559,500,Emboar,Fire,Fighting,110,123,65,100,65,65,5,False
699,638,Cobalion,Steel,Fighting,91,90,129,90,72,108,5,True
700,639,Terrakion,Rock,Fighting,91,129,90,72,90,108,5,True
701,640,Virizion,Grass,Fighting,91,90,72,90,129,108,5,True
713,647,KeldeoOrdinary Forme,Water,Fighting,91,72,90,129,90,108,5,False
714,647,KeldeoResolute Forme,Water,Fighting,91,72,90,129,90,108,5,False


In [152]:
# something like `pokeaman.Type 2` wouldn't work... why?
pokeaman.loc[~(pokeaman.HP > 120) | (pokeaman.Defense > 180)]
# pokeaman.query("HP > 120 and Legendary == True")

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,80,160,60,170,130,80,6,True


> There's probably not time, but if there is... we could review/demo the pokemon data set a little bit more
    - with more complex *chaining* `df.dropna.groupby('col1')...`

In [2]:
pokemon.describe()

Unnamed: 0,#,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,208.343798,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.75,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,364.5,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,539.25,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


In [31]:
pokemon[["Type 1","Type 2"]].value_counts()

Type 1    Type 2
Normal    Flying    24
Grass     Poison    15
Bug       Flying    14
          Poison    12
Ghost     Grass     10
                    ..
Fire      Rock       1
Ice       Ghost      1
Fire      Dragon     1
Fighting  Flying     1
Water     Steel      1
Name: count, Length: 136, dtype: int64

In [3]:
pokemon.groupby('Type 1').describe()

Unnamed: 0_level_0,#,#,#,#,#,#,#,#,HP,HP,...,Speed,Speed,Generation,Generation,Generation,Generation,Generation,Generation,Generation,Generation
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
Type 1,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Bug,69.0,334.492754,210.44516,10.0,168.0,291.0,543.0,666.0,69.0,56.884058,...,85.0,160.0,69.0,3.217391,1.598433,1.0,2.0,3.0,5.0,6.0
Dark,31.0,461.354839,176.022072,197.0,282.0,509.0,627.0,717.0,31.0,66.806452,...,98.5,125.0,31.0,4.032258,1.353609,2.0,3.0,5.0,5.0,6.0
Dragon,32.0,474.375,170.190169,147.0,373.0,443.5,643.25,718.0,32.0,83.3125,...,97.75,120.0,32.0,3.875,1.431219,1.0,3.0,4.0,5.0,6.0
Electric,44.0,363.5,202.731063,25.0,179.75,403.5,489.75,702.0,44.0,59.795455,...,101.5,140.0,44.0,3.272727,1.604697,1.0,2.0,4.0,4.25,6.0
Fairy,17.0,449.529412,271.983942,35.0,176.0,669.0,683.0,716.0,17.0,74.117647,...,60.0,99.0,17.0,4.117647,2.14716,1.0,2.0,6.0,6.0,6.0
Fighting,27.0,363.851852,218.5652,56.0,171.5,308.0,536.0,701.0,27.0,69.851852,...,86.0,118.0,27.0,3.37037,1.800601,1.0,1.5,3.0,5.0,6.0
Fire,52.0,327.403846,226.26284,4.0,143.5,289.5,513.25,721.0,52.0,69.903846,...,96.25,126.0,52.0,3.211538,1.850665,1.0,1.0,3.0,5.0,6.0
Flying,4.0,677.75,42.437209,641.0,641.0,677.5,714.25,715.0,4.0,70.75,...,121.5,123.0,4.0,5.5,0.57735,5.0,5.0,5.5,6.0,6.0
Ghost,32.0,486.5,209.189218,92.0,354.75,487.0,709.25,711.0,32.0,64.4375,...,84.25,130.0,32.0,4.1875,1.693203,1.0,3.0,4.0,6.0,6.0
Grass,70.0,344.871429,200.264385,1.0,187.25,372.0,496.75,673.0,70.0,67.271429,...,80.0,145.0,70.0,3.357143,1.579173,1.0,2.0,3.5,5.0,6.0


In [35]:
pokemon.groupby('Type 1').describe().columns

MultiIndex([(         '#', 'count'),
            (         '#',  'mean'),
            (         '#',   'std'),
            (         '#',   'min'),
            (         '#',   '25%'),
            (         '#',   '50%'),
            (         '#',   '75%'),
            (         '#',   'max'),
            (        'HP', 'count'),
            (        'HP',  'mean'),
            (        'HP',   'std'),
            (        'HP',   'min'),
            (        'HP',   '25%'),
            (        'HP',   '50%'),
            (        'HP',   '75%'),
            (        'HP',   'max'),
            (    'Attack', 'count'),
            (    'Attack',  'mean'),
            (    'Attack',   'std'),
            (    'Attack',   'min'),
            (    'Attack',   '25%'),
            (    'Attack',   '50%'),
            (    'Attack',   '75%'),
            (    'Attack',   'max'),
            (   'Defense', 'count'),
            (   'Defense',  'mean'),
            (   'Defense',   'std'),
 

In [19]:
pokemon.groupby('Type 1').describe().sort_values(('HP','mean'), ascending=False)

Unnamed: 0_level_0,#,#,#,#,#,#,#,#,HP,HP,...,Speed,Speed,Generation,Generation,Generation,Generation,Generation,Generation,Generation,Generation
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
Type 1,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Dragon,32.0,474.375,170.190169,147.0,373.0,443.5,643.25,718.0,32.0,83.3125,...,97.75,120.0,32.0,3.875,1.431219,1.0,3.0,4.0,5.0,6.0
Normal,98.0,319.173469,193.85482,16.0,161.25,296.5,483.0,676.0,98.0,77.27551,...,90.75,135.0,98.0,3.05102,1.575407,1.0,2.0,3.0,4.0,6.0
Fairy,17.0,449.529412,271.983942,35.0,176.0,669.0,683.0,716.0,17.0,74.117647,...,60.0,99.0,17.0,4.117647,2.14716,1.0,2.0,6.0,6.0,6.0
Ground,32.0,356.28125,204.899855,27.0,183.25,363.5,535.25,645.0,32.0,73.78125,...,90.0,120.0,32.0,3.15625,1.588454,1.0,1.75,3.0,5.0,5.0
Water,112.0,303.089286,188.440807,7.0,130.0,275.0,456.25,693.0,112.0,72.0625,...,82.0,122.0,112.0,2.857143,1.5588,1.0,1.0,3.0,4.0,6.0
Ice,24.0,423.541667,175.465834,124.0,330.25,371.5,583.25,713.0,24.0,72.0,...,80.0,110.0,24.0,3.541667,1.473805,1.0,2.75,3.0,5.0,6.0
Flying,4.0,677.75,42.437209,641.0,641.0,677.5,714.25,715.0,4.0,70.75,...,121.5,123.0,4.0,5.5,0.57735,5.0,5.0,5.5,6.0,6.0
Psychic,57.0,380.807018,194.600455,63.0,201.0,386.0,528.0,720.0,57.0,70.631579,...,104.0,180.0,57.0,3.385965,1.644845,1.0,2.0,3.0,5.0,6.0
Fire,52.0,327.403846,226.26284,4.0,143.5,289.5,513.25,721.0,52.0,69.903846,...,96.25,126.0,52.0,3.211538,1.850665,1.0,1.0,3.0,5.0,6.0
Fighting,27.0,363.851852,218.5652,56.0,171.5,308.0,536.0,701.0,27.0,69.851852,...,86.0,118.0,27.0,3.37037,1.800601,1.0,1.5,3.0,5.0,6.0


In [5]:
pokemon.groupby('Type 1').mean(numeric_only=True).round(3)

Unnamed: 0_level_0,#,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Type 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bug,334.493,56.884,70.971,70.725,53.87,64.797,61.681,3.217,0.0
Dark,461.355,66.806,88.387,70.226,74.645,69.516,76.161,4.032,0.065
Dragon,474.375,83.312,112.125,86.375,96.844,88.844,83.031,3.875,0.375
Electric,363.5,59.795,69.091,66.295,90.023,73.705,84.5,3.273,0.091
Fairy,449.529,74.118,61.529,65.706,78.529,84.706,48.588,4.118,0.059
Fighting,363.852,69.852,96.778,65.926,53.111,64.704,66.074,3.37,0.0
Fire,327.404,69.904,84.769,67.769,88.981,72.212,74.442,3.212,0.096
Flying,677.75,70.75,78.75,66.25,94.25,72.5,102.5,5.5,0.5
Ghost,486.5,64.438,73.781,81.188,79.344,76.469,64.344,4.188,0.062
Grass,344.871,67.271,73.214,70.8,77.5,70.429,61.929,3.357,0.043
