# pandas Dataframes - Slicing and Filtering

## lesson_1_2_2

## We will use the same dataframe as last lesson.
### Import packages

In [43]:
import pandas as pd

### Creating a Basic Dataframe From JSON

In [44]:
# define the data as a list
data = [
    ("Dexter","Johnsons","dog","shiba inu","red sesame",1.5,35,"m",False,"both",True),
    ("Alfred","Johnsons","cat","mix","tuxedo",4,12,"m",True,"indoor",True),
    ("Petra","Smith","cat","ragdoll","calico",None,10,"f",False,"both",True),
    ("Ava","Smith","dog","mix","blk/wht",12,32,"f",True,"both",False),
    ("Schroder","Brown","cat","mix","orange",13,15,"m",False,"indoor",True),
    ("Blackbeard","Brown","bird","parrot","multi",5,3,"f",False,"indoor",),
]

# define the labels
labels = ["name","owner","type","breed","color","age","weight","gender","health issues","indoor/outboor","vaccinated"]

# create dataframe
vet_records = pd.DataFrame.from_records(data, columns=labels)

In [45]:
vet_records

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,


### A Note of Caution

Changes and updates to a dataframe is only permanent if saved to the dataframe.  So for example we might say `vet_records = ...` to permanently change the dataframe `vet_records`.  In many cases keeping a reference dataframe is a good practice.  For example, `vet_records_dogs = vet_records[vet_records.type=="dog"]` instead of `vet_records = vet_records[vet_records.type=="dog"]`.  This will leave you with a dataframe to reference that contains the unaldulterated data.

### Grouping and Counting Data

Using counting and grouping can help you get a better grasp of the data.

In [46]:
# How many types of pets do we have?
vet_records.type.count()

6

In [47]:
vet_records.groupby('type').count()

Unnamed: 0_level_0,name,owner,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
bird,1,1,1,1,1,1,1,1,1,0
cat,3,3,3,3,2,3,3,3,3,3
dog,2,2,2,2,2,2,2,2,2,2


In [49]:
vet_records.type.value_counts()

type
cat     3
dog     2
bird    1
Name: count, dtype: int64

### Slicing (Filtering) Data

Slicing data, that is, picking parts of teh data you want to use for a specific purpose is easy with pandas once you have the conpcets down.


#### Here we slice the data to get only the weight column.

In [50]:
vet_records

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,


In [57]:
# Create a pandas series from the dataframe
weight = vet_records[["weight","gender","name"]]

In [59]:
weight

Unnamed: 0,weight,gender,name
0,35,m,Dexter
1,12,m,Alfred
2,10,f,Petra
3,32,f,Ava
4,15,m,Schroder
5,3,f,Blackbeard


Notice that vet_records was not changed

In [61]:
vet_records.head()

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True


While `weight` does show us all the weights for the animals in the dataframe, unless we are interested in straight weight values for some calculation, it is not very useful data.  A list of numbers by themselves is usually not data that can be used.

So, instead let's get all the dog weights.

In [64]:
# Collect the dog weights only using a boolean filter
dog_weight = vet_records.weight[vet_records.type=='dog']

In [65]:
dog_weight

0    35
3    32
Name: weight, dtype: int64

While this still only is a list of values, at least by the variable name we know these are the weights of all the dogs in the sample.

A better way might be to just slice all the dog data.

In [68]:
dogs = vet_records[vet_records.type=='dog']

In [69]:
dogs

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False


#### Using `loc` and `iloc`

- `loc` allows you to use column names to slice data
- `iloc` requires the use of index numbers.  Example: `.iloc[row, column]`. Remember: python indexes starting at 0.

In [70]:
vet_records

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,


In [71]:
# get the pet name and owner for the 2nd record in the dataframe
vet_records.loc[1,["name", "owner"]]

name       Alfred
owner    Johnsons
Name: 1, dtype: object

In [72]:
# get the pet name and owner for all pets in the dataframe
vet_records.loc[:,["name", "owner"]]

Unnamed: 0,name,owner
0,Dexter,Johnsons
1,Alfred,Johnsons
2,Petra,Smith
3,Ava,Smith
4,Schroder,Brown
5,Blackbeard,Brown


In [74]:
# get all the names of the pets using iloc
vet_records.iloc[:,0]

0        Dexter
1        Alfred
2         Petra
3           Ava
4      Schroder
5    Blackbeard
Name: name, dtype: object

In [75]:
# get the name Petra
vet_records.iloc[2,0]

'Petra'

In [76]:
# get the color and age of the 3rd and 4th pet, notice these are not contiguous
vet_records.iloc[[2,3],[4,5]]

Unnamed: 0,color,age
2,calico,
3,blk/wht,12.0


#### `.isin` can be used to gather data about a list of items

Collect the data for Dexter and Blackbeard

In [77]:
vet_records

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,


In [78]:

vet_records[vet_records.name.isin(['Dexter','Blackbeard'])]

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,


#### `~` can be used as a *not* logical operator.

Here we ask for all pets **_not_** named Dexter or Blackbeard

In [79]:
vet_records[~vet_records.name.isin(['Dexter','Blackbeard'])]

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True


#### Boolean Masks

There are times when a boolean mask will be useful to you.  They are similar to filtereing by booleans, but involve using `mask` file.  The `mask` name is what I choose to call them they can be named anything you like.

Create a mask for male pets.

In [82]:
vet_records

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,


In [80]:
mask = vet_records.gender=='m'

Notice this is a series of `True` and `False` where if the gender column as "m", then it was True.

In [81]:
mask

0     True
1     True
2    False
3    False
4     True
5    False
Name: gender, dtype: bool

Applying this series as a mask results in only returning the male pets.  You can also use `~` to get the female pets.

In [85]:
df = vet_records[mask]

Finally check to see that vet_records was not altered.

In [87]:
vet_records

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,


### None and NaN
#### `.isna` will create a boolean dataframe `True` where the value is `NaN` or `None`.
**It is advisable to deal with NaN and None values before doing ny calculations.  A NaN and None cell are ignored during calculations.**

In [88]:
vet_records.isna()

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,True,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False
5,False,False,False,False,False,False,False,False,False,False,True


In [89]:
vet_records_example = vet_records.fillna(0)

In [92]:
vet_records_example

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,0.0,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,0


#### Use `fillna` With a Values Dictionary

In [93]:
values = {"age": 12, "vaccinated": False} #default values yang kita declare

In [94]:
type(values)

dict

In [95]:
vet_records.fillna(value=values)

  vet_records.fillna(value=values)


Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,12.0,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,False


Notice that `vet_records` was not changed.  It would need to set equal to another variable or itself to save the changes.

In [96]:
vet_records

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,


In [97]:
vet_records_na = vet_records.fillna(value=values)

  vet_records_na = vet_records.fillna(value=values)


In [98]:
vet_records_na

Unnamed: 0,name,owner,type,breed,color,age,weight,gender,health issues,indoor/outboor,vaccinated
0,Dexter,Johnsons,dog,shiba inu,red sesame,1.5,35,m,False,both,True
1,Alfred,Johnsons,cat,mix,tuxedo,4.0,12,m,True,indoor,True
2,Petra,Smith,cat,ragdoll,calico,12.0,10,f,False,both,True
3,Ava,Smith,dog,mix,blk/wht,12.0,32,f,True,both,False
4,Schroder,Brown,cat,mix,orange,13.0,15,m,False,indoor,True
5,Blackbeard,Brown,bird,parrot,multi,5.0,3,f,False,indoor,False
