# Harry Potter Dataset
## Data Wrangling: Joining & Combining Data

<img src='hogwarts.JPG'>
<i>Image by inspiredbythemuse from <a href="https://pixabay.com/illustrations/hogwarts-owl-hedwig-harry-potter-2036645/">Pixabay</a></i>

Now we will look more into joining and combining data. It is often the case where you are working with data from various sources and need to combine this data for further analysis.

In this notebook, you will use a number of Harry Potter related datasets to practice data manipulation. First, download the notebook and the Excel file titled `harry_potter.xlsx` and place them in the same folder. 

Also note that there are sometimes different ways to do the same thing when using Python and Pandas for data manipulation. Some ways are more efficient than others, especially with larger data sets, but since these datasets are so small, the suggested solutions may not be the most efficient. Instead, our goal is to show you different options when it comes to completing the tasks.

The complete set of Harry Potter datasets can be found here: [Kaggle](https://www.kaggle.com/gulsahdemiryurek/harry-potter-dataset)

### Initial Imports

In [1]:
# import common Libraries
import pandas as pd
import numpy as np

In [2]:
 # import data from Excel 
fileName = "harry_potter.xlsx" #ensure file is in same location as notebook or add path

# create ExcelFile object
xls = pd.ExcelFile(fileName)

# load individual sheets / we'll explain the data as we go along
gryffindor = pd.read_excel(xls, 'gryffindor', index_col='ID')
slytherin = pd.read_excel(xls, 'slytherin', index_col='ID')
hufflepuff = pd.read_excel(xls, 'hufflepuff', index_col='ID')
ravenclaw = pd.read_excel(xls, 'ravenclaw', index_col='ID')
other1 = pd.read_excel(xls, 'other1', index_col='ID')
other2 = pd.read_excel(xls, 'other2', index_col='ID')
loyalty = pd.read_excel(xls, 'loyalty', index_col='ID')
dead_characters = pd.read_excel(xls, 'dead_characters', index_col='ID')
gryffindor_bloodstatus = pd.read_excel(xls, 'gryffindor_bloodstatus', index_col='ID')
gryffindor_species = pd.read_excel(xls, 'gryffindor_species', index_col='ID')
color = pd.read_excel(xls, 'color', index_col='ID')
wand = pd.read_excel(xls, 'wand', index_col='ID')

Let's take a look at the first dataset: `gryffindor`

In [3]:
gryffindor.head()

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood


Let's also take a closer look at the `Species` attribute.

In [4]:
gryffindor['Species'].value_counts()

Human                      36
Human                       3
Werewolf                    1
Human (Werewolf traits)     1
Ghost                       1
Name: Species, dtype: int64

Notice that there are two values labeled `Human`. Why is that?

This happens a lot when loading messy datasets (especially with Excel files) where there can be either leading or trailing spaces. There are a couple of ways to clean this up, and one of the easiest ways is to use the `strip()` method.

#### Student Practice
Try to perform the following tasks on the indicated dataset. Then check your answers as I walk through the solutions. 

**Exercise:** Use the `strip()` method to remove leading or trailing spaces from the `Species` attribute. Print the value counts for the `Species` column to double check your work.

In [5]:
gryffindor['Species'] = gryffindor['Species'].str.strip()

In [6]:
gryffindor['Species'].value_counts()

Human                      39
Werewolf                    1
Human (Werewolf traits)     1
Ghost                       1
Name: Species, dtype: int64

Good. That took care of the trailing spaces for this column. We can now create a function to remove leading and trailing spaces from all columns of their respective DataFrames. 

**Bonus Exercise:** Create a function called `whitespace` that takes as input a list of the Harry Potter DataFrames. This function should strip the leading and trailing spaces from each column that has an `object` data type for each DataFrame.

In [7]:
slytherin['Species'].value_counts()

Human     25
Human      2
Ghost      1
Name: Species, dtype: int64

In [8]:
df_list = [slytherin, loyalty]

def whitespace(df_list):
    for df in df_list:
        for col in df.columns:
            if df[col].dtype == 'object':
                df[col] = df[col].str.strip()

In [9]:
whitespace(df_list)

In [10]:
slytherin['Species'].value_counts()

Human    27
Ghost     1
Name: Species, dtype: int64

### Adding Rows

#### Append
Let's now look at the [append()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html) method. The `append()` method appends rows from the other dataframe to the end of the caller, returning a new object.

Note the following from the `Python Data Science Handbook`:

> Keep in mind that unlike the `append()` and `extend()` methods of Python lists, the `append()` method in Pandas does not modify the original object–instead it creates a new object with the combined data. It also is not a very efficient method, because it involves creation of a new index and data buffer. Thus, if you plan to do multiple append operations, it is generally better to build a list of DataFrames and pass them all at once to the `concat()` function.

In [11]:
# create a simple dataset
df1 = pd.DataFrame({'a': [1,2,3,4],
                  'b': [5,6,7,8]})

df1

Unnamed: 0,a,b
0,1,5
1,2,6
2,3,7
3,4,8


In [12]:
# create another simple dataset
df2 = pd.DataFrame({'a': [9,10,11,12],
                  'b': [13,14,15,16]})
df2

Unnamed: 0,a,b
0,9,13
1,10,14
2,11,15
3,12,16


In [13]:
# use append() to add rows of df2 to the bottom of df1, notice the duplicate indices
df3 = df1.append(df2)
df3

  df3 = df1.append(df2)


Unnamed: 0,a,b
0,1,5
1,2,6
2,3,7
3,4,8
0,9,13
1,10,14
2,11,15
3,12,16


In [14]:
# setup alternate DataFrame
df2_alternate = pd.DataFrame({'a': [9,10,11,12],
                  'c': [13,14,15,16]})
df2_alternate

Unnamed: 0,a,c
0,9,13
1,10,14
2,11,15
3,12,16


In [15]:
# column in df2_alternate that is not in df1 is added as new column
df3_alternate = df1.append(df2_alternate)
df3_alternate

  df3_alternate = df1.append(df2_alternate)


Unnamed: 0,a,b,c
0,1,5.0,
1,2,6.0,
2,3,7.0,
3,4,8.0,
0,9,,13.0
1,10,,14.0
2,11,,15.0
3,12,,16.0


In [16]:
# argument `verify_integrity` warns about duplicate indices
# df3_alternate = df1.append(df2_alternate, verify_integrity=True)
# df3_alternate

In [17]:
# using `ignore_index`
df3_alternate = df1.append(df2_alternate, ignore_index=True)
df3_alternate

  df3_alternate = df1.append(df2_alternate, ignore_index=True)


Unnamed: 0,a,b,c
0,1,5.0,
1,2,6.0,
2,3,7.0,
3,4,8.0,
4,9,,13.0
5,10,,14.0
6,11,,15.0
7,12,,16.0


#### Concat

Panda's [concat()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) method can be used for simple concatenation of arrays and dataframes along a particular axis. One important thing to note is that Pandas concatenation preserves indices, even if the result will have duplicate indices (similar to the append method above). We'll see this more in a moment.

In [18]:
# remind ourselves of the DataFrame
df1

Unnamed: 0,a,b
0,1,5
1,2,6
2,3,7
3,4,8


In [19]:
# remind ourselves of the DataFrame
df2

Unnamed: 0,a,b
0,9,13
1,10,14
2,11,15
3,12,16


In [20]:
# using concat
pd.concat([df1,df2])

Unnamed: 0,a,b
0,1,5
1,2,6
2,3,7
3,4,8
0,9,13
1,10,14
2,11,15
3,12,16


In [21]:
# using axis=1
pd.concat([df1,df2], axis=1)

Unnamed: 0,a,b,a.1,b.1
0,1,5,9,13
1,2,6,10,14
2,3,7,11,15
3,4,8,12,16


In [22]:
# can pass keys for multi-index
pd.concat([df1,df2], axis=1, keys=['key1', 'key2'])

Unnamed: 0_level_0,key1,key1,key2,key2
Unnamed: 0_level_1,a,b,a,b
0,1,5,9,13
1,2,6,10,14
2,3,7,11,15
3,4,8,12,16


What is the difference between `concat` and `append`? `Concat` gives the flexibility to join based on the axis( all rows or all columns). `Append` is the specific case (axis=0, join='outer') of `concat`

Let's review basic outer vs inner joins.

In [23]:
# create another DataFrame 
df4 = pd.DataFrame({'a': [17,18,19,20],
                  'c': [21,22,23,24]})
df4

Unnamed: 0,a,c
0,17,21
1,18,22
2,19,23
3,20,24


In [24]:
# outer join (df1 union df4)
pd.concat([df1,df4], join='outer')

Unnamed: 0,a,b,c
0,1,5.0,
1,2,6.0,
2,3,7.0,
3,4,8.0,
0,17,,21.0
1,18,,22.0
2,19,,23.0
3,20,,24.0


In [25]:
# inner join (df1 intersect df4)
pd.concat([df1,df4], join='inner')

Unnamed: 0,a
0,1
1,2
2,3
3,4
0,17
1,18
2,19
3,20


You should also notice that the index (in this case) column didn't change with the new DataFrame. This is good in some cases since some indices are unique identifiers. But in some circumstances, you don't want the index to remain the same (for example when you have multiple rows with the same index). In these cases, you could use the `ignore_index=True` argument to reset the indices.

In [26]:
# using default concat method
pd.concat([df1,df2])

Unnamed: 0,a,b
0,1,5
1,2,6
2,3,7
3,4,8
0,9,13
1,10,14
2,11,15
3,12,16


In [27]:
# using ignore_index=True
pd.concat([df1,df2], ignore_index=True)

Unnamed: 0,a,b
0,1,5
1,2,6
2,3,7
3,4,8
4,9,13
5,10,14
6,11,15
7,12,16


#### Student Practice
Try to perform the following tasks on the indicated dataset. Then check your answers as I walk through the solutions. 

We now want to create a new DataFrame that contains all the characters from the Harry Potter series. 

*Note: There are easier ways to do this all at once, but we want to call your attention to some things so we will add to this DataFrame a little at a time.*

**Exercise:** Use the `copy()` method to copy the `gryffindor` dataset to a new DataFrame that you will call `hp_characters`.

In [28]:
hp_characters = gryffindor.copy()
hp_characters

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood
5,Fred Weasley,Male,Student,Gryffindor,?,Human,Pure-blood
6,George Weasley,Male,Student,Gryffindor,?,Human,Pure-blood
7,Ginevra (Ginny) Molly Weasley,Female,Student,Gryffindor,Horse,Human,Pure-blood
8,Dean Thomas,Male,Student,Gryffindor,?,Human,Muggle-born
9,Seamus Finnigan,Male,Student,Gryffindor,Fox,Human,Half-blood


**Exercise:** Add the `slytherin` dataset to the new `hp_characters` DataFrame using the `append()` method.

In [29]:
hp_characters.append(slytherin)

  hp_characters.append(slytherin)


Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood
...,...,...,...,...,...,...,...
89,Augustus Rookwood,Male,Unspeakable,Slytherin,,Human,Pure-blood or half-blood
90,Antonin Dolohov,Male,Death Eater,Slytherin,,Human,Pure-blood or Half-blood
91,Corban Yaxley,Male,Head of the Department of Magical Law Enforcement,Slytherin,,Human,Pure-blood or Half-blood
123,Albus Severus Potter,Male,Student,Slytherin,?,Human,Half-blood


**Exercise**: Add the `ravenclaw` data to `hp_characters` using the `concat` method.

In [30]:
hp_characters = pd.concat([hp_characters, ravenclaw ])
hp_characters

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood
5,Fred Weasley,Male,Student,Gryffindor,?,Human,Pure-blood
6,George Weasley,Male,Student,Gryffindor,?,Human,Pure-blood
7,Ginevra (Ginny) Molly Weasley,Female,Student,Gryffindor,Horse,Human,Pure-blood
8,Dean Thomas,Male,Student,Gryffindor,?,Human,Muggle-born
9,Seamus Finnigan,Male,Student,Gryffindor,Fox,Human,Half-blood


Let's now take a look at the last Hogwarts house, Hufflepuff.

In [31]:
# view hufflepuff data
hufflepuff

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
76,Helga Hufflepuff,Female,Founder of Hufflepuff,Hufflepuff,?,Human,Pure-blood or half-blood,Food-related Charms. Many traditional Hogwarts...
77,Fat Friar,Male,Hufflepuff House Ghost,Hufflepuff,?,Ghost,,Curing peasants of the pox
79,Nymphadora Tonks,Female,Auror,Hufflepuff,"Jack rabbit, Wolf",Human,Half-blood,"Talented Auror, Metamorphmagus"
80,Pomona Sprout,Female,Professor of Herbology | Head of Hufflepuff House,Hufflepuff,Non-corporeal,Human,Pure-blood or half-blood,Herbology
81,Newton Scamander,Male,Employee in the Beast Division of the Departme...,Hufflepuff,?,Human,Pure-blood or half-blood,"Magizoology, Order of Merlin, Second Class"
139,Dr. Greg Longo,Male,Professor,Hufflepuff,Eagle,Human,Muggle-born,Can charm others using Muggle statistics
140,Dr. Mike Morabito,Male,Professor,Hufflepuff,Eagle,Human,Muggle-born,Builds robots to fight Voldemort
82,Cedric Diggory,Male,Student,Hufflepuff,?,Human,Pure-blood,Skilled Seeker
83,Justin Finch-Fletchley,Male,Student,Hufflepuff,Non-corporeal,Human,Muggle-born,
84,Zacharias Smith,Male,Student,Hufflepuff,?,Human,Pure-blood or half-blood,Chaser


Notice that this DataFrame has an extra attribute, `Skills`. Also, if you look at `ID` 139 and 140, there appears to be some bad data in the file. We'll clean that up in a moment. Let's see what happens when we add this data to the new DataFrame.

**Exercise:** Add `hufflepuff` to the `hp_characters` DataFrame. What happens with the `Skill` column?

In [32]:
hp_characters = pd.concat([hp_characters, hufflepuff ])
hp_characters

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,
...,...,...,...,...,...,...,...,...
85,Hannah Abbott,Female,Student,Hufflepuff,Non-corporeal,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
86,Ernest Macmillan,Male,Student,Hufflepuff,Boar,Human,Pure-blood,"Revising, being a Prefect, getting the wrong e..."
87,Susan Bones,Female,Student,Hufflepuff,?,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
112,Edgar Bones,Male,,Hufflepuff,?,Human,Pure-blood or half-blood,Said to be a great wizard


Notice that there is now an extra column titled `Skills`. Pandas automatically added missing values for the instances in the original data since there was not a `Skill` column. Remember that the concat method uses an outer join when combining data.

Now what if instead, we wanted to pull together all four separate dataframes into a new dataframe at one time. Again, there are multiple ways to do this.

**Exercise:** Create a new DataFrame titled `hp_characters2` and add all four DataFrames (`gryffindor`, `slytherin`, `ravenclaw`, and `hufflepuff`) at one time. Try to combine these datasets in one line of code.

In [33]:
hp_characters2 = pd.concat([gryffindor, slytherin, ravenclaw, hufflepuff])
hp_characters2

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,
...,...,...,...,...,...,...,...,...
85,Hannah Abbott,Female,Student,Hufflepuff,Non-corporeal,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
86,Ernest Macmillan,Male,Student,Hufflepuff,Boar,Human,Pure-blood,"Revising, being a Prefect, getting the wrong e..."
87,Susan Bones,Female,Student,Hufflepuff,?,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
112,Edgar Bones,Male,,Hufflepuff,?,Human,Pure-blood or half-blood,Said to be a great wizard


**Exercise:** Combine all four datasets into a new dataframe called `hp_characters3` using a new index. 

In [34]:
hp_characters3 = pd.concat([gryffindor, slytherin, ravenclaw, hufflepuff], ignore_index=True)
hp_characters3

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,
...,...,...,...,...,...,...,...,...
98,Hannah Abbott,Female,Student,Hufflepuff,Non-corporeal,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
99,Ernest Macmillan,Male,Student,Hufflepuff,Boar,Human,Pure-blood,"Revising, being a Prefect, getting the wrong e..."
100,Susan Bones,Female,Student,Hufflepuff,?,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
101,Edgar Bones,Male,,Hufflepuff,?,Human,Pure-blood or half-blood,Said to be a great wizard


### Dropping Instances

#### Student Practice
Try to perform the following tasks on the indicated dataset. Then check your answers as I walk through the solutions. 

Using the `hp_characters` dataframe we created, let's practice removing instances from the data.

First, we will remove the two incorrect names that were listed in the `Hufflepuff` house data. Let's view the incorrect names by using their index.

In [35]:
# view incorrect names in dataset
hp_characters.loc[[139,140]]

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
139,Dr. Greg Longo,Male,Professor,Hufflepuff,Eagle,Human,Muggle-born,Can charm others using Muggle statistics
140,Dr. Mike Morabito,Male,Professor,Hufflepuff,Eagle,Human,Muggle-born,Builds robots to fight Voldemort


**Exercise:** Remove Dr. Longo from the dataset using his `ID` number.

In [36]:
hp_characters =  hp_characters.drop(139, axis=0)
hp_characters

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,
...,...,...,...,...,...,...,...,...
85,Hannah Abbott,Female,Student,Hufflepuff,Non-corporeal,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
86,Ernest Macmillan,Male,Student,Hufflepuff,Boar,Human,Pure-blood,"Revising, being a Prefect, getting the wrong e..."
87,Susan Bones,Female,Student,Hufflepuff,?,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
112,Edgar Bones,Male,,Hufflepuff,?,Human,Pure-blood or half-blood,Said to be a great wizard


In [37]:
# view other incorrect entry
hp_characters[hp_characters['Patronus'] == 'Eagle']

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
140,Dr. Mike Morabito,Male,Professor,Hufflepuff,Eagle,Human,Muggle-born,Builds robots to fight Voldemort


Dr. Longo is now removed from the dataset. Now, let's remove Dr. Morabito by using his name.

**Exercise:** Remove Dr. Morabito from the dataset but this time using his name. Hint: One way is to do this by filtering your data and saving this filtered data as the DataFrame.

In [38]:
hp_characters = hp_characters[hp_characters['Name'] != 'Dr. Mike Morabito']
hp_characters

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,
...,...,...,...,...,...,...,...,...
85,Hannah Abbott,Female,Student,Hufflepuff,Non-corporeal,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
86,Ernest Macmillan,Male,Student,Hufflepuff,Boar,Human,Pure-blood,"Revising, being a Prefect, getting the wrong e..."
87,Susan Bones,Female,Student,Hufflepuff,?,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army"
112,Edgar Bones,Male,,Hufflepuff,?,Human,Pure-blood or half-blood,Said to be a great wizard


In [39]:
# confirm that entry was removed
hp_characters[hp_characters['Patronus'] == 'Eagle']

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1


Now that we have cleaned those instances, there are some other instances that we need to remove. Take a look at the following:

In [40]:
hp_characters[hp_characters['Patronus'] == 'Lion']

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
141,Tywin Lannister,Male,Game of Thrones Character,Gryffindor,Lion,Human,Muggle-born,
142,Cersei Lannister,Female,Game of Thrones Character,Gryffindor,Lion,Human,Muggle-born,
143,Jaime Lannister,Male,Game of Thrones Character,Gryffindor,Lion,Human,Muggle-born,
144,Tyrion Lannister,Male,Game of Thrones Character,Gryffindor,Lion,Human,Muggle-born,
145,Joffrey Baratheon,Male,Game of Thrones Character,Gryffindor,Lion,Human,Muggle-born,


It looks like someone accidentally included Game of Thrones names in the dataset. Let's remove them all at once (there are a number of different ways to do this).

**Exercise:** Remove all instances of Game of Thrones characters from the dataset. Use `inplace=True`.

In [41]:
hp_characters.drop([141,142,143,144,145], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  hp_characters.drop([141,142,143,144,145], inplace=True)


In [42]:
# or this way...
#hp_characters = hp_characters[hp_characters['Job'] != 'Game of Thrones Character']
#hp_characters

Remember that by using `inplace=True`, you can modify the dataframe without doing an assignment. However, you should be careful when using it. See the links below.

https://stackoverflow.com/questions/45570984/in-pandas-is-inplace-true-considered-harmful-or-not

https://github.com/pandas-dev/pandas/issues/16529

### Adding Columns
So far, we have been adding or removing rows from our DataFrame. Now, let's practice adding columns. 

#### Join
The [join()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html) method joins columns of another DataFrame either on index or on a key column. Let's setup two new DataFrames to check this out.

In [43]:
# create sample DataFrame
df5 = pd.DataFrame({'Animal': ['cat','dog','horse','cow','fox'],
                   'Color': ['white','brown','grey','black','brown']})
df5

Unnamed: 0,Animal,Color
0,cat,white
1,dog,brown
2,horse,grey
3,cow,black
4,fox,brown


In [44]:
# second sample DataFrame
df6 = pd.DataFrame({'Sound': ['meow','bark','neigh','moo',np.nan]})
df6

Unnamed: 0,Sound
0,meow
1,bark
2,neigh
3,moo
4,


In [45]:
# join the two DataFrames based on index
df5.join(df6)

Unnamed: 0,Animal,Color,Sound
0,cat,white,meow
1,dog,brown,bark
2,horse,grey,neigh
3,cow,black,moo
4,fox,brown,


In [46]:
# difference between using append
df5.append(df6)

  df5.append(df6)


Unnamed: 0,Animal,Color,Sound
0,cat,white,
1,dog,brown,
2,horse,grey,
3,cow,black,
4,fox,brown,
0,,,meow
1,,,bark
2,,,neigh
3,,,moo
4,,,


In [47]:
# using concat, axis=1
pd.concat([df5,df6], axis=1)

Unnamed: 0,Animal,Color,Sound
0,cat,white,meow
1,dog,brown,bark
2,horse,grey,neigh
3,cow,black,moo
4,fox,brown,


In [48]:
# third sample DataFrame but with slightly different index
df7 = pd.DataFrame({'Size': ['small','medium','large','large','huge']}, index=[0,1,2,3,5])
df7

Unnamed: 0,Size
0,small
1,medium
2,large
3,large
5,huge


In [49]:
# notice index 4 and missing index 5
df5.join(df7)

Unnamed: 0,Animal,Color,Size
0,cat,white,small
1,dog,brown,medium
2,horse,grey,large
3,cow,black,large
4,fox,brown,


In [50]:
# using how='right'
df5.join(df7, how='right')

Unnamed: 0,Animal,Color,Size
0,cat,white,small
1,dog,brown,medium
2,horse,grey,large
3,cow,black,large
5,,,huge


#### Merge

See the following regarding the [merge()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) method from the `Python Data Science Handbook`:

> One essential feature offered by Pandas is its high-performance, in-memory join and merge operations. If you have ever worked with databases, you should be familiar with this type of data interaction. The main interface for this is the pd.merge function

> The behavior implemented in `pd.merge()` is a subset of what is known as relational algebra, which is a formal set of rules for manipulating relational data, and forms the conceptual foundation of operations available in most databases. The strength of the relational algebra approach is that it proposes several primitive operations, which become the building blocks of more complicated operations on any dataset. With this lexicon of fundamental operations implemented efficiently in a database or other program, a wide range of fairly complicated composite operations can be performed. Pandas implements several of these fundamental building-blocks in the `pd.merge()` function and the related `join()` method of Series and Dataframes.

In [51]:
# create sample DataFrame
df1 = pd.DataFrame({'lkey': ['fizz', 'buzz', 'fizzbuzz', 'fizz'],
                    'value': [1, 2, 3, 5]})

df1 

Unnamed: 0,lkey,value
0,fizz,1
1,buzz,2
2,fizzbuzz,3
3,fizz,5


In [52]:
# create other sample DataFrame
df2 = pd.DataFrame({'rkey': ['fizz', 'buzz', 'fizzbuzz', 'fizz'],
                    'value': [5, 6, 7, 8]})
df2

Unnamed: 0,rkey,value
0,fizz,5
1,buzz,6
2,fizzbuzz,7
3,fizz,8


In [53]:
# merge on 'lkey' and 'rkey' columns
df1.merge(df2, left_on='lkey', right_on='rkey')

Unnamed: 0,lkey,value_x,rkey,value_y
0,fizz,1,fizz,5
1,fizz,1,fizz,8
2,fizz,5,fizz,5
3,fizz,5,fizz,8
4,buzz,2,buzz,6
5,fizzbuzz,3,fizzbuzz,7


In [54]:
# specifying suffixes to overlapping columns
df1.merge(df2, left_on='lkey', right_on='rkey',
          suffixes=('_left', '_right'))

Unnamed: 0,lkey,value_left,rkey,value_right
0,fizz,1,fizz,5
1,fizz,1,fizz,8
2,fizz,5,fizz,5
3,fizz,5,fizz,8
4,buzz,2,buzz,6
5,fizzbuzz,3,fizzbuzz,7


In [55]:
# raise an exception if the DataFrames have any overlapping columns.
df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))

ValueError: columns overlap but no suffix specified: Index(['value'], dtype='object')

In [56]:
# let's look at another example
df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
                    'data1': range(7)})
df2 = pd.DataFrame({'key': ['a', 'b', 'd'],
                             'data2': range(3)})

In [57]:
df1

Unnamed: 0,key,data1
0,b,0
1,b,1
2,a,2
3,c,3
4,a,4
5,a,5
6,b,6


In [58]:
df2

Unnamed: 0,key,data2
0,a,0
1,b,1
2,d,2


In [59]:
# this is an example of a many-to-one join
# inner join (by default)
pd.merge(df1, df2)

Unnamed: 0,key,data1,data2
0,b,0,1
1,b,1,1
2,b,6,1
3,a,2,0
4,a,4,0
5,a,5,0


Notice that for every "b" there was a "1" added to the `data2` column and a "0" added for every "a". It also doesn't include "c" from df1 and "d" from df2.

I didn't specify which column to join on so Pandas uses the overlapping column names as the keys. Although it's good practice to specify (`on='key')`

In [60]:
# left outer join
pd.merge(df1, df2, how='left')

Unnamed: 0,key,data1,data2
0,b,0,1.0
1,b,1,1.0
2,a,2,0.0
3,c,3,
4,a,4,0.0
5,a,5,0.0
6,b,6,1.0


In [61]:
# right outer join
pd.merge(df1, df2, how='right')

Unnamed: 0,key,data1,data2
0,a,2.0,0
1,a,4.0,0
2,a,5.0,0
3,b,0.0,1
4,b,1.0,1
5,b,6.0,1
6,d,,2


In [62]:
# full outer join
pd.merge(df1, df2, how='outer')

Unnamed: 0,key,data1,data2
0,b,0.0,1.0
1,b,1.0,1.0
2,b,6.0,1.0
3,a,2.0,0.0
4,a,4.0,0.0
5,a,5.0,0.0
6,c,3.0,
7,d,,2.0


In [63]:
# creates the cartesian product from both frames
# preserves the order of the left keys
pd.merge(df1, df2, how='cross', suffixes=('_left','_right'))

Unnamed: 0,key_left,data1,key_right,data2
0,b,0,a,0
1,b,0,b,1
2,b,0,d,2
3,b,1,a,0
4,b,1,b,1
5,b,1,d,2
6,a,2,a,0
7,a,2,b,1
8,a,2,d,2
9,c,3,a,0


In [64]:
# create two simple DataFrames for card suits and card numbers
suits = pd.DataFrame({'suit': ['clubs','hearts','diamonds','spades']
                       })

numbers = pd.DataFrame({'card_number': ['A','2','3','4','5','6','7','8','9','10','J','Q','K']
                       })

In [65]:
suits

Unnamed: 0,suit
0,clubs
1,hearts
2,diamonds
3,spades


In [66]:
numbers

Unnamed: 0,card_number
0,A
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9,10


In [67]:
# what are all the different combination of suits and numbers
deck = pd.merge(suits, numbers, how='cross')
deck

Unnamed: 0,suit,card_number
0,clubs,A
1,clubs,2
2,clubs,3
3,clubs,4
4,clubs,5
5,clubs,6
6,clubs,7
7,clubs,8
8,clubs,9
9,clubs,10


#### Student Practice
<u>Finish the rest of the notebook yourself.</u> Then check your answers as I walk through the solutions. 

Let's now use `join` to add the correct wand to each Harry Potter character.

In [68]:
# first, view the wand data
wand.head()

Unnamed: 0_level_0,Wand
ID,Unnamed: 1_level_1
0,"11"" Holly phoenix feather"
1,"12"" Ash unicorn tail hair"
2,"10¾"" vine wood dragon heartstring"
3,"15"" Elder Thestral tail hair core"
4,"13"" Cherry unicorn hair"


In [69]:
wand.shape

(100, 1)

The wand data contains the specific wand descriptions for each character in the Harry Potter dataset listed in ID order.

**Exercise**: Add the `wand` data as a column in the `hp_characters` dataset using the `join()` method.

In [70]:
hp_characters.shape

(68, 8)

In [71]:
#left join is default
hp_characters = hp_characters.join(wand)
hp_characters

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills,Wand
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,,"11"" Holly phoenix feather"
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,,"12"" Ash unicorn tail hair"
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,,"10¾"" vine wood dragon heartstring"
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,,"15"" Elder Thestral tail hair core"
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,,"13"" Cherry unicorn hair"
...,...,...,...,...,...,...,...,...,...
85,Hannah Abbott,Female,Student,Hufflepuff,Non-corporeal,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army",Unknown
86,Ernest Macmillan,Male,Student,Hufflepuff,Boar,Human,Pure-blood,"Revising, being a Prefect, getting the wrong e...",Unknown
87,Susan Bones,Female,Student,Hufflepuff,?,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army",Unknown
112,Edgar Bones,Male,,Hufflepuff,?,Human,Pure-blood or half-blood,Said to be a great wizard,Unknown


Now let's take a look at the `color` dataframe. It lists the hair and eye color for each Harry Potter character.

In [72]:
# view color
color.head()

Unnamed: 0_level_0,Name,Hair colour,Eye colour
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Harry James Potter,Black,Bright green
1,Ronald Bilius Weasley,Red,Blue
2,Hermione Jean Granger,Brown,Brown
3,Albus Percival Wulfric Brian Dumbledore,Silver| formerly auburn,Blue
4,Neville Longbottom,Blond,


In [73]:
color.shape

(139, 3)

**Exercise:** Merge the `color` data with the `hp_characters` DataFrame using the `merge()` method.

In [74]:
#inner join default
hp_characters = pd.merge(hp_characters, color)     #hp_characters.merge(color)
hp_characters

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills,Wand,Hair colour,Eye colour
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,,"11"" Holly phoenix feather",Black,Bright green
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,,"12"" Ash unicorn tail hair",Red,Blue
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,,"10¾"" vine wood dragon heartstring",Brown,Brown
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,,"15"" Elder Thestral tail hair core",Silver| formerly auburn,Blue
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,,"13"" Cherry unicorn hair",Blond,
...,...,...,...,...,...,...,...,...,...,...,...
63,Hannah Abbott,Female,Student,Hufflepuff,Non-corporeal,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army",Unknown,Blonde,Brown
64,Ernest Macmillan,Male,Student,Hufflepuff,Boar,Human,Pure-blood,"Revising, being a Prefect, getting the wrong e...",Unknown,Blond,
65,Susan Bones,Female,Student,Hufflepuff,?,Human,Half-blood,"Defensive spells, learned with Dumbledore's Army",Unknown,,
66,Edgar Bones,Male,,Hufflepuff,?,Human,Pure-blood or half-blood,Said to be a great wizard,Unknown,Reddish-brown,Grey


There is also data from non-Hogwart's House characters (ie those not in Gryffindor, Slytherin, Raveclaw or Hufflepuff) but it has been separated into two different datasets. Let's take a look at them now.

In [75]:
other1.head()

Unnamed: 0_level_0,Name,Gender,Job,House
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
70,Cuthbert Binns,Male,Professor of History of Magic,
71,Barty Crouch Jr.,Male,Professor of Defence Against the Dark Arts (as...,
72,Charity Burbage,Female,Professor of Muggle Studies,
73,Firenze,Male,Professor of Divination,
92,Igor Karkaroff,Male,Headmaster of Durmstrang Institute,


In [76]:
other2.head()

Unnamed: 0_level_0,Given Name,Patronus,Species,Blood status,Wand,Eye colour,Hair colour
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
70,Cuthbert Binns,Unknown,Ghost,,Unknown,Black,White (balding)
71,Barty Crouch Jr.,Unknown,Human,Pure-blood or half-blood,Unknown,"Pale, freckled",Straw blond
72,Charity Burbage,Non-corporeal,Human,Pure-blood or half-blood,Unknown,,Blonde
73,Firenze,,Centaur,,,Astonishingly blue,White-blond
92,Igor Karkaroff,,Human,Pure-blood or Half-blood,Unknown,,Silver| formerly black


**Exercise:** Using `merge()` combine the `other1` and `other2` data into a new dataframe called `other_characters` keeping the respective `ID` for each instance. (check the Pandas documentation if you need help)

In [77]:
other_characters = pd.merge(other1, other2, left_index=True, right_index=True)
other_characters

Unnamed: 0_level_0,Name,Gender,Job,House,Given Name,Patronus,Species,Blood status,Wand,Eye colour,Hair colour
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
70,Cuthbert Binns,Male,Professor of History of Magic,,Cuthbert Binns,Unknown,Ghost,,Unknown,Black,White (balding)
71,Barty Crouch Jr.,Male,Professor of Defence Against the Dark Arts (as...,,Barty Crouch Jr.,Unknown,Human,Pure-blood or half-blood,Unknown,"Pale, freckled",Straw blond
72,Charity Burbage,Female,Professor of Muggle Studies,,Charity Burbage,Non-corporeal,Human,Pure-blood or half-blood,Unknown,,Blonde
73,Firenze,Male,Professor of Divination,,Firenze,,Centaur,,,Astonishingly blue,White-blond
92,Igor Karkaroff,Male,Headmaster of Durmstrang Institute,,Igor Karkaroff,,Human,Pure-blood or Half-blood,Unknown,,Silver| formerly black
93,Kingsley Shacklebolt,Male,Auror | Minister for Magic,,Kingsley Shacklebolt,Lynx,Human,Pure-blood,Unknown,Black,Bald
94,Alastor Moody,Male,Auror,,Alastor Moody,Non-corporeal,Human,Pure-blood,Unknown,"One dark, one electric blue",Grey
95,Alice Longbottom,Female,Auror,,Alice Longbottom,Unknown,Human,Pure-blood,Unknown,,Blonde
97,Rufus Scrimgeour,Male,Head of Auror Office| Minister for Magic,,Rufus Scrimgeour,Non-corporeal,Human,,Unknown,Yellowish,Tawny
98,Cornelius Oswald Fudge,Male,Minister for Magic,,Cornelius Oswald Fudge,Non-corporeal,Human,Pure-blood or Half-blood,Unknown,,Grey


Notice that both `Name` and `Given Name` is in the new DataFrame. Since this is providing duplicate information you could now remove one of the columns. 

**Bonus Exercise:** Merge the `other1` and `other2` data the same as above but also removing the `Given Name` field. Try to do this in one line of code.

In [78]:
other_characters = pd.merge(other1, other2, left_index=True, right_index=True).drop('Given Name', axis=1)
other_characters

Unnamed: 0_level_0,Name,Gender,Job,House,Patronus,Species,Blood status,Wand,Eye colour,Hair colour
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
70,Cuthbert Binns,Male,Professor of History of Magic,,Unknown,Ghost,,Unknown,Black,White (balding)
71,Barty Crouch Jr.,Male,Professor of Defence Against the Dark Arts (as...,,Unknown,Human,Pure-blood or half-blood,Unknown,"Pale, freckled",Straw blond
72,Charity Burbage,Female,Professor of Muggle Studies,,Non-corporeal,Human,Pure-blood or half-blood,Unknown,,Blonde
73,Firenze,Male,Professor of Divination,,,Centaur,,,Astonishingly blue,White-blond
92,Igor Karkaroff,Male,Headmaster of Durmstrang Institute,,,Human,Pure-blood or Half-blood,Unknown,,Silver| formerly black
93,Kingsley Shacklebolt,Male,Auror | Minister for Magic,,Lynx,Human,Pure-blood,Unknown,Black,Bald
94,Alastor Moody,Male,Auror,,Non-corporeal,Human,Pure-blood,Unknown,"One dark, one electric blue",Grey
95,Alice Longbottom,Female,Auror,,Unknown,Human,Pure-blood,Unknown,,Blonde
97,Rufus Scrimgeour,Male,Head of Auror Office| Minister for Magic,,Non-corporeal,Human,,Unknown,Yellowish,Tawny
98,Cornelius Oswald Fudge,Male,Minister for Magic,,Non-corporeal,Human,Pure-blood or Half-blood,Unknown,,Grey


**Exercise:** Add `other_characters` to the `hp_characters` DataFrame.

In [79]:
hp_characters = hp_characters.append(other_characters)
hp_characters

  hp_characters = hp_characters.append(other_characters)


Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills,Wand,Hair colour,Eye colour
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,,"11"" Holly phoenix feather",Black,Bright green
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,,"12"" Ash unicorn tail hair",Red,Blue
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,,"10¾"" vine wood dragon heartstring",Brown,Brown
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,,"15"" Elder Thestral tail hair core",Silver| formerly auburn,Blue
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,,"13"" Cherry unicorn hair",Blond,
...,...,...,...,...,...,...,...,...,...,...,...
134,Wilhelmina Grubbly-Plank,Female,Substitute professor of Care of Magical Creatu...,,Non-corporeal,Human,,,Unknown,Grey,
135,Fenrir Greyback,Male,,,,Werewolf,,,Unknown,Grey,
136,Gellert Grindelwald,Male,Revolutionary leader(c. 1920s[6]  1945),,,Human,Pure-blood or half-blood,,"15"", Elder, Thestral tail hair core",Blond,Blue
137,Dobby,Male,"Malfoy family's house-elf (? - 1993),\nHogwart...",,,House elf,,,,Green,


**Bonus Exercise**:

After merging all the data, we noticed a **big mistake**! Hagrid, one of our favorite characters, is missing from the data. Here is his information:

- **Name**: Rubeus Hagrid
- **Gender**: Male
- **Job**: Keeper of Keys and Grounds | Professor of Care of Magical Creatures
- **House**: Gryffindor
- **Patronus**: None
- **Species**: Half-Human/Half-Giant
- **Blood status**: Part-Human (Half-giant)
- **Wand**: 16" Oak unknown core
- **Hair colour**: Black
- **Eye colour**: Black

Add Hagrid to the `hp_characters` dataset. (try to Google how to do this if needed, then watch the solutions video for one option)

In [86]:
hp_characters.tail()

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills,Wand,Hair colour,Eye colour
103,Fenrir Greyback,Male,,,,Werewolf,,,Unknown,Grey,
104,Gellert Grindelwald,Male,Revolutionary leader(c. 1920s[6]  1945),,,Human,Pure-blood or half-blood,,"15"", Elder, Thestral tail hair core",Blond,Blue
105,Dobby,Male,"Malfoy family's house-elf (? - 1993),\nHogwart...",,,House elf,,,,Green,
106,Kreacher,Male,"\nBlack family's house-elf (?-1996), \nHarry P...",,,House elf,,,,White,
107,Rubeus Hagrid,Male,Keeper of Keys and Grounds | Professor of Care...,Gryffindor,,Half-Human/Half-Giant,Part-Human (Half-giant),,"16"" Oak unknown core",Black,Black


In [87]:
hp_characters.shape

(108, 11)

In [83]:
hagrid_row = { 
'Name': 'Rubeus Hagrid',
'Gender': 'Male',
'Job': 'Keeper of Keys and Grounds | Professor of Care of Magical Creatures',
'House': 'Gryffindor',
'Patronus': None,
'Species': 'Half-Human/Half-Giant',
'Blood status': 'Part-Human (Half-giant)',
'Wand': '16" Oak unknown core',
'Hair colour': 'Black',
'Eye colour': 'Black' }


hp_characters = hp_characters.append(hagrid_row, ignore_index=True)
hp_characters

  hp_characters = hp_characters.append(hagrid_row, ignore_index=True)


Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills,Wand,Hair colour,Eye colour
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,,"11"" Holly phoenix feather",Black,Bright green
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,,"12"" Ash unicorn tail hair",Red,Blue
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,,"10¾"" vine wood dragon heartstring",Brown,Brown
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,,"15"" Elder Thestral tail hair core",Silver| formerly auburn,Blue
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,,"13"" Cherry unicorn hair",Blond,
...,...,...,...,...,...,...,...,...,...,...,...
103,Fenrir Greyback,Male,,,,Werewolf,,,Unknown,Grey,
104,Gellert Grindelwald,Male,Revolutionary leader(c. 1920s[6]  1945),,,Human,Pure-blood or half-blood,,"15"", Elder, Thestral tail hair core",Blond,Blue
105,Dobby,Male,"Malfoy family's house-elf (? - 1993),\nHogwart...",,,House elf,,,,Green,
106,Kreacher,Male,"\nBlack family's house-elf (?-1996), \nHarry P...",,,House elf,,,,White,


Next, view the `dead_characters` DataFrame which includes the date of death for any deceased characters.

In [88]:
# view dataframe
dead_characters.head()

Unnamed: 0_level_0,Name,Death
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
3,Albus Percival Wulfric Brian Dumbledore,"30 June, 1997"
5,Fred Weasley,"2 May, 1998"
10,Lily J. Potter,"31 October, 1981"
11,James Potter,"31 October, 1981"
12,Sirius Black,"18 June, 1996"


**Exercise:** Create a DataFrame called `hp_dead_characters` using the `hp_characters` and `dead_characters` DataFrames. This new DataFrame should only include characters that are deceased. Don't worry about preserving index numbers.

In [91]:
hp_dead_characters = pd.merge(hp_characters,dead_characters, how='right' )  
hp_dead_characters

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills,Wand,Hair colour,Eye colour,Death
0,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,,"15"" Elder Thestral tail hair core",Silver| formerly auburn,Blue,"30 June, 1997"
1,Fred Weasley,Male,Student,Gryffindor,?,Human,Pure-blood,,Unknown,Red,Brown,"2 May, 1998"
2,Lily J. Potter,Female,,Gryffindor,Doe,Human,Muggle-born,,"10¼"" Willow unknown core",Auburn,Bright green,"31 October, 1981"
3,James Potter,Male,,Gryffindor,Stag,Human,Pure-blood,,"11"" Mahogany unknown core",Black,Hazel,"31 October, 1981"
4,Sirius Black,Male,,Gryffindor,?,Human,Pure-blood,,Unknown,Black,Grey,"18 June, 1996"
5,Remus John Lupin,Male,Professor of Defence Against the Dark Arts,Gryffindor,Wolf,Werewolf,Half-blood,,"10¼"" Cypress unicorn hair",Light brown flecked with grey,Green,"2 May, 1998"
6,Peter Pettigrew,Male,The Servant of Lord Voldemort,Gryffindor,,Human,Half-blood or pure-blood,,"9¼"" Chestnut dragon heartstring",Colourless and balding,Blue,Late March 1998
7,Lavender Brown,Female,Student,Gryffindor,?,Human,Pure-blood,,Unknown,Blond,Blue,"2 May, 1998"
8,Colin Creevey,Male,Student,Gryffindor,?,Human,Muggle-born,,Unknown,Mousy,,"2 May, 1998"
9,Quirinus Quirrell,Male,Defence Against the Dark Arts(1991-1992),Ravenclaw,Non-corporeal,Human,Half-blood,,"9"" Alder unicorn hair bendy",,,4 June 1992


Next, look at the `loyalty` DataFrame. This includes a list of all characters with stated allegiance to Voldemort.

In [92]:
# view loyalty
loyalty.head()

Unnamed: 0_level_0,Name,Loyalty
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
14,Peter Pettigrew,Lord Voldemort| Death Eaters
31,Quirinus Quirrell,Lord Voldemort
48,Bellatrix Lestrange,Lord Voldemort | Death Eaters
51,Lucius Malfoy,Lord Voldemort | Death Eaters
52,Narcissa Malfoy,Lord Voldemort | Death Eaters


**Exercise:** Create a DataFrame called `voldemort_army` that includes only characters loyal to Voldemort. Again, you don't need to preserve index numbers.

In [97]:
voldemort_army = pd.merge(hp_characters, loyalty, how='right')
voldemort_army

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills,Wand,Hair colour,Eye colour,Loyalty
0,Peter Pettigrew,Male,The Servant of Lord Voldemort,Gryffindor,,Human,Half-blood or pure-blood,,"9¼"" Chestnut dragon heartstring",Colourless and balding,Blue,Lord Voldemort| Death Eaters
1,Quirinus Quirrell,Male,Defence Against the Dark Arts(1991-1992),Ravenclaw,Non-corporeal,Human,Half-blood,,"9"" Alder unicorn hair bendy",,,Lord Voldemort
2,Bellatrix Lestrange,,,,,,,,,,,Lord Voldemort | Death Eaters
3,Lucius Malfoy,,,,,,,,,,,Lord Voldemort | Death Eaters
4,Narcissa Malfoy,,,,,,,,,,,Lord Voldemort | Death Eaters
5,Rodolphus Lestrange,,,,,,,,,,,Lord Voldemort | Death Eaters
6,Barty Crouch Jr.,Male,Professor of Defence Against the Dark Arts (as...,,Unknown,Human,Pure-blood or half-blood,,Unknown,Straw blond,"Pale, freckled",Lord Voldemort | Death Eaters
7,Alecto Carrow,,,,,,,,,,,Lord Voldemort | Death Eaters
8,Amycus Carrow,,,,,,,,,,,Lord Voldemort | Death Eaters
9,Walden Macnair,,,,,,,,,,,Lord Voldemort | Death Eaters


Now take a look at the `gryffindor_bloodstatus` and `gryffindor_species` dataframes.

In [98]:
gryffindor_bloodstatus.head()

Unnamed: 0_level_0,Name,Type
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Harry James Potter,Half-blood
1,Ronald Bilius Weasley,Pure-blood
2,Hermione Jean Granger,Muggle-born
3,Albus Percival Wulfric Brian Dumbledore,Half-blood
4,Neville Longbottom,Pure-blood


In [99]:
gryffindor_species.head()

Unnamed: 0_level_0,Name,Gender,Job,House,Type
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Harry James Potter,Male,Student,Gryffindor,Human
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Human
2,Hermione Jean Granger,Female,Student,Gryffindor,Human
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Human
4,Neville Longbottom,Male,Student,Gryffindor,Human


**Exercise:** Merge these two DataFrames together into a new DataFrame called `gryffindor2`. Notice that they both have a column called `Type` that represents different attributes. Add a suffix to the `Type` attributes so we better understand what each attribute represents.

In [100]:
#merge with 2 columns with same name w/o specifying suffix
pd.merge(gryffindor_bloodstatus, gryffindor_species, on='Name').head()

Unnamed: 0,Name,Type_x,Gender,Job,House,Type_y
0,Harry James Potter,Half-blood,Male,Student,Gryffindor,Human
1,Ronald Bilius Weasley,Pure-blood,Male,Student,Gryffindor,Human
2,Hermione Jean Granger,Muggle-born,Female,Student,Gryffindor,Human
3,Albus Percival Wulfric Brian Dumbledore,Half-blood,Male,Headmaster,Gryffindor,Human
4,Neville Longbottom,Pure-blood,Male,Student,Gryffindor,Human


In [102]:
#now mergiing while giving the two columns with the same name a seperate suffix
gryffindor2 = pd.merge(gryffindor_bloodstatus, gryffindor_species, on='Name', suffixes=('_bloodstatus', '_species'))
gryffindor2.head()

Unnamed: 0,Name,Type_bloodstatus,Gender,Job,House,Type_species
0,Harry James Potter,Half-blood,Male,Student,Gryffindor,Human
1,Ronald Bilius Weasley,Pure-blood,Male,Student,Gryffindor,Human
2,Hermione Jean Granger,Muggle-born,Female,Student,Gryffindor,Human
3,Albus Percival Wulfric Brian Dumbledore,Half-blood,Male,Headmaster,Gryffindor,Human
4,Neville Longbottom,Pure-blood,Male,Student,Gryffindor,Human


### Removing Columns

Now that we've had practice adding columns and merging dataframes. Let's practice removing columns.

In [103]:
# refresh DataFrame
hp_characters.head()

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Skills,Wand,Hair colour,Eye colour
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,,"11"" Holly phoenix feather",Black,Bright green
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,,"12"" Ash unicorn tail hair",Red,Blue
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,,"10¾"" vine wood dragon heartstring",Brown,Brown
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,,"15"" Elder Thestral tail hair core",Silver| formerly auburn,Blue
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,,"13"" Cherry unicorn hair",Blond,


**Exercise:** Since the `Skills` column is mostly filled with null values, remove this column from the `hp_characters` DataFrame.

In [105]:
hp_characters = hp_characters.drop('Skills', axis=1)
hp_characters

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Wand,Hair colour,Eye colour
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,"11"" Holly phoenix feather",Black,Bright green
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,"12"" Ash unicorn tail hair",Red,Blue
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,"10¾"" vine wood dragon heartstring",Brown,Brown
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,"15"" Elder Thestral tail hair core",Silver| formerly auburn,Blue
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,"13"" Cherry unicorn hair",Blond,
...,...,...,...,...,...,...,...,...,...,...
103,Fenrir Greyback,Male,,,,Werewolf,,Unknown,Grey,
104,Gellert Grindelwald,Male,Revolutionary leader(c. 1920s[6]  1945),,,Human,Pure-blood or half-blood,"15"", Elder, Thestral tail hair core",Blond,Blue
105,Dobby,Male,"Malfoy family's house-elf (? - 1993),\nHogwart...",,,House elf,,,Green,
106,Kreacher,Male,"\nBlack family's house-elf (?-1996), \nHarry P...",,,House elf,,,White,


### Updating Value in DataFrame

Notice that some values are labeled with a `?`. These are values where the data is unknown...in other words, the books or movies never mention it.

In [106]:
# see Fred and George Weasley's Patronus column as an example
hp_characters.head(10)

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Wand,Hair colour,Eye colour
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,"11"" Holly phoenix feather",Black,Bright green
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,"12"" Ash unicorn tail hair",Red,Blue
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,"10¾"" vine wood dragon heartstring",Brown,Brown
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,"15"" Elder Thestral tail hair core",Silver| formerly auburn,Blue
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,"13"" Cherry unicorn hair",Blond,
5,Fred Weasley,Male,Student,Gryffindor,?,Human,Pure-blood,Unknown,Red,Brown
6,George Weasley,Male,Student,Gryffindor,?,Human,Pure-blood,Unknown,Red,Brown
7,Ginevra (Ginny) Molly Weasley,Female,Student,Gryffindor,Horse,Human,Pure-blood,Unknown,Red,Bright brown
8,Dean Thomas,Male,Student,Gryffindor,?,Human,Muggle-born,Unknown,Black,Brown
9,Seamus Finnigan,Male,Student,Gryffindor,Fox,Human,Half-blood,Unknown,Sandy,


**Exercise:** Change all values of `?` to the string `'Unknown'` to better represent what this data means.

In [108]:
hp_characters = hp_characters.replace('?', 'Uknown')
hp_characters.head(10)

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Wand,Hair colour,Eye colour
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,"11"" Holly phoenix feather",Black,Bright green
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,"12"" Ash unicorn tail hair",Red,Blue
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,"10¾"" vine wood dragon heartstring",Brown,Brown
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,"15"" Elder Thestral tail hair core",Silver| formerly auburn,Blue
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,"13"" Cherry unicorn hair",Blond,
5,Fred Weasley,Male,Student,Gryffindor,Uknown,Human,Pure-blood,Unknown,Red,Brown
6,George Weasley,Male,Student,Gryffindor,Uknown,Human,Pure-blood,Unknown,Red,Brown
7,Ginevra (Ginny) Molly Weasley,Female,Student,Gryffindor,Horse,Human,Pure-blood,Unknown,Red,Bright brown
8,Dean Thomas,Male,Student,Gryffindor,Uknown,Human,Muggle-born,Unknown,Black,Brown
9,Seamus Finnigan,Male,Student,Gryffindor,Fox,Human,Half-blood,Unknown,Sandy,


### Renaming Columns

The data has the British spelling for colour. Since this data will be used in the US, we should rename these columns.

**Exercise:** Rename the word `colour` to `color` in the respective columns for the `hp_characters` dataframe.

In [110]:
hp_characters = hp_characters.rename({'Hair colour': 'Hair color', 'Eye colour': 'Eye color'}, axis=1)
hp_characters

Unnamed: 0,Name,Gender,Job,House,Patronus,Species,Blood status,Wand,Hair color,Eye color
0,Harry James Potter,Male,Student,Gryffindor,Stag,Human,Half-blood,"11"" Holly phoenix feather",Black,Bright green
1,Ronald Bilius Weasley,Male,Student,Gryffindor,Jack Russell terrier,Human,Pure-blood,"12"" Ash unicorn tail hair",Red,Blue
2,Hermione Jean Granger,Female,Student,Gryffindor,Otter,Human,Muggle-born,"10¾"" vine wood dragon heartstring",Brown,Brown
3,Albus Percival Wulfric Brian Dumbledore,Male,Headmaster,Gryffindor,Phoenix,Human,Half-blood,"15"" Elder Thestral tail hair core",Silver| formerly auburn,Blue
4,Neville Longbottom,Male,Student,Gryffindor,Non-corporeal,Human,Pure-blood,"13"" Cherry unicorn hair",Blond,
...,...,...,...,...,...,...,...,...,...,...
103,Fenrir Greyback,Male,,,,Werewolf,,Unknown,Grey,
104,Gellert Grindelwald,Male,Revolutionary leader(c. 1920s[6]  1945),,,Human,Pure-blood or half-blood,"15"", Elder, Thestral tail hair core",Blond,Blue
105,Dobby,Male,"Malfoy family's house-elf (? - 1993),\nHogwart...",,,House elf,,,Green,
106,Kreacher,Male,"\nBlack family's house-elf (?-1996), \nHarry P...",,,House elf,,,White,


### Perform DataFrame Operation for New Column

Let's setup a new DataFrame called `hp_characters_new` that contains information about some of our favorite characters.

In [111]:
# data for new dataframe
data = {'Name': ['Harry Potter', 'Ron Weasley', 'Hermione Granger', 'Ginny Weasley', 'Rubeus Hagrid','Neville Longbottom'], 
            'Job': ['Student', 'Student','Student','Student','Keeper of Keys and Grounds','Student'],
            'House': ['Gryffindor','Gryffindor','Gryffindor','Gryffindor','Gryffindor','Gryffindor'],
            'Birth Year': [1980,1980,1979,1980,1928,1980]}

# create new dataframe with the above data
hp_characters_new = pd.DataFrame(data)
hp_characters_new

Unnamed: 0,Name,Job,House,Birth Year
0,Harry Potter,Student,Gryffindor,1980
1,Ron Weasley,Student,Gryffindor,1980
2,Hermione Granger,Student,Gryffindor,1979
3,Ginny Weasley,Student,Gryffindor,1980
4,Rubeus Hagrid,Keeper of Keys and Grounds,Gryffindor,1928
5,Neville Longbottom,Student,Gryffindor,1980


**Bonus Exercise:** Add a new column called `Age` that tells how old each character would be if they were 'alive' today.

In [123]:
import datetime
current_year = datetime.datetime.now().year

current_age = []

for i in hp_characters_new['Birth Year']:
    current_age.append(current_year - i)

In [124]:
current_age

[42, 42, 43, 42, 94, 42]

In [125]:
hp_characters_new['age'] = current_age
hp_characters_new

Unnamed: 0,Name,Job,House,Birth Year,age
0,Harry Potter,Student,Gryffindor,1980,42
1,Ron Weasley,Student,Gryffindor,1980,42
2,Hermione Granger,Student,Gryffindor,1979,43
3,Ginny Weasley,Student,Gryffindor,1980,42
4,Rubeus Hagrid,Keeper of Keys and Grounds,Gryffindor,1928,94
5,Neville Longbottom,Student,Gryffindor,1980,42


### Other Questions

For further practice with data analysis, here are some additional questions to try to answer on your own.

- How many of the characters are female are there in the `hp_characters` DataFrame? How many of the characters are male?
- How many characters are there of each respective house?
- What are the various species represented in the dataset?
- List the various value counts for blood status.
- What job is the most represented in the data?