# Lab 2: Pandas Overview

**If you are not attending lab, this assignment is due 09/05/2017 at 11:59pm (graded on accuracy)**

Pandas is one of the most widely used Python libraries in data science. In this lab, you will learn commonly used data wrangling operations/tools in Pandas. We aim to give you familiarity with:

* Creating dataframes
* Slicing data frames (ie. selecting rows and columns)
* Filtering data (using boolean arrays)
* Data Aggregation/Grouping dataframes
* Joining tables
* Handling NA/Null values

## Setup

In [None]:
import pandas as pd
import numpy as np

# These lines load the tests.
!pip install -U okpy

from client.api.notebook import Notebook
ok = Notebook('lab02.ok')

import os
auth_refresh = os.path.join(os.path.expanduser('~'), '.config', 'ok', 'auth_refresh')
if os.path.exists(auth_refresh):
    os.remove(auth_refresh)

ok.auth()

## Creating DataFrames & Basic Manipulations

A dataframe is a two-dimensional labeled data structure with columns of potentially different types.

**Method 1: ** You can create a data frame by specifying the columns and values as shown below.

In [2]:
fruit_info = pd.DataFrame(
    data={'fruit': ['apple', 'orange', 'banana', 'raspberry'],
          'color': ['red', 'orange', 'yellow', 'pink']
          })
fruit_info

Unnamed: 0,color,fruit
0,red,apple
1,orange,orange
2,yellow,banana
3,pink,raspberry


**Method 2: ** You can also define a dataframe by specifying the rows like below.

In [3]:
fruit_info2 = pd.DataFrame([("red", "apple"), ("orange", "orange"), ("yellow", "banana"),
                            ("pink", "raspberry")], columns = ["color", "fruit"])
fruit_info2

Unnamed: 0,color,fruit
0,red,apple
1,orange,orange
2,yellow,banana
3,pink,raspberry


### Question 1

You can add a column by `dataframe['new column name'] = [data]`. Please add a column called `rank` to the `fruit_info` table which contains a 1,2,3, or 4 based on your personal preference ordering for each fruit.


In [4]:
...

Ellipsis

In [4]:
#SOLUTION CELL
fruit_info["rank"] = [1, 2, 3, 4]
fruit_info

Unnamed: 0,color,fruit,rank
0,red,apple,1
1,orange,orange,2
2,yellow,banana,3
3,pink,raspberry,4


In [5]:
_ = ok.grade('q01')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.


OAuthException: 

### Question 2

You can obtain the dimensions of a matrix by using the shape attribute `dataframe.shape`. How many rows and columns are in the dataframe you modified above?

In [7]:
num_rows = ...
num_columns = ...

In [8]:
#SOLUTION CELL
num_rows = fruit_info.shape[0]
num_columns = fruit_info.shape[1]

In [9]:
_ = ok.grade('q02')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/BByl8Q
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



### Question 3

Use the `.drop()` method to drop the `rank` column you created.

In [10]:
fruit_info_original = ...

In [11]:
#SOLUTION CELL
fruit_info_original = fruit_info.drop("rank", axis = 1)

In [12]:
_ = ok.grade('q03')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/ERBrZW
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



### Question 4 

Use the `.drop()` method to drop the last row of the `fruit_info_original` table. (Hint: pay attention to the `axis` argument!)

In [13]:
...

Ellipsis

In [14]:
#SOLUTION CELL
fruit_info_original.drop(3, axis = 0, inplace = True)

In [15]:
_ = ok.grade('q04')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Could not save your notebook. Make sure your notebook is saved before sending it to OK!
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/J6RGYK
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



### Question 5

Use the `.rename()` method to rename the columns of `fruit_info_original` so they begin with a capital letter.

In [16]:
...

Ellipsis

In [17]:
#SOLUTION CELL
fruit_info_original.rename(columns = {"color":"Color", "fruit":"Fruit"}, inplace = True)

In [18]:
_ = ok.grade('q05')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/KrRKYG
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



Now that we have learned the basics, we created 3 dataframes below. We will be cleaning and wrangling the following data frames for the remainder of the lab.

In [19]:
popular_songs = pd.DataFrame(
    data={'song name': ['Thinking Out Loud', 'One Dance', 'Sorry', 
                    'Closer', 'Decpasito', 'Lean On'],
          'number of streams': [770, 1011, 828, 678, 500, 909],
         'artist': ["Ed Sheeran", "Drake", "Justin Bieber", "Chainsmokers", "Justin Bieber", "Major Lazer"]
         }
)

top_2017_albums = pd.DataFrame(
    data={'album name': ['Starboy', 'Divide', 'More Life',
                  '24k Magic', 'A Head Full of Dreams', 
                  'A Head Full of Dreams'],
          'artist': ['The Weeknd', 'Ed Sheeran', 'Drake', 'Bruno Mars',
                        'Coldplay', 'Coldplay']}
)

In [20]:
popular_songs

Unnamed: 0,artist,number of streams,song name
0,Ed Sheeran,770,Thinking Out Loud
1,Drake,1011,One Dance
2,Justin Bieber,828,Sorry
3,Chainsmokers,678,Closer
4,Justin Bieber,500,Decpasito
5,Major Lazer,909,Lean On


In [21]:
top_2017_albums

Unnamed: 0,album name,artist
0,Starboy,The Weeknd
1,Divide,Ed Sheeran
2,More Life,Drake
3,24k Magic,Bruno Mars
4,A Head Full of Dreams,Coldplay
5,A Head Full of Dreams,Coldplay


## Slicing Data Frames - selecting rows and columns


### Selection Using Label

**Column Selection** 
To select a column of a `DataFrame` by column label, the safest and fastest way is to use the `.loc` method. General usage looks like `frame.loc[rowname,colname]`. (Reminder that the colon `:` means "everything").  For example, if we want the `color` column of the `ex` data frame, we would use :

- You can also slice across columns. For example, `popular_songs.loc[:, 'first_seen_on':]` would give select the columns `first_seen_on` and the columns after.

- *Alternative:* While `.loc` is invaluable when writing production code, it may be a little too verbose for interactive use. One recommended alternative is the `[]` method, which takes on the form `frame['colname']`.

**Row Selection**
Similarly, if we want to select a row by its label, we can use the same `.loc` method. In this case, the "label" of each row refers to the index (ie. primary key) of the dataframe.

In [22]:
#Example:
top_2017_albums.loc[:, 'album name']

0                  Starboy
1                   Divide
2                More Life
3                24k Magic
4    A Head Full of Dreams
5    A Head Full of Dreams
Name: album name, dtype: object

### Question 6a

Selecting multiple columns is easy.  You just need to supply a list of column names.  Select the `song name` and `number of streams` from the `popular_songs` table.

In [23]:
song_and_streams = ...

In [24]:
#SOLUTION CELL
song_and_streams = popular_songs.loc[:, ['song name', 'number of streams']]
song_and_streams

Unnamed: 0,song name,number of streams
0,Thinking Out Loud,770
1,One Dance,1011
2,Sorry,828
3,Closer,678
4,Decpasito,500
5,Lean On,909


In [25]:
_ = ok.grade('q6a')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Could not save your notebook. Make sure your notebook is saved before sending it to OK!
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/M86OEm
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



As you may have noticed above, the .loc() method is a way to re-order the columns within a dataframe.

### Question 6b

One of the important components of a dataframe is the **index**. An index uniquely defines each row of a dataframe. Notice that the index of the `popular_songs` table is numerical. Since the granularity of the popular_songs dataframe is one row per song, use the `set_index()` method to make `song name` the index of the dataframe. (this will be useful in row selection in the next problem)

In [26]:
...

Ellipsis

In [27]:
#SOLUTION CELL
popular_songs.set_index("song name", inplace = True)

In [28]:
_ = ok.grade('q6b')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/NkRQE2
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



In [29]:
popular_songs

Unnamed: 0_level_0,artist,number of streams
song name,Unnamed: 1_level_1,Unnamed: 2_level_1
Thinking Out Loud,Ed Sheeran,770
One Dance,Drake,1011
Sorry,Justin Bieber,828
Closer,Chainsmokers,678
Decpasito,Justin Bieber,500
Lean On,Major Lazer,909


**Note: ** Now try selecting the `song name` index from the table above - although it looks like an column, it cannot be accessed in the same way as columns. If you would like to turn `song name` back into a column, you can call the `reset_index()` method.

### Question 6c

Using the `.loc()` slicing technique, select the middle 4 rows (and all of the columns) of the `popular_songs` table using the index defined above.

In [30]:
popular_songs_small = ...

In [31]:
#SOLUTION CELL
popular_songs_small = popular_songs.loc["One Dance": "Decpasito", :]

In [32]:
_ = ok.grade('q6c')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/PN8XMA
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



### Selection using position/location

If you want to select rows and columns by position, the Data Frame has an analogous `.iloc` method for integer indexing. General usage looks like `frame.iloc[row position, column position]`. Remember that Python indexing starts at 0. Also remember that you can use : in order to slice across rows and columns like in the previous question.

### Question 7

Select the first 4 rows and first 2 columns of the `popular songs` table.

In [33]:
selected_popular_songs = ...

In [34]:
#SOLUTION CELL
selected_popular_songs = popular_songs.iloc[:4, :2]
selected_popular_songs

Unnamed: 0_level_0,artist,number of streams
song name,Unnamed: 1_level_1,Unnamed: 2_level_1
Thinking Out Loud,Ed Sheeran,770
One Dance,Drake,1011
Sorry,Justin Bieber,828
Closer,Chainsmokers,678


In [35]:
_ = ok.grade('q07')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Could not save your notebook. Make sure your notebook is saved before sending it to OK!
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/R672MV
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



## Filtering Data

### Filtering with boolean arrays

Filtering is the process of removing unwanted material.  In your quest for cleaner data, you will undoubtedly filter your data at some point: whether it be for clearing up cases with missing values, culling out fishy outliers, or analyzing subgroups of your data set.  Note that compound expressions have to be grouped with parentheses. Example usage looks like `df[df[column name] < 5]]`.

For your reference, some commonly used comparison operators are given below.

Symbol | Usage      | Meaning 
------ | ---------- | -------------------------------------
==   | a == b   | Does a equal b?
<=   | a <= b   | Is a less than or equal to b?
>=   | a >= b   | Is a greater than or equal to b?
<    | a < b    | Is a less than b?
&#62;    | a &#62; b    | Is a greater than b?
~    | ~p       | Returns negation of p
&#124; | p &#124; q | p OR q
&    | p & q    | p AND q
^  | p ^ q | p XOR q (exclusive or)

### Question 8
Select the Justin Bieber songs that have over 600 streams. 

In [36]:
filtered_songs = ...

In [37]:
#SOLUTION CELL
filtered_songs = popular_songs[(popular_songs['number of streams']>600) & (popular_songs['artist']=='Justin Bieber')]
filtered_songs

Unnamed: 0_level_0,artist,number of streams
song name,Unnamed: 1_level_1,Unnamed: 2_level_1
Sorry,Justin Bieber,828


In [38]:
_ = ok.grade('q08')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/VO9g5z
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



### Question 9

An often-used operation missing from the above table is a test-of-membership.  The `Series.isin(values)` method returns a boolean array denoting whether each element of `Series` is in `values`.  We can then use the array to subset our data frame. For example, if we wanted to see which rows of `number of streams` had values in $\{500,1011\}$, we would use : 

`popular_songs[popular_songs['number of streams'].isin([500,1011])]`

Select the only rows in `popular_songs` where the artist is in the `top_2017_albums` dataframe.

In [39]:
top_2017_songs = ...

In [40]:
#SOLUTION CELL
top_2017_songs = popular_songs[popular_songs["artist"].isin(top_2017_albums["artist"])]
top_2017_songs

Unnamed: 0_level_0,artist,number of streams
song name,Unnamed: 1_level_1,Unnamed: 2_level_1
Thinking Out Loud,Ed Sheeran,770
One Dance,Drake,1011


In [41]:
_ = ok.grade('q09')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Could not save your notebook. Make sure your notebook is saved before sending it to OK!
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/W6RkwE
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



## Data Aggregration (Grouping Data Frames)

### Question 10
To count the number of instances of a value in a `Series`, we can use the `value_counts()` method. Count the number of instances of each artist in `popular_songs`.

In [42]:
song_counts = ...

In [43]:
#SOLUTION CELL
song_counts = popular_songs["artist"].value_counts()
song_counts

Justin Bieber    2
Major Lazer      1
Drake            1
Chainsmokers     1
Ed Sheeran       1
Name: artist, dtype: int64

In [44]:
_ = ok.grade('q10')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/XDRm7v
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



### Question 11

A more versatile way to aggregate data is to use the `.groupby()` function. Find the total number of streams per each artist in the `popular_songs` table.

In [45]:
grouped_songs = ...

In [46]:
#SOLUTION CELL
grouped_songs = popular_songs.groupby("artist").sum()
grouped_songs

Unnamed: 0_level_0,number of streams
artist,Unnamed: 1_level_1
Chainsmokers,678
Drake,1011
Ed Sheeran,770
Justin Bieber,1328
Major Lazer,909


In [47]:
_ = ok.grade('q11')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Could not save your notebook. Make sure your notebook is saved before sending it to OK!
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/1wDxYZ
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



## Joining Tables


**Inner Join: ** returns rows representing the heroes that appear in both data frames.

**Outer Join: ** returns all heroes found in both the left and right data frames. Any missing values are filled in with NaN.

**Left Join: ** returns all records from the left table and the matched records from the right table.

**Right Join: ** returns all records from the right table and the matched records from the left table.

### Question 12
Create a new data frame that contains the artist, number of streams, and album name only if the artist is in both the `popular_songs` table and the `top_2017_albums` table.

In [None]:
merged_artists = ...

In [48]:
#SOLUTION CELL
merged_artists = pd.merge(popular_songs, top_2017_albums, how = "inner", on = "artist")
merged_artists

Unnamed: 0,artist,number of streams,album name
0,Ed Sheeran,770,Divide
1,Drake,1011,More Life


In [49]:
_ = ok.grade('q12')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/73kNnQ
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



### Question 13
Create a new data frame that contains the artist, number of streams, and album name. Include row if the artist is in either the `popular_songs` table or the `top_2017_albums` table.

In [None]:
merged_artists_all = ...

In [50]:
#SOLUTION CELL
merged_artists_all = pd.merge(popular_songs, top_2017_albums, how = "outer", on = "artist")
merged_artists_all

Unnamed: 0,artist,number of streams,album name
0,Ed Sheeran,770.0,Divide
1,Drake,1011.0,More Life
2,Justin Bieber,828.0,
3,Justin Bieber,500.0,
4,Chainsmokers,678.0,
5,Major Lazer,909.0,
6,The Weeknd,,Starboy
7,Bruno Mars,,24k Magic
8,Coldplay,,A Head Full of Dreams
9,Coldplay,,A Head Full of Dreams


In [51]:
_ = ok.grade('q13')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/9rmRpJ
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



## Handling Null/Nan Values

To check if a value is null, we use the `isnull()` method for series and data frames.  Alternatively, there is a `pd.isnull()` function as well. In order to replace null values in a dataframe, we can use the `fillna()` function which will replace the NaNs with a value of your choosing. Feel free to experiment with these functions below! (This concept will be important in the upcoming homework).

### Question 14

In the table you created in the previous question, replace the NaN values in the `album_name` column with "None".

In [None]:
merged_artists_cleaned = ...

In [52]:
#SOLUTION CELL
merged_artists_cleaned = merged_artists_all.fillna("None")
merged_artists_cleaned

Unnamed: 0,artist,number of streams,album name
0,Ed Sheeran,770.0,Divide
1,Drake,1011.0,More Life
2,Justin Bieber,828.0,
3,Justin Bieber,500.0,
4,Chainsmokers,678.0,
5,Major Lazer,909.0,
6,The Weeknd,,Starboy
7,Bruno Mars,,24k Magic
8,Coldplay,,A Head Full of Dreams
9,Coldplay,,A Head Full of Dreams


In [53]:
_ = ok.grade('q14')
_ = ok.backup()

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Backup... 100% complete
Backup successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/backups/gJP1xr
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit



## Submission
Run the cell below to submit the lab.  You may resubmit as many times you want.

In [54]:
_ = ok.submit()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'lab02_master.ipynb'.
Submit... 100% complete
Submission successful for user: sona.jeswani@berkeley.edu
URL: https://okpy.org/cal/ds100/fa17/lab02/submissions/jRP7vy
NOTE: this is only a backup. To submit your assignment, use:
	python3 ok --submit

