We've seen a preview of how Pandas handles missing values using the None Type and the NumPy and NaN keywords. Missing values are pretty common in data cleaning activities, and missing values can be there for any number of reasons, and I just want to touch on a few of those here. For instance, if you're running a survey and a respondent didn't answer a question, the missing value is actually an omission. This kind of missing data is called missing at random if there are other variables that might be used to predict the variable which is missing. In my work when I deliver surveys I often find that missing data say interest in being involved in a follow-up study, often has some correlation with other data like gender or ethnicity. If there's no relationship to other variables, then we call this data missing completely at random. So these are just two examples of missing data and there's many more. For instance, data might be missing because it wasn't collected. Either because the process responsible for collecting the data such as the researcher, or because it wouldn't make sense if it were to be collected. This last example is extremely common when you start joining DataFrames together from multiple sources such as joining a list of people at a university with a list of offices in the university. Students don't generally have offices but they're still people at the university.


In [175]:
import pandas as pd
df = pd.read_csv('resources/class_grades.csv')
df.head(10)

Unnamed: 0,Prefix,Assignment,Tutorial,Midterm,TakeHome,Final
0,5,57.14,34.09,64.38,51.48,52.5
1,8,95.05,105.49,67.5,99.07,68.33
2,8,83.7,83.17,,63.15,48.89
3,7,,,49.38,105.93,80.56
4,8,91.32,93.64,95.0,107.41,73.89
5,7,95.0,92.58,93.12,97.78,68.06
6,8,95.05,102.99,56.25,99.07,50.0
7,7,72.85,86.85,60.0,,56.11
8,8,84.26,93.1,47.5,18.52,50.83
9,7,90.1,97.55,51.25,88.89,63.61


In [176]:
#So we can actually use the functions.isnull to create a Boolean mask of the whole DataFrame. 
#This effectively broadcasts the isnull function to every cell of data.
msk = df.isnull()
msk.head(10)

Unnamed: 0,Prefix,Assignment,Tutorial,Midterm,TakeHome,Final
0,False,False,False,False,False,False
1,False,False,False,False,False,False
2,False,False,False,True,False,False
3,False,True,True,False,False,False
4,False,False,False,False,False,False
5,False,False,False,False,False,False
6,False,False,False,False,False,False
7,False,False,False,False,True,False
8,False,False,False,False,False,False
9,False,False,False,False,False,False


In [177]:
df.dropna().head(10)
#we can see that  2 3 and 7 are gone now

Unnamed: 0,Prefix,Assignment,Tutorial,Midterm,TakeHome,Final
0,5,57.14,34.09,64.38,51.48,52.5
1,8,95.05,105.49,67.5,99.07,68.33
4,8,91.32,93.64,95.0,107.41,73.89
5,7,95.0,92.58,93.12,97.78,68.06
6,8,95.05,102.99,56.25,99.07,50.0
8,8,84.26,93.1,47.5,18.52,50.83
9,7,90.1,97.55,51.25,88.89,63.61
10,7,80.44,90.2,75.0,91.48,39.72
12,8,97.16,103.71,72.5,93.52,63.33
13,7,91.28,83.53,81.25,99.81,92.22


In [178]:
#One of the handy functions that Pandas has for working with missing values is the filling function called fillna. 
#This function takes a number of parameters. You can pass in a single value which is called a scalar value to 
#change all of the missing data to one value. This isn't really applicable in this case, but it's a pretty common use case. 
df.fillna(0, inplace=True)
df.head(10)
# the inplace True means that we want to edit the origional data not to make a edited copy of it

Unnamed: 0,Prefix,Assignment,Tutorial,Midterm,TakeHome,Final
0,5,57.14,34.09,64.38,51.48,52.5
1,8,95.05,105.49,67.5,99.07,68.33
2,8,83.7,83.17,0.0,63.15,48.89
3,7,0.0,0.0,49.38,105.93,80.56
4,8,91.32,93.64,95.0,107.41,73.89
5,7,95.0,92.58,93.12,97.78,68.06
6,8,95.05,102.99,56.25,99.07,50.0
7,7,72.85,86.85,60.0,0.0,56.11
8,8,84.26,93.1,47.5,18.52,50.83
9,7,90.1,97.55,51.25,88.89,63.61


We can also use the na filter option to turn off whitespace filtering. If whitespace is an actual value of interest, but in practice this is pretty rare. In data without any na's passing na filter equals false can improve the performance of reading a large file. In addition to rules controlling how missing values might be loaded, it's sometimes useful to consider missing values as actually having information.

 I've looked at video use in lecture capture systems. In these systems, it's common for the player to have a heartbeat functionality, where playbacks statistics are sent to the server every so often, maybe every 30 seconds. These heartbeats can get big as they carry the whole state of the playback system. Such as where the video play head is at, to where the video sizes at. Where the video is being rendered to the screen. How loud the volume is and so on. So if we load the data file log.csv we can see an example of this.

In [179]:
df = pd.read_csv('resources/log.csv')
df.head(20)

Unnamed: 0,time,user,video,playback position,paused,volume
0,1469974424,cheryl,intro.html,5,False,10.0
1,1469974454,cheryl,intro.html,6,,
2,1469974544,cheryl,intro.html,9,,
3,1469974574,cheryl,intro.html,10,,
4,1469977514,bob,intro.html,1,,
5,1469977544,bob,intro.html,1,,
6,1469977574,bob,intro.html,1,,
7,1469977604,bob,intro.html,1,,
8,1469974604,cheryl,intro.html,11,,
9,1469974694,cheryl,intro.html,14,,


next up is the method parameter. The two common fill values are ffill and bfill. Ffill is for forward filling and it updates an na value for a particular cell with the value from the previous row. bfills for backward filling which is the opposite of that fill. It fills the missing values with the next valid value. It's important to note that your data needs to be sorted in order for this to have the effect you might want. Data which comes from traditional database management systems usually has no order guarantee just like this data, so you have to be careful. So in Pandas we can sort by index or by value. Here will just promote the timestamp to an index and then sort on the index.

In [180]:
df = df.set_index('time')
df = df.sort_index()
df.head(20)

Unnamed: 0_level_0,user,video,playback position,paused,volume
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1469974424,cheryl,intro.html,5,False,10.0
1469974424,sue,advanced.html,23,False,10.0
1469974454,cheryl,intro.html,6,,
1469974454,sue,advanced.html,24,,
1469974484,cheryl,intro.html,7,,
1469974514,cheryl,intro.html,8,,
1469974524,sue,advanced.html,25,,
1469974544,cheryl,intro.html,9,,
1469974554,sue,advanced.html,26,,
1469974574,cheryl,intro.html,10,,


#If we look closely at the output though we'll notice that the index isn't really unique. Two users seemed to be able to use the system at the same time, and again this is actually a common case. So let's reset the index and use some multi-level indexing on time and user together, and promote the username to a second level index to deal with the issue. 

In [181]:
df = df.reset_index()
df = df.set_index(['time','user'])
df.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,video,playback position,paused,volume
time,user,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1469974424,cheryl,intro.html,5,False,10.0
1469974424,sue,advanced.html,23,False,10.0
1469974454,cheryl,intro.html,6,,
1469974454,sue,advanced.html,24,,
1469974484,cheryl,intro.html,7,,
1469974514,cheryl,intro.html,8,,
1469974524,sue,advanced.html,25,,
1469974544,cheryl,intro.html,9,,
1469974554,sue,advanced.html,26,,
1469974574,cheryl,intro.html,10,,


Now that we have the Data indexed and sorted appropriately we can fill the missing datas using ffill. It's good to remember when dealing with missing values so that you can deal with individual columns or sets of columns by projecting them. So you don't have to fix all missing values in one command. 

In [182]:
df = df.fillna(method='ffill')
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,video,playback position,paused,volume
time,user,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1469974424,cheryl,intro.html,5,False,10.0
1469974424,sue,advanced.html,23,False,10.0
1469974454,cheryl,intro.html,6,False,10.0
1469974454,sue,advanced.html,24,False,10.0
1469974484,cheryl,intro.html,7,False,10.0


we can also do customize fill-in to replace values with the replace function. It allows replacement from several approaches. Value-to-value, list, dictionary, regex. So let's generate a simple example.

In [183]:
df = pd.DataFrame({'A':[1, 1, 2, 3, 4],
                  'B':[3, 6, 3, 8, 9],
                  'C':['a','b','c','d','e']})
df

Unnamed: 0,A,B,C
0,1,3,a
1,1,6,b
2,2,3,c
3,3,8,d
4,4,9,e


In [184]:
df.replace(1, 100) # replace 1 with 100

Unnamed: 0,A,B,C
0,100,3,a
1,100,6,b
2,2,3,c
3,3,8,d
4,4,9,e


In [185]:
df.replace([1, 3],[100, 300]) # to replace 1 with 100 and 3 with 300

Unnamed: 0,A,B,C
0,100,300,a
1,100,6,b
2,2,300,c
3,300,8,d
4,4,9,e


In [206]:
#lets get back to our log example
df = pd.read_csv('resources/log.csv')
df.head(20)

Unnamed: 0,time,user,video,playback position,paused,volume
0,1469974424,cheryl,intro.html,5,False,10.0
1,1469974454,cheryl,intro.html,6,,
2,1469974544,cheryl,intro.html,9,,
3,1469974574,cheryl,intro.html,10,,
4,1469977514,bob,intro.html,1,,
5,1469977544,bob,intro.html,1,,
6,1469977574,bob,intro.html,1,,
7,1469977604,bob,intro.html,1,,
8,1469974604,cheryl,intro.html,11,,
9,1469974694,cheryl,intro.html,14,,


to replace using regex, we make the first parameter to replace the regex pattern that we want to replace. The second parameter the value that we want to emit upon a match. Then we pass in a third parameter that just says regex equals true. So take a moment to pause this video and think about this problem. Imagine that we want to detect all HTML pages in the video column. Let's say that this just means that they ended with.HTML, and we want to overwrite that with the keyword webpage. How could we accomplish that? 

In [207]:
df.replace(to_replace=".*.html$", value='WebPage', regex=True) # the regex here is to get any number of letters inding in .html

Unnamed: 0,time,user,video,playback position,paused,volume
0,1469974424,cheryl,WebPage,5,False,10.0
1,1469974454,cheryl,WebPage,6,,
2,1469974544,cheryl,WebPage,9,,
3,1469974574,cheryl,WebPage,10,,
4,1469977514,bob,WebPage,1,,
5,1469977544,bob,WebPage,1,,
6,1469977574,bob,WebPage,1,,
7,1469977604,bob,WebPage,1,,
8,1469974604,cheryl,WebPage,11,,
9,1469974694,cheryl,WebPage,14,,


When you use statistical functions on DataFrames, these functions typically ignore missing values. For instance, if you try and calculate the mean value of a DataFrame the underlying NumPy functions may ignore those missing values. This is usually what you want, but you should be aware of the values that are being excluded. Why you have missing values really matters depending on the problem you're trying to solve. It might be unreasonable to infer missing values for instance, if the Data shouldn't exist in the first place.


# Manipulating dataframes


In [208]:
df = pd.read_csv('resources/presidents.csv')
df.head()

Unnamed: 0,#,President,Born,Age atstart of presidency,Age atend of presidency,Post-presidencytimespan,Died,Age
0,1,George Washington,"Feb 22, 1732[a]","57 years, 67 daysApr 30, 1789","65 years, 10 daysMar 4, 1797","2 years, 285 days","Dec 14, 1799","67 years, 295 days"
1,2,John Adams,"Oct 30, 1735[a]","61 years, 125 daysMar 4, 1797","65 years, 125 daysMar 4, 1801","25 years, 122 days","Jul 4, 1826","90 years, 247 days"
2,3,Thomas Jefferson,"Apr 13, 1743[a]","57 years, 325 daysMar 4, 1801","65 years, 325 daysMar 4, 1809","17 years, 122 days","Jul 4, 1826","83 years, 82 days"
3,4,James Madison,"Mar 16, 1751[a]","57 years, 353 daysMar 4, 1809","65 years, 353 daysMar 4, 1817","19 years, 116 days","Jun 28, 1836","85 years, 104 days"
4,5,James Monroe,"Apr 28, 1758","58 years, 310 daysMar 4, 1817","66 years, 310 daysMar 4, 1825","6 years, 122 days","Jul 4, 1831","73 years, 67 days"


 Okay, so we have some presidents, some dates. I see a bunch of footnotes in the born column which may cause issues. So let's start with cleaning up that name into first name and last name, I'm going to tackle this with a regex. So I want to create two new columns and apply a regex to the projection of the president column. So here's one solution. We could make a copy of the president column.

In [209]:
df['First']=df['President']
df['First']=df['First'].replace("[ ].*", "", regex=True)
df.head()

Unnamed: 0,#,President,Born,Age atstart of presidency,Age atend of presidency,Post-presidencytimespan,Died,Age,First
0,1,George Washington,"Feb 22, 1732[a]","57 years, 67 daysApr 30, 1789","65 years, 10 daysMar 4, 1797","2 years, 285 days","Dec 14, 1799","67 years, 295 days",George
1,2,John Adams,"Oct 30, 1735[a]","61 years, 125 daysMar 4, 1797","65 years, 125 daysMar 4, 1801","25 years, 122 days","Jul 4, 1826","90 years, 247 days",John
2,3,Thomas Jefferson,"Apr 13, 1743[a]","57 years, 325 daysMar 4, 1801","65 years, 325 daysMar 4, 1809","17 years, 122 days","Jul 4, 1826","83 years, 82 days",Thomas
3,4,James Madison,"Mar 16, 1751[a]","57 years, 353 daysMar 4, 1809","65 years, 353 daysMar 4, 1817","19 years, 116 days","Jun 28, 1836","85 years, 104 days",James
4,5,James Monroe,"Apr 28, 1758","58 years, 310 daysMar 4, 1817","66 years, 310 daysMar 4, 1825","6 years, 122 days","Jul 4, 1831","73 years, 67 days",James


So that works but it's kind of gross and it's slow since we had to make a full copy of the column and then go through and update strings. There are a few other ways that we can deal this. Let me show you the most general one first and that's the apply function. So let's drop the column we just made.

In [210]:
del(df['First'])

 So the apply function on a data frame will take some arbitrary function you have written and apply it to either a series, a single column or a data frame across all rows or columns. Let's write a function which just splits a string into two pieces using a single row of data

In [211]:
def splitname(row):
    #row is a single row indexed by column values
    #to extract the first name
    row['First']=row['President'].split(" ")[0]
    #to extract the last name
    row['Last']=row['President'].split(" ")[-1]
    
    return row
df = df.apply(splitname, axis = 'columns')
df.head()

Unnamed: 0,#,President,Born,Age atstart of presidency,Age atend of presidency,Post-presidencytimespan,Died,Age,First,Last
0,1,George Washington,"Feb 22, 1732[a]","57 years, 67 daysApr 30, 1789","65 years, 10 daysMar 4, 1797","2 years, 285 days","Dec 14, 1799","67 years, 295 days",George,Washington
1,2,John Adams,"Oct 30, 1735[a]","61 years, 125 daysMar 4, 1797","65 years, 125 daysMar 4, 1801","25 years, 122 days","Jul 4, 1826","90 years, 247 days",John,Adams
2,3,Thomas Jefferson,"Apr 13, 1743[a]","57 years, 325 daysMar 4, 1801","65 years, 325 daysMar 4, 1809","17 years, 122 days","Jul 4, 1826","83 years, 82 days",Thomas,Jefferson
3,4,James Madison,"Mar 16, 1751[a]","57 years, 353 daysMar 4, 1809","65 years, 353 daysMar 4, 1817","19 years, 116 days","Jun 28, 1836","85 years, 104 days",James,Madison
4,5,James Monroe,"Apr 28, 1758","58 years, 310 daysMar 4, 1817","66 years, 310 daysMar 4, 1825","6 years, 122 days","Jul 4, 1831","73 years, 67 days",James,Monroe


So it's pretty questionable as to whether that's less gross but it achieves the result and I find that I use the apply function regularly in my work. The panda series has a couple of other nice convenience functions though and next I'd like to touch on one called extract. 

So extract takes a regular expression as input and specifically requires you to set capture groups that correspond to the output columns that you're interested in 

In [212]:
del(df['First'])
del(df['Last'])


In [213]:
pattern = "(?P<First>^[\w]*)(?:.* )(?P<Last>[\w]*$)"
names = df['President'].str.extract(pattern).head()
names

Unnamed: 0,First,Last
0,George,Washington
1,John,Adams
2,Thomas,Jefferson
3,James,Madison
4,James,Monroe


In [214]:
# so we have got 2 new columns that we can deal with
df['First']=names['First']
df['Last']=names['Last']
df.head()

Unnamed: 0,#,President,Born,Age atstart of presidency,Age atend of presidency,Post-presidencytimespan,Died,Age,First,Last
0,1,George Washington,"Feb 22, 1732[a]","57 years, 67 daysApr 30, 1789","65 years, 10 daysMar 4, 1797","2 years, 285 days","Dec 14, 1799","67 years, 295 days",George,Washington
1,2,John Adams,"Oct 30, 1735[a]","61 years, 125 daysMar 4, 1797","65 years, 125 daysMar 4, 1801","25 years, 122 days","Jul 4, 1826","90 years, 247 days",John,Adams
2,3,Thomas Jefferson,"Apr 13, 1743[a]","57 years, 325 daysMar 4, 1801","65 years, 325 daysMar 4, 1809","17 years, 122 days","Jul 4, 1826","83 years, 82 days",Thomas,Jefferson
3,4,James Madison,"Mar 16, 1751[a]","57 years, 353 daysMar 4, 1809","65 years, 353 daysMar 4, 1817","19 years, 116 days","Jun 28, 1836","85 years, 104 days",James,Madison
4,5,James Monroe,"Apr 28, 1758","58 years, 310 daysMar 4, 1817","66 years, 310 daysMar 4, 1825","6 years, 122 days","Jul 4, 1831","73 years, 67 days",James,Monroe


In [215]:
# Now, let's move on to clean up the born column. First, let's get rid of anything that isn't in the pattern of day, month, year
df['Born'] = df['Born'].str.extract("([\w]{3}\s[\w]{1,2},\s[\w]{4})")
df['Born'].head()

0    Feb 22, 1732
1    Oct 30, 1735
2    Apr 13, 1743
3    Mar 16, 1751
4    Apr 28, 1758
Name: Born, dtype: object

In [216]:
df.head()

Unnamed: 0,#,President,Born,Age atstart of presidency,Age atend of presidency,Post-presidencytimespan,Died,Age,First,Last
0,1,George Washington,"Feb 22, 1732","57 years, 67 daysApr 30, 1789","65 years, 10 daysMar 4, 1797","2 years, 285 days","Dec 14, 1799","67 years, 295 days",George,Washington
1,2,John Adams,"Oct 30, 1735","61 years, 125 daysMar 4, 1797","65 years, 125 daysMar 4, 1801","25 years, 122 days","Jul 4, 1826","90 years, 247 days",John,Adams
2,3,Thomas Jefferson,"Apr 13, 1743","57 years, 325 daysMar 4, 1801","65 years, 325 daysMar 4, 1809","17 years, 122 days","Jul 4, 1826","83 years, 82 days",Thomas,Jefferson
3,4,James Madison,"Mar 16, 1751","57 years, 353 daysMar 4, 1809","65 years, 353 daysMar 4, 1817","19 years, 116 days","Jun 28, 1836","85 years, 104 days",James,Madison
4,5,James Monroe,"Apr 28, 1758","58 years, 310 daysMar 4, 1817","66 years, 310 daysMar 4, 1825","6 years, 122 days","Jul 4, 1831","73 years, 67 days",James,Monroe


The type of this column is actually an object and we know that's what pandas uses when it's dealing with strings but pandas actually has really interesting date-time features. 

In [217]:
# So if I were building this out further, I would actually update this column to write the data type as well. 
df['Born'] = pd.to_datetime(df['Born'])
df['Born'].head() # so now the tipe is date time not object as string

0   1732-02-22
1   1735-10-30
2   1743-04-13
3   1751-03-16
4   1758-04-28
Name: Born, dtype: datetime64[ns]

In [218]:
df = df.set_index(['#'])
df.head()

Unnamed: 0_level_0,President,Born,Age atstart of presidency,Age atend of presidency,Post-presidencytimespan,Died,Age,First,Last
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,George Washington,1732-02-22,"57 years, 67 daysApr 30, 1789","65 years, 10 daysMar 4, 1797","2 years, 285 days","Dec 14, 1799","67 years, 295 days",George,Washington
2,John Adams,1735-10-30,"61 years, 125 daysMar 4, 1797","65 years, 125 daysMar 4, 1801","25 years, 122 days","Jul 4, 1826","90 years, 247 days",John,Adams
3,Thomas Jefferson,1743-04-13,"57 years, 325 daysMar 4, 1801","65 years, 325 daysMar 4, 1809","17 years, 122 days","Jul 4, 1826","83 years, 82 days",Thomas,Jefferson
4,James Madison,1751-03-16,"57 years, 353 daysMar 4, 1809","65 years, 353 daysMar 4, 1817","19 years, 116 days","Jun 28, 1836","85 years, 104 days",James,Madison
5,James Monroe,1758-04-28,"58 years, 310 daysMar 4, 1817","66 years, 310 daysMar 4, 1825","6 years, 122 days","Jul 4, 1831","73 years, 67 days",James,Monroe


In [219]:
df = df.drop('President', axis=1)
df.head()

Unnamed: 0_level_0,Born,Age atstart of presidency,Age atend of presidency,Post-presidencytimespan,Died,Age,First,Last
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,1732-02-22,"57 years, 67 daysApr 30, 1789","65 years, 10 daysMar 4, 1797","2 years, 285 days","Dec 14, 1799","67 years, 295 days",George,Washington
2,1735-10-30,"61 years, 125 daysMar 4, 1797","65 years, 125 daysMar 4, 1801","25 years, 122 days","Jul 4, 1826","90 years, 247 days",John,Adams
3,1743-04-13,"57 years, 325 daysMar 4, 1801","65 years, 325 daysMar 4, 1809","17 years, 122 days","Jul 4, 1826","83 years, 82 days",Thomas,Jefferson
4,1751-03-16,"57 years, 353 daysMar 4, 1809","65 years, 353 daysMar 4, 1817","19 years, 116 days","Jun 28, 1836","85 years, 104 days",James,Madison
5,1758-04-28,"58 years, 310 daysMar 4, 1817","66 years, 310 daysMar 4, 1825","6 years, 122 days","Jul 4, 1831","73 years, 67 days",James,Monroe


In [230]:
order = ['First','Last', 'Born', 'Age atstart of presidency','Age atend of presidency','Post-presidencytimespan']
df = df[order]
df.head()

Unnamed: 0_level_0,First,Last,Born,Age atstart of presidency,Age atend of presidency,Post-presidencytimespan
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,George,Washington,1732-02-22,"57 years, 67 daysApr 30, 1789","65 years, 10 daysMar 4, 1797","2 years, 285 days"
2,John,Adams,1735-10-30,"61 years, 125 daysMar 4, 1797","65 years, 125 daysMar 4, 1801","25 years, 122 days"
3,Thomas,Jefferson,1743-04-13,"57 years, 325 daysMar 4, 1801","65 years, 325 daysMar 4, 1809","17 years, 122 days"
4,James,Madison,1751-03-16,"57 years, 353 daysMar 4, 1809","65 years, 353 daysMar 4, 1817","19 years, 116 days"
5,James,Monroe,1758-04-28,"58 years, 310 daysMar 4, 1817","66 years, 310 daysMar 4, 1825","6 years, 122 days"
