In [34]:
import pandas as pd

## Read the titles dataframe

In [35]:
snl = pd.read_csv('https://github.com/gdv/foundationsCS/raw/main/students/ex-data/snldb/snl_title.csv')

In [36]:
snl.head()

Unnamed: 0,sid,eid,tid,title,titleType
0,3,20,1978052013,"""Space is the Place"", ""Space-Loneliness""",Musical Performance
1,2,21,1977051416,,Goodnights
2,3,18,1978042215,,Goodnights
3,3,20,1978052014,,Goodnights
4,3,18,1978042214,Next Week In Review,Show


## Remove all rows that do not have a title

The simplest approach is to use the **dropna**, but this function removes the rows with *any* missing value: therefore it does not work if there are more fields with missing values.

We extract the index of the rows that do not have a missing title

In [37]:
clean_idx = snl['title'].notnull()

Then we use the fancy indexing

In [38]:
snl[clean_idx] 

Unnamed: 0,sid,eid,tid,title,titleType
0,3,20,1978052013,"""Space is the Place"", ""Space-Loneliness""",Musical Performance
4,3,18,1978042214,Next Week In Review,Show
13,2,20,1977042314,Trans Eastern Airlines,Sketch
15,7,20,1982052214,"""Landslide""",Musical Performance
17,7,20,1982052215,The Clams,Commercial
...,...,...,...,...,...
11692,11,18,198605249,"""Let's Take It To The Stage""",Musical Performance
11693,11,18,198605246,National Council of Liquor and Spirits,Commercial
11694,11,18,198605247,Actors On Film,Show
11695,11,18,198605244,Moments Of Doubt,Sketch


Another way is to explore the options of `dropna`. More precisely, the `subset` option.

In [39]:
snl.dropna(subset=['title'])

Unnamed: 0,sid,eid,tid,title,titleType
0,3,20,1978052013,"""Space is the Place"", ""Space-Loneliness""",Musical Performance
4,3,18,1978042214,Next Week In Review,Show
13,2,20,1977042314,Trans Eastern Airlines,Sketch
15,7,20,1982052214,"""Landslide""",Musical Performance
17,7,20,1982052215,The Clams,Commercial
...,...,...,...,...,...
11692,11,18,198605249,"""Let's Take It To The Stage""",Musical Performance
11693,11,18,198605246,National Council of Liquor and Spirits,Commercial
11694,11,18,198605247,Actors On Film,Show
11695,11,18,198605244,Moments Of Doubt,Sketch


## 2. Build a DataFrame with index the pair (season id, episode id).

In [40]:
sk34 = snl.set_index(['sid', 'eid'])

In [41]:
sk34.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,tid,title,titleType
sid,eid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3,20,1978052013,"""Space is the Place"", ""Space-Loneliness""",Musical Performance
2,21,1977051416,,Goodnights
3,18,1978042215,,Goodnights
3,20,1978052014,,Goodnights
3,18,1978042214,Next Week In Review,Show


Check if the index has been correctly set

In [42]:
sk34.index.names

FrozenList(['sid', 'eid'])

## 3. Extract the sketches of seasons 3 and 4.

If we want to use slices, then we need to sort the dataframe according to the index

In [43]:
sk34.sort_index(inplace=True)

In [44]:
sk34.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,tid,title,titleType
sid,eid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1,1975101126,,Goodnights
1,1,1975101125,"""In The Winter""",Musical Performance
1,1,1975101124,Triple-Trac,Commercial
1,1,1975101123,Home Securities,Sketch
1,1,1975101122,"""Fancy Lady""",Musical Performance


We can use *loc* with either a list of values or a slice. We are going for the list.

In [45]:
sk34.loc[[3, 4]]

Unnamed: 0_level_0,Unnamed: 1_level_0,tid,title,titleType
sid,eid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3,1,1977092413,,Goodnights
3,1,1977092412,"""The Pretender""",Musical Performance
3,1,1977092411,Royal Deluxe II,Commercial
3,1,1977092410,The Franken and Davis Show,Show
3,1,197709248,Keypunch Confession,Sketch
...,...,...,...,...
4,20,197905263,Ray's Disco Roller Fishing Park,Commercial
4,20,197905262,,Monologue
4,20,197905266,"""Married Men""",Musical Performance
4,20,197905269,The Franken and Davis Show,Show


Now we can impose a search condition (like the `where` part of a SQL query)

In [46]:
sk34[sk34['titleType'] == 'Sketch']

Unnamed: 0_level_0,Unnamed: 1_level_0,tid,title,titleType
sid,eid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1,1975101123,Home Securities,Sketch
1,1,1975101114,The Land of Gorch,Sketch
1,1,197510115,Trial,Sketch
1,2,1975101812,The Land of Gorch,Sketch
1,3,1975102520,Bees,Sketch
...,...,...,...,...
42,11,201701147,Susan B. Anthony House,Sketch
42,11,201701145,Theatre Donor,Sketch
42,12,2017012111,Pizza Town,Sketch
42,12,201701219,Dirty Talk,Sketch


In [47]:
sk34[sk34['titleType'] == 'Sketch']

Unnamed: 0_level_0,Unnamed: 1_level_0,tid,title,titleType
sid,eid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1,1975101123,Home Securities,Sketch
1,1,1975101114,The Land of Gorch,Sketch
1,1,197510115,Trial,Sketch
1,2,1975101812,The Land of Gorch,Sketch
1,3,1975102520,Bees,Sketch
...,...,...,...,...
42,11,201701147,Susan B. Anthony House,Sketch
42,11,201701145,Theatre Donor,Sketch
42,12,2017012111,Pizza Town,Sketch
42,12,201701219,Dirty Talk,Sketch


With the slice

In [48]:
sk34.loc[3:4]

Unnamed: 0_level_0,Unnamed: 1_level_0,tid,title,titleType
sid,eid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3,1,1977092413,,Goodnights
3,1,1977092412,"""The Pretender""",Musical Performance
3,1,1977092411,Royal Deluxe II,Commercial
3,1,1977092410,The Franken and Davis Show,Show
3,1,197709248,Keypunch Confession,Sketch
...,...,...,...,...
4,20,197905263,Ray's Disco Roller Fishing Park,Commercial
4,20,197905262,,Monologue
4,20,197905266,"""Married Men""",Musical Performance
4,20,197905269,The Franken and Davis Show,Show


Another way is to reset the index, so that the season id becomes a column, then query the dataframe. 

In [49]:
flat = sk34.reset_index()

In [50]:
flat[(flat['titleType'] == 'Sketch') & ((flat['sid'] == 3) | (flat['sid'] == 4))]

Unnamed: 0,sid,eid,tid,title,titleType
849,3,1,197709248,Keypunch Confession,Sketch
851,3,1,197709247,"Mike McMack, Defense Lawyer",Sketch
852,3,1,197709249,Great Moments In Rock & Roll,Sketch
858,3,2,1977100817,Phone Call,Sketch
861,3,2,1977100814,Hercules,Sketch
...,...,...,...,...,...
1355,4,18,197905128,Boulevard Of Proud Chicano Cars,Sketch
1362,4,19,1979051913,Candy Store,Sketch
1365,4,19,197905197,Mother & Daughter,Sketch
1370,4,19,197905194,Houseguest Idi Amin,Sketch


In this case, the first solution is simpler, even though it requires two separate steps.