## Merging the CSVs

In this notebook we are going to merge the CSV exported from the Google Drive list with the CSV we created from the transcripts. The first thing I want to do is compare the two columns of the two CSVs:

### Load the CSVs

In [1]:
# =-=-=-=-=-=-=-=-=-=-=
# LOAD the CSVs into dataframes to check our work
# =-=-=-=-=-=-=-=-=-=-= 

# Let python create the column names list:
with open('./transcripts.csv') as f:
    t_colnames = f.readline().strip().split(",")

with open('./Google_list.csv') as f:
    G_colnames = f.readline().strip().split(",")

print(t_colnames)
print(G_colnames)

['public_url', 'speaker', 'duration', 'uploaded', 'views', 'description', 'text']
['Talk ID', 'public_url', 'speaker_name', 'headline', 'description', 'event', 'duration', 'language', 'published', 'tags']


Next will take a look at the first few entries for those two CSVs:

In [3]:
import pandas as pd

google_list = pd.read_csv('./Google_list_as_of_2018-05-29.csv')
transcripts = pd.read_csv('./transcripts.csv')

google_list.head()

Unnamed: 0,Talk ID,public_url,speaker_name,headline,description,event,duration,language,published,tags
0,1,https://www.ted.com/talks/al_gore_on_averting_...,Al Gore,Averting the climate crisis,With the same humor and humanity he exuded in ...,TED2006,0:16:17,en,6/27/06,"alternative energy,cars,global issues,climate ..."
1,7,https://www.ted.com/talks/david_pogue_says_sim...,David Pogue,Simplicity sells,New York Times columnist David Pogue takes aim...,TED2006,0:21:26,en,6/27/06,"simplicity,entertainment,interface design,soft..."
2,53,https://www.ted.com/talks/majora_carter_s_tale...,Majora Carter,Greening the ghetto,"In an emotionally charged talk, MacArthur-winn...",TED2006,0:18:36,en,6/27/06,"MacArthur grant,cities,green,activism,politics..."
3,66,https://www.ted.com/talks/ken_robinson_says_sc...,Ken Robinson,Do schools kill creativity?,Sir Ken Robinson makes an entertaining and pro...,TED2006,0:19:24,en,6/27/06,"children,teaching,creativity,parenting,culture..."
4,92,https://www.ted.com/talks/hans_rosling_shows_t...,Hans Rosling,The best stats you've ever seen,You've never seen data presented like this. Wi...,TED2006,0:19:50,en,6/27/06,"demo,Asia,global issues,visualizations,global ..."


In [4]:
transcripts.head()

Unnamed: 0,public_url,speaker,duration,uploaded,views,description,text
0,https://www.ted.com/talks/courtney_martin_the_...,Courtney E. Martin,PT15M32S,2016-09-07T14:52:02+00:00,1508852,"For the first time in history, the majority of...","I'm a journalist, so I like to look for the ..."
1,https://www.ted.com/talks/kiran_bedi_a_police_...,Kiran Bedi,PT8M47S,2010-12-13T16:09:51+00:00,957684,Kiran Bedi has a surprising resume. Before bec...,Now I'm going to give you a story. It's an I...
2,https://www.ted.com/talks/tom_chatfield_7_ways...,Tom Chatfield,PT16M28S,2010-11-01T09:17:00+00:00,1061370,We're bringing gameplay into more aspects of o...,I love video games. I'm also slightly in awe...
3,https://www.ted.com/talks/mundano_pimp_my_tras...,Mundano,PT5M22S,2014-12-19T16:18:48+00:00,986242,"In Brazil, ""catadores"" collect junk and recycl...",Our world has many superheroes. But they hav...
4,https://www.ted.com/talks/sunitha_krishnan_ted...,Sunitha Krishnan,PT12M42S,2009-12-07T01:00:00+00:00,2610947,Sunitha Krishnan has dedicated her life to res...,I'm talking to you about the worst form of h...


## Merge the CSVs

Now, let's see about merging these two into a new dataframe. *One CSV to rule them all!*

The [docs][] report this syntax:

    pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
         left_index=False, right_index=False, sort=True,
         suffixes=('_x', '_y'), copy=True, indicator=False,
         validate=None)



[docs]: https://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging

In [4]:
one_ring = pandas.merge(google_list, transcripts, on='public_url')

In [5]:
one_ring.head()

Unnamed: 0,Talk ID,public_url,speaker_name,headline,description_x,event,duration_x,language,published,tags,speaker,duration_y,uploaded,views,description_y,text
0,Talk ID,public_url,speaker_name,headline,description,event,duration,language,published,tags,speaker,duration,uploaded,views,description,text
1,1,https://www.ted.com/talks/al_gore_on_averting_...,Al Gore,Averting the climate crisis,With the same humor and humanity he exuded in ...,TED2006,0:16:17,en,6/27/06,"alternative energy,cars,global issues,climate ...",Al Gore,PT16M17S,2006-06-27T00:11:00+00:00,3266733,With the same humor and humanity he exuded in ...,"Thank you so much, Chris. And it's truly a g..."
2,7,https://www.ted.com/talks/david_pogue_says_sim...,David Pogue,Simplicity sells,New York Times columnist David Pogue takes aim...,TED2006,0:21:26,en,6/27/06,"simplicity,entertainment,interface design,soft...",David Pogue,PT21M26S,2006-06-27T00:11:00+00:00,1702201,New York Times columnist David Pogue takes aim...,"(Music: ""The Sound of Silence,"" Simon & Garf..."
3,53,https://www.ted.com/talks/majora_carter_s_tale...,Majora Carter,Greening the ghetto,"In an emotionally charged talk, MacArthur-winn...",TED2006,0:18:36,en,6/27/06,"MacArthur grant,cities,green,activism,politics...",Majora Carter,PT18M36S,2006-06-27T00:11:00+00:00,2000421,"In an emotionally charged talk, MacArthur-winn...",If you're here today — and I'm very happy th...
4,66,https://www.ted.com/talks/ken_robinson_says_sc...,Ken Robinson,Do schools kill creativity?,Sir Ken Robinson makes an entertaining and pro...,TED2006,0:19:24,en,6/27/06,"children,teaching,creativity,parenting,culture...",Ken Robinson,PT19M24S,2006-06-27T00:11:00+00:00,51614087,Sir Ken Robinson makes an entertaining and pro...,Good morning. How are you? (Laughter) ...


And now to save to a CSV:

In [6]:
one_ring.to_csv('./TEDtalks_all.csv', index = False)

After the CSV file was written, I edited out the second line, which was a duplicate header list, with the headers from the original CSVs. 

## Compare the CSVs

It looks like the `transcripts` CSV contains 30 fewer talks, let's see if we can find out what those are.

The 30 fewer lines are lines I edited out.