# Ted Talks

This dataset has been adapted from the Kaggle [Ted Talks Dataset](https://www.kaggle.com/rounakbanik/ted-talks).  A full dataset description is available at the website. You will need to download the files `ted.csv` and `ted_tags.csv` in Brightspace.


## Context
These datasets contain information about all audio-video recordings of TED Talks uploaded to the official TED.com website until September 21st, 2017. The TED main dataset contains information about all talks including number of views, number of comments, descriptions, speakers and titles. The TED transcripts dataset contains the transcripts for all talks available on TED.com.

## Data Description

### ted.csv

*comments*
The number of first level comments made on the talk

*description*
A blurb of what the talk is about

*duration*
The duration of the talk in seconds

*event*
The TED/TEDx event where the talk took place

*film_date*
The Unix timestamp of the filming

*languages*
The number of languages in which the talk is available

*main_speaker*
The first named speaker of the talk

*name*
The official name of the TED Talk. Includes the title and the speaker.

*num_speaker*
The number of speakers in the talk

*published_date*
The Unix timestamp for the publication of the talk on TED.com

*speaker_occupation*
The occupation of the main speaker

*title*
The title of the talk

*url*
The URL of the talk

*views*
The number of views of the talk

### ratings.csv

A lookup table for ratings

*id*
The ID of the rating

*rating*
Rating description in English

### talk_ratings.csv

*id*
Rating ID (corresponds with the ID in *ratings.csv*

*title*
Title of the talk

*count*
The number of times this talk was awarded given this rating

### ted_tags.csv

*title*
Title of the talk

*tag*
Descriptive tag applied to the talk

1. Load the Ted Talks dataset (ted.csv)

In [4]:
ted <- read.csv("ted.csv")
head(ted)

Unnamed: 0_level_0,X,comments,description,duration,event,film_date,languages,main_speaker,name,num_speaker,published_date,speaker_occupation,title,url,views
Unnamed: 0_level_1,<int>,<int>,<chr>,<int>,<chr>,<int>,<int>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>
1,1,4553,Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,Author/educator,Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity,47227110
2,2,265,"With the same humor and humanity he exuded in ""An Inconvenient Truth,"" Al Gore spells out 15 ways that individuals can address climate change immediately, from buying a hybrid to inventing a new, hotter brand name for global warming.",977,TED2006,1140825600,43,Al Gore,Al Gore: Averting the climate crisis,1,1151367060,Climate advocate,Averting the climate crisis,https://www.ted.com/talks/al_gore_on_averting_climate_crisis,3200520
3,3,124,"New York Times columnist David Pogue takes aim at technology’s worst interface-design offenders, and provides encouraging examples of products that get it right. To funny things up, he bursts into song.",1286,TED2006,1140739200,26,David Pogue,David Pogue: Simplicity sells,1,1151367060,Technology columnist,Simplicity sells,https://www.ted.com/talks/david_pogue_says_simplicity_sells,1636292
4,4,200,"In an emotionally charged talk, MacArthur-winning activist Majora Carter details her fight for environmental justice in the South Bronx -- and shows how minority neighborhoods suffer most from flawed urban policy.",1116,TED2006,1140912000,35,Majora Carter,Majora Carter: Greening the ghetto,1,1151367060,Activist for environmental justice,Greening the ghetto,https://www.ted.com/talks/majora_carter_s_tale_of_urban_renewal,1697550
5,5,593,"You've never seen data presented like this. With the drama and urgency of a sportscaster, statistics guru Hans Rosling debunks myths about the so-called ""developing world.""",1190,TED2006,1140566400,48,Hans Rosling,Hans Rosling: The best stats you've ever seen,1,1151440680,Global health expert; data visionary,The best stats you've ever seen,https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen,12005869
6,6,672,"Tony Robbins discusses the ""invisible forces"" that motivate everyone's actions -- and high-fives Al Gore in the front row.",1305,TED2006,1138838400,36,Tony Robbins,Tony Robbins: Why we do what we do,1,1151440680,Life coach; expert in leadership psychology,Why we do what we do,https://www.ted.com/talks/tony_robbins_asks_why_we_do_what_we_do,20685401


2. How many speakers from each occupation have been appeared on a Ted Talk? What are the top 5 most popular occupations?

In [21]:
library('dplyr')
ted %>% group_by(speaker_occupation) %>%
        summarise(n = n(), 
                              .groups = 'drop') %>%
        arrange(desc(n)) %>% top_n(5)

[1m[22mSelecting by n


speaker_occupation,n
<chr>,<int>
Writer,45
Artist,34
Designer,34
Journalist,33
Entrepreneur,31


3. Create two subsets of the data into 2 groups, popular (top 25% of views) and niche (< top 25% of views)

In [42]:
# totalViews <- summarise()
totalViews <- (ted %>% summarise(totalViews = sum(views)))$totalViews
ted <- mutate(ted, percentViews = (views *100) / totalViews)
getSubsetBasedOnPercentViews <- function(data, top = TRUE, percent = 25) {
    arrangedData <- NULL
    if(top)
        arrangedData <- data %>% arrange(desc(percentViews))
    else 
        arrangedData <- data %>% arrange(percentViews)
    
    totalPercent <- 0
    howManyRows <- 0
    for(i in 1:nrow(arrangedData)) {
        row <- arrangedData[i,]
        totalPercent <- totalPercent + row$percentViews
        howManyRows <- i
        if(totalPercent >= percent) break
        # do stuff with row
    }
    subset <- arrangedData[1:howManyRows,]
    return (subset)
}
top25Percent <- getSubsetBasedOnPercentViews(ted, top = TRUE)
botton25Percent <- getSubsetBasedOnPercentViews(ted, top = FALSE)

# poppularTed <- 

4. Load the ted tags data (ted_tags.csv)

In [50]:
tedTags <- read.csv("ted_tags.csv")
head(tedTags)

Unnamed: 0_level_0,X,title,tags
Unnamed: 0_level_1,<int>,<chr>,<chr>
1,1,Do schools kill creativity?,children
2,2,Do schools kill creativity?,creativity
3,3,Do schools kill creativity?,culture
4,4,Do schools kill creativity?,dance
5,5,Do schools kill creativity?,education
6,6,Do schools kill creativity?,parenting


5. Count the number of rows in your main dataset, create a copy of your dataset and join in with the tags. How many rows do you have now? Why?

In [56]:
nrow(tedTags)
nrow(ted)
tedAndTedTags <- inner_join(ted, tedTags, by = "title")
head(tedAndTedTags)
nrow(tedAndTedTags)

Unnamed: 0_level_0,X.x,comments,description,duration,event,film_date,languages,main_speaker,name,num_speaker,published_date,speaker_occupation,title,url,views,percentViews,X.y,tags
Unnamed: 0_level_1,<int>,<int>,<chr>,<int>,<chr>,<int>,<int>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>,<dbl>,<int>,<chr>
1,1,4553,Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,Author/educator,Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity,47227110,1.09053,1,children
2,1,4553,Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,Author/educator,Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity,47227110,1.09053,2,creativity
3,1,4553,Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,Author/educator,Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity,47227110,1.09053,3,culture
4,1,4553,Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,Author/educator,Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity,47227110,1.09053,4,dance
5,1,4553,Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,Author/educator,Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity,47227110,1.09053,5,education
6,1,4553,Sir Ken Robinson makes an entertaining and profoundly moving case for creating an education system that nurtures (rather than undermines) creativity.,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,Author/educator,Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_schools_kill_creativity,47227110,1.09053,6,parenting


6. Are there any talks which don't have a corresponding entry in ted tags

In [57]:
nrow(tedTags) == nrow(tedAndTedTags)

7. Which is the most popular tag

In [59]:
tedAndTedTags %>% group_by(tags) %>%
        summarise(n = n(), 
                              .groups = 'drop') %>%
        arrange(desc(n)) %>% top_n(5)

[1m[22mSelecting by n


tags,n
<chr>,<int>
technology,727
science,567
globalissues,501
culture,486
TEDx,450


8. Calculate each event's percentage contribution to the total number of talks

In [78]:
tedEventsCounted <- tedAndTedTags %>% group_by(event) %>%
        summarise(totalTalksInEvent = n(), 
                              .groups = 'drop') %>%
        arrange(desc(totalTalksInEvent)) 
tedAndTedTagsWithEventsConted <- inner_join(tedAndTedTags, tedEventsCounted, by = "event")

tedAndTedTagsWithEventsConted <-
    mutate(tedAndTedTagsWithEventsConted, 
       eventPercentageContribution = (totalTalksInEvent * 100) / nrow(tedAndTedTagsWithEventsConted))
unique(select(tedAndTedTagsWithEventsConted, c(event, eventPercentageContribution))) %>%
arrange(desc(eventPercentageContribution))
    

event,eventPercentageContribution
<chr>,<dbl>
TED2016,5.377467
TED2017,3.628485
TEDGlobal 2012,3.064634
TED2009,2.902788
TEDSummit,2.594758
TED2015,2.542550
TED2014,2.344158
TED2007,2.338937
TED2012,2.213637
TEDGlobal 2013,2.156208
