ted-talks-analysis

Some exploratory data analysis of TED talk transcripts. I thought of this project so that I could work on work on:

Gameplan:

Scrape all (currently 2888) English transcripts of TED talks
Included data:
- Title (and talk ID)
- Speaker (and speaker ID)
- # of views on TED
- # of comments on TED
- date published
- TED tags (topics)
- original language (specific word usage will be skewed by translation)
- video length (to calculate rough wpm)
- Event (e.g. TEDx vs TED Global)
- Category (reader ratings, e.g. 'inspirational' or 'confusing')
Run some basic analysis (duration, length, common words, past/future tense, passive/active voice, etc)
Break into categories (such as top ranked 'inspirational') and try to use PCA to find the specific distinguishing words
Feed text into LSTM and churn out a sudo TED talk
Create a nice interactive visalization of results

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scrapes		scrapes
.gitignore		.gitignore
README.md		README.md
basic-stats.ipynb		basic-stats.ipynb

Provide feedback