Speech2Text

Using the Google Cloud Speech API to perform NLP and LDA on YouTube video transcriptions

The Challenge: So much of the language we process every day is heard, not written. When thinking of natural language processing, it's important to take that into account. This project serves as a proof of concept platform for accessing the other half of language.

The Toolkit:

scikit-learn
Google Cloud project
gensim
numpy
MongoDB
Google Cloud Speech API
Google Cloud Storage API

The Results: The topic extraction was accurate, but not insightful. Increasing the size of the data set or using more nuanced data could yield better results. One possible idea is transcribing stand up comedy over time and grouping comedians by time or topics.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
presentation		presentation
Part 0 - Audio processing.ipynb		Part 0 - Audio processing.ipynb
Part 1 - Preprocessing.ipynb		Part 1 - Preprocessing.ipynb
Part 2 - LDA.ipynb		Part 2 - LDA.ipynb
gs_paths.txt		gs_paths.txt
mongo.py		mongo.py
operation_1.pkl		operation_1.pkl
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech2Text

Using the Google Cloud Speech API to perform NLP and LDA on YouTube video transcriptions

About

Releases

Packages

Languages

galenballew/speech2text

Folders and files

Latest commit

History

Repository files navigation

Speech2Text

Using the Google Cloud Speech API to perform NLP and LDA on YouTube video transcriptions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages