Skip to content

Latest commit

 

History

History
36 lines (30 loc) · 1.3 KB

README.md

File metadata and controls

36 lines (30 loc) · 1.3 KB

Television Show Recommender System


Executive Summary


I analyzed the transcripts of 117,937 television episodes from 4,667 different television shows using Latent Dirichlet Allocation in order to find clusters of common language between different shows. and to then take those similarities to build a content based recommender for television shows.

System Requirements


  • Python==3.7.3
  • gensim==3.8.1
  • Flask==1.1.1
  • nltk==3.4.5
  • pandas==0.25.2
  • matplotlib==3.1.1
  • numpy==1.17.2
  • spacy==2.2.1
  • spacy-langdetect==0.1.2
  • beautifulsoup4==4.8.0

For Google Cloud Virtual Instance:

  • need Virtual Machine with at least 104 GBs of RAM
  • google-api-core==1.14.3
  • google-auth==1.7.1
  • google-auth-oauthlib==0.4.1
  • google-cloud==0.34.0
  • google-cloud-core==1.0.3
  • google-cloud-storage==1.23.0
  • google-pasta==0.1.8
  • google-resumable-media==0.5.0

How to Use this Repository


All final production code is in the final_code folder, while the development_code folder contains other pieces of code written during the project that ended up not being used to create the final result. The notebooks Python scripts are listed in chronological order. None of my final data is posted because of its size (2.6 GBs), but please contact me if you would like a copy!