Skip to content

rmelbardis/ObjectivelyFunny

Repository files navigation

Objectively Funny

This project was made by 4 Le Wagon Data Science students as our final project. We have sourced and processed a wide range of stand-up scripts to analyse comedians and make our own comedy.

Description

  • Constructed a scraper to source and clean 3.6 million words of stand-up comedy from a variety of online sources using BeautifulSoup, Requests, and Pandas.
  • Carried out a machine learning analysis using a Latent Dirichlet Allocation (LDA) on the processed dataset and used wordclouds to visualise results.
  • Finetuned a bot using GPT2 (gpt-2-simple, credit: Max Woolf) on the subset of female comedian scripts & generted entirely original stand-up comedy material from only a few words of text input. Created an API for the bot using FastAPI, Docker and Google Cloud Run.
  • Integrated and published the completed project on a public site using Heroku and Streamlit.

Link to Application

Streamlit App

Data

Sources

Data At a Glance

  • 555 individual transcripts
  • 268 comedians
  • 19 million characters
  • 3.6 million words

Tech Stack

  • Python
  • Jupyter Notebook
  • Requests + BeatuifulSoup
  • Pandas
  • NLTK
  • Gensim
  • GPT-2
  • Docker
  • Google Cloud AI Platform
  • Google Cloud Run
  • Heroku
  • Streamlit

Full list of python packages can be found in the requirements.txt file.

Authors

Version History

  • 0.1
    • Initial Release

Acknowledgments