This repository is an archive of a data science project conducted for UCSD's COG 108: Data Science in Practice Course.
Specific aims of this project were to:
- Learn relevant Data Science Python Packages to implement within an applied setting.
- Gain experience drafting research questions and gathering literature reviews.
- Gain experience working with a team with the goal of presenting an informed, well-drafted data science research project.
Personal Contributions to this project were:
- Data curration
- Data wrangling
- Self-taught web-scraping and multithreading
- Drafting and proofreading the Background, Hypothesis, Datasets, and Conclusions
Profanity has been a topic of taboo and interest throughout the years due to its inherent strength and meaning behind words. To answer the question as to whether social perception has changed over time in regards to profanity, we conducted research on the trends of profanity usage within movies over the past 20 years in relation to the performance of movies from their box office revenue, MPAA age ratings, and aggregated reviews from the public.
cleaned_data
Contains all of the output.csv
files from data wrangling which were utilized throughout different phases of the project.kaggle/input
Contains one of the main datasets used as a foundation from this project.scripts
Contains a compressed zip file of all the scripts which were used for this project. It contains movies from 1997 to 2017 which are labeled by their IMDB ID with the leading 0 excluded.- Loose files: These are the final outputs and presentation of the project.