YouTube is the 2nd most visited website on the internet and 73% of US adults have stated to using the website on a Pew Reearch survey, larger than all other social media platforms (source).
Sponsoring YouTube channels is one way for companies to tap into this large audience that can actually be targeted quite precisely - individual tastes vary and there are many channels with very specific audiences.
However, with thousands of channels with 1 million or more subscribers, it can be difficult to choose the right one to sponsor. Age, gender and geographic demographic information is available to the channel owners but not to outside marketers. Further, of the available services, none offer a glimpse into thoughts, interests and opinions of the most engaged users of these audiences: the commentors.
CommenTube offers channel recommendations based on what these commentors are saying:
Simply type in your a key word or even a whole sentence and CommenTube will reveal the channels with the greatest proportion of relevant comments related to your search term.
Switch tabs to see what the most relevant comments are.
Visit the website here
This repository contains the files on the AWS EC2 instance used to host the site and data exploration / production steps used to create the final product.
The python files used on the AWS EC2 instance.
- comm_chan_result.py: the workhorse of the website making calls to the PostgreSQL database for relevant comment and channel data based on the users input text.
- server.py: runs the Flask app and interfaces between the HTML file and comm_chan_result.py file when a user makes a submission of text
On my local machine contains the data that was gathered for this project. As the size of all files exceeds 2 GB this has been omitted from upload to GitHub
A range of Jupyter notebooks that tracks my journey including:
- gathering the data and formatting / turning it into .csv files,
- data exploration of comments
- processing/cleaning for NLP tasks
- searching for topics within the comments
- testing embeddings and search
- validation
A collection of scripts containing various python functions that were used for:
- YouTube comment, video, and channel data collection
- Creating word2vec embeddings
- Data visualization