This project explores how text similarity techniques can be used to identify relationships between animated television shows. By analyzing Wikipedia summaries of animated shows, the project measures how similar different shows are based on their descriptions.
Using TF-IDF vectorization and cosine similarity, the project identifies shows that share similar themes, vocabulary, and narrative elements. These methods are commonly used in content recommendation systems, such as those used by streaming platforms to suggest new shows to viewers.
Read the full Medium post here: 👉 [https://medium.com/@kousalyapotti/from-dora-to-batman-finding-similar-animated-tv-shows-with-nlp-a930ea117ae8]
-
Python
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
-
Libraries for Text Analysis
- TF-IDF vectorization
- cosine similarity
-
Data Sources
- Kaggle dataset of animated shows
- Wikipedia API