This project is a part of the Lambda School Build Week April 2019
-- Project Status: [Completed]
Post Here helps you find the best place to share your post on Reddit. The user enters their post and Post Here finds the subreddit that is most appropriate for that post using data science methods.
Natural Language Processing Predictive Modeling
- AWS
- PRAW
- SpaCy
- Sci-kit Learn
- NLTK
- String
- Flask
- Pickle
- Re
- Pandas
- URLlib
Frontend developer to provide a layout for theme/color scheme, integrating content and creating user form for input.
Backend developer for securing endpoints, authenticated login, and routing user input and output of DS API.
Access to AWS to host Flask app with API and Reddit post data.
Data exploration/descriptive statistics
Data processing/cleaning
Word vectorizing
Statistical modeling
Clone this repo
If you want more recent data you will need to:
-
Request access to Reddit API: https://docs.google.com/forms/d/e/1FAIpQLSezNdDNK1-P8mspSbmtC2r86Ee9ZRbC66u929cG2GX0T9UMyw/viewform
-
Pull posts from the Reddit API using PRAW and store in a database or a csv file.
-
Retrain the model and create a new .pkl file.
You can host the Flask app locally or use another platform like AWS.