GitHub - nmcalow/Reddit-Webscraping-Analysis: Project for my Data Science Immersive

##Description:

This project is all about understanding Reddit's many twists and turns through webscraping, natural language processing, and model building. To this end I analyzed posts from r/tifu and r/NoStupidQuestions, returning with a perfect model, which I then made less perfect by intentionally removing some key features, and working the model again until it was mostly perfect.

Data Dictionary:

Name	Use
subreddit_tifu	1 for tifu, 0 for NoStupidQuestions
Id	Post ID to make sure there were no duplicates
Author	Author of the post at hand
selftext	text inside of post, if any
title	Title of post
text	combination of selftext and title, to feed into the model
line_count	Total number of times enter was pressed, denoted in the text as /n

See attached Executive Summary

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Images		Images
Modeling_EDA		Modeling_EDA
Workbooks		Workbooks
data		data
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
Executive Summary.ipynb		Executive Summary.ipynb
Modeling.ipynb		Modeling.ipynb
Presentation.odp		Presentation.odp
Presentation.pdf		Presentation.pdf
README.md		README.md
Requirements.txt		Requirements.txt
Workbook 2.ipynb		Workbook 2.ipynb
Workbook.ipynb		Workbook.ipynb
Workbook_Final.ipynb		Workbook_Final.ipynb
final.csv		final.csv
nsq.csv		nsq.csv
tifu.csv		tifu.csv
til.csv		til.csv

nmcalow/Reddit-Webscraping-Analysis

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages