Code and work for SP18 Intuit-ML@B Project on determining important personal finance terms and their relationships.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
EmbedRank.py
GridSearch.py
README.md
embedding.py
freq_thresholding.py
gridsearch_results_8k_words.csv
keyword_extraction.py
reddit.py
word2vec.py

README.md

intuit-mlab-sp18

Code and work for SP18 Intuit-ML@B Project on determining important personal finance terms. Implements methods described in this presentation. Method names in presentation correspond to files with same name. Data stored seperately on Google Drive and linked below.

General Reddit Comments Dataset: http://files.pushshift.io/reddit/comments/

Personal Finance Subreddit: https://www.reddit.com/r/personalfinance/

Reddit Scraper: https://praw.readthedocs.io/en/latest/

Scraped Personal Finance Results: https://drive.google.com/drive/folders/1CPqoFiJulPSL8mobby8cSH4lqi_XAMkX?usp=sharing

Our scraped results are stored as follows:

[ ...

{ 'title': 'title text',

'body': 'body text',

'flair': 'flair text',

'upvotes': num_upvotes,

'date': 'date string',

'comments': 'text of all comments', }

...]

NLP Intro: https://ocf.io/gkswamy/Intro_NLP_Intuit.pdf