Name: | Text Mining Project |
---|---|
Author: | Sparsh Bansal |
Version: | 3.0 |
Text Mining is a project in Software Design at Olin College of Engineering. It conducts the following analyses on a given text:
i: | Pickles the books from a given web link |
---|---|
ii: | Analysis 1 - Word Frequency Analysis |
iii: | Analysis 2 - Markov Analysis |
iv: | Analysis 3 - Sentiment Analysis |
Text Mining Version 3.0 requires the following Python packages
import pickle
import requests
import string
from string import punctuation
from string import whitespace
from bs4 import BeautifulSoup
import re
import sys
import random
import numpy as np
from nltk.sentiment.vader import SentimentIntensityAnalyzer
The easiest and fastest way to get the packages up and running:
import requests
print(requests.get('http://google.com').text)
python -m nltk.downloader all
I have added comments for every line of code that I felt could be beneficial for someone to understand the program
Note: I haved added comments especially on the imported packages and code so that I can fully understand the code written by someone else. I have cited the sources wherever appropriate.
I used information from:
i: | Think Python - Allen Downey |
---|---|
i: | Vader - NLTK Corpora |
Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014