## Summary: 
1 tutorial
2 case studies
2 libraies to play - TextBlob & spaCy



## TextBlob Basics: Play with Texts

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.


In [155]:
from textblob import TextBlob
sentext="Tong is so awesome a Data Emperor. Right?"
wiki=TextBlob(sentext)
print("1.tags:\n",wiki.tags)
print("\n2.Noun phrases:\n",wiki.noun_phrases)
print("\n3.Words:\n",wiki.words)
print("\n4.Sentences:\n",wiki.sentences)
print("\n5.Sentiment:\n",wiki.sentiment)
print("\n6.Count Tong:",wiki.words.count('Tong'))

1.tags:
 [('Tong', 'NNP'), ('is', 'VBZ'), ('so', 'RB'), ('awesome', 'JJ'), ('a', 'DT'), ('Data', 'NNP'), ('Emperor', 'NNP'), ('Right', 'NNP')]

2.Noun phrases:
 ['tong', 'data emperor', 'right']

3.Words:
 ['Tong', 'is', 'so', 'awesome', 'a', 'Data', 'Emperor', 'Right']

4.Sentences:
 [Sentence("Tong is so awesome a Data Emperor."), Sentence("Right?")]

5.Sentiment:
 Sentiment(polarity=0.6428571428571428, subjectivity=0.7678571428571428)

6.Count Tong: 1


In [156]:
for sentence in wiki.sentences:
    print(sentence.sentiment)

Sentiment(polarity=1.0, subjectivity=1.0)
Sentiment(polarity=0.2857142857142857, subjectivity=0.5357142857142857)


In [157]:
print(wiki.correct())

Long is so awesome a Data Emperor. Right?


In [158]:
wiki.parse()

'Tong/NNP/B-NP/O is/VBZ/B-VP/O so/RB/B-ADJP/O awesome/JJ/I-ADJP/O a/DT/B-NP/O Data/NNP/I-NP/O Emperor/NNP/I-NP/O ././O/O\nRight/RB/B-ADVP/O ?/./O/O'


## Case Study 1: Analyze Michelle's Writing Guide with a Super Simple Text Classifier

### Training DataSet Description & pre-processing

In [159]:
from textblob.classifiers import NaiveBayesClassifier
simple_traindata = [
     ('I love this sandwich.', 'pos'),
     ('this is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('this is my best work.', 'pos'),
     ("what an awesome view", 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this", 'neg'),
     ('he is my sworn enemy!', 'neg'),
     ('my boss is horrible.', 'neg')]

### Testing Dataset

In [160]:
test = [
     ('the beer was good.', 'pos'),
     ('I do not enjoy my job', 'neg'),
     ("I ain't feeling dandy today.", 'neg'),
     ("I feel amazing!", 'pos'),
     ('Gary is a friend of mine.', 'pos'),
     ("I can't believe I'm doing this.", 'neg'),
     ("This is not a good choice",'neg')]

### Train the classifiers with Naive Bayes

In [161]:
cl=NaiveBayesClassifier(simple_traindata)

After training the Naive Bayes model, we would like to test it with something.
As human being, we know the text below should be classified as neg. Lets see what the classifier says.

In [162]:
print("You say: ahamoment is not an amazing group!")
print("Classifier says: ",cl.classify("ahamoment is not an amazing group!"),)

You say: ahamoment is not an amazing group!
Classifier says:  pos


In [163]:
print("The overall test accuracy of this classifier: ",cl.accuracy(test))

The overall test accuracy of this classifier:  0.7142857142857143


An accuracy of 71.4% is aweful, really aweful. A small volumn of data contributes most to this low accuracy and 
therefore we might want to feed in more data and see if the posterior model works better or not.

In [164]:
new_data_1 = [('She is not my best friend.', 'neg'),
             ("I'm not happy to have a new friend.", 'neg'),
             ("Stay thirsty, my enermy.", 'neg'),
             ("He is not supposed to be from around here.", 'neg')]
cl.update(new_data_1)
cl.accuracy(test)

0.8571428571428571

Alright! The accuracy improves by 11%! Lets add some more data and retry.

In [169]:
new_data_2 = [('She is my best friend.', 'pos'),
             ("I'm happy to have a new friend.", 'pos'),
             ("Stay thirsty, my friend.", 'pos'),
             ("He loves me, Hehe.", 'neg')]
cl.update(new_data_2)
cl.accuracy(test)

1.0

In [170]:
from termcolor import colored
pos_pot=0
neg_pot=0
sub_pot=0
with open('comp_white_paper_guide.txt','r') as file:
    guide=TextBlob(file.read(),classifier=cl)
    for line in guide.sentences:
        classi=line.classify()
        subj_index=line.sentiment.subjectivity
        sub_pot+=subj_index
        if classi=="pos":
            pos_pot+=1
            print(colored('Positive', 'green'),"( subj:","{0:.2f}".format(subj_index),"):",line)
        else:
            neg_pot+=1
            print(colored('Negative', 'red'),"( subj:","{0:.2f}".format(subj_index),"):",line)   

[32mPositive[0m ( subj: 0.00 ): This Guide
This guide was compiled to assist TCube Solutions team members when writing company white papers.
[31mNegative[0m ( subj: 0.22 ): While this guide may offer generalities, white papers do not have a specific, exact format that is appropriate for every situation.
[32mPositive[0m ( subj: 0.48 ): Thusly, it is important to use your best judgement, and write a paper for your specific goals.
[32mPositive[0m ( subj: 0.00 ): Definition
A White Paper is a catch all term in business.
[32mPositive[0m ( subj: 0.70 ): In general, it is an authoritative report giving information or proposals on an issue.
[32mPositive[0m ( subj: 0.28 ): White papers have also been described as:
advanced problem solving guides
background reports
crossbreeds of magazine articles and brochures
persuasive essays that use facts/logic to promote a product, service, or viewpoint
This broad definition leaves white papers open to interpretation in the business community.


In [171]:
print("---------------Summary Report------------------")
print("This Guide has",pos_pot+neg_pot,"sentences.")
print("Positive Sentences:",pos_pot)
print("Negative Sentences:",neg_pot)
print("Subjective Index:","{0:.2f}".format(sub_pot/(pos_pot+neg_pot)))

---------------Summary Report------------------
This Guide has 200 sentences.
Positive Sentences: 107
Negative Sentences: 93
Subjective Index: 0.26



## Case Study 2: Analyze Michelle's Writing Guide with an Advanced Text Classifier

### Training DataSet Description & pre-processing
We are gonna use 'Sentiment Labelled Sentences Data set to train our model.
It contains sentences labelled with positive or negative sentiment. 

**Format**: sentence score 

**Details**: Score is either 1 (for positive) or 0 (for negative). The sentences come from three different 

**websites/fields**: imdb.com / amazon.com / yelp.com 

For each website, there exist 500 positive and 500 negative sentences. Those were selected randomly for larger datasets of reviews. 
We attempted to select sentences that have a clearly positive or negative connotaton, the goal was for no neutral sentences to be selected. 

In [168]:
from textblob.classifiers import NaiveBayesClassifier

### Testing Dataset