## NLP Testing Notebook
#### NLTK Position Tags:
```('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'), ('completely', 'RB'), ('different', 'JJ')```

*and* is CC, a coordinating conjunction; *now* and *completely* are RB, or adverbs; *for* is IN, a preposition; *something* is NN, a noun; and *different* is JJ, an adjective.

```('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP'), ('to', 'TO'), ('obtain', 'VB'), ('the', 'DT'), ('refuse', 'NN'), ('permit', 'NN')```

*refuse* and *permit* both appear as a present tense verb (VBP) and a noun (NN). E.g. *refUSE* is a verb meaning "deny," while *REFuse* is a noun meaning "trash"

In [5]:
from pattern.web import Twitter
from pattern.en import tag
from pattern.vector import NB, count
import sys
import time

twitter, classifier = Twitter(language="en"), NB(baseline="UNDEFINED")

def train_model(n_pts, search_terms, category, category_count):
    print("Training " + str(n_pts*100) + " data points for " + str(category))
    for i in range(1, n_pts):
        for tweet in twitter.search(search_terms, start=i, count=100):
            s = tweet.text.lower()
            p = category
            category_count+=1
            v = tag(s)
            v = [word for word, pos in v if (pos == "NN" or pos == "VB")]
            v = count(v) # {'sweet': 1}
            if v:
                classifier.train(v, type=p)
                sys.stdout.write('\r')
                sys.stdout.write(str(int(category_count/(n_pts*100)*100)) + "% : " + str(v))
    sys.stdout.write('\r')
    print("Finished!")
    print("Number of data points: " + str(category_count) + "\n")

In [6]:
n = 30
happy = 0
sad = 0

train_model(n, "shiok OR swee OR perfect OR #happy", "HAPPY", happy)
time.sleep(1)
train_model(n, "sian OR shag OR suay OR fml OR #sad", "SAD", sad)

Training 3000 data points for HAPPY
Finished!ttps://t.co/ynuox5emzp': 1, 'combination': 1, 'cocktails…': 1, 'rt': 1, '@imperialdurbar': 1, 'papper': 1, 'basil': 1, 'gin': 3}1, 'convince': 1, '@realskipbayless': 1, 'god': 1}nity': 1} 'https://t.co/tx7lfsddmu': 1}
Number of data points: 2795

Training 3000 data points for SAD
Finished!': 1, 'person': 1, 'look': 1, 'lot': 1, '@kelsey_maya': 1, 'rt': 1}…': 1, 'damage': 1, 'aim': 1} 'nfl': 1, '@realdonaldtrump': 1}uellersknife': 1, '@nbcnews': 1}': 1, 'amp': 1}': 1}sknife': 1}1}
Number of data points: 2900



In [7]:
def evaluate(word):
    category = classifier.classify(word)
    return ("The word " + str(word) + " is " + str(category))
    
words = ("pangseh","nasi lemak","food","breakfast","lunch","dinner","MRT","school","trip","work","home","family","garden","play","train","bus","KFC","SAF","book out","camp","army","navy","air force",)
    
for word in words:
    print(evaluate(word))

The word pangseh is SAD
The word nasi lemak is SAD
The word food is HAPPY
The word breakfast is HAPPY
The word lunch is SAD
The word dinner is SAD
The word MRT is SAD
The word school is SAD
The word trip is HAPPY
The word work is SAD
The word home is SAD
The word family is HAPPY
The word garden is SAD
The word play is SAD
The word train is SAD
The word bus is SAD
The word KFC is SAD
The word SAF is SAD
The word book out is HAPPY
The word camp is HAPPY
The word army is SAD
The word navy is HAPPY
The word air force is HAPPY
