#### Install and import NLTK

In [1]:
#!pip install nltk

In [2]:
import nltk

In [3]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Sanjana\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [4]:
from nltk.stem import PorterStemmer
from nltk.stem import LancasterStemmer

In [5]:
#create an object of class PorterStemmer
porter = PorterStemmer()
lancaster=LancasterStemmer()

#### PorterStemmer

- It uses set of rules to decide whether it is wise to strip a suffix. 
- Quite often does not generate words which are in dictionary.
- PorterStemmer is known for its simplicity and speed. 

In [6]:
#proide a word to be stemmed
print(porter.stem("cats"))
print(porter.stem("trouble"))
print(porter.stem("troubling"))
print(porter.stem("troubled"))

cat
troubl
troubl
troubl


#### LancasterStemmer

... is a simple, but heavy stemming due to iterations and over-stemming may occur. On each iteration, it tries to find an applicable rule by the last character of the word. Each rule specifies either a deletion or replacement of an ending. If there is no such rule, it terminates. Over-stemming causes the stems to be not linguistic, or they may have no meaning.

In [7]:
print(lancaster.stem("cats"))
print(lancaster.stem("trouble"))
print(lancaster.stem("troubling"))
print(lancaster.stem("troubled"))

cat
troubl
troubl
troubl


Porter vs Lancaster

In [8]:
#A list of words to be stemmed
word_list = ["friend", "friendship", "friends", "friendships","stabil","destabilize","misunderstanding","railroad","moonlight","football"]
print("{0:20}{1:20}{2:20}".format("Word","Porter Stemmer","lancaster Stemmer"))
for word in word_list:
    print("{0:20}{1:20}{2:20}".format(word,porter.stem(word),lancaster.stem(word)))

Word                Porter Stemmer      lancaster Stemmer   
friend              friend              friend              
friendship          friendship          friend              
friends             friend              friend              
friendships         friendship          friend              
stabil              stabil              stabl               
destabilize         destabil            dest                
misunderstanding    misunderstand       misunderstand       
railroad            railroad            railroad            
moonlight           moonlight           moonlight           
football            footbal             footbal             


#### Sentence stemming

In [9]:
sentence="Pythoners are very intelligent and work very pythonly and now they are pythoning their way to success."
porter.stem(sentence)

'pythoners are very intelligent and work very pythonly and now they are pythoning their way to success.'

In [10]:
from nltk.tokenize import sent_tokenize, word_tokenize
def stemSentence(sentence):
    token_words=word_tokenize(sentence)
    token_words
    stem_sentence=[]
    for word in token_words:
        stem_sentence.append(porter.stem(word))
        stem_sentence.append(" ")
    return "".join(stem_sentence)

x=stemSentence(sentence)
print(x)

python are veri intellig and work veri pythonli and now they are python their way to success . 


#### Using TextBlob

TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

https://textblob.readthedocs.io/en/dev/

In [11]:
from textblob import TextBlob

In [12]:
sent = TextBlob(sentence)

In [13]:
print(' '.join([porter.stem(word) for word in sent.words]))

python are veri intellig and work veri pythonli and now they are python their way to success


In [14]:
from afinn import Afinn
af = Afinn()

In [15]:
sentiment_scores = [af.score(c) for c in sent.split()]

In [16]:
sentiment_category = ['positive' if score > 0 
                          else 'negative' if score < 0 
                              else 'neutral' 
                                  for score in sentiment_scores]

In [17]:
print(sent)

Pythoners are very intelligent and work very pythonly and now they are pythoning their way to success.


In [18]:
import pandas as pd
df = pd.DataFrame([list(sent.split()), sentiment_scores, sentiment_category]).T

In [19]:
df.columns = ['sent', 'sentiment_score', 'sentiment_category']

In [20]:
df['sentiment_score'] = df.sentiment_score.astype('float')

In [21]:
df.groupby(by=['sentiment_category']).describe()

Unnamed: 0_level_0,sentiment_score,sentiment_score,sentiment_score,sentiment_score,sentiment_score,sentiment_score,sentiment_score,sentiment_score
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
sentiment_category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
neutral,15.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
positive,2.0,2.0,0.0,2.0,2.0,2.0,2.0,2.0
