<a href="https://colab.research.google.com/github/seunghyunmoon2/NLP/blob/master/NLP5_LDA/TextRank/Sentimentanalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LDA - gensim

> I will be using the Latent Dirichlet Allocation (LDA) from Gensim package along with the Mallet’s implementation (via Gensim). Mallet has an efficient implementation of the LDA. It is known to run faster and gives better topics segregation.

> We will also extract the volume and percentage contribution of each topic to get an idea of how important a topic is.

[reference](https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/)

In [None]:
# Latent Dirichlet Allocation (LDA) using gensim
# ----------------------------------------------
import numpy as np
import re
import pickle
from nltk.corpus import stopwords
from gensim import corpora
from gensim.models.ldamodel import LdaModel as LDA

#from sklearn.datasets import fetch_20newsgroups

# news data를 읽어와서 저장해 둔다.
#newsData = fetch_20newsgroups(shuffle=True, 
#                              random_state=1, 
#                              remove=('headers', 'footers', 'quotes'))

#with open('./dataset/news.data', 'wb') as f:
#    pickle.dump(newsData , f, pickle.HIGHEST_PROTOCOL)

# 저장된 news data를 읽어온다.
with open('./dataset/news.data', 'rb') as f:
    newsData  = pickle.load(f)

# 첫 번째 news를 조회해 본다.
news = newsData.data
print(len(news))
print(news[0])

# news 별로 분류된 target을 확인해 본다.
print(newsData.target_names)
print(len(newsData.target_names))

# preprocessing.
# 1. 영문자가 아닌 문자를 모두 제거한다.
news1 = []
for doc in news:
    news1.append(re.sub("[^a-zA-Z]", " ", doc))

# 2. 불용어를 제거하고, 모든 단어를 소문자로 변환하고, 길이가 3 이하인 
# 단어를 제거한다
stop_words = stopwords.words('english')
news2 = []
for doc in news1:
    doc1 = []
    for w in doc.split():
        w = w.lower()
        if len(w) > 3 and w not in stop_words:
            doc1.append(w)
    news2.append(doc1)
    
print(news2[0])

# doc2bow 생성
vocab = corpora.Dictionary(news2)
dict(list(vocab.items())[:10])
news_bow = [vocab.doc2bow(s) for s in news2]
print(news_bow[0])

# Latent Dirichlet Allocation (LDA)
# ---------------------------------
model = LDA(news_bow, 
            num_topics = len(newsData.target_names), 
            id2word=vocab)

# 문서 별 Topic 번호를 확인한다. (문서 10개만 확인)
doc_topic = model.get_document_topics(news_bow)
for i in range(10):
    dp = np.array(doc_topic[i])
    most_likely_topic = int(dp[np.argmax(dp[:, 1]), 0])
    print('문서-{:d} : topic = {:d}'.format(i, most_likely_topic))
    
# topic_term 행렬에서 topic 별로 중요 단어를 표시한다
topic_term = model.get_topic_terms(0, topn=10)
for i in range(len(newsData.target_names)):
    topic_term = model.get_topic_terms(i, topn=10)
    idx = [idx for idx, score in topic_term]
    print('토픽-{:2d} : '.format(i+1), end='')
    for n in idx:
        print('{:s} '.format(vocab[n]), end='')
    print()

# 문서별로 분류된 코드를 확인해 본다.
# x, y : 문서 번호
def checkTopic(x, y):
    print("문서 %d의 topic = %s" % (x, newsData.target_names[newsData.target[x]]))
    print("문서 %d의 topic = %s" % (y, newsData.target_names[newsData.target[y]]))

checkTopic(2, 5)
checkTopic(7, 9)

```
11314
Well i'm not sure about the story nad it did seem biased. What
I disagree with is your statement that the U.S. Media is out to
ruin Israels reputation. That is rediculous. The U.S. media is
the most pro-israeli media in the world. Having lived in Europe
I realize that incidences such as the one described in the
letter have occured. The U.S. media as a whole seem to try to
ignore them. The U.S. is subsidizing Israels existance and the
Europeans are not (at least not to the same degree). So I think
that might be a reason they report more clearly on the
atrocities.
	What is a shame is that in Austria, daily reports of
the inhuman acts commited by Israeli soldiers and the blessing
received from the Government makes some of the Holocaust guilt
go away. After all, look how the Jews are treating other races
when they got power. It is unfortunate.

['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc']
20
['well', 'sure', 'story', 'seem', 'biased', 'disagree', 'statement', 'media', 'ruin', 'israels', 'reputation', 'rediculous', 'media', 'israeli', 'media', 'world', 'lived', 'europe', 'realize', 'incidences', 'described', 'letter', 'occured', 'media', 'whole', 'seem', 'ignore', 'subsidizing', 'israels', 'existance', 'europeans', 'least', 'degree', 'think', 'might', 'reason', 'report', 'clearly', 'atrocities', 'shame', 'austria', 'daily', 'reports', 'inhuman', 'acts', 'commited', 'israeli', 'soldiers', 'blessing', 'received', 'government', 'makes', 'holocaust', 'guilt', 'away', 'look', 'jews', 'treating', 'races', 'power', 'unfortunate']
[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1), (8, 1), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1), (14, 1), (15, 1), (16, 1), (17, 1), (18, 1), (19, 1), (20, 1), (21, 2), (22, 2), (23, 1), (24, 1), (25, 1), (26, 1), (27, 1), (28, 1), (29, 4), (30, 1), (31, 1), (32, 1), (33, 1), (34, 1), (35, 1), (36, 1), (37, 1), (38, 1), (39, 1), (40, 1), (41, 1), (42, 2), (43, 1), (44, 1), (45, 1), (46, 1), (47, 1), (48, 1), (49, 1), (50, 1), (51, 1), (52, 1), (53, 1), (54, 1)]
문서-0 : topic = 0
문서-1 : topic = 8
문서-2 : topic = 18
문서-3 : topic = 7
문서-4 : topic = 3
문서-5 : topic = 8
문서-6 : topic = 11
문서-7 : topic = 8
문서-8 : topic = 2
문서-9 : topic = 11
토픽- 1 : people would armenian armenians government think said well even right 
토픽- 2 : filename orbit navy naval system earth lunar shar nuclear program 
토픽- 3 : drive disk card system hard drivers mouse also controller tape 
토픽- 4 : game entries period play request season first list send year 
토픽- 5 : space program would nasa data information also research national technology 
토픽- 6 : available information mail thanks software would version like know windows 
토픽- 7 : team games hockey year game play players teams kings last 
토픽- 8 : would going know president people think said like well time 
토픽- 9 : would people jesus know like believe think many time even 
토픽-10 : caps nords jets vpic batting strawberry impulse model salary reactions 
토픽-11 : file program windows files size entry info line remark know 
토픽-12 : would good time could like think much also back know 
토픽-13 : output entry widget char input rules stream build define open 
토픽-14 : printf easter kent screens allah doug symbol allocation cheers said 
토픽-15 : ground wire runs year neutral york panel like rockefeller wiring 
토픽-16 : plastic know like would yankees anyone price good enough paint 
토픽-17 : shipping sale offer also condition asking price like interested best 
토픽-18 : chip encryption keys system clipper drive would chips security government 
토픽-19 : israel jews israeli arab jewish would peace think palestinian time 
토픽-20 : window scsi display using windows color screen problem mode monitor 
문서 2의 topic = talk.politics.mideast
문서 5의 topic = soc.religion.christian
문서 7의 topic = talk.politics.mideast
문서 9의 topic = sci.electronics
```

## exercise

### generate document

In [None]:
import numpy as np
import pickle

theta = np.array([[0.05, 0.15, 0.50, 0.20, 0.10],
                  [0.05, 0.05, 0.20, 0.10, 0.60],
                  [0.60, 0.15, 0.05, 0.10, 0.10]])

beta = np.array([[0.1, 0.1, 0.2, 0.2, 0.2, 0.1, 0.05, 0.05],
                 [0.15, 0.25, 0.2, 0.05, 0.05, 0.15, 0.1, 0.05],
                 [0.1, 0.2, 0.25, 0.05, 0.25, 0.05, 0.05, 0.05],
                 [0.2, 0.2, 0.05, 0.05, 0.15, 0.15, 0.15, 0.05],
                 [0.1, 0.2, 0.25, 0.1, 0.1, 0.05, 0.15, 0.05]])

w1 = ["분기", "매출", "영업이익", "영업외이익", "실적", "증가", "감소", "계약"]
w2 = ["수주", "계약", "발주", "호재", "기대", "매출", "실적", "약세"]
w3 = ["주가", "약세", "하락", "부정적", "약화", "예상", "감소", "영업이익"]
w4 = ["원화", "환율", "영업이익", "매출", "강세", "약세", "달러", "발행"]
w5 = ["자본", "신주", "발행", "전환", "보통주", "기준일", "배정", "발주"]
words = [w1, w2, w3, w4, w5]

sentences = []
topic = []
for i in range(500):
    # theta 1개를 선택한다.
    t = np.random.choice([0,1,2])
    
    # 선택된 theta로 단어 = 50개 짜리 문서를 생성한다.
    doc = []
    for j in range(50):
        # 다항분포로 토픽 1개를 선택한다.
        x = np.random.multinomial(1, theta[t])
        z = np.argmax(x)  # 토픽 번호
        
        # z 번째 토픽으로 부터 1개의 단어 (word)를 선택한다.
        x = np.random.multinomial(1, beta[z])   # 다항분포 샘플링 (1개)
        w = np.argmax(x)  # 단어 번호
        word = words[z][w]
        doc.append(word)
    sentences.append(doc)
    
with open('dataset/8-6-1.genDoc.pickle', 'wb') as f:
    pickle.dump(sentences, f, pickle.HIGHEST_PROTOCOL)

###  predict LDA

In [None]:
# Latent Dirichlet Allocation (LDA) using gensim
# ----------------------------------------------
import numpy as np
import re
import pickle
from gensim import corpora
from gensim.models.ldamodel import LdaModel as LDA

with open('dataset/8-6-1.genDoc.pickle', 'rb') as f:
    newsData  = pickle.load(f)

# doc2bow 생성
vocab = corpora.Dictionary(newsData)
news_bow = [vocab.doc2bow(s) for s in newsData]

# Latent Dirichlet Allocation (LDA)
model = LDA(news_bow, num_topics = 5, id2word=vocab)

# 문서 별 Topic 번호를 확인한다. (문서 10개만 확인)
doc_topic = model.get_document_topics(news_bow)
for i in range(10):
    dp = np.array(doc_topic[i])
    most_likely_topic = int(dp[np.argmax(dp[:, 1]), 0])
    print('문서-{:d} : topic = {:d}'.format(i, most_likely_topic))
    
# topic_term 행렬에서 topic 별로 중요 단어를 표시한다
for i in range(5):
    topic_term = model.get_topic_terms(i, topn=6)
    idx = [idx for idx, score in topic_term]
    print('토픽-{:2d} : '.format(i+1), end='')
    for n in idx:
        print('{:s} '.format(vocab[n]), end='')
    print()

```
문서-0 : topic = 1
문서-1 : topic = 2
문서-2 : topic = 1
문서-3 : topic = 1
문서-4 : topic = 1
문서-5 : topic = 0
문서-6 : topic = 0
문서-7 : topic = 4
문서-8 : topic = 2
문서-9 : topic = 4
토픽- 1 : 실적 영업외이익 영업이익 발행 계약 매출 
토픽- 2 : 발행 신주 전환 배정 보통주 자본 
토픽- 3 : 약화 하락 약세 발행 발주 계약 
토픽- 4 : 약세 발행 하락 약화 신주 배정 
토픽- 5 : 실적 매출 영업이익 약세 영업외이익 계약 
```

# PaperRank algorithm

> PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages.

![PR](https://drive.google.com/uc?export=view&id=1mN27Gdo4KrgOc-yOC9LtyvF2-XLNvAPW)

In [None]:
# 'summarizer' module provides functions for summarizing texts. 
# Summarizing is based on ranks of text sentences using a variation 
# of the TextRank algorithm.
#
# Federico Barrios, et, al., 2016, Variations of the Similarity Function 
# of TextRank for Automated Summarization, https://arxiv.org/abs/1602.03606
#
# Barrios는 tfidf 대신 BM25, BM25+를 사용했고, cosine similarity를 사용했다.
# gensim.summarizer도 Barrios의 TextRank를 사용한다.
from gensim.summarization.summarizer import summarize

text = \
'''Rice Pudding - Poem by Alan Alexander Milne
What is the matter with Mary Jane?
She's crying with all her might and main,
And she won't eat her dinner - rice pudding again -
What is the matter with Mary Jane?
What is the matter with Mary Jane?
I've promised her dolls and a daisy-chain,
And a book about animals - all in vain -
What is the matter with Mary Jane?
What is the matter with Mary Jane?
She's perfectly well, and she hasn't a pain;
But, look at her, now she's beginning again! -
What is the matter with Mary Jane?
What is the matter with Mary Jane?
I've promised her sweets and a ride in the train,
And I've begged her to stop for a bit and explain -
What is the matter with Mary Jane?
What is the matter with Mary Jane?
She's perfectly well and she hasn't a pain,
And it's lovely rice pudding for dinner again!
What is the matter with Mary Jane?
'''
# ratio (float, optional) – Number between 0 and 1 that determines the 
# proportion of the number of sentences of the original text to be chosen 
# for the summary.
s = summarize(text, ratio = 0.2)
print(s)

```
And she won't eat her dinner - rice pudding again -
I've promised her dolls and a daisy-chain,
I've promised her sweets and a ride in the train,
And it's lovely rice pudding for dinner again!
```

# TextRank algorithm

> TextRank is a general purpose, graph based ranking algorithm for NLP. TextRank is an automatic summarisation technique. Graph-based ranking algorithms are a way for deciding the importance of a vertex within a graph, based on global information recursively drawn from the entire graph.

![textalgo](https://drive.google.com/uc?export=view&id=1STAL53o1DiWMhOJM7D56odyWEX2YdYBC)

In [None]:
# 인공지능이 생성한 논문 사이트에서 논문 한 편을 읽어와서 주요 문장을
# 추출한다.
from gensim.summarization import summarize
from bs4 import BeautifulSoup
import requests

url = 'http://scigen.csail.mit.edu/scicache/269/scimakelatex.25977.Admoni.Moskalskaia.Schendels.html'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
data = soup.get_text()
print(data[:2000])

pos1 = data.find('Introduction') + len("Introduction")
pos2 = data.find("Related Work")

text = data[pos1:pos2].strip()
summary = summarize(text, ratio=0.1)
print(text)

print("PAPER URL: \n{}\n".format(url))
print("GENERATED SUMMARY: \n{}".format(summary))
print()

```
GENERATED SUMMARY: 
We emphasize that our heuristic develops
a cycle of four phases: allowance, evaluation, investigation, and
motivate the need for wide-area networks [2].
can be applied to the exploration of local-area networks.
same lines, we prove the development of linked lists.
```

# Anaphora Resolution

> Anaphora resolution (AR) which most commonly appears as pronoun resolution is the problem of resolving references to earlier or later items in the discourse. These items are usually noun phrases representing objects in the real world called referents but can also be verb phrases, whole sentences or paragraphs.


In [None]:
# Anaphora resolution 예시
import nltk
from nltk.chunk import tree2conlltags
from nltk.corpus import names
import random

# name의 마지막 철자를 리턴한다.
def feature(word):
    return {'last(1)' : word[-1]}

# name corpus를 읽어온다.
males = [(name, 'male') for name in names.words('male.txt')]
females = [(name, 'female') for name in names.words('female.txt')]

print(males[:10])
print(females[:10])

combined = males + females
random.shuffle(combined)

# supervised learning용 학습  데이터를 생성한다.
# 이름의 마지막 철자로 성별 (male or female)을 학습하기 위한 것이다.
training = [(feature(name), gender) for (name, gender) in combined]
print(training[:10])

# Naive Bayes로 학습한다.
classifier = nltk.NaiveBayesClassifier.train(training)

sentences = [
    "John is a man. He walks",
    "John and Mary are married. They have two kids",
    "In order for Ravi to be successful, he should follow John",
    "John met Mary in Barista. She asked him to order a Pizza"
]

# name의 마지막 철자로 성별을 예상한다.
def gender(word):
    return classifier.classify(feature(word))

# 문장을 chunk로 분해해서 사람과 연관된 대명사를 찾는다.
for sent in sentences:
    chunks = nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent)), binary=False)
    stack = []
    print(sent)
    items = tree2conlltags(chunks)
    for item in items:
        if item[1] == 'NNP' and (item[2] == 'B-PERSON' or item[2] == 'O'):
            stack.append((item[0], gender(item[0])))
        elif item[1] == 'CC':
            stack.append(item[0])
        elif item[1] == 'PRP':
            stack.append(item[0])
    print("\t {}".format(stack))

print(items)
print(chunks)

* output
```
[('Aamir', 'male'), ('Aaron', 'male'), ('Abbey', 'male'), ('Abbie', 'male'), ('Abbot', 'male'), ('Abbott', 'male'), ('Abby', 'male'), ('Abdel', 'male'), ('Abdul', 'male'), ('Abdulkarim', 'male')]
[('Abagael', 'female'), ('Abagail', 'female'), ('Abbe', 'female'), ('Abbey', 'female'), ('Abbi', 'female'), ('Abbie', 'female'), ('Abby', 'female'), ('Abigael', 'female'), ('Abigail', 'female'), ('Abigale', 'female')]
[({'last(1)': 'e'}, 'female'), ({'last(1)': 'l'}, 'female'), ({'last(1)': 'l'}, 'female'), ({'last(1)': 'd'}, 'male'), ({'last(1)': 'r'}, 'female'), ({'last(1)': 'e'}, 'female'), ({'last(1)': 'n'}, 'male'), ({'last(1)': 'g'}, 'male'), ({'last(1)': 'y'}, 'male'), ({'last(1)': 'd'}, 'male')]
John is a man. He walks
	 [('John', 'male'), 'He']
John and Mary are married. They have two kids
	 [('John', 'male'), 'and', ('Mary', 'female'), 'They']
In order for Ravi to be successful, he should follow John
	 [('Ravi', 'female'), 'he', ('John', 'male')]
John met Mary in Barista. She asked him to order a Pizza
	 [('John', 'male'), ('Mary', 'female'), 'She', 'him']
[('John', 'NNP', 'B-PERSON'), ('met', 'VBD', 'O'), ('Mary', 'NNP', 'O'), ('in', 'IN', 'O'), ('Barista', 'NNP', 'B-GPE'), ('.', '.', 'O'), ('She', 'PRP', 'O'), ('asked', 'VBD', 'O'), ('him', 'PRP', 'O'), ('to', 'TO', 'O'), ('order', 'NN', 'O'), ('a', 'DT', 'O'), ('Pizza', 'NN', 'O')]
(S
  (PERSON John/NNP)
  met/VBD
  Mary/NNP
  in/IN
  (GPE Barista/NNP)
  ./.
  She/PRP
  asked/VBD
  him/PRP
  to/TO
  order/NN
  a/DT
  Pizza/NN)
```

# WSD - Word Sense Disabiguation

> In natural language processing, word sense disambiguation (WSD) is the problem of determining which "sense" (meaning) of a word is activated by the use of the word in a particular context, a process which appears to be largely unconscious in people. WSD is a natural classification problem: Given a word and its possible senses, as defined by a dictionary, classify an occurrence of the word in context into one or more of its sense classes. The features of the context (such as neighboring words) provide the evidence for classification.

In [None]:
# Word Sense Disambiguation (WSD)
import nltk

def understandWordSenseExamples():
    words = ['wind', 'date', 'left']
    print("-- examples --")
    for word in words:
        syns = nltk.corpus.wordnet.synsets(word)
        for syn in syns[:2]:
            for example in syn.examples()[:2]:
                print("{} -> {} -> {}".format(word, syn.name(), example))

understandWordSenseExamples()

def understandBuiltinWSD():
    print("-- built-in wsd --")
    maps = [
        ('Is it the fish net that you are using to catch fish ?', 'fish', 'n'),
        ('Please dont point your finger at others.', 'point', 'n'),
        ('I went to the river bank to see the sun rise', 'bank', 'n'),
    ]
    for m in maps:
        print("Sense '{}' for '{}' -> '{}'".format(m[0], m[1], 
              nltk.wsd.lesk(m[0], m[1], m[2])))

understandBuiltinWSD()

nltk.corpus.wordnet.synsets('fish')
nltk.corpus.wordnet.synset('pisces.n.02').lemma_names()
nltk.corpus.wordnet.synset('pisces.n.02').definition()

* output   
```
# examples 
wind -> wind.n.01 -> trees bent under the fierce winds
wind -> wind.n.01 -> when there is no wind, row
wind -> wind.n.02 -> the winds of change
date -> date.n.01 -> what is the date today?
date -> date.n.02 -> his date never stopped talking
left -> left.n.01 -> she stood on the left
# built-in wsd 
Sense 'Is it the fish net that you are using to catch fish ?' for 'fish' -> 'Synset('pisces.n.02')'
Sense 'Please dont point your finger at others.' for 'point' -> 'Synset('point.n.25')'
Sense 'I went to the river bank to see the sun rise' for 'bank' -> 'Synset('savings_bank.n.02')'
```

# Sentiment analysis

> Natural Language Processing (NLP) is a field at the intersection of computer science, artificial intelligence, and linguistics. The goal is for computers to process or “understand” natural language in order to perform various human like tasks like language translation or answering questions.

In [None]:
# 감정 분석
import nltk
import nltk.sentiment.sentiment_analyzer

def wordBasedSentiment():
    positive_words = ['love', 'hope', 'joy']
    text = 'Rainfall this year brings lot of hope and joy to Farmers.'.split()
    analysis = nltk.sentiment.util.extract_unigram_feats(text, positive_words)
    print(' -- single word sentiment --')
    print(analysis)
    
def multiWordBasedSentiment():
    word_sets = [('heavy', 'rains'), ('flood', 'bengaluru')]
    text = 'heavy rains cause flash flooding in bengaluru'.split()
    analysis = nltk.sentiment.util.extract_bigram_feats(text, word_sets)
    print(' -- multi word sentiment --')
    print(analysis)

def markNegativity(text):
    negation = nltk.sentiment.util.mark_negation(text.split())
    print(' -- negativity --')
    print(negation)

wordBasedSentiment()
multiWordBasedSentiment()

# 주어진 문장에서 부정적 의미를 가진 모든 단어에 대해 접미사 _NEG를 표시한다.
markNegativity('Rainfall last year did not bring joy to Farmers')
markNegativity("I didn't like this movie . It was bad.")

output

```
 -- single word sentiment --
{'contains(love)': False, 'contains(hope)': True, 'contains(joy)': True}
 -- multi word sentiment --
{'contains(heavy - rains)': True, 'contains(flood - bengaluru)': False}
 -- negativity --
['Rainfall', 'last', 'year', 'did', 'not', 'bring_NEG', 'joy_NEG', 'to_NEG', 'Farmers_NEG']
 -- negativity --
['I', "didn't", 'like_NEG', 'this_NEG', 'movie_NEG', '.', 'It', 'was', 'bad.']
```



In [None]:
# VADER-Sentiment-Analysis
import nltk
import nltk.sentiment.util
import nltk.sentiment.sentiment_analyzer
from nltk.sentiment.vader import SentimentIntensityAnalyzer

nltk.downloader.download('vader_lexicon', download_dir='./dataset/')

def mySentimentAnalyzer():
    def score_feedback(text):
        positive_words = ['love', 'genuine', 'liked']
        if '_NEG' in ' '.join(nltk.sentiment.util.mark_negation(text.split())):
            score = -1
        else:
            analysis = nltk.sentiment.util.extract_unigram_feats(text.split(), positive_words)
            if True in analysis.values():
                score = 1
            else:
                score = 0
        return score

    feedback = """I love the items in this shop, very genuine and quality is well maintained.
    I have visited this shop and had samosa, my friends liked it very much.
    ok average food in this shop.
    Fridays are very busy in this shop, do not place orders during this day."""
    
    print(' -- custom scorer --')
    for text in feedback.split("\n"):
        print("score = {} for >> {}".format(score_feedback(text), text))

def advancedSentimentAnalyzer():
    sentences = [
        ':)',
        ':(',
        'She is so :(',
        'I love the way cricket is played by the champions',
        'She neither likes coffee nor tea',
    ]
    
    senti = SentimentIntensityAnalyzer()
    print(' -- built-in intensity analyser --')
    for sentence in sentences:
        print('[{}]'.format(sentence), end=' --> ')
        kvp = senti.polarity_scores(sentence)
        for k in kvp:
            print('{} = {}, '.format(k, kvp[k]), end='')
        print()

mySentimentAnalyzer()
advancedSentimentAnalyzer()

output
```
 -- single word sentiment --
{'contains(love)': False, 'contains(hope)': True, 'contains(joy)': True}
 -- multi word sentiment --
{'contains(heavy - rains)': True, 'contains(flood - bengaluru)': False}
 -- negativity --
['Rainfall', 'last', 'year', 'did', 'not', 'bring_NEG', 'joy_NEG', 'to_NEG', 'Farmers_NEG']
 -- negativity --
['I', "didn't", 'like_NEG', 'this_NEG', 'movie_NEG', '.', 'It', 'was', 'bad.']
```