<a href="https://colab.research.google.com/github/solankiharsh/NLP_Snippets/blob/master/NLP_Exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import nltk and download the ‘stopwords’ and ‘punkt’ packages

In [1]:
# Import nltk and download the ‘stopwords’ and ‘punkt’ packages

In [2]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

# Import spacy and load the language model


In [3]:
import spacy
nlp=spacy.load("en_core_web_sm")
nlp

<spacy.lang.en.English at 0x7f57c630fba8>

# How to tokenize a given text?


In [4]:
text='''
US Presidential Election 2020 LIVE Updates: Former President Barack Obama returned to the campaign trail on Wednesday with a blistering attack on Donald Trump with less than two weeks to go before the Republican president's Election Day face-off with Democratic nominee Joe Biden. Speaking at a drive-in rally in Philadelphia on behalf of Biden, his former vice president, and Democratic running mate Kamala Harris, Obama offered his fiercest critique yet of his successor. He took aim at Trump's divisive rhetoric, his track record in the Oval Office and his habit of re-tweeting conspiracy theories. "With Joe and Kamala at the helm, you’re not going to have to think about the crazy things they said every day," Obama said. "And that’s worth a lot. You’re not going to have to argue about them every day. It just won’t be so exhausting.'''


In [5]:
# Tokeniation with nltk
tokens = nltk.word_tokenize(text)
for token in tokens:
    print(token)

US
Presidential
Election
2020
LIVE
Updates
:
Former
President
Barack
Obama
returned
to
the
campaign
trail
on
Wednesday
with
a
blistering
attack
on
Donald
Trump
with
less
than
two
weeks
to
go
before
the
Republican
president
's
Election
Day
face-off
with
Democratic
nominee
Joe
Biden
.
Speaking
at
a
drive-in
rally
in
Philadelphia
on
behalf
of
Biden
,
his
former
vice
president
,
and
Democratic
running
mate
Kamala
Harris
,
Obama
offered
his
fiercest
critique
yet
of
his
successor
.
He
took
aim
at
Trump
's
divisive
rhetoric
,
his
track
record
in
the
Oval
Office
and
his
habit
of
re-tweeting
conspiracy
theories
.
``
With
Joe
and
Kamala
at
the
helm
,
you
’
re
not
going
to
have
to
think
about
the
crazy
things
they
said
every
day
,
''
Obama
said
.
``
And
that
’
s
worth
a
lot
.
You
’
re
not
going
to
have
to
argue
about
them
every
day
.
It
just
won
’
t
be
so
exhausting
.


In [6]:
# Tokeniation with spaCy
nlp = spacy.load("en_core_web_sm")
doc=nlp(text)
for token in doc: 
    print(token.text)



US
Presidential
Election
2020
LIVE
Updates
:
Former
President
Barack
Obama
returned
to
the
campaign
trail
on
Wednesday
with
a
blistering
attack
on
Donald
Trump
with
less
than
two
weeks
to
go
before
the
Republican
president
's
Election
Day
face
-
off
with
Democratic
nominee
Joe
Biden
.
Speaking
at
a
drive
-
in
rally
in
Philadelphia
on
behalf
of
Biden
,
his
former
vice
president
,
and
Democratic
running
mate
Kamala
Harris
,
Obama
offered
his
fiercest
critique
yet
of
his
successor
.
He
took
aim
at
Trump
's
divisive
rhetoric
,
his
track
record
in
the
Oval
Office
and
his
habit
of
re
-
tweeting
conspiracy
theories
.
"
With
Joe
and
Kamala
at
the
helm
,
you
’re
not
going
to
have
to
think
about
the
crazy
things
they
said
every
day
,
"
Obama
said
.
"
And
that
’s
worth
a
lot
.
You
’re
not
going
to
have
to
argue
about
them
every
day
.
It
just
wo
n’t
be
so
exhausting
.


# How to get the sentences of a text document ?


In [7]:
text="""Hrithik Roshan’s mother Pinkie Roshan recently took to her Instagram handle to share a cryptic post on Sushant Singh Rajput. Along with a picture of the actor, she shared a post that read, ‘Everyone wants to know the truth but no one wants to be honest.’ """

In [8]:
# Tokenizing the text into sentences with spaCy
doc=nlp(text)
for sentence in doc.sents:
    print(sentence)
    print(' ')

Hrithik Roshan’s mother Pinkie Roshan recently took to her Instagram handle to share a cryptic post on Sushant Singh Rajput.
 
Along with a picture of the actor, she shared a post that read, ‘Everyone wants to know the truth but no one wants to be honest.’
 


In [9]:
# Extracting sentences with nltk
nltk.sent_tokenize(text)

['Hrithik Roshan’s mother Pinkie Roshan recently took to her Instagram handle to share a cryptic post on Sushant Singh Rajput.',
 'Along with a picture of the actor, she shared a post that read, ‘Everyone wants to know the truth but no one wants to be honest.’']

# How to tokenize a text using the `transformers` package ?


In [10]:
text='''The festivities season has kicked-in the country and people are going out in larg numbers to do their festival shopping. "Most of us are committed to our responsibilities and moving out of our homes to do our duties. In the times of festivals, streets are seeing increased activity. But we have to remember that lockdown may have gone but the virus has not gone away. The situation that we have reached in seven-eight months should not be allowed to impacted adversely," PM Modi said. He cautioned that carelessness can impact the country's fight against the pandemic.

'''

In [13]:
# Import tokenizer from transfromers
#!pip install transformers
from transformers import AutoTokenizer

# Initialize the tokenizer
tokenizer=AutoTokenizer.from_pretrained('bert-base-uncased')

# Encoding with the tokenizer
inputs=tokenizer.encode(text)
print(inputs)
tokenizer.decode(inputs)

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/2c/4e/4f1ede0fd7a36278844a277f8d53c21f88f37f3754abf76a5d6224f76d4a/transformers-3.4.0-py3-none-any.whl (1.3MB)
[K     |████████████████████████████████| 1.3MB 2.8MB/s 
Collecting tokenizers==0.9.2
[?25l  Downloading https://files.pythonhosted.org/packages/7c/a5/78be1a55b2ac8d6a956f0a211d372726e2b1dd2666bb537fea9b03abd62c/tokenizers-0.9.2-cp36-cp36m-manylinux1_x86_64.whl (2.9MB)
[K     |████████████████████████████████| 2.9MB 17.1MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 43.9MB/s 
Collecting sentencepiece!=0.1.92
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |██

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…


[101, 4787, 3972, 5714, 3110, 11480, 9247, 5714, 2002, 3972, 5714, 11441, 2651, 9247, 5714, 2002, 2763, 3972, 5714, 3972, 5714, 2190, 2711, 1045, 2113, 9247, 5714, 102]


'[CLS] walter delim feeling anxiousdelim he delim diagnosed todaydelim he probably delim delim best person i knowdelim [SEP]'

# How to tokenize text with stopwords as delimiters?


In [14]:
text = "Walter was feeling anxious. He was diagnosed today. He probably is the best person I know."

stop_words_and_delims = ['was', 'is', 'the', '.', ',', '-', '!', '?']
for r in stop_words_and_delims:
    text = text.replace(r, 'DELIM')

words = [t.strip() for t in text.split('DELIM')]
words_filtered = list(filter(lambda a: a not in [''], words))
words_filtered

['Walter',
 'feeling anxious',
 'He',
 'diagnosed today',
 'He probably',
 'best person I know']

# How to remove stop words in a text ?


In [16]:
text=""""As soon as a COVID-19 vaccine is available for production at a mass scale, every person in Bihar will get free vaccination. This is the first promise mentioned in our poll manifesto," Union Finance Minister Nirmala Sitharaman said today, announcing the BJP's manifesto for the Bihar election.

 """


In [17]:
# Removing stopwords in nltk

from nltk.corpus import stopwords
my_stopwords=set(stopwords.words('english'))
new_tokens=[]

# Tokenization using word_tokenize()
all_tokens=nltk.word_tokenize(text)

for token in all_tokens:
  if token not in my_stopwords:
    new_tokens.append(token)


" ".join(new_tokens)

"`` As soon COVID-19 vaccine available production mass scale , every person Bihar get free vaccination . This first promise mentioned poll manifesto , '' Union Finance Minister Nirmala Sitharaman said today , announcing BJP 's manifesto Bihar election ."

In [18]:
# Method 2
# Removing stopwords in spaCy

doc=nlp(text)
new_tokens=[]

# Using is_stop attribute of each token to check if it's a stopword
for token in doc:
  if token.is_stop==False:
    new_tokens.append(token.text)

" ".join(new_tokens)

'" soon COVID-19 vaccine available production mass scale , person Bihar free vaccination . promise mentioned poll manifesto , " Union Finance Minister Nirmala Sitharaman said today , announcing BJP manifesto Bihar election . \n\n '

# How to add custom stop words in spaCy ?


In [19]:
text=" Jonas was a JUNK great guy NIL Adam was evil NIL Martha JUNK was more of a fool "

In [20]:
# list of custom stop words
customize_stop_words = ['NIL','JUNK']

# Adding these stop words
for w in customize_stop_words:
    nlp.vocab[w].is_stop = True
doc = nlp(text)
tokens = [token.text for token in doc if not token.is_stop]

" ".join(tokens)

'  Jonas great guy Adam evil Martha fool'

# How to remove punctuations ?


In [21]:
text="The match has concluded !!! India has won the match . Will we fin the finals too ? !"

In [22]:
# Removing punctuations in spaCy

doc=nlp(text)
new_tokens=[]
# Check if a token is a punctuation through is_punct attribute
for token in doc:
  if token.is_punct==False:
    new_tokens.append(token.text)

" ".join(new_tokens)

'The match has concluded India has won the match Will we fin the finals too'

In [23]:
# Method 2
# Removing punctuation in nltk with RegexpTokenizer

tokenizer=nltk.RegexpTokenizer(r"\w+")

tokens=tokenizer.tokenize(text)
" ".join(tokens)

'The match has concluded India has won the match Will we fin the finals too'

# How to perform stemming


In [24]:
text= "There are many types of challenging dances. Dancing is also a way to express or communicate personal feelings usually happiness and excitement. Dance can also have other meaning and purposes. For example, Indian dance has a specific purpose."

In [None]:
# Stemming with nltk's PorterStemmer
from nltk.stem import PorterStemmer
stem_=PorterStemmer()
stemmed_tokens=[]
for token in nltk.word_tokenize(text):
  stemmed_tokens.append(stem_.stem(token))

" ".join(stemmed_tokens)

In [26]:
text= "There are many types of challenging dances. Dancing is also a way to express or communicate personal feelings usually happiness and excitement. Dance can also have other meaning and purposes. For example, Indian dance has a specific purpose."

In [27]:
# Lemmatization using spacy's lemma_ attribute of token
nlp=spacy.load("en_core_web_sm")
doc=nlp(text)

lemmatized=[token.lemma_ for token in doc]
" ".join(lemmatized)

'there be many type of challenge dance . dancing be also a way to express or communicate personal feeling usually happiness and excitement . dance can also have other meaning and purpose . for example , indian dance have a specific purpose .'

In [28]:
text= "The new registrations are potter709@gmail.com , elixir101@gmail.com. If you find any disruptions, kindly contact granger111@gamil.com or severus77@gamil.com "

In [29]:
# Using regular expression to extract usernames
import re  

# \S matches any non-whitespace character 
# @ for as in the Email 
# + for Repeats a character one or more times 
usernames= re.findall('(\S+)@', text)     
print(usernames) 

['potter709', 'elixir101', 'granger111', 'severus77']


In [30]:
text="""The green algae (singular: green alga) are a large, informal grouping of algae consisting of the Chlorophyte and Charophyte algae, which are now placed in separate Divisions. The land plants or Embryophytes (higher plants) are thought to have emerged from the Charophytes.[1] As the embryophytes are not algae, and are therefore excluded, green algae are a paraphyletic group. However, the clade that includes both green algae and embryophytes is monophyletic and is referred to as the clade Viridiplantae and as the kingdom Plantae. The green algae include unicellular and colonial flagellates, most with two flagella per cell, as well as various colonial, coccoid and filamentous forms, and macroscopic, multicellular seaweeds. In the Charales, the closest relatives of higher plants, full cellular differentiation of tissues occurs. There are about 8,000 species of green algae.[2] Many species live most of their lives as single cells, while other species form coenobia (colonies), long filaments, or highly differentiated macroscopic seaweeds. A few other organisms rely on green algae to conduct photosynthesis for them. The chloroplasts in euglenids and chlorarachniophytes were acquired from ingested green algae,[1] and in the latter retain a nucleomorph (vestigial nucleus). Green algae are also found symbiotically in the ciliate Paramecium, and in Hydra viridissima and in flatworms. Some species of green algae, particularly of genera Trebouxia of the class Trebouxiophyceae and Trentepohlia (class Ulvophyceae), can be found in symbiotic associations with fungi to form lichens. In general the fungal species that partner in lichens cannot live on their own, while the algal species is often found living in nature without the fungus. Trentepohlia is a filamentous green alga that can live independently on humid soil, rocks or tree bark or form the photosymbiont in lichens of the family Graphidaceae."""

In [31]:
# Removing stopwords in nltk

from nltk.corpus import stopwords
my_stopwords=set(stopwords.words('english'))
new_tokens=[]

# Tokenization using word_tokenize()
all_tokens=nltk.word_tokenize(text)

for token in all_tokens:
  if token not in my_stopwords:
    new_tokens.append(token)


new_tokens = " ".join(new_tokens)

tokenizer=nltk.RegexpTokenizer(r"\w+")

tokens=tokenizer.tokenize(new_tokens)
" ".join(tokens)

freq_dict={}

# Calculating frequency count
for word in tokens:
  if word not in freq_dict:
    freq_dict[word]=1
  else:
    freq_dict[word]+=1

freq_dict

{'000': 1,
 '1': 2,
 '2': 1,
 '8': 1,
 'A': 1,
 'As': 1,
 'Charales': 1,
 'Charophyte': 1,
 'Charophytes': 1,
 'Chlorophyte': 1,
 'Divisions': 1,
 'Embryophytes': 1,
 'Graphidaceae': 1,
 'Green': 1,
 'However': 1,
 'Hydra': 1,
 'In': 2,
 'Many': 1,
 'Paramecium': 1,
 'Plantae': 1,
 'Some': 1,
 'The': 4,
 'There': 1,
 'Trebouxia': 1,
 'Trebouxiophyceae': 1,
 'Trentepohlia': 2,
 'Ulvophyceae': 1,
 'Viridiplantae': 1,
 'acquired': 1,
 'alga': 2,
 'algae': 12,
 'algal': 1,
 'also': 1,
 'associations': 1,
 'bark': 1,
 'cell': 1,
 'cells': 1,
 'cellular': 1,
 'chlorarachniophytes': 1,
 'chloroplasts': 1,
 'ciliate': 1,
 'clade': 2,
 'class': 2,
 'closest': 1,
 'coccoid': 1,
 'coenobia': 1,
 'colonial': 2,
 'colonies': 1,
 'conduct': 1,
 'consisting': 1,
 'differentiated': 1,
 'differentiation': 1,
 'embryophytes': 2,
 'emerged': 1,
 'euglenids': 1,
 'excluded': 1,
 'family': 1,
 'filamentous': 2,
 'filaments': 1,
 'flagella': 1,
 'flagellates': 1,
 'flatworms': 1,
 'form': 3,
 'forms': 1,
 '

In [32]:
text="He is a gret person. He beleives in bod"

In [33]:
#!pip install textblob
#Import textblob
from textblob import TextBlob

# Using textblob's correct() function
text=TextBlob(text)
print(text.correct())

He is a great person. He believes in god


In [34]:
text="Having lots of fun #goa #vaction #summervacation. Fancy dinner @Beachbay restro :) "

In [35]:
import re
# Cleaning the tweets
text=re.sub(r'[^\w]', ' ', text)

# Using nltk's TweetTokenizer
from nltk.tokenize import TweetTokenizer
tokenizer=TweetTokenizer()
tokenizer.tokenize(text)

['Having',
 'lots',
 'of',
 'fun',
 'goa',
 'vaction',
 'summervacation',
 'Fancy',
 'dinner',
 'Beachbay',
 'restro']

In [36]:
text="James works at Microsoft. She lives in manchester and likes to play the flute"

In [37]:
# Coverting the text into a spacy Doc
nlp=spacy.load("en_core_web_sm")
doc=nlp(text)

# Using spacy's pos_ attribute to check for part of speech tags
for token in doc:
  if token.pos_=='NOUN' or token.pos_=='PROPN':
    print(token.text)

James
Microsoft
manchester
flute


In [38]:
text="John is happy finally. He had landed his dream job finally. He told his mom. She was elated "

In [39]:
# Using spacy's pos_ attribute to check for part of speech tags
nlp=spacy.load("en_core_web_sm")
doc=nlp(text)

for token in doc:
  if token.pos_=='PRON':
    print(token.text)

He
He
She


In [40]:
word1="amazing"
word2="terrible"
word3="excellent"

In [73]:
# Convert words into spacy tokens
import spacy
#!python -m spacy download en_core_web_lg
nlp=spacy.load('en_core_web_lg')
token1=nlp(word1)
token2=nlp(word2)
token3=nlp(word3)

# Use similarity() function of tokens
print('similarity between', word1,'and' ,word2, 'is' ,token1.similarity(token2))
print('similarity between', word1,'and' ,word3, 'is' ,token1.similarity(token3))

OSError: ignored

In [74]:
text1="John likes to play baseball"
#text2="James lives in America, though he's not from there"
text2="James plays hockey"

In [75]:
# Finding similarity using spaCy library

doc1=nlp(text1)
doc2=nlp(text2)
doc1.similarity(doc2)   

ValueError: ignored

In [76]:
text1='Taj Mahal is a tourist place in India'
text2='Great Wall of China is a tourist place in china'

In [77]:
# Using Vectorizer of sklearn to get vector representation
documents=[text1,text2]
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd

vectorizer=CountVectorizer()
matrix=vectorizer.fit_transform(documents)

# Obtaining the document-word matrix
doc_term_matrix=matrix.todense()
doc_term_matrix

# Computing cosine similarity
df=pd.DataFrame(doc_term_matrix)

from sklearn.metrics.pairwise import cosine_similarity
print(cosine_similarity(df,df))

[[1.         0.45584231]
 [0.45584231 1.        ]]


In [None]:
# Import gensim api
import gensim.downloader as api

# Load the pretrained google news word2vec model
word2vec_model300 = api.load('word2vec-google-news-300')


# Using most_similar() function
word2vec_model300.most_similar('amazing')

INFO:gensim.api:Creating /root/gensim-data




INFO:gensim.api:word2vec-google-news-300 downloaded
INFO:gensim.models.utils_any2vec:loading projection weights from /root/gensim-data/word2vec-google-news-300/word2vec-google-news-300.gz
  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
INFO:gensim.models.utils_any2vec:loaded (3000000, 300) matrix from /root/gensim-data/word2vec-google-news-300/word2vec-google-news-300.gz
INFO:gensim.models.keyedvectors:precomputing L2-norms of word weight vectors


In [None]:
text=" My sister has a dog and she loves him"

In [None]:
# Import neural coref library
!pip install neuralcoref
import spacy
import neuralcoref

# Add it to the pipeline
nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)

# Printing the coreferences
doc1 = nlp('My sister has a dog. She loves him.')
print(doc1._.coref_clusters)

In [None]:
texts= ["""It's all about travel. I travel a lot.  those who do not travel read only a page.” – said Saint Augustine. He was a great travel person. Travelling can teach you more than any university course. You learn about the culture of the country you visit. If you talk to locals, you will likely learn about their thinking, habits, traditions and history as well.If you travel, you will not only learn about foreign cultures, but about your own as well. You will notice the cultural differences, and will find out what makes your culture unique. After retrurning from a long journey, you will see your country with new eyes.""",
        """ You can learn a lot about yourself through travelling. You can observe how you feel beeing far from your country. You will find out how you feel about your homeland.You should travel You will realise how you really feel about foreign people. You will find out how much you know/do not know about the world. You will be able to observe how you react in completely new situations. You will test your language, orientational and social skills. You will not be the same person after returning home.During travelling you will meet people that are very different from you. If you travel enough, you will learn to accept and appreciate these differences. Traveling makes you more open and accepting.""",
        """Some of my most cherished memories are from the times when I was travelling. If you travel, you can experience things that you could never experience at home. You may see beautiful places and landscapes that do not exist where you live. You may meet people that will change your life, and your thingking. You may try activities that you have never tried before.Travelling will inevitably make you more independent and confident. You will realise that you can cope with a lot of unexpected situations. You will realise that you can survive without all that help that is always available for you at home. You will likely find out that you are much stronger and braver than you have expected.""",
        """If you travel, you may learn a lot of useful things. These things can be anything from a new recepie, to a new, more effective solution to an ordinary problem or a new way of creating something.Even if you go to a country where they speak the same language as you, you may still learn some new words and expressions that are only used there. If you go to a country where they speak a different language, you will learn even more.""",
        """After arriving home from a long journey, a lot of travellers experience that they are much more motivated than they were before they left. During your trip you may learn things that you will want to try at home as well. You may want to test your new skills and knowledge. Your experiences will give you a lot of energy.During travelling you may experience the craziest, most exciting things, that will eventually become great stories that you can tell others. When you grow old and look back at your life and all your travel experiences, you will realise how much you have done in your life and your life was not in vain. It can provide you with happiness and satisfaction for the rest of your life.""",
        """The benefits of travel are not just a one-time thing: travel changes you physically and psychologically. Having little time or money isn't a valid excuse. You can travel for cheap very easily. If you have a full-time job and a family, you can still travel on the weekends or holidays, even with a baby. travel  more is likely to have a tremendous impact on your mental well-being, especially if you're no used to going out of your comfort zone. Trust me: travel more and your doctor will be happy. Be sure to get in touch with your physician, they might recommend some medication to accompany you in your travels, especially if you're heading to regions of the globe with potentially dangerous diseases.""",
        """Sure, you probably feel comfortable where you are, but that is just a fraction of the world! If you are a student, take advantage of programs such as Erasmus to get to know more people, experience and understand their culture. Dare traveling to regions you have a skeptical opinion about. I bet that you will change your mind and realize that everything is not so bad abroad.""",
        """ So, travel makes you cherish life. Let's travel more . Share your travel diaries with us too"""
        ]

In [None]:
# Importing the Tf-idf vectorizer from sklearn
from sklearn.feature_extraction.text import TfidfVectorizer

# Defining the vectorizer
vectorizer = TfidfVectorizer(stop_words='english', max_features=1000, max_df = 0.5, smooth_idf=True)

# Transforming the tokens into the matrix form through .fit_transform()
matrix = vectorizer.fit_transform(text)

# SVD represent documents and terms in vectors
from sklearn.decomposition import TruncationSVD
SVD_model = TruncatedSVF(n_components=10, algorithm='randomized', n_iter=100, random_state=42)
SVD_model.fit(matrix)

# Getting the terms 
terms = vectorizer.get_feature_names()

# Iterating through each topic
for i, comp in enumerate(SVD_model.components_):
    terms_comp = zip(terms, comp)
    # sorting the 7 most important terms
    sorted_terms = sorted(terms_comp, key= lambda x:x[1], reverse=True)[:7]
    print("Topic "+str(i)+": ")
    # printing the terms of a topic
    for t in sorted_terms:
        print(t[0],end=' ')
    print(' ')

In [None]:
# Import gensim, nltk
import gensim
from gensim import models, corpora
import nltk
from nltk.corpus import stopwords

# Before topic extraction, we remove punctuations and stopwords.
my_stopwords=set(stopwords.words('english'))
punctuations=['.','!',',',"You","I"]

# We prepare a list containing lists of tokens of each text
all_tokens=[]
for text in texts:
    token=[]
    raw=nltk.wordpunct_tokenize(text)
    for token in raw:
        if token not in my_stopwords:
            if token not in punctuations:
                tokens.append(token)
                all_tokens.append(tokens)

# Creating a gensim dictionary and the matrix

dictionary = corpora.Dictionary(all_tokens)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in all_tokens]

# Building the model and training it with the matrix 
from gensim.models.ldamodel import LdaModel
model = LdaModel(doc_term_matrix, num_topics=5, id2word = dictionary,passes=40)

print(model.print_topics(num_topics=6,num_words=5))

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Defining the vectorizer
vectorizer = TfidfVectorizer(stop_words='english', max_features= 1000,  max_df = 0.5, smooth_idf=True)

# Transforming the tokens into the matrix form through .fit_transform()
nmf_matrix= vectorizer.fit_transform(texts)
from sklearn.decomposition import NMF
nmf_model = NMF(n_components=6)
nmf_model.fit(nmf_matrix)

# Function to print topics
def print_topics_nmf(model, vectorizer, top_n=6):
    for idx, topic in enumerate(model.components_):
        print("Topic %d:" % (idx))
        print([(vectorizer.get_feature_names()[i], topic[i])
                        for i in topic.argsort()[:-top_n - 1:-1]])
        
print_topics_nmf(nmf_model,vectorizer)

In [None]:
text="It was a very pleasant day"

In [None]:
# Sentiment analysis with TextBlob
from textblob import TextBlob
blob=TextBlob(text)

# Using the sentiment attribute 
print(blob.sentiment)
if(blob.sentiment.polarity > 0):
  print("Positive")

In [None]:
texts= [" Photography is an excellent hobby to pursue ",
        " Photographers usually develop patience, calmnesss"
        " You can try Photography with any good mobile too"]

In [None]:
# We prepare a list containing lists of tokens of each text
tokens=[]
for text in texts:
  tokens=[]
  raw=nltk.wordpunct_tokenize(text)
  for token in raw:
    tokens.append(token)
    all_tokens.append(tokens)

# Import and fit the model with data
import gensim
from gensim.models import Word2Vec
model=Word2Vec(all_tokens)

# Getting the vector representation of a word
model['Photography']

In [None]:
states = ['Alabama', 'Alaska', 'American Samoa', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', \
          'District of Columbia', 'Florida', 'Georgia', 'Guam', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', \
          'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', \
          'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', \
          'North Dakota', 'Northern Mariana Islands', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Puerto Rico', 'Rhode Island',\
          'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virgin Islands', 'Virginia', 'Washington',\
          'West Virginia', 'Wisconsin', 'Wyoming']

In [None]:
# We prepare a list containing lists of tokens of each text
all_tokens=[]
for text in texts:
  tokens=[]
  raw=nltk.wordpunct_tokenize(text)
  for token in raw:
    tokens.append(token)
    all_tokens.append(tokens)

# Import and fit the model with data
import gensim
from gensim.models import Word2Vec
model=Word2Vec(all_tokens)


# Visualizing the word embedding
from sklearn.decomposition import PCA
from matplotlib import pyplot


X = model[model.wv.vocab]
pca = PCA(n_components=2)
result = pca.fit_transform(X)
# create a scatter plot of the projection
pyplot.scatter(result[:, 0], result[:, 1])
words = list(model.wv.vocab)
for i, word in enumerate(words):
    pyplot.annotate(word, xy=(result[i, 0], result[i, 1]))
pyplot.show()

In [None]:
# Importing the model
from gensim.models import Doc2Vec

# Preparing data in the format and fitting to the model
def tagged_document(list_of_list_of_words):
   for i, list_of_words in enumerate(list_of_list_of_words):
      yield gensim.models.doc2vec.TaggedDocument(list_of_words, [i])
my_data = list(tagged_document(all_tokens))
model=Doc2Vec(my_data)

model.infer_vector(['photography','is','an',' excellent ','hobby ','to',' pursue '])

In [None]:
text_documents=['Painting is a hobby for many , passion for some',
                'My hobby is coin collection'
                'I do some Painting every now and then']

In [None]:
# Method 1-Using gensim

from gensim import corpora
from gensim.utils import simple_preprocess
doc_tokenized = [simple_preprocess(text) for text in text_documents]
dictionary = corpora.Dictionary()

# Creating the Bag of Words from the docs
BoW_corpus = [dictionary.doc2bow(doc, allow_update=True) for doc in doc_tokenized]
for doc in BoW_corpus:
   print([[dictionary[id], freq] for id, freq in doc])
import numpy as np
tfidf = models.TfidfModel(BoW_corpus)

In [None]:
# Method 2- Using sklearn's TfidfVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

# Fit the vectorizer to our text documents
vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(text_documents)
print(matrix)

In [None]:
documents = ["the mayor of new york was there", "new york mayor was present"]

In [None]:
|# Import Phraser from gensim
from gensim.models import Phrases
from gensim.models.phrases import Phraser

sentence_stream = [doc.split(" ") for doc in documents]

# Creating bigram phraser
bigram = Phrases(sentence_stream, min_count=1, threshold=2, delimiter=b' ')
bigram_phraser = Phraser(bigram)

for sent in sentence_stream:
    tokens_ = bigram_phraser[sent]
    print(tokens_)

In [None]:
Sentences="Machine learning is a neccessary field in today's world. Data science can do wonders . Natural Language Processing is how machines understand text "

In [None]:
# Creating bigrams and trigrams
from nltk import ngrams
bigram=list(ngrams(Sentences.lower().split(),2))
trigram=list(ngrams(Sentences.lower().split(),3))

print(" Bigrams are",bigram)
print(" Trigrams are", trigram)

In [None]:
text='''Sí, sabes que ya llevo un rato mirándote
Tengo que bailar contigo hoy (DY)
Vi que tu mirada ya estaba llamándome
Muéstrame el camino que yo voy
Oh
Tú, tú eres el imán y yo soy el metal
Me voy acercando y voy armando el plan
Solo con pensarlo se acelera el pulso
Oh yeah
Ya, ya me está gustando más de lo normal
Todos mis sentidos van pidiendo más
Esto hay que tomarlo sin ningún apuro
Despacito
Quiero respirar tu cuello despacito
Deja que te diga cosas al oído
Para que te acuerdes si no estás conmigo
Despacito
Quiero desnudarte a besos despacito
Firmo en las paredes de tu laberinto
Y hacer de tu cuerpo todo un manuscrito (sube, sube, sube)
(Sube, sube) Oh
'''

In [None]:
# Install spacy's languagedetect library
import spacy
#!pip install spacy_langdetect
from spacy_langdetect import LanguageDetector
#nlp = spacy.load('en')
nlp=spacy.load("en_core_web_sm")

# Add the language detector to the processing pipeline
nlp.add_pipe(LanguageDetector(), name='language_detector', last=True)

doc = nlp(text)
# document level language detection. Think of it like average language of the document!
print(doc._.language)
# sentence level language detection
for sent in doc.sents:
   print(sent, sent._.language)

In [None]:
text="Robert Langdon is a famous character in various books and movies "

In [None]:
# Using retokenize() method of Doc object to merge two tokens

doc = nlp(text)
with doc.retokenize() as retokenizer:
    retokenizer.merge(doc[0:2])

for token in doc:
  print(token.text)

In [None]:
text="There is a empty house on the Elm Street"

In [None]:
# Create a spacy doc of the text
doc = nlp(text)

# Use `noun_chunks` attribute to extract the Noun phrases
chunks = list(doc.noun_chunks)
chunks

In [None]:
text=("I may bake a cake for my birthday. The talk will introduce reader about Use of baking")

In [None]:
# Import textacy library
#!pip install textacy
#!pip install -U spacy
import textacy
# Regex pattern to identify verb phrase
pattern = r'(<VERB>?<ADV>*<VERB>+)'
#doc = textacy.make_spacy_doc(text,lang='en_core_web_sm')
doc = textacy.make_spacy_doc("The author is writing a new book.", lang='en_core_web_sm')

# Finding matches
verb_phrases = textacy.extract.pos_regex_matches(doc, pattern)

# Print all Verb Phrase
for chunk in verb_phrases:
  print(chunk.text)

In [None]:
text="Sherlock Holmes and Clint Thomas were good friends. I am a fan of John Mark"

In [None]:
# Import and initialize spacy's matcher
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
doc=nlp(text)

# Function that adds patterns to the matcher and finds the respective matches
def extract_matches(doc):
   pattern = [{'POS': 'PROPN'}, {'POS': 'PROPN'}]
   matcher.add('FULL_NAME', None, pattern)
   matches = matcher(doc)
   
   for match_id, start, end in matches:
     span = doc[start:end]
     print(span.text)

extract_matches(doc)

In [None]:
text="Jeff Bezos works at Amazon. He lives in Seattle."

In [None]:
# Load spacy modelimport spacy
nlp=spacy.load("en_core_web_sm")
doc=nlp(text)
# Using the ents attribute of doc, identify labels
for entity in doc.ents:  
   print(entity.text,entity.label_)

In [None]:
text =" Google has released it's new model which has got attention of everyone. Amazon is planning to expand into Food delivery, thereby giving competition . Apple is coming up with new iphone model. Flipkart will have to catch up soon."

In [None]:
doc=nlp(text)
list_of_org=[]
for entity in doc.ents:
  if entity.label_=="ORG":
    list_of_org.append(entity.text)

print(list_of_org)

In [44]:
news=" Walter was arrested yesterday at Brooklyn for murder. The suspicions and fingerprints pointed to Walter  and his friend  Pinkman . The arrest was made by inspector Hank"

In [45]:
doc=nlp(news)

# Identifying the entities of category 'PERSON'
entities = [entity.text  for entity in doc.ents  if entity.label_=='PERSON']
updated_text=[]

for token in doc:
  if token.text in entities:
    updated_text.append("UNKNOWN")
  else :
    updated_text.append(token.text)

" ".join(updated_text)

'  UNKNOWN was arrested yesterday at Brooklyn for murder . The suspicions and fingerprints pointed to UNKNOWN   and his friend   UNKNOWN . The arrest was made by inspector UNKNOWN'

In [46]:
text=" Walter was arrested yesterday at Brooklyn for murder. The suspicions and fingerprints pointed to Walter  and his friend  Pinkman . He is from Paris "

In [47]:
from spacy import displacy
doc=nlp(text)
displacy.render(doc,style='ent',jupyter=True)

In [48]:
text="Mark plays volleyball every evening."

In [49]:
# Using dep_ attribute od tokens in spaCy to access the dependency of the word in sentence.
doc=nlp(text)

for token in doc:
  print(token.text,token.dep_)

Mark nsubj
plays ROOT
volleyball dobj
every det
evening npadvmod
. punct


In [50]:
text="Mark plays volleyball. Sam is not into sports, he paints a lot"

In [51]:
# use the head attribute of tokens to find it's rootword
doc=nlp(text)
for token in doc:
  print(token.text,token.head)

Mark plays
plays plays
volleyball plays
. plays
Sam is
is paints
not is
into is
sports into
, paints
he paints
paints paints
a lot
lot paints


In [52]:
text="Mark plays volleyball. Sam is not into sports, he paints a lot"

In [53]:
# Use spacy's displacy with the parameter style="dep"
doc=nlp(text)

from spacy import displacy
displacy.render(doc,style='dep',jupyter=True)

In [54]:
text="For my offical use, I prefer lenova. For gaming purposes, I love asus"

In [55]:
# Import EntityRuler of spacy model
import spacy
nlp=spacy.load("en_core_web_sm")
from spacy.pipeline import EntityRuler

# Functions to create patterns of laptop name to match
def create_versioned(name):
    return [
        [{'LOWER': name}], 
        [{'LOWER': {'REGEX': f'({name}\d+\.?\d*.?\d*)'}}], 
        [{'LOWER': name}, {'TEXT': {'REGEX': '(\d+\.?\d*.?\d*)'}}]]

def create_patterns():
    versioned_languages = ['dell', 'HP', 'asus','msi','Apple','HCL','sony','samsung','lenova','acer']
    flatten = lambda l: [item for sublist in l for item in sublist]
    versioned_patterns = flatten([create_versioned(lang) for lang in versioned_languages])

    lang_patterns = [
        [{'LOWER': 'dell'}, {'LIKE_NUM': True}],
        [{'LOWER': 'HP'}],
        [{'LOWER': 'asus'}, {'LOWER': '#'}],
        [{'LOWER': 'msi'}, {'LOWER': 'sharp'}],
        [{'LOWER': 'Apple'}],
        [{'LOWER': 'HCL'}, {'LOWER': '#'}],
        [{'LOWER': 'sony'}],
        [{'LOWER': 'samsung'}],
        [{'LOWER': 'toshiba'}],
        [{'LOWER': 'dell'},{'LOWER': 'inspiron'}],
        [{'LOWER': 'acer'},{'IS_PUNCT': True, 'OP': '?'},{'LOWER': 'c'}],
        [{'LOWER': 'golang'}],
        [{'LOWER': 'lenova'}],
        [{'LOWER': 'HP'},{'LOWER':'gaming'}],
        [{'LOWER': 'Fujitsu'}],
        [{'LOWER': 'micromax'}],
    ]

    return versioned_patterns + lang_patterns

# Add the Entity Ruler to the pipeline
ruler=EntityRuler(nlp)
ruler.add_patterns([{'label':'laptop','pattern':p} for p in create_patterns()])
nlp.add_pipe(ruler)

# Identify the car names now
doc=nlp("For my offical use, I prefer lenova. For gaming purposes, I love asus")
for ent in doc.ents:
  print(ent.text,ent.label_)

lenova laptop
asus laptop


In [56]:
original_text="""Studies show that exercise can treat mild to moderate depression as effectively as antidepressant medication—but without the side-effects, of course. As one example, a recent study done by the Harvard T.H. Chan School of Public Health found that running for 15 minutes a day or walking for an hour reduces the risk of major depression by 26%. In addition to relieving depression symptoms, research also shows that maintaining an exercise schedule can prevent you from relapsing.
Exercise is a powerful depression fighter for several reasons. Most importantly, it promotes all kinds of changes in the brain, including neural growth, reduced inflammation, and new activity patterns that promote feelings of calm and well-being. It also releases endorphins, powerful chemicals in your brain that energize your spirits and make you feel good. Finally, exercise can also serve as a distraction, allowing you to find some quiet time to break out of the cycle of negative thoughts that feed depression.
Exercise is not just about aerobic capacity and muscle size. Sure, exercise can improve your physical health and your physique, trim your waistline, improve your sex life, and even add years to your life. But that’s not what motivates most people to stay active.
People who exercise regularly tend to do so because it gives them an enormous sense of well-being. They feel more energetic throughout the day, sleep better at night, have sharper memories, and feel more relaxed and positive about themselves and their lives. And it’s also powerful medicine for many common mental health challenges.
Regular exercise can have a profoundly positive impact on depression, anxiety, ADHD, and more. It also relieves stress, improves memory, helps you sleep better, and boosts your overall mood. And you don’t have to be a fitness fanatic to reap the benefits. Research indicates that modest amounts of exercise can make a difference. No matter your age or fitness level, you can learn to use exercise as a powerful tool to feel better.
Ever noticed how your body feels when you’re under stress? Your muscles may be tense, especially in your face, neck, and shoulders, leaving you with back or neck pain, or painful headaches. You may feel a tightness in your chest, a pounding pulse, or muscle cramps. You may also experience problems such as insomnia, heartburn, stomachache, diarrhea, or frequent urination. The worry and discomfort of all these physical symptoms can in turn lead to even more stress, creating a vicious cycle between your mind and body.
Exercising is an effective way to break this cycle. As well as releasing endorphins in the brain, physical activity helps to relax the muscles and relieve tension in the body. Since the body and mind are so closely linked, when your body feels better so, too, will your mind.Evidence suggests that by really focusing on your body and how it feels as you exercise, you can actually help your nervous system become “unstuck” and begin to move out of the immobilization stress response that characterizes PTSD or trauma. 
Instead of allowing your mind to wander, pay close attention to the physical sensations in your joints and muscles, even your insides as your body moves. Exercises that involve cross movement and that engage both arms and legs—such as walking (especially in sand), running, swimming, weight training, or dancing—are some of your best choices.
Outdoor activities like hiking, sailing, mountain biking, rock climbing, whitewater rafting, and skiing (downhill and cross-country) have also been shown to reduce the symptoms of PTSD."""

In [57]:
# Importing the summarize function from gensim module
import gensim
from gensim.summarization.summarizer import summarize

# Pass the document along with desired word count to get the summary
my_summary=summarize(original_text,word_count=100)
print(my_summary)

As one example, a recent study done by the Harvard T.H. Chan School of Public Health found that running for 15 minutes a day or walking for an hour reduces the risk of major depression by 26%.
No matter your age or fitness level, you can learn to use exercise as a powerful tool to feel better.
The worry and discomfort of all these physical symptoms can in turn lead to even more stress, creating a vicious cycle between your mind and body.
As well as releasing endorphins in the brain, physical activity helps to relax the muscles and relieve tension in the body.


In [58]:
original_text="""Studies show that exercise can treat mild to moderate depression as effectively as antidepressant medication—but without the side-effects, of course. As one example, a recent study done by the Harvard T.H. Chan School of Public Health found that running for 15 minutes a day or walking for an hour reduces the risk of major depression by 26%. In addition to relieving depression symptoms, research also shows that maintaining an exercise schedule can prevent you from relapsing.
Exercise is a powerful depression fighter for several reasons. Most importantly, it promotes all kinds of changes in the brain, including neural growth, reduced inflammation, and new activity patterns that promote feelings of calm and well-being. It also releases endorphins, powerful chemicals in your brain that energize your spirits and make you feel good. Finally, exercise can also serve as a distraction, allowing you to find some quiet time to break out of the cycle of negative thoughts that feed depression.
Exercise is not just about aerobic capacity and muscle size. Sure, exercise can improve your physical health and your physique, trim your waistline, improve your sex life, and even add years to your life. But that’s not what motivates most people to stay active.
People who exercise regularly tend to do so because it gives them an enormous sense of well-being. They feel more energetic throughout the day, sleep better at night, have sharper memories, and feel more relaxed and positive about themselves and their lives. And it’s also powerful medicine for many common mental health challenges.
Regular exercise can have a profoundly positive impact on depression, anxiety, ADHD, and more. It also relieves stress, improves memory, helps you sleep better, and boosts your overall mood. And you don’t have to be a fitness fanatic to reap the benefits. Research indicates that modest amounts of exercise can make a difference. No matter your age or fitness level, you can learn to use exercise as a powerful tool to feel better.
Ever noticed how your body feels when you’re under stress? Your muscles may be tense, especially in your face, neck, and shoulders, leaving you with back or neck pain, or painful headaches. You may feel a tightness in your chest, a pounding pulse, or muscle cramps. You may also experience problems such as insomnia, heartburn, stomachache, diarrhea, or frequent urination. The worry and discomfort of all these physical symptoms can in turn lead to even more stress, creating a vicious cycle between your mind and body.
Exercising is an effective way to break this cycle. As well as releasing endorphins in the brain, physical activity helps to relax the muscles and relieve tension in the body. Since the body and mind are so closely linked, when your body feels better so, too, will your mind.Evidence suggests that by really focusing on your body and how it feels as you exercise, you can actually help your nervous system become “unstuck” and begin to move out of the immobilization stress response that characterizes PTSD or trauma. 
Instead of allowing your mind to wander, pay close attention to the physical sensations in your joints and muscles, even your insides as your body moves. Exercises that involve cross movement and that engage both arms and legs—such as walking (especially in sand), running, swimming, weight training, or dancing—are some of your best choices.
Outdoor activities like hiking, sailing, mountain biking, rock climbing, whitewater rafting, and skiing (downhill and cross-country) have also been shown to reduce the symptoms of PTSD."""

In [61]:
!pip install sumy
import sumy
from sumy.summarizers.lex_rank import LexRankSummarizer

#Plain text parsers since we are parsing through text
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer

parser=PlaintextParser.from_string(original_text,Tokenizer("english"))

summarizer=LexRankSummarizer()
my_summary=summarizer(parser.document,2)
print(my_summary)

Collecting sumy
[?25l  Downloading https://files.pythonhosted.org/packages/61/20/8abf92617ec80a2ebaec8dc1646a790fc9656a4a4377ddb9f0cc90bc9326/sumy-0.8.1-py2.py3-none-any.whl (83kB)
[K     |████                            | 10kB 15.2MB/s eta 0:00:01[K     |███████▉                        | 20kB 1.7MB/s eta 0:00:01[K     |███████████▊                    | 30kB 2.1MB/s eta 0:00:01[K     |███████████████▋                | 40kB 2.5MB/s eta 0:00:01[K     |███████████████████▌            | 51kB 2.0MB/s eta 0:00:01[K     |███████████████████████▍        | 61kB 2.2MB/s eta 0:00:01[K     |███████████████████████████▍    | 71kB 2.4MB/s eta 0:00:01[K     |███████████████████████████████▎| 81kB 2.7MB/s eta 0:00:01[K     |████████████████████████████████| 92kB 2.4MB/s 
Collecting pycountry>=18.2.23
[?25l  Downloading https://files.pythonhosted.org/packages/76/73/6f1a412f14f68c273feea29a6ea9b9f1e268177d32e0e69ad6790d306312/pycountry-20.7.3.tar.gz (10.1MB)
[K     |████████████████

In [62]:
original_text="""Studies show that exercise can treat mild to moderate depression as effectively as antidepressant medication—but without the side-effects, of course. As one example, a recent study done by the Harvard T.H. Chan School of Public Health found that running for 15 minutes a day or walking for an hour reduces the risk of major depression by 26%. In addition to relieving depression symptoms, research also shows that maintaining an exercise schedule can prevent you from relapsing.
Exercise is a powerful depression fighter for several reasons. Most importantly, it promotes all kinds of changes in the brain, including neural growth, reduced inflammation, and new activity patterns that promote feelings of calm and well-being. It also releases endorphins, powerful chemicals in your brain that energize your spirits and make you feel good. Finally, exercise can also serve as a distraction, allowing you to find some quiet time to break out of the cycle of negative thoughts that feed depression.
Exercise is not just about aerobic capacity and muscle size. Sure, exercise can improve your physical health and your physique, trim your waistline, and even add years to your life. But that’s not what motivates most people to stay active.
People who exercise regularly tend to do so because it gives them an enormous sense of well-being. They feel more energetic throughout the day, sleep better at night, have sharper memories, and feel more relaxed and positive about themselves and their lives. And it’s also powerful medicine for many common mental health challenges.
Regular exercise can have a profoundly positive impact on depression, anxiety, ADHD, and more. It also relieves stress, improves memory, helps you sleep better, and boosts your overall mood. And you don’t have to be a fitness fanatic to reap the benefits. Research indicates that modest amounts of exercise can make a difference. No matter your age or fitness level, you can learn to use exercise as a powerful tool to feel better.
Ever noticed how your body feels when you’re under stress? Your muscles may be tense, especially in your face, neck, and shoulders, leaving you with back or neck pain, or painful headaches. You may feel a tightness in your chest, a pounding pulse, or muscle cramps. You may also experience problems such as insomnia, heartburn, stomachache, diarrhea, or frequent urination. The worry and discomfort of all these physical symptoms can in turn lead to even more stress, creating a vicious cycle between your mind and body.
Exercising is an effective way to break this cycle. As well as releasing endorphins in the brain, physical activity helps to relax the muscles and relieve tension in the body. Since the body and mind are so closely linked, when your body feels better so, too, will your mind.Evidence suggests that by really focusing on your body and how it feels as you exercise, you can actually help your nervous system become “unstuck” and begin to move out of the immobilization stress response that characterizes PTSD or trauma. 
Instead of allowing your mind to wander, pay close attention to the physical sensations in your joints and muscles, even your insides as your body moves. Exercises that involve cross movement and that engage both arms and legs—such as walking (especially in sand), running, swimming, weight training, or dancing—are some of your best choices.
Outdoor activities like hiking, sailing, mountain biking, rock climbing, whitewater rafting, and skiing (downhill and cross-country) have also been shown to reduce the symptoms of PTSD."""

In [63]:
import sumy
from sumy.summarizers.luhn import LuhnSummarizer

#Plain text parsers since we are parsing through text
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer

parser=PlaintextParser.from_string(original_text,Tokenizer("english"))

summarizer=LuhnSummarizer()
my_summary=summarizer(parser.document,2)
print(my_summary)

(<Sentence: Finally, exercise can also serve as a distraction, allowing you to find some quiet time to break out of the cycle of negative thoughts that feed depression.>, <Sentence: Since the body and mind are so closely linked, when your body feels better so, too, will your mind.Evidence suggests that by really focusing on your body and how it feels as you exercise, you can actually help your nervous system become “unstuck” and begin to move out of the immobilization stress response that characterizes PTSD or trauma.>)


In [64]:
original_text="""Studies show that exercise can treat mild to moderate depression as effectively as antidepressant medication—but without the side-effects, of course. As one example, a recent study done by the Harvard T.H. Chan School of Public Health found that running for 15 minutes a day or walking for an hour reduces the risk of major depression by 26%. In addition to relieving depression symptoms, research also shows that maintaining an exercise schedule can prevent you from relapsing.
Exercise is a powerful depression fighter for several reasons. Most importantly, it promotes all kinds of changes in the brain, including neural growth, reduced inflammation, and new activity patterns that promote feelings of calm and well-being. It also releases endorphins, powerful chemicals in your brain that energize your spirits and make you feel good. Finally, exercise can also serve as a distraction, allowing you to find some quiet time to break out of the cycle of negative thoughts that feed depression.
Exercise is not just about aerobic capacity and muscle size. Sure, exercise can improve your physical health and your physique, trim your waistline, and even add years to your life. But that’s not what motivates most people to stay active.
People who exercise regularly tend to do so because it gives them an enormous sense of well-being. They feel more energetic throughout the day, sleep better at night, have sharper memories, and feel more relaxed and positive about themselves and their lives. And it’s also powerful medicine for many common mental health challenges.
Regular exercise can have a profoundly positive impact on depression, anxiety, ADHD, and more. It also relieves stress, improves memory, helps you sleep better, and boosts your overall mood. And you don’t have to be a fitness fanatic to reap the benefits. Research indicates that modest amounts of exercise can make a difference. No matter your age or fitness level, you can learn to use exercise as a powerful tool to feel better.
Ever noticed how your body feels when you’re under stress? Your muscles may be tense, especially in your face, neck, and shoulders, leaving you with back or neck pain, or painful headaches. You may feel a tightness in your chest, a pounding pulse, or muscle cramps. You may also experience problems such as insomnia, heartburn, stomachache, diarrhea, or frequent urination. The worry and discomfort of all these physical symptoms can in turn lead to even more stress, creating a vicious cycle between your mind and body.
Exercising is an effective way to break this cycle. As well as releasing endorphins in the brain, physical activity helps to relax the muscles and relieve tension in the body. Since the body and mind are so closely linked, when your body feels better so, too, will your mind.Evidence suggests that by really focusing on your body and how it feels as you exercise, you can actually help your nervous system become “unstuck” and begin to move out of the immobilization stress response that characterizes PTSD or trauma. 
Instead of allowing your mind to wander, pay close attention to the physical sensations in your joints and muscles, even your insides as your body moves. Exercises that involve cross movement and that engage both arms and legs—such as walking (especially in sand), running, swimming, weight training, or dancing—are some of your best choices.
Outdoor activities like hiking, sailing, mountain biking, rock climbing, whitewater rafting, and skiing (downhill and cross-country) have also been shown to reduce the symptoms of PTSD."""

In [65]:
import sumy
from sumy.summarizers.lsa import LsaSummarizer

#Plain text parsers since we are parsing through text
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer

parser=PlaintextParser.from_string(original_text,Tokenizer("english"))

summarizer=LsaSummarizer()
my_summary=summarizer(parser.document,2)
print(my_summary)

(<Sentence: In addition to relieving depression symptoms, research also shows that maintaining an exercise schedule can prevent you from relapsing.>, <Sentence: People who exercise regularly tend to do so because it gives them an enormous sense of well-being.>)


In [66]:
text1="Netflix has released a new series"
text2="It was shot in London"
text3="It is called Dark and the main character is Jonas"
text4="Adam is the evil character"

In [67]:
# Covert into spacy documents
doc1=nlp(text1)
doc2=nlp(text2)
doc3=nlp(text3)
doc4=nlp(text4)

# Import docs_to_json 
from spacy.gold import docs_to_json

# Converting into json format
json_data = docs_to_json([doc1,doc2,doc3,doc4])
json_data

{'id': 0,
 'paragraphs': [{'cats': [],
   'raw': 'Netflix has released a new series',
   'sentences': [{'brackets': [],
     'tokens': [{'dep': 'nsubj',
       'head': 2,
       'id': 0,
       'ner': 'U-ORG',
       'orth': 'Netflix',
       'tag': 'NNP'},
      {'dep': 'aux',
       'head': 1,
       'id': 1,
       'ner': 'O',
       'orth': 'has',
       'tag': 'VBZ'},
      {'dep': 'ROOT',
       'head': 0,
       'id': 2,
       'ner': 'O',
       'orth': 'released',
       'tag': 'VBN'},
      {'dep': 'det', 'head': 2, 'id': 3, 'ner': 'O', 'orth': 'a', 'tag': 'DT'},
      {'dep': 'amod',
       'head': 1,
       'id': 4,
       'ner': 'O',
       'orth': 'new',
       'tag': 'JJ'},
      {'dep': 'dobj',
       'head': -3,
       'id': 5,
       'ner': 'O',
       'orth': 'series',
       'tag': 'NN'}]}]},
  {'cats': [],
   'raw': 'It was shot in London',
   'sentences': [{'brackets': [],
     'tokens': [{'dep': 'nsubjpass',
       'head': 2,
       'id': 0,
       'ner': 'O',
  

In [68]:
# Data to train the classifier
train = [
    ('I love eating sushi','food-review'),
    ('This is an amazing place!', 'Tourist-review'),
    ('Pizza is my all time favorite food','food-review'),
    ('I baked a cake yesterday, it was tasty', 'food-review'),
    ("What an awesome taste this sushi has", 'food-review'),
    ('It is a perfect place for outing', 'Tourist-review'),
    ('This is a nice picnic spot', 'Tourist-review'),
    ("Families come out on tours here", 'Tourist-review'),
    ('It is a beautiful place !', 'Tourist-review'),
    ('The place was warm and nice', 'Tourist-review')
]
test = [
    ('The sushi was good', 'food-review'),
    ('The place was perfect for picnics ', 'Tourist-review'),
    ("Burgers are my favorite food", 'food-review'),
    ("I feel amazing!", 'food-review'),
    ('It is an amazing place', 'Tourist-review'),
    ("This isn't a very good place", 'Tourist-review')
]

In [69]:
# Importing the classifier
from textblob.classifiers import NaiveBayesClassifier
from textblob import TextBlob

# Training
cl = NaiveBayesClassifier(train)

# Classify some text
print(cl.classify("My favorite food is spring rolls"))  
print(cl.classify("It was a cold place for picnic"))  

# Printing accuracy of classifier
print("Accuracy: {0}".format(cl.accuracy(test)))

food-review
Tourist-review
Accuracy: 0.8333333333333334


In [70]:
train_data = [
    ["The movie was amazing", 1],
    ["It was a boring movie", 0],
    ["I had a great experience",1],
    ["I was bored during the movie",0],
    ["The movie was great",1],
    ["The movie was bad",0],
    ["The movie was good",1]
]

In [71]:
# Import requirements
#!pip install simpletransformers
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import pandas as pd
import logging


logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

# Preparing train data

train_df = pd.DataFrame(train_data)
train_df.columns = ["text", "labels"]

# Optional model configuration
model_args = ClassificationArgs(num_train_epochs=5)

# Create a ClassificationModel
model = ClassificationModel("bert", "bert-base-uncased", args=model_args,use_cuda=False)

# Train the model
model.train_model(train_df)


# Make predictions with the model
predictions, raw_outputs = model.predict(["The titanic was a good movie"])

predictions

Collecting simpletransformers
[?25l  Downloading https://files.pythonhosted.org/packages/56/35/31022262786f4aa070fe472677cea66fade8d221181a86825096af021e2c/simpletransformers-0.48.14-py3-none-any.whl (214kB)
[K     |████████████████████████████████| 215kB 2.7MB/s 
[?25hCollecting streamlit
[?25l  Downloading https://files.pythonhosted.org/packages/c7/d2/3228e647441606a38ce4f73a30ba8a5d36a7be421123d129f7e40348dc02/streamlit-0.69.2-py2.py3-none-any.whl (7.4MB)
[K     |████████████████████████████████| 7.4MB 7.3MB/s 
[?25hCollecting tensorboardx
[?25l  Downloading https://files.pythonhosted.org/packages/af/0c/4f41bcd45db376e6fe5c619c01100e9b7531c55791b7244815bac6eac32c/tensorboardX-2.1-py2.py3-none-any.whl (308kB)
[K     |████████████████████████████████| 317kB 37.3MB/s 
Collecting seqeval
[?25l  Downloading https://files.pythonhosted.org/packages/c4/47/f85f522f1f2532ca650474088e4024a9f52d524379bff058eaadf53cb663/seqeval-1.2.1.tar.gz (43kB)
[K     |██████████████████████████████

INFO:filelock:Lock 140012701772544 acquired on /root/.cache/torch/transformers/f2ee78bdd635b758cc0a12352586868bef80e47401abe4c4fcc3832421e7338b.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=440473133.0, style=ProgressStyle(descri…

INFO:filelock:Lock 140012701772544 released on /root/.cache/torch/transformers/f2ee78bdd635b758cc0a12352586868bef80e47401abe4c4fcc3832421e7338b.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157.lock





Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

HBox(children=(FloatProgress(value=0.0, max=7.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=5.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 5', max=1.0, style=ProgressStyle(descr…






HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 5', max=1.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 5', max=1.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 5', max=1.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 4 of 5', max=1.0, style=ProgressStyle(descr…





INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_model: Converting to features started. Cache is not used.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




array([1])

In [72]:
import spacy
nlp=spacy.load("en_core_web_sm")


textcat = nlp.create_pipe("textcat", config={"exclusive_classes": True, "architecture": "simple_cnn"})
nlp.add_pipe(textcat, last=True)
textcat = nlp.get_pipe("textcat")

# add label to text classifier
textcat.add_label("POSITIVE")
textcat.add_label("NEGATIVE")


def load_data(limit=0, split=0.8):
    """Load data from the IMDB dataset."""
    # Partition off part of the train data for evaluation
    train_data, _ = thinc.extra.datasets.imdb()
    random.shuffle(train_data)
    train_data = train_data[-limit:]
    texts, labels = zip(*train_data)
    cats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels]
    split = int(len(train_data) * split)
    return (texts[:split], cats[:split]), (texts[split:], cats[split:])


# load the IMDB dataset
print("Loading IMDB data...")
(train_texts, train_cats), (dev_texts, dev_cats) = load_data()
train_texts = train_texts[:n_texts]
train_cats = train_cats[:n_texts]
    
train_data = list(zip(train_texts, [{"cats": cats} for cats in train_cats]))

# get names of other pipes to disable them during training
pipe_exceptions = ["textcat", "trf_wordpiecer", "trf_tok2vec"]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]

# Training the text classifier
with nlp.disable_pipes(*other_pipes):  # only train textcat
   optimizer = nlp.begin_training()
   if init_tok2vec is not None:
      with init_tok2vec.open("rb") as file_:
        textcat.model.tok2vec.from_bytes(file_.read())
        print("Training the model...")
        print("{:^5}\t{:^5}\t{:^5}\t{:^5}".format("LOSS", "P", "R", "F"))
        batch_sizes = compounding(4.0, 32.0, 1.001)
        for i in range(n_iter):
            losses = {}
            # batch up the examples using spaCy's minibatch
            random.shuffle(train_data)
            batches = minibatch(train_data, size=batch_sizes)
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.2, losses=losses)

Loading IMDB data...


NameError: ignored

In [None]:
['Our experienced writers travel the world to bring you informative and inspirational features, destination roundups, travel ideas, tips and beautiful photos in order to help you plan your next holiday',
                  'Each part of Germany is different, and there are thousands of memorable places to visit.',
                  "Christmas Markets originated in Germany, and the tradition dates to the Late Middle Ages.",
                  "Garmisch-Partenkirchen is a small town in Bavaria, near Germany’s highest mountain Zugspitze, which rises to 9,718 feet (2,962 meters)",
                  "It’s one of the country’s top alpine destinations, extremely popular during the winter",
                  "In spring, take a road trip through Bavaria and enjoy the view of the dark green Alps and the first alpine wildflowers. "]

In [None]:
# Import the model
from simpletransformers.seq2seq import Seq2SeqModel

# Setting desired arguments
my_args = {    "train_batch_size": 2,
               "num_train_epochs": 10,
               "save_eval_checkpoints": False,
               "save_model_every_epoch": False,
               "evaluate_during_training": True,
               "evaluate_generated_text": True   }

# Instantiating the model
my_model=Seq2SeqModel(encoder_decoder_name="Helsinki-NLP/opus-mt-en-de",encoder_decoder_type="marian",args=my_args,use_cuda=False)


# translating the text

my_model.predict(['Our experienced writers travel the world to bring you informative and inspirational features, destination roundups, travel ideas, tips and beautiful photos in order to help you plan your next holiday',
                  'Each part of Germany is different, and there are thousands of memorable places to visit.',
                  "Christmas Markets originated in Germany, and the tradition dates to the Late Middle Ages.",
                  "Garmisch-Partenkirchen is a small town in Bavaria, near Germany’s highest mountain Zugspitze, which rises to 9,718 feet (2,962 meters)",
                  "It’s one of the country’s top alpine destinations, extremely popular during the winter",
                  "In spring, take a road trip through Bavaria and enjoy the view of the dark green Alps and the first alpine wildflowers. "])

In [None]:
context=""" Harry Potter is the best book series according to many people. Harry Potter was written by JK.Rowling .
It is afantasy based novel that provides a thrilling experience to readers."""

question="What is Harry Potter ?"

In [None]:
#Install and import the pipeline of transformers
#!pip install transformers
from transformers import pipeline

# Get thetask-specific pipeline
my_model=pipeline(task="question-answering")

context = r""" Harry Potter is the best book series according to many people. Harry Potter was written by JK.Rowling .
It is afantasy based novel that provides a thrilling experience to readers."""

# Pass the question and context to the model to obtain answer
print(my_model(question="What is Harry Potter ?", context=context))
print(my_model(question="Who wrote Harry Potter ?", context=context))

In [None]:
starting="It was a bright"

In [None]:
# Import pipeline from transformers package
from transformers import pipeline

# Get the task-specific pipeline
my_model=pipeline(task="text-generation")

# Pass the starting sequence as input to generate text
my_model(starting)

In [None]:
text1="It is a pleasant day, I am going for a walk"
text2="I have a terrible headache"

In [None]:
!pip install --upgrade tensorflow
# Import pipeline from transformers package
from transformers import pipeline

# Get the task specific pipeline
my_model = pipeline("sentiment-analysis")

# Predicting the sentiment with score
print(my_model(text1))
print(my_model(text2))