In [1]:
import pandas as pd
import numpy as np
import spacy

In [2]:
text='Donald Trump is an American businessman, television personality, and politician who served as the 45th President of the United States from January 2017 to January 2021. Before entering politics, he was best known for his career in real estate development, branding, and media, particularly as the chairman and president of The Trump Organization and the host of the reality TV show The Apprentice. His presidency was marked by a populist and nationalist agenda, focusing on issues such as immigration reform, deregulation, trade renegotiations, and tax cuts. Trump’s leadership style and communication methods, particularly his use of social media, were highly unconventional and often polarising. Following his 2020 electoral defeat to Joe Biden, Trump remained an influential figure within the Republican Party, continuing to shape American political discourse and mobilise a large base of supporters.'

print(text)

Donald Trump is an American businessman, television personality, and politician who served as the 45th President of the United States from January 2017 to January 2021. Before entering politics, he was best known for his career in real estate development, branding, and media, particularly as the chairman and president of The Trump Organization and the host of the reality TV show The Apprentice. His presidency was marked by a populist and nationalist agenda, focusing on issues such as immigration reform, deregulation, trade renegotiations, and tax cuts. Trump’s leadership style and communication methods, particularly his use of social media, were highly unconventional and often polarising. Following his 2020 electoral defeat to Joe Biden, Trump remained an influential figure within the Republican Party, continuing to shape American political discourse and mobilise a large base of supporters.


In [3]:
#NLP steps
# 1. Split sentences
# 2. Tokenization
# 3. Remove stop words
# 4. Lemmatization/Stemming
# 5. POS tagging
# 6. Named Entity Recognition
# 7. Dependency Parsing
# 8. Vectorization
# 9. Sentiment Analysis
# 10. Topic Modeling
# 11. Text Classification


In [4]:
#split sentences

nlp=spacy.load('en_core_web_lg')  #load sentences


doc=nlp(text) #process the text


#print the sentences
for sentence in doc.sents:
    print(sentence.text)
    print('\n')


Donald Trump is an American businessman, television personality, and politician who served as the 45th President of the United States from January 2017 to January 2021.


Before entering politics, he was best known for his career in real estate development, branding, and media, particularly as the chairman and president of The Trump Organization and the host of the reality TV show The Apprentice.


His presidency was marked by a populist and nationalist agenda, focusing on issues such as immigration reform, deregulation, trade renegotiations, and tax cuts.


Trump’s leadership style and communication methods, particularly his use of social media, were highly unconventional and often polarising.


Following his 2020 electoral defeat to Joe Biden, Trump remained an influential figure within the Republican Party, continuing to shape American political discourse and mobilise a large base of supporters.




In [5]:
#generare a list of sentences
sentences=list(doc.sents) #convert to list
print(sentences) #print first sentence

[Donald Trump is an American businessman, television personality, and politician who served as the 45th President of the United States from January 2017 to January 2021., Before entering politics, he was best known for his career in real estate development, branding, and media, particularly as the chairman and president of The Trump Organization and the host of the reality TV show The Apprentice., His presidency was marked by a populist and nationalist agenda, focusing on issues such as immigration reform, deregulation, trade renegotiations, and tax cuts., Trump’s leadership style and communication methods, particularly his use of social media, were highly unconventional and often polarising., Following his 2020 electoral defeat to Joe Biden, Trump remained an influential figure within the Republican Party, continuing to shape American political discourse and mobilise a large base of supporters.]


In [6]:
#tokenise
tokens = [token for token in doc]
print(tokens) #print the tokens

[Donald, Trump, is, an, American, businessman, ,, television, personality, ,, and, politician, who, served, as, the, 45th, President, of, the, United, States, from, January, 2017, to, January, 2021, ., Before, entering, politics, ,, he, was, best, known, for, his, career, in, real, estate, development, ,, branding, ,, and, media, ,, particularly, as, the, chairman, and, president, of, The, Trump, Organization, and, the, host, of, the, reality, TV, show, The, Apprentice, ., His, presidency, was, marked, by, a, populist, and, nationalist, agenda, ,, focusing, on, issues, such, as, immigration, reform, ,, deregulation, ,, trade, renegotiations, ,, and, tax, cuts, ., Trump, ’s, leadership, style, and, communication, methods, ,, particularly, his, use, of, social, media, ,, were, highly, unconventional, and, often, polarising, ., Following, his, 2020, electoral, defeat, to, Joe, Biden, ,, Trump, remained, an, influential, figure, within, the, Republican, Party, ,, continuing, to, shape, Ame

In [7]:
# for token in doc:
#     print(token.text, token.idx) #print the token and its index

#get unique tokens
unique_tokens = set(tokens)
print(len(unique_tokens)) #print unique tokens


doc.vocab.strings['television']

154


3403785101479364457

In [8]:
#POS tagging
for token in doc:
    print(token.text, token.pos_) #print the token and its POS tag

Donald PROPN
Trump PROPN
is AUX
an DET
American ADJ
businessman NOUN
, PUNCT
television NOUN
personality NOUN
, PUNCT
and CCONJ
politician NOUN
who PRON
served VERB
as ADP
the DET
45th ADJ
President PROPN
of ADP
the DET
United PROPN
States PROPN
from ADP
January PROPN
2017 NUM
to ADP
January PROPN
2021 NUM
. PUNCT
Before ADP
entering VERB
politics NOUN
, PUNCT
he PRON
was AUX
best ADV
known VERB
for ADP
his PRON
career NOUN
in ADP
real ADJ
estate NOUN
development NOUN
, PUNCT
branding NOUN
, PUNCT
and CCONJ
media NOUN
, PUNCT
particularly ADV
as SCONJ
the DET
chairman NOUN
and CCONJ
president NOUN
of ADP
The DET
Trump PROPN
Organization PROPN
and CCONJ
the DET
host NOUN
of ADP
the DET
reality NOUN
TV NOUN
show NOUN
The DET
Apprentice PROPN
. PUNCT
His PRON
presidency NOUN
was AUX
marked VERB
by ADP
a DET
populist ADJ
and CCONJ
nationalist ADJ
agenda NOUN
, PUNCT
focusing VERB
on ADP
issues NOUN
such ADJ
as ADP
immigration NOUN
reform NOUN
, PUNCT
deregulation NOUN
, PUNCT
trade NOUN
re

In [9]:
#filter only nouns and prononouns

data=[]
for token in doc:
    if token.pos_ in ['VERB']:
        data.append((token.text, token.pos_))

print(data)

[('served', 'VERB'), ('entering', 'VERB'), ('known', 'VERB'), ('marked', 'VERB'), ('focusing', 'VERB'), ('polarising', 'VERB'), ('Following', 'VERB'), ('remained', 'VERB'), ('continuing', 'VERB'), ('shape', 'VERB'), ('mobilise', 'VERB')]


In [10]:
#Named entity recoginition
for ent in doc.ents:
    print(ent.text, ent.label_) #print the entity and its label

Donald Trump PERSON
American NORP
45th ORDINAL
the United States GPE
January 2017 to January 2021 DATE
The Trump Organization ORG
Apprentice ORG
Trump ORG
2020 DATE
Joe Biden PERSON
Trump ORG
the Republican Party ORG
American NORP


In [11]:
#lemmatisation
for token in doc:
    print(token.text, token.lemma_) #print the token and its lemma

Donald Donald
Trump Trump
is be
an an
American american
businessman businessman
, ,
television television
personality personality
, ,
and and
politician politician
who who
served serve
as as
the the
45th 45th
President President
of of
the the
United United
States States
from from
January January
2017 2017
to to
January January
2021 2021
. .
Before before
entering enter
politics politic
, ,
he he
was be
best well
known know
for for
his his
career career
in in
real real
estate estate
development development
, ,
branding branding
, ,
and and
media medium
, ,
particularly particularly
as as
the the
chairman chairman
and and
president president
of of
The the
Trump Trump
Organization Organization
and and
the the
host host
of of
the the
reality reality
TV tv
show show
The the
Apprentice Apprentice
. .
His his
presidency presidency
was be
marked mark
by by
a a
populist populist
and and
nationalist nationalist
agenda agenda
, ,
focusing focus
on on
issues issue
such such
as as
immigration immig

In [12]:
#dependency parsing

sent1=list(doc.sents)[0]
print(sent1)

for token in sent1:
    print(token.text, token.dep_, token.head.text, token.head.pos_,[child for child in token.children]) #print the token and its dependency


Donald Trump is an American businessman, television personality, and politician who served as the 45th President of the United States from January 2017 to January 2021.
Donald compound Trump PROPN []
Trump nsubj is AUX [Donald]
is ROOT is AUX [Trump, businessman, .]
an det businessman NOUN []
American amod businessman NOUN []
businessman attr is AUX [an, American, ,, personality]
, punct businessman NOUN []
television compound personality NOUN []
personality conj businessman NOUN [television, ,, and, politician]
, punct personality NOUN []
and cc personality NOUN []
politician conj personality NOUN [served]
who nsubj served VERB []
served relcl politician NOUN [who, as, from]
as prep served VERB [President]
the det President PROPN []
45th amod President PROPN []
President pobj as ADP [the, 45th, of]
of prep President PROPN [States]
the det States PROPN []
United compound States PROPN []
States pobj of ADP [the, United]
from prep served VERB [January, to]
January pobj from ADP [2017]
20

In [None]:
#visualise with displacy
spacy.displacy.serve(sent1, style='dep', options={'distance': 120})

ImportError: cannot import name 'display' from 'IPython.core.display' (/Users/nirmal/Library/Python/3.13/lib/python/site-packages/IPython/core/display.py)