<a href="https://colab.research.google.com/github/priyanshgupta1998/Natural-language-processing-NLP-/blob/master/Polyglot_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#INTRODUCTION TO NATURAL LANGUAGE PROCESSING WITH POLYGLOT
`Polyglot is a natural language pipeline that supports massive multilingual applications.Polyglot has a similar learning curve with TextBlob making it easier to pick up quickly.`

In [0]:
!pip install polyglot
!pip install PyICU
!pip install pycld2
!pip install morfessor

In [0]:
!polyglot download embeddings2.en
!polyglot download ner2.en
!polyglot download sentiment2.en
!polyglot download pos2.en
!polyglot download morph2.en
!polyglot download transliteration2.ar
!polyglot download embeddings2.la

#1. TOKENIZATION

In [0]:
# Load packages
import polyglot
from polyglot.text import Text,Word

In [12]:
# Word Tokens
docx = Text(u"He likes reading and painting")
docx

Text("He likes reading and painting")

In [13]:
docx.words

WordList(['He', 'likes', 'reading', 'and', 'painting'])

In [15]:
#another example
docx2 = Text(u"He exclaimed, 'what're you doing? Reading?'.")
docx2.words

WordList(['He', 'exclaimed', ',', "'", "what're", 'you', 'doing', '?', 'Reading', '?', "'", '.'])

###[ ?  ,  .  ,  !]   End mark

In [16]:
# Sentence tokens
docx3 = Text(u"He likes reading and painting.He exclaimed, 'what're you doing? Reading?'.")
docx3.sentences

[Sentence("He likes reading and painting.He exclaimed, 'what're you doing?"),
 Sentence("Reading?'.")]

In [17]:
# Sentence tokens
docx3 = Text(u"He likes reading and painting.He exclaimed! 'what're you doing: Reading?'.")
docx3.sentences

[Sentence("He likes reading and painting.He exclaimed!"),
 Sentence("'what're you doing: Reading?'.")]

#2. PARTS OF SPEECH TAGGING

In [24]:
docx.pos_tags

[('He', 'PRON'),
 ('likes', 'VERB'),
 ('reading', 'VERB'),
 ('and', 'CONJ'),
 ('painting', 'NOUN')]

#3. LANGUAGE DETECTION

In [25]:
docx.language.name

'English'

In [26]:
docx.language.code

'en'

In [27]:
from polyglot.detect  import Detector
en_text = "He is a student "
fr_text = "Il est un étudiant"
ru_text = "Он студент"
detect_en = Detector(en_text)
detect_fr = Detector(fr_text)
detect_ru = Detector(ru_text)
print(detect_en.language)
print(detect_fr.language)
print(detect_ru.language)

Detector is not able to detect the language reliably.
Detector is not able to detect the language reliably.


name: English     code: en       confidence:  94.0 read bytes:   704
name: French      code: fr       confidence:  95.0 read bytes:   870
name: Serbian     code: sr       confidence:  95.0 read bytes:   614


#4. SENTIMENT ANALYSIS
it is based on polarity (it can be +ve or -ve)

In [0]:
!polyglot download sentiment2.en

In [30]:
print(docx)
docx.polarity

He likes reading and painting


1.0

In [31]:
docx4 = Text(u"He hates reading and playing")
print(docx4)
docx4.polarity

He hates reading and playing


-1.0

#5. NAMED ENTITIES

In [33]:
!polyglot download ner2.en

[polyglot_data] Downloading package ner2.en to /root/polyglot_data...


In [34]:
docx5 = Text(u"John Jones was a FBI detector")
docx5.entities

[I-PER(['John', 'Jones']), I-ORG(['FBI'])]

#6. MORPHOLOGY
morpheme is the smallest grammatical unit in a language.   
morpheme may or may not stand alone, word, by definition, is freestanding.

In [36]:
!polyglot download morph2.en

[polyglot_data] Downloading package morph2.en to
[polyglot_data]     /root/polyglot_data...


In [37]:
docx6 = Text(u"preprocessing")
docx6.morphemes

Detector is not able to detect the language reliably.


WordList(['pre', 'process', 'ing'])

#7. TRANSLITERATION

In [39]:
!polyglot download transliteration2.fr

[polyglot_data] Downloading package transliteration2.fr to
[polyglot_data]     /root/polyglot_data...


In [40]:
# Load 
from polyglot.transliteration import Transliterator
translit = Transliterator(source_lang='en',target_lang='fr')
translit.transliterate(u"working")                          

'working'