In [None]:
#python -m spacy download en_core_web_sm

In [5]:
import spacy
nlp = spacy.load('en_core_web_sm')

introduction_text = ("""Perhaps one of the most significant advances made by Arabic mathematics began at this time with the work of al-Khwarizmi,namely the beginnings of algebra. It is important to understand just how significant this new idea was. It was a revolutionary move away from the Greek concept of mathematics which was essentially geometry. Algebra was a unifying theory which allowed rational numbers, irrational numbers, geometrical magnitudes, etc., to all be treated as "algebraic objects". It gave mathematics a whole new development path so much broader in concept to that which had existed before, and provided a vehicle for future development of the subject. Another important aspect of the introduction of algebraic ideas was that it allowed mathematics to be applied to itself in a way which had not happened before.""")
about_doc = nlp(introduction_text)


## Split Paragraph to word

In [6]:
# Extract tokens for the given doc
print ([token.text for token in about_doc])

['Perhaps', 'one', 'of', 'the', 'most', 'significant', 'advances', 'made', 'by', 'Arabic', 'mathematics', 'began', 'at', 'this', 'time', 'with', 'the', 'work', 'of', 'al', '-', 'Khwarizmi', ',', 'namely', 'the', 'beginnings', 'of', 'algebra', '.', 'It', 'is', 'important', 'to', 'understand', 'just', 'how', 'significant', 'this', 'new', 'idea', 'was', '.', 'It', 'was', 'a', 'revolutionary', 'move', 'away', 'from', 'the', 'Greek', 'concept', 'of', 'mathematics', 'which', 'was', 'essentially', 'geometry', '.', 'Algebra', 'was', 'a', 'unifying', 'theory', 'which', 'allowed', 'rational', 'numbers', ',', 'irrational', 'numbers', ',', 'geometrical', 'magnitudes', ',', 'etc', '.', ',', 'to', 'all', 'be', 'treated', 'as', '"', 'algebraic', 'objects', '"', '.', 'It', 'gave', 'mathematics', 'a', 'whole', 'new', 'development', 'path', 'so', 'much', 'broader', 'in', 'concept', 'to', 'that', 'which', 'had', 'existed', 'before', ',', 'and', 'provided', 'a', 'vehicle', 'for', 'future', 'development', 

## Split Paragraph to sentences

In [7]:
sentences = list(about_doc.sents)
for sentence in sentences:
    print (sentence)

Perhaps one of the most significant advances made by Arabic mathematics began at this time with the work of al-Khwarizmi,namely the beginnings of algebra.
It is important to understand just how significant this new idea was.
It was a revolutionary move away from the Greek concept of mathematics which was essentially geometry.
Algebra was a unifying theory which allowed rational numbers, irrational numbers, geometrical magnitudes, etc., to all be treated as "algebraic objects".
It gave mathematics a whole new development path so much broader in concept to that which had existed before, and provided a vehicle for future development of the subject.
Another important aspect of the introduction of algebraic ideas was that it allowed mathematics to be applied to itself in a way which had not happened before.


## Tokenization

In [8]:
for token in about_doc[:20]:
    print (token, token.idx, token.text_with_ws,
            token.is_alpha, token.is_punct, token.is_space,
            token.shape_, token.is_stop)

Perhaps 0 Perhaps  True False False Xxxxx True
one 8 one  True False False xxx True
of 12 of  True False False xx True
the 15 the  True False False xxx True
most 19 most  True False False xxxx True
significant 24 significant  True False False xxxx False
advances 36 advances  True False False xxxx False
made 45 made  True False False xxxx True
by 50 by  True False False xx True
Arabic 53 Arabic  True False False Xxxxx False
mathematics 60 mathematics  True False False xxxx False
began 72 began  True False False xxxx False
at 78 at  True False False xx True
this 81 this  True False False xxxx True
time 86 time  True False False xxxx False
with 91 with  True False False xxxx True
the 96 the  True False False xxx True
work 100 work  True False False xxxx False
of 105 of  True False False xx True
al 108 al True False False xx False


## Listing Stop Word

In [9]:
import spacy
from spacy.lang.en import English
spacy_stopwords = spacy.lang.en.stop_words.STOP_WORDS
len(spacy_stopwords)
for stop_word in list(spacy_stopwords)[:10]:
    print(stop_word)


upon
anyhow
’d
really
quite
besides
they
himself
anyone
where


## Remove stop words and punctuation


In [10]:
for token in about_doc:
    if not token.is_stop and not token.is_punct and token.is_alpha :
        print(token,end=" ")

significant advances Arabic mathematics began time work al Khwarizmi beginnings algebra important understand significant new idea revolutionary away Greek concept mathematics essentially geometry Algebra unifying theory allowed rational numbers irrational numbers geometrical magnitudes etc treated algebraic objects gave mathematics new development path broader concept existed provided vehicle future development subject important aspect introduction algebraic ideas allowed mathematics applied way happened 

## Lemmatisation

In [11]:
for token in about_doc:
    print (token, ">>>> "+token.lemma_)

Perhaps >>>> perhaps
one >>>> one
of >>>> of
the >>>> the
most >>>> most
significant >>>> significant
advances >>>> advance
made >>>> make
by >>>> by
Arabic >>>> arabic
mathematics >>>> mathematic
began >>>> begin
at >>>> at
this >>>> this
time >>>> time
with >>>> with
the >>>> the
work >>>> work
of >>>> of
al >>>> al
- >>>> -
Khwarizmi >>>> Khwarizmi
, >>>> ,
namely >>>> namely
the >>>> the
beginnings >>>> beginning
of >>>> of
algebra >>>> algebra
. >>>> .
It >>>> it
is >>>> be
important >>>> important
to >>>> to
understand >>>> understand
just >>>> just
how >>>> how
significant >>>> significant
this >>>> this
new >>>> new
idea >>>> idea
was >>>> be
. >>>> .
It >>>> it
was >>>> be
a >>>> a
revolutionary >>>> revolutionary
move >>>> move
away >>>> away
from >>>> from
the >>>> the
Greek >>>> greek
concept >>>> concept
of >>>> of
mathematics >>>> mathematic
which >>>> which
was >>>> be
essentially >>>> essentially
geometry >>>> geometry
. >>>> .
Algebra >>>> Algebra
was >>>> be
a >>>> a


## Part of Speech Tagging

In [12]:
for token in about_doc:
    print (token, token.tag_, token.pos_, spacy.explain(token.tag_))

Perhaps RB ADV adverb
one CD NUM cardinal number
of IN ADP conjunction, subordinating or preposition
the DT DET determiner
most RBS ADV adverb, superlative
significant JJ ADJ adjective
advances NNS NOUN noun, plural
made VBN VERB verb, past participle
by IN ADP conjunction, subordinating or preposition
Arabic JJ ADJ adjective
mathematics NNS NOUN noun, plural
began VBD VERB verb, past tense
at IN ADP conjunction, subordinating or preposition
this DT DET determiner
time NN NOUN noun, singular or mass
with IN ADP conjunction, subordinating or preposition
the DT DET determiner
work NN NOUN noun, singular or mass
of IN ADP conjunction, subordinating or preposition
al NNP PROPN noun, proper singular
- HYPH PUNCT punctuation mark, hyphen
Khwarizmi NNP PROPN noun, proper singular
, , PUNCT punctuation mark, comma
namely RB ADV adverb
the DT DET determiner
beginnings NNS NOUN noun, plural
of IN ADP conjunction, subordinating or preposition
algebra NNS NOUN noun, plural
. . PUNCT punctuation ma