# 11. Part of Speech Tagging and Lemmatisation

### Exercise 11.1

Create a list containing the unique adjectives that are occur in *Pride and Prejudice*. 

In [1]:
import nltk
import os
from nltk import word_tokenize,pos_tag
from text_mining import *
from collections import Counter

path = os.path.join('Corpus','PrideAndPrejudice.txt')

with open( path , encoding = 'utf-8') as file:
    full_text = file.read()

words = word_tokenize(full_text)
words = remove_punctuation(words)
pos = pos_tag(words)

adjectives = []
adj_codes = ['JJ','JJR','JJS']

for p in pos:
    if p[1] in adj_codes:
        adjectives.append(p[0])
        
freq = Counter(adjectives)

for word,count in freq.most_common(20):
    print(f'{word} => {count}')
 

such => 285
other => 208
much => 204
little => 179
own => 179
good => 172
more => 155
great => 141
young => 126
last => 118
many => 115
first => 109
dear => 95
sure => 92
happy => 80
few => 71
same => 68
next => 67
least => 60
better => 59


### Exercise 11.2

Stephen King is [reputed to have said](https://www.goodreads.com/quotes/430289-i-believe-the-road-to-hell-is-paved-with-adverbs) that “the road to hell is paved with adverbs", and many style guides similarly give writers the advice to avoid adverbs, especially those ending in '-ly'. 

Can you calculate, for each text in the corpus, the number of adverb ending in '-ly', measured as a percentage of the total number of words?

In [2]:
import nltk
from nltk import word_tokenize,pos_tag
from text_mining import *
from collections import Counter

directory = 'Corpus'
files = os.listdir(directory)

for file in files:
    print(f"\n{file}")
    path = os.path.join(directory,file)
    
    full_text = ''
    with open( path , encoding = 'utf-8') as file:
        full_text = file.read()

    words = word_tokenize(full_text.lower())
    words = remove_punctuation(words)
    nr_words = len(words)
    pos = pos_tag(words)

    adjectives = []
    adj_codes = ['RB','RBR','RBS']

    ly_adverbs = 0
    for p in pos:
        if p[1] in adj_codes and p[0][-2:].strip() == 'ly':
            adjectives.append(p[0])
            ly_adverbs += 1

    freq = Counter(adjectives)
        
    print(f"{ly_adverbs} adverbs ending in '-ly' in total.")
    print(f"This is {round(ly_adverbs/nr_words,4)}% of all the words ")
    
    number = 15
    if ly_adverbs>0:
        print(f"{number} most frequent adverbs:")
        for word,count in freq.most_common(number):
            print(f'{word} => {count}')


sonnet116.txt
0 adverbs ending in '-ly' in total.
This is 0.0% of all the words 

Ullyses.txt
3179 adverbs ending in '-ly' in total.
This is 0.012% of all the words 
15 most frequent adverbs:
only => 212
slowly => 71
molly => 70
simply => 57
quickly => 54
really => 49
milly => 42
suddenly => 39
gently => 38
lovely => 36
probably => 34
quietly => 33
softly => 33
loudly => 29
certainly => 29

BraveNewWorld.txt
1203 adverbs ending in '-ly' in total.
This is 0.0195% of all the words 
15 most frequent adverbs:
only => 89
suddenly => 81
really => 50
slowly => 30
actually => 21
simply => 13
absolutely => 11
simultaneously => 10
merely => 10
beastly => 10
finally => 10
carefully => 9
particularly => 9
hardly => 9
perfectly => 8

PrideandPrejudice.txt
1891 adverbs ending in '-ly' in total.
This is 0.0157% of all the words 
15 most frequent adverbs:
only => 190
really => 89
certainly => 70
immediately => 61
perfectly => 48
hardly => 46
scarcely => 44
merely => 33
particularly => 33
instantly =>

### Exercise 11.3

Which text in the corpus has the highest number of modal verbs? The Penn Treebank code for 'modal auxialiaries' is MD. 

In [3]:
import nltk
from nltk import word_tokenize,pos_tag
from text_mining import *
from collections import Counter

directory = 'Corpus'
files = os.listdir(directory)

for file in files:
    print(f"\n{file}")
    path = os.path.join(directory,file)
    
    full_text = ''
    with open( path , encoding = 'utf-8') as file:
        full_text = file.read()

    words = word_tokenize(full_text.lower())
    words = remove_punctuation(words)
    nr_words = len(words)
    pos = pos_tag(words)
    
    modal_verbs = []

    for p in pos:
        if p[1] == 'MD' and len(p[0])>2:
            modal_verbs.append(p[0])

    freq = Counter(modal_verbs)
        
    print(f"{len(modal_verbs)} modal verbs.")

    number = 10
    if len(modal_verbs)>0:
        print(f"{number} most frequent modal verbs:")
        for word,count in freq.most_common(number):
            print(f'{word} => {count}')


sonnet116.txt
0 modal verbs.

Ullyses.txt
2018 modal verbs.
10 most frequent modal verbs:
would => 382
will => 340
could => 309
can => 276
must => 221
might => 176
may => 91
should => 82
shall => 66
ought => 61

BraveNewWorld.txt
518 modal verbs.
10 most frequent modal verbs:
would => 116
could => 116
can => 68
should => 47
must => 43
will => 37
might => 34
ought => 24
may => 18
shall => 15

PrideandPrejudice.txt
2892 modal verbs.
10 most frequent modal verbs:
could => 522
would => 468
will => 413
can => 320
must => 308
should => 252
might => 200
may => 194
shall => 162
ought => 45


### Exercise 11.4

Extract all the sentences from *BraveNewWorld.txt* that contain an adjective in the superlative form.  Write these sentences into a file named 'sentences.txt'. The code for the words in these category is 'JJS'.

In [4]:
import nltk
from nltk import word_tokenize,pos_tag
from text_mining import *
from collections import Counter

path = os.path.join( 'Corpus','BraveNewWorld.txt')

with open( path , encoding = 'utf-8') as file:
    full_text = file.read()

sentences = sent_tokenize(full_text)

for sentence in sentences:
    words = word_tokenize(sentence)
    words = remove_punctuation(words)
    pos = pos_tag(words)
    
    adj = []
    for p in pos:
        if p[1] == 'JJS':
            adj.append(p[0])
            
    if len(adj)>0:
        print(f"{sentence} [{'|'.join(adj)}]")


A few died; of the rest, the least susceptible divided into two; most put out four buds; some eight; all were returned to the incubators, where the buds began to develop; then, after two days, were suddenly chilled, chilled and checked. [least|most]
From the same ovary and with gametes of the same male to manufacture as many batches of identical twins as possible--that was the best (sadly a second best) that they could do. [best]
They could make sure of at least a hundred and fifty mature eggs within two years. [least]
Result: they're decanted as freemartins--structurally quite normal (except,' he had to admit, 'that they do have just the slightest tendency to grow beards), but sterile. [slightest]
'At least one glance at the Decanting Room,' he pleaded. [least]
and his students stepped into the nearest lift and were carried up to the fifth floor. [nearest]
NEO-PAVLOVIAN CONDITIONING ROOMS, announced the notice board. [notice]
The swiftest crawlers were already at their goal. [swiftest

What should have been the crowning moment of Bernard's whole career had turned out to be the moment of his greatest humiliation. [greatest]
It was in the lowest spirits that he taxied across to his work at the Conditioning Centre. [lowest]
'Listen to this,' was his answer; and unlocking the drawer in which he kept his mouse-eaten book, he opened and read:

    'Let the bird of loudest lay,
    On the sole Arabian tree,
    Herald sad and trumpet be...' 

Helmholtz listened with a growing excitement. [loudest]
'That old fellow,' he said, 'he makes our best propaganda technicians look absolutely silly.' [best]
Oh, if you only knew,' he whispered, and, venturing to raise his eyes to her face, 'Admired Lenina,' he went on, 'indeed the top of admiration, worth what's dearest in the world.' [dearest]
'Oh, you so perfect' (she was leaning towards him with parted lips), 'so perfect and so peerless are created' (nearer and nearer) 'of every creature's best.' [best]
'The murkiest den, the most o

### Exercise11.5

Extract all the sentences from *Ullyses.txt* containing a form of the verb 'to see', in all tenses and conjugations and excepting the infitive form. In other words, extract sentences containing forms such as 'seen', 'saw' or 'seeing', but not 'see'. 


In [5]:
import nltk
from nltk.stem import WordNetLemmatizer
import re
from text_mining import *
lemmatiser = WordNetLemmatizer()


path = os.path.join( 'Corpus','Ullyses.txt')

full_text = ''
with open( path , encoding = 'utf-8') as file:
    full_text = file.read()

sentences = sent_tokenize(full_text)

for sentence in sentences:

    words = word_tokenize(sentence.lower())
    words = remove_punctuation(words)

    pos = nltk.pos_tag(words)
    
    hits = []

    for i,word in enumerate(words):
        word = word.lower()
        posTag = ptb_to_wordnet( pos[i][1] )

        if re.search( r'\w+' , posTag , re.IGNORECASE ):
            lemma = lemmatiser.lemmatize( words[i] , posTag )
            if lemma == 'see':
                hits.append(word)
        else:
            if word == 'see':
                hits.append(word)
                
    if len(hits)>0:
        print(f"{sentence}\n---")
        

As he and others see me.
---
—The rage of Caliban at not seeing his face in a mirror, he said.
---
If
Wilde were only alive to see you!
---
I see them pop off every day in the Mater and
Richmond and cut up into tripes in the dissectingroom.
---
—I see little hope, Stephen said, from her or from him.
---
I
don’t want to see my country fall into the hands of German jews either.
---
—We’ll see you again, Haines said, turning as Stephen walked up the
path and smiling at wild Irish.
---
But can those have been possible seeing
that they never were?
---
I don’t see anything.
---
See.
---
See.
---
You will see at the
next outbreak they will put an embargo on Irish cattle.
---
I have seen it coming these years.
---
And you can see
the darkness in their eyes.
---
You
see if you can get it into your two papers.
---
Shut your eyes and see.
---
Rhythm begins, you see.
---
_Basta!_ I will see if I
can see.
---
See now.
---
With beaded mitre and with crozier, stalled upon his throne, widower of
a wid

---
See this.
---
So in the future, the sister of the past, I may see myself as I sit
here now but by reflection from that which then I shall be.
---
—If you want to know what are the events which cast their shadow over
the hell of time of _King Lear, Othello, Hamlet, Troilus and Cressida,_
look to see when and how the shadow lifts.
---
_L’art d’être
grand_...

—Will he not see reborn in her, with the memory of his own youth added,
another image?
---
He will see in them
grotesque attempts of nature to foretell or to repeat himself.
---
He’ll see you after at
the D. B. C. He’s gone to Gill’s to buy Hyde’s _Lovesongs of Connacht_.
---
From the _Freeman._ He wants to see the files of the
_Kilkenny People_ for last year.
---
You know
Manningham’s story of the burgher’s wife who bade Dick Burbage to her
bed after she had seen him in _Richard III_ and how Shakespeare,
overhearing, without more ado about nothing, took the cow by the horns
and, when Burbage came knocking at the gate, answered 

---
Cissy’s quick motherwit guessed what was amiss and she whispered to Edy
Boardman to take him there behind the pushcar where the gentleman
couldn’t see and to mind he didn’t wet his new tan shoes.
---
Gerty MacDowell who was seated near her companions, lost in thought,
gazing far away into the distance was, in very truth, as fair a
specimen of winsome Irish girlhood as one could wish to see.
---
Had kind fate but willed her to be born a gentlewoman of high degree in
her own right and had she only received the benefit of a good education
Gerty MacDowell might easily have held her own beside any lady in the
land and have seen herself exquisitely gowned with jewels on her brow
and patrician suitors at her feet vying with one another to pay their
devoirs to her.
---
As for undies they were Gerty’s
chief care and who that knows the fluttering hopes and fears of sweet
seventeen (though Gerty would never see seventeen again) can find it in
his heart to blame her?
---
She had four dinky set

---
Never see them sit on a bench marked _Wet Paint_.
---
And Cissy and Tommy and Jacky ran out to see
and Edy after with the pushcar and then Gerty beyond the curve of the
rocks.
---
See!
---
Always see a fellow’s weak point in his
wife.
---
Dress up
and look and suggest and let you see and see more and defy you if
you’re a man to see that and, like a sneeze coming, legs, look, look
and if you have any guts in you.
---
See ourselves as others see us.
---
See.
---
A star I see.
---
All that old hill has seen.
---
Almost see them shimmering, kind of a bluey
white.
---
Colours depend on the light you see.
---
Stare the sun for example
like the eagle then look at a shoe see a blotch blob yellowish.
---
Say you never see them with
three colours.
---
Might be
the one bit me, come back to see.
---
Made me laugh to see.
---
Call to the hospital
to see.
---
Widower I hate to
see.
---
See him sometimes walking about trying to find out who played the
trick.
---
Bend, see my face there, dark
mirr

---
For all
these knotty points see the seventeenth book of my Fundamentals of
Sexology or the Love Passion which Doctor L. B. says is the book
sensation of the year.
---
And a prettier, a daintier head of winsome curls was never seen on a
whore’s shoulders.
---
)_ Married, I see.
---
)_ I see Keating Clay is elected
vicechairman of the Richmond asylum and by the by Guinness’s preference
shares are at sixteen three quarters.
---
And by the offensively smelling vitriol
works did he not pass night after night by loving courting couples to
see if and what and how much he could see?
---
)_ My boys will be no end charmed to see you so
ladylike, the colonel, above all, when they come here the night before
the wedding to fondle my new attraction in gilded heels.
---
Return and see.
---
)_ I see her!
---
If you have none see you damn well get it, steal it, rob it!
---
)_ What have I not seen in
that chamber?
---
Me see.
---
The eye sees all
flat.
---
)_ I
see it in your face.
---
See it in you

What did Stephen see on raising his gaze to the height of a yard from
the fire towards the opposite wall?
---
What did Bloom see on the range?
---
What had prevented him from completing a topical song (music by R. G.
Johnston) on the events of the past, or fixtures for the actual, years,
entitled _If Brian Boru could but come back and see old Dublin now_,
commissioned by Michael Gunn, lessee of the Gaiety Theatre, 46, 47, 48,
49 South King street, and to be introduced into the sixth scene, the
valley of diamonds, of the second edition (30 January 1893) of the
grand annual Christmas pantomime _Sinbad the Sailor_ (produced by R.
Shelton 26 December 1892, written by Greenleaf Whittier, scenery by
George A. Jackson and Cecil Hicks, costumes by Mrs and Miss Whelan
under the personal supervision of Mrs Michael Gunn, ballets by Jessie
Noir, harlequinade by Thomas Otto) and sung by Nelly Bouverist,
principal girl?
---
Did he depict the scene verbally for his guest to see?
---
He preferred hims

Mulveys was the first when I was in bed that morning and Mrs Rubio
brought it in with the coffee she stood there standing when I asked her
to hand me and I pointing at them I couldnt think of the word a hairpin
to open it with ah horquilla disobliging old thing and it staring her
in the face with her switch of false hair on her and vain about her
appearance ugly as she was near 80 or a 100 her face a mass of wrinkles
with all her religion domineering because she never could get over the
Atlantic fleet coming in half the ships of the world and the Union Jack
flying with all her carabineros because 4 drunken English sailors took
all the rock from them and because I didnt run into mass often enough
in Santa Maria to please her with her shawl up on her except when there
was a marriage on with all her miracles of the saints and her black
blessed virgin with the silver dress and the sun dancing 3 times on
Easter Sunday morning and when the priest was going by with the bell
bringing the vatic

### Exercise 11.6

From *Ullyses.txt*, extract all sentences containing the following combinations of categories: 

* Article - adverb - adjective - noun 

These categories can be assigned the following codes:

* Article: DT
* Adverb: RB, RBR or RBS
* Adjective: JJ, JJR or JJS
* Noun: NN, NNP, NNPS or NNS


In [6]:
import nltk
from nltk import word_tokenize , sent_tokenize
from nltk.stem import WordNetLemmatizer
import re
from tdm import *


from os.path import join 

path = join( 'Corpus','Ullyses.txt' )
with open( path , encoding = 'utf-8') as fh:
    full_text = fh.read()
    
sentences = sent_tokenize(full_text)

for sentence in sentences:
    sentence = re.sub(r'\n',' ',sentence)
    words = word_tokenize(sentence)
    words = remove_punctuation(words)
    pos = pos_tag(words)
    
    tagged_sentence = ''

    for p in pos:
        tagged_sentence += p[1] + ' '

    if re.search( r'DT RB JJ NN' , tagged_sentence):
        print(f"{sentence}\n---")
        

A very short space of time through very short times of space.
---
Sermon by the very reverend John Conmee S. J. on saint Peter Claver S. J. and the African Mission.
---
—That’s an awfully good one that’s going the rounds about Reuben J and the son.
---
Old Mrs Thornton was a jolly old soul.
---
[ 10 ]   The superior, the very reverend John Conmee S. J. reset his smooth watch in his interior pocket as he came down the presbytery steps.
---
He jerked short before the convent of the sisters of charity and held out a peaked cap for alms towards the very reverend John Conmee S. J.
---
O, yes: a very great success.
---
O, that was a very nice name to have.
---
On Newcomen bridge the very reverend John Conmee S. J. of saint Francis Xavier’s church, upper Gardiner street, stepped on to an outward bound tram.
---
In the still faint light he moved about, tapping with his lath the piled seedbags and points of vantage on the floor.
---
John Mulligan, the manager of the Hibernian bank, gave me a ve

In fact we are just bringing out a collection of prize stories of which I am the inventor, something that is an entirely new departure.
---
We are considerably out of pocket over this bally pressman johnny, this jackdaw of Rheims, who has not even been to a university.
---
)_ It’s a damnably foul lie, showing the moral rottenness of the man!
---
He was down and out but, though branded as a black sheep, if he might say so, he meant to reform, to retrieve the memory of the past in a purely sisterly way and return to nature as a purely domestic animal.
---
My client, an innately bashful man, would be the last man in the world to do anything ungentlemanly which injured modesty could object to or cast a stone at a girl who took the wrong turning when some dastard, responsible for her condition, had worked his own sweet will on her.
---
_(The very reverend Canon O’Hanlon in cloth of gold cope elevates and exposes a marble timepiece.
---
He has written a really beautiful letter, a poem in its

---
The spirit moving him he would much have liked to follow Jack Tar’s good example and leave the likeness there for a very few minutes to speak for itself on the plea he so that the other could drink in the beauty for himself, her stage presence being, frankly, a treat in itself which the camera could not at all do justice to.
---
Nevertheless he sat tight just viewing the slightly soiled photo creased by opulent curves, none the worse for wear however, and looked away thoughtfully with the intention of not further increasing the other’s possible embarrassment while gauging her symmetry of heaving _embonpoint_.
---
His hat (Parnell’s) a silk one was inadvertently knocked off and, as a matter of strict history, Bloom was the man who picked it up in the crush after witnessing the occurrence meaning to return it to him (and return it to him he did with the utmost celerity) who panting and hatless and whose thoughts were miles away from his hat at the time all the same being a gentleman 

•     [ 18 ]   Yes because he never did a thing like that before as ask to get his breakfast in bed with a couple of eggs since the _City Arms_ hotel when he used to be pretending to be laid up with a sick voice doing his highness to make himself interesting for that old faggot Mrs Riordan that he thought he had a great leg of and she never left us a farthing all for masses for herself and her soul greatest miser ever was actually afraid to lay out 4d for her methylated spirit telling me all her ailments she had too much old chat in her about politics and earthquakes and the end of the world let us have a bit of fun first God help the world if all the women were her sort down on bathingsuits and lownecks of course nobody wanted her to wear them I suppose she was pious because no man would look at her twice I hope Ill never be like her a wonder she didnt want us to cover our faces but she was a welleducated woman certainly and her gabby talk about Mr Riordan here and Mr Riordan there I 

Mulveys was the first when I was in bed that morning and Mrs Rubio brought it in with the coffee she stood there standing when I asked her to hand me and I pointing at them I couldnt think of the word a hairpin to open it with ah horquilla disobliging old thing and it staring her in the face with her switch of false hair on her and vain about her appearance ugly as she was near 80 or a 100 her face a mass of wrinkles with all her religion domineering because she never could get over the Atlantic fleet coming in half the ships of the world and the Union Jack flying with all her carabineros because 4 drunken English sailors took all the rock from them and because I didnt run into mass often enough in Santa Maria to please her with her shawl up on her except when there was a marriage on with all her miracles of the saints and her black blessed virgin with the silver dress and the sun dancing 3 times on Easter Sunday morning and when the priest was going by with the bell bringing the vatic