# In-Class Exercise: Exploring Gutenberg Using Python
 This exercise includes
 both collaborative and independent components. You will be working primarily in your own Jupyter notebook, but will be collaborating on investigating a question of your own choosing.


 First, you will need to install some dependencies:

 
 - Install BSD-DB according to the instructions here:
 https://github.com/c-w/Gutenberg

 - Next, we'll install a library for downloading texts from gutenberg via pip. After selecting the appropriate shell for Anaconda, type the following into the terminal:
 
 ```bash
 pip install gutenberg
 ```
 
 - Finally, install TextBlob and necessary corpora:
 ```na;j
 pip install -U textblob
 python -m textblob.download_corpora
 ```

In [2]:
# Let's begin by downloading and using the version of Moby Dick published on Project Gutenberg.
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
from textblob import TextBlob

text = strip_headers(load_etext(29614)).strip() #load_etext is what gets the right book, using the number in the parentheses
blob = TextBlob(text)
# print(text)  # prints 'MOBY DICK; OR THE WHALE\n\nBy Herman Melville ...'
# This will save the text to a local .txt file in this directory.
source = open('gameratdragon.txt','w',encoding="utf-16",newline='\n')
source.write(text)
source.close()

In [2]:
type(text)


str

In [5]:
blob.noun_phrases   # WordList(['titular threat', 'blob',
                    #            'ultimate movie monster',
                    #            'amoeba-like mass', ...])

WordList(['online distributed proofreading team', 'transcriber', 'galaxy', 'fiction', 'october', 'extensive', 'u.s.', 'game', 'rat', 'dragon', 'cordwainer smith', 'hunter', '* * * * *', 'the table', 'illustration', 'pinlighting', 'underhill', 'outer corridor', 'meow', 'uniformed nonentity', 'immense grid', 'cubic grid', 'terrible anxiety', 'faintest trace', 'inert dust', 'sun', 'familiar planets', 'moon', 'own solar system', 'ancient cuckoo clock', 'mars', 'frantic mice', 'far', 'human travel', 'solar', 'telepathic astronomer', 'warm protection', 'sun', '* * * * *', 'woodley', 'underhill', 'down', 'sun', "'s sort", 'woodley', 'undeterred', 'underhill', 'ancient man', 'rats', 'game', 'woodley', 'woodley', 'uh-huh', 'woodley', 'twenty-six years', 'hard work', 'task whenever', 'woodley', 'partners', 'partners', 'ugly thoughts', 'partners', 'partners', 'articulate form', 'chiefs', 'instrumentality', 'underhill', 'happily', "'s sort", 'woodley', 'dogwood', 'dogwood', 'top part', 'rats', 'up

In [6]:
for sentence in blob.sentences:
    print(sentence.sentiment.polarity)


0.0
0.0
-0.13333333333333333
-0.25
0.2
-0.10370370370370373
0.6
0.0
0.0
0.0
0.0
0.0
-0.16666666666666666
0.6
0.11666666666666665
-0.55
0.375
0.4149999999999999
-0.17708333333333331
0.038095238095238106
0.0
0.0
0.45
0.0
0.05
0.0
0.0
0.1986111111111111
0.0
0.2375
0.0
0.0
-0.1
0.475
0.0
0.0
0.0
-0.4
0.0
0.0
0.15833333333333333
0.0
0.23611111111111108
0.25333333333333335
0.6
0.43
0.0
-0.28125
0.35
0.8
0.0
0.0
0.0
0.0
0.0
0.0
0.25
-0.1
0.3
-0.15555555555555559
0.0
0.03333333333333335
0.0
0.0
0.5
0.5
-0.16666666666666666
0.11000000000000001
0.0
0.0
0.0
-0.21666666666666667
0.0
0.0
0.0
-0.04999999999999999
0.25
0.0423611111111111
-0.39999999999999997
-0.1525
0.25
0.2
0.3333333333333333
0.25
-0.14999999999999997
0.02708333333333334
0.2875
-0.08493589743589744
-0.2
0.225
0.0
0.0
-0.13333333333333333
0.0
-0.04
0.22500000000000003
0.0
0.0
0.0
0.07500000000000001
0.4
0.4
-0.07777777777777779
0.0
0.04166666666666667
0.13333333333333333
0.4
0.0
0.05
0.3361111111111111
0.3333333333333333
0.2
0.0
0.00

In [7]:
from operator import itemgetter  

d = blob.word_counts
for key, value in sorted(d.items(), key = itemgetter(1), reverse = True):
    print(key, value)


the 387
of 193
and 134
a 134
he 133
to 111
was 91
in 86
his 61
it 59
that 58
had 58
as 50
her 49
with 48
mind 45
they 45
she 44
for 40
were 39
at 38
you 37
partners 33
could 32
him 32
underhill 30
did 29
out 29
on 28
not 26
like 25
s 24
i 24
all 23
so 23
them 23
but 23
their 22
n't 21
into 21
which 20
by 18
nothing 18
this 17
than 17
have 17
thought 17
from 16
be 16
what 16
pin-set 16
up 16
said 16
or 16
human 16
woodley 16
one 15
people 15
felt 15
space 15
little 15
more 15
ship 15
dragons 15
lady 15
may 15
an 14
captain 14
much 13
around 13
rats 13
other 13
do 13
light 13
partner 13
fight 12
there 12
man 12
about 12
when 12
looked 11
know 11
own 11
ever 11
good 11
saw 11
ships 11
wow 11
back 10
who 10
came 10
old 10
been 10
away 10
time 10
would 10
ready 10
minds 10
father 10
words 10
cat 10
way 9
pinlighting 9
is 9
girl 9
yet 9
far 9
go 9
we 9
moontree 9
himself 8
down 8
warm 8
two 8
now 8
telepathic 8
until 8
sharp 8
through 8
even 8
just 8
once 8
something 8
long 8
found 8
room 8


In [11]:
max = 0
index = 0
# Find the longest sentence in the work
for key, sentence in enumerate(blob.sentences):
    if(len(sentence.words) > max):
        max = len(sentence.words)
        index = key
print(max)
print(blob.sentences[index])
# this one isn't working? Update: added the second print part and it worked

80
In the fraction of a second between the telepaths' awareness of a
hostile something out in the black, hollow nothingness of space and
the impact of a ferocious, ruinous psychic blow against all living
things within the ship, the telepaths had sensed entities something
like the Dragons of ancient human lore, beasts more clever than
beasts, demons more tangible than demons, hungry vortices of aliveness
and hate compounded by unknown means out of the thin tenuous matter
between the stars.


In [9]:
# Find the longest word in the work
max = 0
for key, word in enumerate(blob.words):
    if(len(word) > max):
        max = len(word)
        index = key
print(max)
print(blob.words[index])


19
three-dimensionally


# Parts of Speech

Another method Montfort described is to use the tags to count certain parts of speech. Below is an example that uses a single sentence, but the same could be applied to a full manuscript.

In [11]:

pride = TextBlob('''It is a truth universally acknowledged, 
that a single man in possession of a good fortune, must be in 
want of a wife.''')


In [12]:
def adjs(pride):
    count = 0
    for (word, tag) in pride.tags:
        if tag == 'JJ':
            count = count + 1
    return count


In [27]:
adjs(pride)

2

In [None]:
# Creating Figures
There are many ways to create figures. Below is one example of a table. You can save the figure to a file. 

You will need to install orca, however, using conda in order to create a static image:
```
conda install -c plotly plotly-orca
```

In [14]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(header=dict(values=['A Scores', 'B Scores']),
                 cells=dict(values=[[100, 90, 80, 90], [95, 85, 75, 95]]))
                     ])
fig.show()
fig.write_image("in-class/week-6/fig1.png")

ModuleNotFoundError: No module named 'plotly'

In [None]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(header=dict(values=['A Scores', 'B Scores']),
                 cells=dict(values=[[100, 90, 80, 90], [95, 85, 75, 95]]))
                     ])
fig.show()
fig.write_image("in-class/week-6/fig1.png")

We will work with other types of figures, graphs, and tables in Lab 2.

To turn in the assignment, follow the instructions in class_notes.ipynb