# In-Class Exercise: Exploring Gutenberg Using Python
 This exercise includes
 both collaborative and independent components. You will be working primarily in your own Jupyter notebook, but will be collaborating on investigating a question of your own choosing.


 First, you will need to install some dependencies:

 
 - Install BSD-DB according to the instructions here:
 https://github.com/c-w/Gutenberg

 - Next, we'll install a library for downloading texts from gutenberg via pip. After selecting the appropriate shell for Anaconda, type the following into the terminal:
 
 ```bash
 pip install gutenberg
 ```
 
 - Finally, install TextBlob and necessary corpora:
 ```na;j
 pip install -U textblob
 python -m textblob.download_corpora
 ```

In [7]:
# Let's begin by downloading and using the version of Moby Dick published on Project Gutenberg.
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
from textblob import TextBlob

text = strip_headers(load_etext(2701)).strip() #load_etext is what gets the right book, using the number in the parentheses
blob = TextBlob(text)
# print(text)  # prints 'MOBY DICK; OR THE WHALE\n\nBy Herman Melville ...'
# This will save the text to a local .txt file in this directory.
source = open('mobydick.txt','w',encoding="utf-16",newline='\n')
source.write(text)
source.close()

In [8]:
type(text)


str

In [9]:
blob.noun_phrases   # WordList(['titular threat', 'blob',
                    #            'ultimate movie monster',
                    #            'amoeba-like mass', ...])

whole life', 'person ’ s hands', 'ahab', 'savage sea-hawks', 'swift circlings', 'distant horizon', 'ahab', 'wild bird', 'uncommon circumstance', 'heedful eye', 'your', 'sicilian', 'ahab', 'deep gulf', 'sable wing', 'old man ’ s eyes', 'black hawk', 'thrice round', 'tarquin', '’ s head', 'tanaquil', 'tarquin', 'rome', 'ahab', '’ s hat', 'wild hawk', 'black spot', 'vast height', 'chapter', 'pequod meets', 'delight', 'pequod', 'delight', 'broad beams', 'upon', 'stranger ’ s shears', 'white ribs', 'hast', 'whale', '” “', 'hast', '” “', 'noiseless sailors', 'perth', '’ s', 'ahab', 'exclaiming— “', 'nantucketer', 'tempered', 'hot place', 'whale', '” “', 'god', 'old man—see ’ st thou', 'hammock— “', 'stout men', 'dead ere night', 'tomb. ”', 'crew— “', 'god', 'hands— “', 'life—— ” “', 'brace', 'up', 'ahab', 'pequod', 'ahab', 'delight', 'strange life-buoy', 'pequod', '’ s stern', 'conspicuous relief', 'ha', 'sad burial', 'chapter', 'clear steel-blue day', 'pensive air', 'woman ’ s', 'man-like s

In [10]:
for sentence in blob.sentences:
    print(sentence.sentiment.polarity)



0.7
-0.24999999999999992
0.10666666666666666
0.0
0.2
0.0
0.5
0.375
0.0
0.5
0.12499999999999999
0.06999999999999999
-0.0071428571428571435
0.0
-0.08333333333333333
0.0
0.0
0.0
0.0
0.0
-0.025
0.0
0.0
-0.125
0.1619047619047619
-0.18888888888888888
0.0
0.0
-0.03571428571428571
0.20999999999999996
0.016666666666666673
0.146875
0.08333333333333333
0.7666666666666666
0.0
0.0
0.35
0.0
-0.025
-0.05416666666666667
0.0
0.4375
0.0
-0.6
0.0
0.0
0.0
0.3
0.0
0.0
-0.04783950617283952
0.07575757575757576
0.25476190476190474
0.36
0.13229166666666667
0.3
0.2
0.0
0.16666666666666666
0.3
0.5
0.2
0.3977777777777778
0.2722222222222222
0.4166666666666667
0.0
0.21033333333333334
0.0
0.0
0.025000000000000005
-0.052222222222222225
0.0
0.21428571428571427
-0.25
0.0
0.1
0.09999999999999999
0.0864957264957265
0.4572727272727272
0.034999999999999996
0.0
0.2333333333333333
0.0
0.32
0.016666666666666673
0.17840136054421768
0.0
0.0
0.104125
0.11666666666666668
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.0
0.0
0.0
0.0
0.0
0.

In [11]:
from operator import itemgetter  

d = blob.word_counts
for key, value in sorted(d.items(), key = itemgetter(1), reverse = True):
    print(key, value)


es 1
unconcluded 1
top-blocks 1
capstans 1
expertness 1
vivacity 1
pauselessly 1
ignores 1
half-horrible 1
involving 1
all-ramifying 1
heartlessness 1
—yet 1
oddly 1
crutch-like 1
wheezing 1
unstreaked 1
wittiness 1
wanderer 1
clingings 1
pertained 1
abstract 1
unfractioned 1
babe 1
premeditated 1
uncompromisedness 1
unintelligence 1
intermixture 1
uneven 1
spontaneous 1
literal 1
manipulator 1
oozed 1
multum 1
parvo 1
sheffield 1
exterior—though 1
swelled—of 1
screw-drivers 1
cork-screws 1
awls 1
rulers 1
nail-filers 1
countersinkers 1
superiors 1
screw-driver 1
omnitooled 1
open-and-shut 1
machine 1
automaton 1
quicksilver 1
abided 1
life-principle 1
soliloquizing 1
hummingly 1
soliloquizes 1
soliloquizer 1
deck—first 1
filing 1
—bless 1
amputate 1
knee-joint 1
shinbone—why 1
hop-poles 1
soak 1
doctored 1
lotions 1
mogulship 1
manmaker 1
pleases 1
pinch 1
bones—beware 1
mean—what 1
partnership 1
um-m 1
africans 1
pedlar 1
crushing 1
pattern 1
imprimis 1
modelled 1
see—shall 1
outward

In [23]:
max = 0
index = 0
# Find the longest sentence in the work
for key, sentence in enumerate(blob.sentences):
    if(len(sentence.words) > max):
        max = len(sentence.words)
        index = key
print(max)
# this one isn't working?

469


In [16]:
# Find the longest word in the work
max = 0
for key, word in enumerate(blob.words):
    if(len(word) > max):
        max = len(word)
        index = key
print(max)
print(blob.words[index])


28
swayings—coyings—flutterings


# Parts of Speech

Another method Montfort described is to use the tags to count certain parts of speech. Below is an example that uses a single sentence, but the same could be applied to a full manuscript.

In [24]:

pride = TextBlob('''It is a truth universally acknowledged, 
that a single man in possession of a good fortune, must be in 
want of a wife.''')


In [26]:
def adjs(pride):
    count = 0
    for (word, tag) in pride.tags:
        if tag == 'JJ':
            count = count + 1
    return count


In [27]:
adjs(pride)

2

# Creating Figures
There are many ways to create figures. Below is one example of a table. You can save the figure to a file. 

You will need to install orca, however, using conda in order to create a static image:
```
conda install -c plotly plotly-orca
```

In [5]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(header=dict(values=['A Scores', 'B Scores']),
                 cells=dict(values=[[100, 90, 80, 90], [95, 85, 75, 95]]))
                     ])
fig.show()
fig.write_image("in-class/week-6/fig1.png")

ValueError: 
The orca executable is required to export figures as static images,
but it could not be found on the system path.

Searched for executable 'orca' on the following path:
    C:\Users\jo284142\AppData\Local\Continuum\anaconda3
    C:\Users\jo284142\AppData\Local\Continuum\anaconda3\Library\mingw-w64\bin
    C:\Users\jo284142\AppData\Local\Continuum\anaconda3\Library\usr\bin
    C:\Users\jo284142\AppData\Local\Continuum\anaconda3\Library\bin
    C:\Users\jo284142\AppData\Local\Continuum\anaconda3\Scripts
    C:\Users\jo284142\AppData\Local\Continuum\anaconda3\bin
    C:\Users\jo284142\AppData\Local\Continuum\anaconda3\condabin
    C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler
    C:\cygwin64\bin
    C:\Users\jo284142\bin
    C:\Program Files\emacs\bin
    C:\Program Files (x86)\Common Files\Oracle\Java\javapath
    C:\Windows\system32
    C:\Windows
    C:\Windows\System32\Wbem
    C:\Windows\System32\WindowsPowerShell\v1.0
    C:\Program Files (x86)\GtkSharp\2.12\bin
    C:\Program Files\nodejs
    C:\Users\jo284142\MagicLeap\mlsdk\v0.18.0\tools\mldb
    C:\Program Files (x86)\Dell\Dell Display Manager
    C:\Program Files\Pandoc
    C:\Program Files\Common Files\Autodesk Shared
    C:\Program Files\Microsoft SQL Server\120\Tools\Binn
    C:\Program Files\dotnet
    C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common
    C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR
    C:\Program Files\7-Zip
    C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit
    C:\Ruby26-x64\bin
    C:\Ruby25-x64\bin
    C:\Users\jo284142\AppData\Local\Microsoft\WindowsApps
    C:\Users\jo284142\AppData\Local\Programs\Microsoft VS Code\bin
    C:\Users\jo284142\AppData\Local\Programs\MiKTeX 2.9\miktex\bin\x64
    C:\ProgramData\jo284142\GitHubDesktop\bin
    C:\Users\jo284142\AppData\Roaming\npm
    C:\ProgramData\jo284142\atom\bin
    C:\Users\jo284142\.dotnet\tools
    C:\Users\jo284142\bin\php
    C:\Users\jo284142\composer
    C:\Users\jo284142\AppData\Roaming\Composer\vendor\bin
    C:\Users\jo284142\AppData\Local\Programs\Microsoft VS Code Insiders\bin
    C:\Users\jo284142\AppData\Local\Programs\Git\cmd

If you haven't installed orca yet, you can do so using conda as follows:

    $ conda install -c plotly plotly-orca

Alternatively, see other installation methods in the orca project README at
https://github.com/plotly/orca.

After installation is complete, no further configuration should be needed.

If you have installed orca, then for some reason plotly.py was unable to
locate it. In this case, set the `plotly.io.orca.config.executable`
property to the full path of your orca executable. For example:

    >>> plotly.io.orca.config.executable = '/path/to/orca'

After updating this executable property, try the export operation again.
If it is successful then you may want to save this configuration so that it
will be applied automatically in future sessions. You can do this as follows:

    >>> plotly.io.orca.config.save()

If you're still having trouble, feel free to ask for help on the forums at
https://community.plot.ly/c/api/python


In [None]:
import plotly.graph_objects as go

fig = go.Figure(data=[go.Table(header=dict(values=['A Scores', 'B Scores']),
                 cells=dict(values=[[100, 90, 80, 90], [95, 85, 75, 95]]))
                     ])
fig.show()
fig.write_image("in-class/week-6/fig1.png")

We will work with other types of figures, graphs, and tables in Lab 2.

To turn in the assignment, follow the instructions in class_notes.ipynb