## Analysis of _Around the World in 80 days_
### Author: __Sam Lyddon__
***

This analysis uses entity recognition to extract the locations travelled to within the book _Around the World in 80 days_, and to compare them against the more general route locations obtained through manual extraction.

In [1]:
import sys
sys.path.append("..")

In [2]:
from pathlib import Path
from collections import Counter
import gensim.corpora as corpora
import gensim.models as models
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

from src import RequestHandler, Document

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
%load_ext watermark
%watermark -d -t -u
%watermark -iv -v

last updated: 2020-03-02 16:25:58
CPython 3.7.6
IPython 7.12.0


In [5]:
%load_ext lab_black

### Pull in the text
***

In [40]:
rq = RequestHandler()

# scrape the text from a web source
# save to local file
data_path = Path("./../data/pg103.txt")
book_path = "http://www.gutenberg.org/cache/epub/103/pg103.txt"
if not data_path.is_file():
    text = rq.scrape_text(book_path)
    data_path.write_text(text)
else:
    text = data_path.read_text()

### Read the text
***

In [41]:
doc = Document(text)
print(f"{doc.title} - {doc.author} - {doc.release_date} - {doc.language}")

Celebrated Travels and Travellers - Jules Verne - March 7, 2008 [EBook #24777] - English


In [42]:
chap_no = 15
print(doc.chapters[chap_no].chapter_header)
[
    " - ".join(w for w, score in features)
    for idx, features in doc.chapters[chap_no].get_topics(10)[:2]
]

CELEBRATED TRAVELLERS FROM THE FIRST TO THE NINTH CENTURY.


['town - century - country - traveller - account - church - great - say - Lord - visit',
 'church - Lord - century - traveller - place - St. - account - China - Mount - Soleyman']

In [43]:
doc.search_paragraphs("Human machine interface for lab abc computer applications")

['human', 'machine', 'interface', 'lab', 'abc', 'computer', 'application']


('"The Tartars believe in God as the Creator of the universe and as the Rewarder and Avenger of all, but they also worship the sun, moon, fire, earth, and water, and idols made in felt, like human beings. They have little toleration, and put Michael of Turnigoo and Féodor to death for not worshipping the sun at midday at the command of Prince Bathy. They are a superstitious people, believing in enchantment and sorcery, and looking upon fire as the purifier of all things. When one of their chiefs dies he is buried with a horse saddled and bridled, a table, a dish of meat, a cup of mare\'s milk, and a mare and foal.',
 0.5888932)

In [44]:
doc.search_paragraphs("Out of the cannon fired fury! Fire!")

['cannon', 'fire', 'fury', 'fire']


('Thibet abounds in lions, bears, and other savage animals, from which the travellers would have much difficulty in defending themselves had it not been for the quantity of large thick canes that grow there, which are probably bamboos: he says, "the merchants and travellers passing through these countries at night collect a quantity of these canes and make a large fire of them, for when they are burning they make such a noise and crackle so much, that the lions, bears, and other wild beasts take flight to a distance, and would not approach these fires on any account; thus both men, horses, and camels are safe. In another way, too, protection is afforded by throwing a number of these canes on a wood fire, and when they become heated and split, and the sap hisses, the sound is heard at least ten miles off. When any one is not accustomed to this noise, it is so terrifying that even the horses will break away from their cords and tethers; so their owners often bandage their eyes and tie th

In [45]:
doc.search_paragraphs("Is it south or west? What is the wind direction? ")

['south', 'west', 'wind', 'direction']


("Meanwhile, the four caravels of Columbus, denied access to the harbour, had been driven before the storm. They were separated one from the other, and disabled, but they succeeded in meeting together again, and by the 14th of July, the squall had carried them within sight of Jamaica. Arrived there, strong currents bore them towards the islands called the Queen's Garden, and then in the direction of east-south-east. The little flotilla contended for sixty days against the wind without making more than 210 miles, and at length was driven towards the coast of Cuba, which led to the discovery of Cayman and Pinos Islands.",
 0.7551625)

In [46]:
doc.search_paragraphs(
    "Where is the steamer Mongolia? Hopefully the water is fresh and not too stormy"
)

['steamer', 'Mongolia', 'hopefully', 'water', 'fresh', 'stormy']


('This last is a large town containing fine squares and shops. It never rains there, but this want is supplied by the overflow of the Nile once a year, which waters the country and renders it very fertile.',
 0.5408858)

In [47]:
doc.search_paragraphs("ships")

['ship']


('Columbus now believed himself to be arrived near the mouth of the Ganges, and from the natives speaking of a certain province of Ciguare, which was surrounded by the sea, he felt himself confirmed in this opinion. They declared that it was a country containing rich gold-mines, of which the most important was situated seventy-five miles to the south. When the admiral again set sail, he followed the wooded coast of Veragua, where the Indians appeared to be very wild. On the 26th of November, the flotilla entered the harbour of El Retrete, which is now the port of Escribanos. The ships battered by the winds, were now in a most miserable plight; it was absolutely necessary to repair the damage they had sustained, and for this purpose to prolong the stay at El Retrete. Upon quitting this harbour Columbus was met by a storm even more dreadful than those which had preceded it: "During nine days," he says, "I remained without hope of being saved. Never did any man see a more violent or terri

In [48]:
doc.search_paragraphs("the game of solitaire")

['game', 'solitaire']


("Seven days' journey further on they came to the beautiful commercial city of Pianfou, now called Pin-yang-foo, where the manufacture of silk was carried on. He soon afterwards came to the banks of the Yellow River, which he calls Caramoran or Black River, probably on account of its waters being darkened by the aquatic plants growing in them; at two days' journey from hence he came to the town of Cacianfu, whose position is not now clearly defined. He found nothing remarkable in this town, and leaving it he rode across a beautiful country, covered with towns, country-houses, and gardens, and abounding in game.",
 0.5456555)