## Mapping *Mrs Dalloway* by Virginia Woolf
### Contents:
1. Lesson Overview
2. Loading the text
3. Installing spaCy for NLP
4. Finding the place names
5. Gathering coordinates
6. Mapping the data

### Lesson Overview

This lesson will walk you through the process of mapping locations found in Virginia Woolf's *Mrs. Dalloway.* Using the Python library spaCy, we will identify place names from the text and match them with coordinates that can be added as a layer into a mapping software like ArcGIS Online. Through this approach, we can visualize and reflect on the spatial dimensions of Woolf’s writing – how does she use setting to reflect the internal lives of the characters? Are there patterns in the places mentioned?

### Loading the text

First, we need to access the .txt file of Mrs. Dalloway that we will be using:

In [24]:
from pathlib import Path

dalloway_path = Path("./MrsDalloway.txt")

with open(dalloway_path) as f:
    dalloway = f.read()

### Installing spaCy for NLP

To retrieve the place names from this file, we will need to install the Python library spaCy. We also need to select which pre-trained model we will be using – in this case, we can load the medium model (en_core_web_md):


In [30]:
%pip install spacy

%run -m spacy download en_core_web_md

Note: you may need to restart the kernel to use updated packages.
Collecting en-core-web-md==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.8.0/en_core_web_md-3.8.0-py3-none-any.whl (33.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m33.5/33.5 MB[0m [31m62.8 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


Using this library, we can now run our .txt file of *Mrs Dalloway* using this model:

In [None]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_md")

dalloway_nlp = nlp(dalloway)

### Finding place names in the text

Because we are interested in identifying the geographic locations mentioned in the text, we are looking for all of the entities labelled as “LOC” (locations) or “GPE” (geopolitical entities). To see a list of all of these, we can print them as (e.text, e.label_):

In [33]:
ents = [(e.text, e.label_) 
        for e in dalloway_nlp.ents
        if e.label_ in ("LOC", "GPE")]

for ent in ents:
    print(ent)

('India', 'GPE')
('Westminster', 'GPE')
('Westminster', 'GPE')
('London', 'GPE')
('Ascot', 'GPE')
('Hugh', 'GPE')
('London', 'GPE')
('Whitbreads', 'GPE')
('Pimlico', 'GPE')
('Park', 'LOC')
("St. James's Park", 'GPE')
('India', 'GPE')
('Piccadilly', 'GPE')
('London', 'GPE')
('Nigeria', 'GPE')
('England', 'GPE')
('Italy', 'GPE')
('England', 'GPE')
('London', 'GPE')
('Ascot', 'GPE')
('England', 'GPE')
('China', 'GPE')
('Empire', 'GPE')
('England', 'GPE')
("St. James's", 'LOC')
('Victoria', 'GPE')
("St. James's", 'GPE')
('Pimlico', 'GPE')
('Aberdeen', 'GPE')
('Albany', 'GPE')
('West', 'LOC')
('East', 'LOC')
('the Green Park', 'LOC')
('Piccadilly', 'GPE')
("Regent's Park", 'LOC')
("Regent's Park", 'LOC')
('Rezia', 'GPE')
('Rezia', 'GPE')
('Rezia', 'GPE')
('Italy', 'GPE')
('Milan', 'GPE')
('Milan', 'GPE')
('Rezia', 'GPE')
('Edinburgh', 'GPE')
('Rezia', 'GPE')
('London', 'GPE')
("Regent's Park", 'LOC')
('London', 'GPE')
('Edinburgh', 'GPE')
("Regent's Park", 'LOC')
("m'dear", 'GPE')
('Margate

We might not want to map all of these results at once – for this lesson, we can take a random sample of 30 locations from this list to focus on: 

In [34]:
import random

my_ents = random.sample(ents,30)

my_ents

[('London', 'GPE'),
 ("Regent's Park", 'LOC'),
 ('India', 'GPE'),
 ('London', 'GPE'),
 ('Rezia', 'GPE'),
 ('Gordon', 'GPE'),
 ('London', 'GPE'),
 ('London', 'GPE'),
 ('Rezia', 'GPE'),
 ('Mayfair', 'LOC'),
 ('Africa', 'LOC'),
 ('Italy', 'GPE'),
 ('Manchester', 'GPE'),
 ('London', 'GPE'),
 ('Greenwich', 'GPE'),
 ('Kensington', 'GPE'),
 ('Edinburgh', 'GPE'),
 ('London', 'GPE'),
 ('Rezia', 'GPE'),
 ("St. Margaret's", 'GPE'),
 ('India', 'GPE'),
 ('Park', 'LOC'),
 ('London', 'GPE'),
 ('London', 'GPE'),
 ('London', 'GPE'),
 ("Regent's Park", 'LOC'),
 ('Inn', 'GPE'),
 ('Australia', 'GPE'),
 ('London', 'GPE'),
 ('Mayfair', 'LOC')]

### Gathering coordinates for each location

Now that we have a list of some of these locations, we can match them with their latitudes and longitudes in an excel file. We can use the [World Historical Gazetteer](https://whgazetteer.org/) to get this coordinate inforation by simply looking up each name. However, some of the more specific names might not be available on here. In this case, you can use sites like Google Maps or [OpenStreetMaps](https://www.openstreetmap.org/#map=15/51.53016/-0.14789) to gather these coordinates as well. 

Not all of these locations will make sense for matching with XY coordinates. For example, some of the names identified were only “park” or “inn.” Additionally, some cover entire continents like “Africa” or “North America.”

### Mapping our data

Now we have a list of locations and their coordinates that can be used in geospatial software like ArcGIS Online. Depending on what you want to show with the data, you could add additional information to each location - for example, if you wanted to add the counts for each token, you could display the locations using a heat map or graduate symbols to signify how frequently each place is being mentioned.

You might also want to separate the locations into different categories (i.e. locations around London that the characters visit, versus global locations mentioned throughout the text like “India,” “Nigeria,” or “China.”)

In this example, I used several of the locations from around London:


![Map of London](</workspaces/Programming-Historian-Lesson/image_example.png>)