# Mapping Kate Chopin's *The Awakening*

*This is an example of a data-driven essay that might be published in the Cornell DH Notebook Series. This is a Jupyter notebook file that can be downloaded as an .ipynb file and a PDF, and it can also be opened and [run in the cloud](https://mybinder.org/v2/gh/melaniewalsh/Cornell-DH-Notebooks/main?urlpath=lab/tree/jupyterbook/notebooks/Issue-1/Mapping-Kate-Chopin.ipynb).*

## Introduction

*Essays will begin with an introduction*

In this essay, we will explore the geography of Kate Chopin's 1899 novel *The Awakening*. The novel is set in Louisiana, but it also discusses and imagines many other locations around the world.

To explore this geography, we will use a natural language processing method called Named Entity Recognition (NER) to identify places mentioned in *The Awakening*. Then we will geocode these locations and map them.

## Dataset

*Essays will describe the data*

### Kate Chopin's *The Awakening*

```{epigraph}
Robert spoke of his intention to go to Mexico in the autumn, where fortune awaited him. He was always intending to go to Mexico, but some way never got there. Meanwhile he held on to his modest position in a mercantile house in
New Orleans, where an equal familiarity with English, French and Spanish gave him no small value as a clerk and correspondent.

-- Kate Chopin, *The Awakening*
```

:::{admonition,dropdown,tip} Jupyter Book Tip  
Jupyter Book allows you to include special "directives" for creating cool features on our published website. For example, we can create a dropdown tip like this one by using a [dropdown tip directive](https://jupyterbook.org/content/content-blocks.html#hiding-the-content-of-admonitions) or we can create an epigraph like the one above by using an [epigraph directive](https://jupyterbook.org/content/content-blocks.html#quotations-and-epigraphs).  
:::

## Computational Narrative/Argument

*Essays will proceed to make an argument or construct a narrative interspersed with code and explanations. Because this example is drawn from my course materials, it is heavy on the code/explanation and light on the narrative/argument, but ideally it would contain more claims, insights, findings, etc.*

## Named Entity Recognition

## Install spaCy

In [None]:
!pip install -U spacy

### Import Libraries

We import `spacy` and `displacy`, a special spaCy module for visualization.

In [1]:
import spacy
from spacy import displacy
from collections import Counter
import pandas as pd
pd.options.display.max_rows = 600
pd.options.display.max_colwidth = 400

We also import the `Counter` module for counting places and the `pandas` library for organizing and displaying data (we're also changing the pandas default max row and column width display setting).

### Download Language Model

Next we download the English-language model (`en_core_web_sm`), which will be processing and making predictions about our texts.

In [None]:
!python -m spacy download en_core_web_sm

:::{note}  
spaCy offers [models for other languages](https://spacy.io/usage/models#languages) including German, French, Spanish, Portuguese, Italian, Dutch, Greek, Norwegian, and Lithuanian. Languages such as Russian, Ukrainian, Thai, Chinese, Japanese, Korean and Vietnamese don't currently have their own NLP models. However, spaCy offers language and tokenization support for many of these language with external dependencies — such as [PyviKonlpy](https://github.com/konlpy/konlpy) for Korean or [Jieba](https://github.com/fxsjy/jieba) for Chinese.
:::

### Load Language Model

In [2]:
import en_core_web_sm
nlp = en_core_web_sm.load()

## Process Document

In the cell below, we open and read *The Awakening*. Then we process our `document` with the loaded NLP model. Most of the heavy NLP lifting is done in this line of code.

In [3]:
filepath = "../../data/The-Awakening-Kate-Chopin.txt"
text = open(filepath, encoding='utf-8').read()
document = nlp(text)

## spaCy Named Entities

To quickly see spaCy's NER in action, we can use the [spaCy module `displacy`](https://spacy.io/usage/visualizers#ent) with the `style=` parameter set to "ent"  (short for entities):

In [4]:
displacy.render(document, style="ent")

:::{admonition,dropdown,tip} Jupyter Book Tip  
You can hide cells, hide outputs, or choose to make an output scrollable by adding the [correct directives](https://jupyterbook.org/interactive/hiding.html?highlight=hide#the-toggle-directive) or [metadata tags](https://jupyterbook.org/content/layout.html?highlight=output_scroll#scrolling-cell-outputs) to Jupyter cells. Read more about [How to Add Metadata Tags to Jupyter Notebook Cells](https://jupyterbook.org/advanced/advanced.html#how-should-i-add-cell-tags-and-metadata-to-my-notebooks)
:::

### Get Places

All the named entities in our document can be found in the `document.ents` property. Each of the named entities in `document.ents` contains more [information about itself](https://spacy.io/usage/linguistic-features#accessing), which we can access by iterating through them with a simple for loop.

For each named_entity, we will only extract the named_entity if the corresponding named_entity.label_ is "GPE."  This is the type label for "counties cities, states."

In [73]:
places = []

for named_entity in document.ents:
    if named_entity.label_ == "GPE":
        places.append(named_entity.text)

places_tally = Counter(places)

places_df = pd.DataFrame(places_tally.most_common(), columns=['place', 'count'])
places_df

Unnamed: 0,place,count
0,Mexico,19
1,New Orleans,11
2,Kentucky,7
3,the United States,7
4,New York,6
5,Valmonde,6
6,Brantain,6
7,Mississippi,5
8,Leandre,5
9,Iberville,4


## Geocoding

First, we're going to geocode data — aka get coordinates from addresses or place names — with the Python package [GeoPy](https://geopy.readthedocs.io/en/stable/#). GeoPy makes it easier to use a range of third-party [geocoding API services](https://geopy.readthedocs.io/en/stable/#), such as Google, Bing, ArcGIS, and OpenStreetMap.

Though most of these services require an API key, Nominatim, which uses OpenStreetMap data, does not, which is why we're going to use it here.

### Install GeoPy

In [None]:
!pip install geopy

### Import Nominatim

From GeoPy's list of possible geocoding services, we're going to import Nominatim:

In [74]:
from geopy.geocoders import Nominatim

Nominatim (which means "name" in Latin) uses [OpenStreetMap data](https://www.openstreetmap.org/relation/174979) to match addresses with geopgraphic coordinates. Though we don't need an API key to use Nominatim, we do need to create a unique [application name](https://operations.osmfoundation.org/policies/nominatim/). 

Here we're initializing Nominatim as a variable called `geolocator`. You can make the application name ("Our mapping app") anything you want.

In [75]:
geolocator = Nominatim(user_agent="Our mapping app", timeout=2)

To geocode an address or location, we simply use the `.geocode()` function:

In [85]:
location = geolocator.geocode("New Orleans")
location

Location(New Orleans, Orleans Parish, Louisiana, United States of America, (29.9499323, -90.0701156, 0.0))

## Geocode with Pandas

To geocode every location in a CSV file, we can use Pandas, make a Python function, and `.apply()` it to every row in the CSV file.

Here we make a function with `geolocator.geocode()` and ask it to return the address, lat/lon, and importance score:

In [78]:
def find_location(row):
    
    place = row['place']
    
    location = geolocator.geocode(place)
    
    if location != None:
        return location.address, location.latitude, location.longitude, location.raw['importance']
    else:
        return "Not Found", "Not Found", "Not Found", "Not Found"

Now let's `.apply()` our function to this Pandas dataframe and see what results Nominatim's geocoding service spits out.

In [79]:
places_df[['address', 'lat', 'lon', 'importance']] = places_df.apply(find_location, axis="columns", result_type="expand")
places_df

Unnamed: 0,place,count,address,lat,lon,importance
0,Mexico,19,México,22.5,-100,0.839924
1,New Orleans,11,"New Orleans, Orleans Parish, Louisiana, United States of America",29.9499,-90.0701,0.808026
2,Kentucky,7,"Kentucky, United States of America",37.5726,-85.1551,0.821405
3,the United States,7,United States,39.7837,-100.446,1.13569
4,New York,6,"New York, United States of America",40.7127,-74.006,1.01758
5,Valmonde,6,Not Found,Not Found,Not Found,Not Found
6,Brantain,6,Not Found,Not Found,Not Found,Not Found
7,Mississippi,5,"Mississippi, United States of America",32.9716,-89.7348,0.810392
8,Leandre,5,"Léandre, Le Mont-Bellevue, Sherbrooke, Estrie, Québec, J1H 2B5, Canada",45.3815,-71.9138,0.075
9,Iberville,4,"Iberville Parish, Louisiana, United States of America",30.2121,-91.3134,0.571381


## Making an Interactive Map

To map our geocoded coordinates, we're going to use the Python library [Folium](https://python-visualization.github.io/folium/). Folium is built on top of the popular JavaScript library [Leaflet](https://leafletjs.com/).

In [None]:
!pip install folium

In [80]:
import folium

### Base Map

First, we need to establish a base map. This is where we'll map our geocoded locations. To do so, we're going to call `folium.Map()`and enter the general latitude/longitude coordinates of the New Orleans area at a particular zoom.

To find latitude/longitude coordintes for a particular location, you can use Google Maps, [as described here](https://support.google.com/maps/answer/18539?co=GENIE.Platform%3DDesktop&hl=en).

In [81]:
places_map = folium.Map(location=[29.98, -90.01], zoom_start=3)
places_map

### Add Markers From Pandas Data

Adding a marker to a map is easy with Folium. We'll simply call `folium.Marker()` at a particular lat/lon, enter some text to display when the marker is clicked on, and then add it to our base map.

To add markers for every location in our Pandas dataframe, we can make a Python function and `.apply()` it to every row in the dataframe.

In [82]:
def create_map_markers(row, map_name):
    folium.Marker(location=[row['lat'], row['lon']], popup=row['place']).add_to(map_name)

Before we apply this function to our dataframe, we're going to drop any locations that were "Not Found" (which would cause `folium.Marker()` to return an error).

In [83]:
found_place_locations = places_df[places_df['address'] != "Not Found"]

In [84]:
found_place_locations.apply(create_map_markers, map_name=places_map, axis='columns')
places_map

## Conclusion

*Essays will conclude with final insights and possibilities for future work in the area*