# Extracting Place Names from other Fields

This tutorial scans the two columns from a CSV file ('Title' and 'Description') to look for known place names and writes the values to a separate field.

## 1. Install spaCy

If you do not have spaCy installed yet, choose ONE of the following set of commands. (Uncomment one of these)

In [4]:
# pip install -U pip setuptools wheel
# pip install -U spacy
# python -m spacy download en_core_web_sm

# OR

# conda install -c conda-forge spacy
# python -m spacy download en_core_web_sm

SyntaxError: invalid syntax (1757602442.py, line 1)

## 2. Import modules and models

In [2]:
import spacy
import pandas as pd

# Load English language model for spaCy
nlp = spacy.load("en_core_web_sm")

## 3. Define function to extract place names from a text

In [3]:
def extract_place_names(text):
    doc = nlp(text)
    place_names = []
    for ent in doc.ents:
        if ent.label_ == "GPE": # GPE is the label for geopolitical entities, i.e. place names
            place_names.append(ent.text)
    return place_names

## 4. Open a CSV file and load it into a pandas DataFrame

There are only two columns in the sample CSV (missing-spatial-coverage.csv), `Title` and `Description`. Our goal is a new column just for place names called `Spatial Coverage`.

In [4]:
df = pd.read_csv("missing-spatial-coverage.csv")
print(df)

                                      Title  \
0  Municipalities Outagamie County, WI 2015   
1     Municipalities Pierce County, WI 2012   
2    Municipalities Portage County, WI 2013   

                                         Description  
0  This polygon data layer represents municipalit...  
1  This data layer represents municipalities for ...  
2  This data layer represents municipalities for ...  


## 5. Create an empty list for the coming loop

Before we run a Python loop, we need to create an empty list that will store the information. Here, we give it a name of `places` and set it as equal to empty (`= []`)

In [5]:
places = []
print(places)

[]


## 6. Loop over the Title and Description columns

Scan for place names and add them to the list called places. After scanning, we will print the list and see what it found.

In [6]:
for i, row in df.iterrows():
    title = row["Title"]
    desc = row["Description"]
    place_names = extract_place_names(title) #look for place names in the title field first
    if not place_names: # if no place names found in the title field
        place_names = extract_place_names(desc) #look for place names in the description field
    places.append("|".join(place_names)) #for multiple place names, add them to a cell separated by a pipe
    print(places)

['Outagamie County']
['Outagamie County', 'Pierce County|Wisconsin']
['Outagamie County', 'Pierce County|Wisconsin', 'Portage County']


## 7. Add the extracted places to the dataframe

Next, we need to add the values from the list to our full dataframe in a new column called `Spatial Coverage`

In [7]:
df["Spatial Coverage"] = places
print(df["Spatial Coverage"])

0           Outagamie County
1    Pierce County|Wisconsin
2             Portage County
Name: Spatial Coverage, dtype: object


## 8. Save the results to a new CSV file

In [8]:
df.to_csv("spacy-output.csv", index=False)

## 9. Inspect the new CSV file

In practice, you will likely open a generated CSV file in a spreadsheet editor to prepare the metadata for publishing. However, let's take a look a it within this Notebook using the pandas `.read_csv` function.

In [9]:
new_csv = pd.read_csv("spacy-output.csv")
new_csv.head(3) #displays the first 3 rows for us

Unnamed: 0,Title,Description,Spatial Coverage
0,"Municipalities Outagamie County, WI 2015",This polygon data layer represents municipalit...,Outagamie County
1,"Municipalities Pierce County, WI 2012",This data layer represents municipalities for ...,Pierce County|Wisconsin
2,"Municipalities Portage County, WI 2013",This data layer represents municipalities for ...,Portage County
