Toponym extraction

This repo contains:

Tools for extracting toponyms (and lemmata) from newspaper articles downloaded from LexisNexis.
The results that were collected with these tools for a research on toponyms in news on Brexit in Dutch newspapers.
A short write up on this case study. Check out the interactive map here.

Workflow

Tools

There are three main scripts that were used to generate the data for this case study. Each script contains further documentation on how they should be used:

Build NER model :Create a spaCy NER-model for extracting toponyms
Build data set: Extract text and meta data from LexisNexis files
Extract toponyms: Apply the model to the data set and extract statistics from it

The PhraseAnnotator in annotation_tools can be used to annotate the NER-results.

Results

This tool currently extracts two main statistics for each geographical category defined in the [MODEL] chapter of config.ini:

Total frequency
Article counts

These scripts will generally store results in Python's pickle format. In order to make the results of this study generally available the following data has been added to the repo as csv-files (some have been zipped):

The metadata for the lexisnexis dataset
The statistics of the toponym recognition
The statistics of the lemmata recognition
The annotation data

The data and results have been made available through an online jupyter notebook. Access the notebook by clicking this button:

Use pandas and altair to explore the data.

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
annotations		annotations
data		data
docs		docs
model		model
notebooks		notebooks
parameters		parameters
resources		resources
results		results
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
config.ini		config.ini
create_env.bat		create_env.bat
environment.yml		environment.yml
run_jupyter_lab.bat		run_jupyter_lab.bat
run_jupyter_lab.sh		run_jupyter_lab.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toponym extraction

Workflow

Tools

Results

About

Releases

Packages

Languages

License

lcvriend/toponym_extraction

Folders and files

Latest commit

History

Repository files navigation

Toponym extraction

Workflow

Tools

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages