Skip to content

| thesis project | Toponym extraction from LexisNexis data using named entity recognition

License

Notifications You must be signed in to change notification settings

lcvriend/toponym_extraction

Repository files navigation

Toponym extraction

Case Study Binder

This repo contains:

  1. Tools for extracting toponyms (and lemmata) from newspaper articles downloaded from LexisNexis.
  2. The results that were collected with these tools for a research on toponyms in news on Brexit in Dutch newspapers.
  3. A short write up on this case study. Check out the interactive map here.

Workflow

Workflow

Tools

There are three main scripts that were used to generate the data for this case study. Each script contains further documentation on how they should be used:

The PhraseAnnotator in annotation_tools can be used to annotate the NER-results.

Results

This tool currently extracts two main statistics for each geographical category defined in the [MODEL] chapter of config.ini:

  1. Total frequency
  2. Article counts

These scripts will generally store results in Python's pickle format. In order to make the results of this study generally available the following data has been added to the repo as csv-files (some have been zipped):

  1. The metadata for the lexisnexis dataset
  2. The statistics of the toponym recognition
  3. The statistics of the lemmata recognition
  4. The annotation data

The data and results have been made available through an online jupyter notebook. Access the notebook by clicking this button:

Binder

Use pandas and altair to explore the data.

About

| thesis project | Toponym extraction from LexisNexis data using named entity recognition

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages