Skip to content

Public repo for the peacekeeping operations corpus (PKOC). Ref: Amicarelli, Elio and Di Salvatore, Jessica, Introducing the PeaceKeeping Operations Corpus (PKOC).

Notifications You must be signed in to change notification settings

elioamicarelli/peacekeeping_operations_corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🦚 About this repo 🦚

This is a public repo for the peacekeeping operations corpus (PKOC). Here you can find the code and documentation needed to create the corpus from scratch as well as the data already cooked for you as described in "Amicarelli, Elio and Di Salvatore, Jessica, Introducing the PeaceKeeping Operations Corpus (PKOC)".

  • In the document documentation/PKOC0A03.pdf you will find the PKOC functions' documentation as well as important info on how you should set up your local environment to succesfully execute the corpus creation workflow. Read this file.

  • The code needed to create the corpus from scratch is in main/PKOC_main.py (Python 3). At the very end of this file there is a commented section showing the entire workflow for the corpus creation.

  • The data can be downloaded from https://dataverse.harvard.edu/dataverse/pkoc. There you will find three pickle files containing python dictiories for the plain, tagged and reduced version of PKOC (see our paper for details). If you want to cook the data by yourself we also shared all the reports converted to txt.

Contributions to this project are very welcome!!!

Notes:

  • The last update of the shared data is June 2020.

  • Please notice that this repository may contain reports that are not using the latest version of the corpus.

Useful resources:

https://chromedriver.chromium.org/

About

Public repo for the peacekeeping operations corpus (PKOC). Ref: Amicarelli, Elio and Di Salvatore, Jessica, Introducing the PeaceKeeping Operations Corpus (PKOC).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published