Skip to content

Tools for processing the Corpus of Historical American English (COHA)

License

Notifications You must be signed in to change notification settings

suomela/coha-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

coha-filter

Library for quickly finding data in the Corpus of Historical American English (COHA).

This assumes you have got a local copy of COHA on your own computer; we have used Corpus of Historical American English - Kielipankki download version 2017H1.

The program will read the corpus files that were provided in the relational database format. You do not need to do any preprocessing, and you do not need to have a relational database. The program will just read the text files as such.

An example: BE going to V and gonna

In examples/coha-be-going-to.rs we have a sample program that searches for the following phrases in the entire COHA corpus:

  • VB*, “going”, “to”, V?I*
  • “gon”, “na”, *
  • “gon”, “na”, V?I*

If your corpus is in e.g. ~/COHA/ and you would like to store the search results in ~/results/, you can run it like this:

cargo run --release --example coha-be-going-to ~/COHA ~/results

This should take less than half a minute; it will create CSV files in ~/results that are organized by search term and decade. The files will contain the hit and 30 words of context on both sides.

Author

Jukka Suomela

Acknowledgements

This was developed in collaboration with Tanja Säily and Florent Perek.

About

Tools for processing the Corpus of Historical American English (COHA)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages