coha-filter

Library for quickly finding data in the Corpus of Historical American English (COHA).

This assumes you have got a local copy of COHA on your own computer; we have used Corpus of Historical American English - Kielipankki download version 2017H1.

The program will read the corpus files that were provided in the relational database format. You do not need to do any preprocessing, and you do not need to have a relational database. The program will just read the text files as such.

An example: BE going to V and gonna

In examples/coha-be-going-to.rs we have a sample program that searches for the following phrases in the entire COHA corpus:

VB*, “going”, “to”, V?I*
“gon”, “na”, *
“gon”, “na”, V?I*

If your corpus is in e.g. ~/COHA/ and you would like to store the search results in ~/results/, you can run it like this:

cargo run --release --example coha-be-going-to ~/COHA ~/results

This should take less than half a minute; it will create CSV files in ~/results that are organized by search term and decade. The files will contain the hit and 30 words of context on both sides.

Author

Jukka Suomela

Acknowledgements

This was developed in collaboration with Tanja Säily and Florent Perek.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
examples		examples
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coha-filter

An example: BE going to V and gonna

Author

Acknowledgements

About

Releases

Packages

Languages

License

suomela/coha-filter

Folders and files

Latest commit

History

Repository files navigation

coha-filter

An example: BE going to V and gonna

Author

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages