Skip to content

savkov/harvey-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Harvey Corpus Repository

This repository contains the annotation guidelines used for the building of the Harvey corpus of clinical text.

Files:

  • Annotation guidelines: guidelines.pdf
  • Syntactic chunk annotation (redacted): annotation/harvey-chunks-redacted.txt
  • Sematic expressions (redacted): annotation/harvey-expressions-redacted.txt

About Harvey

The Harvey corpus is a collection of linguistically annotated de-identified clinical text. The data consists of primary care patient examination notes (GP notes) with layers of linguistic annotation. The data was licensed to the PREP project at the University of Sussex and the Brighton and Sussex Medical School. The first annotation layer contains part of speech tags automatically assigned by cTAKES. The other two layers consist of manually annotated syntactic chunks and named entities (expressions).

References

@Article{Savkov2016,
author="Savkov, Aleksandar
and Carroll, John
and Koeling, Rob
and Cassell, Jackie",
title="Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus",
journal="Language Resources and Evaluation",
year="2016",
month="Sep",
day="01",
volume="50",
number="3",
pages="523--548",
issn="1574-0218",
doi="10.1007/s10579-015-9330-7",
url="https://doi.org/10.1007/s10579-015-9330-7"
}

Licence

The Harvey Corpus annotations and guidelines are released under the GPL license.

About

Syntactic chunks and semantic entities annotations and guidelines for the Harvey corpus of primary care text.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published