The NorSynthClinical PHI Corpus

A corpus of clinical text which can be used as a reference standard for de-identification of Norwegian clinical text.

The reference standard corpus is synthetic and based on the NorSynthClinical corpus. NorSynthClinical is a synthetic corpus describing patients’ family history relating to cases of cardiac disease, presented and described here: https://github.com/ltgoslo/NorSynthClinical.

To create this reference standard, the NorSynthClinical corpus was extended with personal information, and then, annotated using the following tags:

First_Name
Last_Name
Age
Health_Care_Unit
Phone_Number
Social_Security_Number
Date_Full
Date_Part
Location

The verified version of the reference standard is the "reference_standard_annotated.txt" file.

The reference standard is made by Synnøve Bråten. It was made as a part of a master's thesis in the Joint Master's Programme in Health Informatics at Stockholm University/Karolinska Institutet. The master's thesis describing the work is available here: https://daisy.dsv.su.se/fil/visa?id=230054.

See also Bråten, S., Wie, W., & Dalianis, H. (2021). Creating and Evaluating a Synthetic Norwegian Clinical Corpus for De-Identification. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa) (pp. 222-230). https://aclanthology.org/2021.nodalida-main.22.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Annotation_guidelines.pdf		Annotation_guidelines.pdf
README.md		README.md
reference_standard_annotated.conll		reference_standard_annotated.conll
reference_standard_annotated.txt		reference_standard_annotated.txt
reference_standard_annotated.zip		reference_standard_annotated.zip
reference_standard_not_annotated.txt		reference_standard_not_annotated.txt
reference_standard_not_annotated.zip		reference_standard_not_annotated.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The NorSynthClinical PHI Corpus

About

Releases

Packages

synnobra/NorSynthClinical-PHI

Folders and files

Latest commit

History

Repository files navigation

The NorSynthClinical PHI Corpus

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages