Skip to content

public code repository for preprocessing i2b2 2012 dataset.

License

Notifications You must be signed in to change notification settings

nyuolab/i2b2_2012_preprocessing

Repository files navigation

I2B2 2012 Preprocessing

This repo contains code for convert tar.gz file (downloaded from n2c2 data portal) to labels_SPLIT.txt and text_SPLIT.txt, where SPLIT is in [train, dev, test]. This data format is compatible for NeMo TokenClassification Model.

The exact steps of conversion is as follows:

  1. Convert .xml file to brat format
  2. Convert brat to bio/iob2 format
  3. Convert bio to nemo-comptabile format

Usage

python i2b2_2012_preprocessing.py

About

public code repository for preprocessing i2b2 2012 dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published