Skip to content
Chemical Exposure Information Corpus
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Chemical Exposure Information Corpus for text classification.

The Chemical Exposure Information (CEI) Corpus consists of 3661 PubMed publication abstracts manually annotated by experts according to a taxonomy. The taxonomy consists of 32 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus. The labels are found under the "labels" directory, while the tokenized text can be found under "text" directory. The filenames are the corresponding PubMed IDs (PMID).


We are constantly exposed to a large number of chemicals present in food, water, air, dust, soil and consumer products. These chemicals enter our bodies via several routes: ingestion, inhalation and dermal absorption. Many of these chemicals are known or suspected to have toxic effects that can cause disorders and diseases. Chemical risk assessment is the process of evaluating such risks, and includes exposure assessment. Exposure assessment methods include both indirect methods, such as exposure modelling and exposure calculations based on environmental measurements and questionnaire data, and direct measurements, such as human biomonitoring (HBM) and personal monitoring. HBM is the measurement of exposure biomarkers (chemicals or chemical metabolites) and effect biomarkers (indicators of effects caused by chemical exposure) in human body tissues or fluids, such as blood, hair and urine. To assess the total exposure to a chemical and evaluate the importance of different exposure routes. We have annotated a corpus of 3686 scientific publication abstracts with a novel classification taxonomy specific to Exposure assessment. The taxonomy is divided into two main branches: Biomonitoring and Exposure routes.

Please cite the following publication

We would very much appreciate it if you cite our publication:

Text mining for improved exposure assessment

  title={Text mining for improved exposure assessment},
  author={Larsson, Kristin and Baker, Simon and Silins, Ilona and Guo, Yufan and Stenius, Ulla and Korhonen, Anna and Berglund, Marika},
  journal={PloS one},
  publisher={Public Library of Science}
You can’t perform that action at this time.