Skip to content

hielkemuizelaar/clinical-dutch-lifestyle-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Exploring BERT Models for Dutch Clinical Lifestyle Classificarion: a thesis project

This repository contains the code for the creation and evaluation of several string matching and machine learning methods used for classification of Dutch clinical texts on the basis of the patient's smoking, alcohol usage and drugs usage statuses. The data used in this project can not be provided due to privacy constraints.

BMC Medical Informatics and Decision Making has published our paper regarding this project. The paper can be found here: https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-024-02557-5 https://doi.org/10.1186/s12911-024-02557-5

Overview

This repo contains the following subfolders:

└───src
│   └───Data Processing and Exploration (provides the code used for gathering, filtering and preparing the data used for pre-training and fine-tuning our models)
│   └───Model Training and Evaluation (provides the code to pre-train and fine-tune multiple BERT models, HAGALBERT is pre-trained from scratch, RobBERT-HAGA, belabBERT-HAGA and MedRoBERTa.nl-HAGA are further pre-trained on our data and BioBERT and ClinicalBERT are merely fine-tuned on translated input)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published