Exploring BERT Models for Dutch Clinical Lifestyle Classificarion: a thesis project

This repository contains the code for the creation and evaluation of several string matching and machine learning methods used for classification of Dutch clinical texts on the basis of the patient's smoking, alcohol usage and drugs usage statuses. The data used in this project can not be provided due to privacy constraints.

BMC Medical Informatics and Decision Making has published our paper regarding this project. The paper can be found here: https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-024-02557-5 https://doi.org/10.1186/s12911-024-02557-5

Overview

This repo contains the following subfolders:

└───src
│   └───Data Processing and Exploration (provides the code used for gathering, filtering and preparing the data used for pre-training and fine-tuning our models)
│   └───Model Training and Evaluation (provides the code to pre-train and fine-tune multiple BERT models, HAGALBERT is pre-trained from scratch, RobBERT-HAGA, belabBERT-HAGA and MedRoBERTa.nl-HAGA are further pre-trained on our data and BioBERT and ClinicalBERT are merely fine-tuned on translated input)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Data Processing and Exploration		Data Processing and Exploration
Model Training and Evaluation		Model Training and Evaluation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring BERT Models for Dutch Clinical Lifestyle Classificarion: a thesis project

Overview

About

Releases

Packages

Languages

hielkemuizelaar/clinical-dutch-lifestyle-extraction

Folders and files

Latest commit

History

Repository files navigation

Exploring BERT Models for Dutch Clinical Lifestyle Classificarion: a thesis project

Overview

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages