GitHub - SelinaMeyer/GLoHBCD: Repository for the German Language of Health Behaviour Change Dataset

This repository makes available code to replicate the German Language of Health Behaviour Change Dataset (GLoHBCD) and perform some Machine Learning experiments on the data.

How can I replicate the dataset?

You can find the webcrawler and the script to preprocess the data in the Crawler folder. To run the crawler you will need to install scrapy:

pip install scrapy

Then, after navigating to the Crawler-folder, execute:

scrapy crawl AbnehmenOhneOp -O abnehmenOhneOp.json

scrapy crawl PsychoTherapie -O psychoTherapie.json

you will get two json-files, which can be processed and mapped to corresponding annotations in the Preprocessing.ipynb. In addition to the complete dataset, the Preprocessing.ipynb also produces all files later needed for the machine learning experiments.

What do the labels mean?

A description of the labels used for annotation can be found here

Where do I find code for the experiments?

In the Experiments folder! To execute, you will need the files produced in Preprocessing.ipynb. There are three scripts, one for each annotation level (Label, Sublabel and Valence). They all do roughly the same but take different data-files.

Citation

A paper around this dataset and experiments has been accepted at LREC2022. The paper will be linked here and the reference will be updated once the proceedings are published. For now, when using the dataset please cite:

@InProceedings{meyer-elsweiler:2022:LREC,
author    = {Meyer, Selina  and  Elsweiler, David},
title     = {GLoHBCD: A Naturalistic German Dataset for Language of Health Behaviour Change on Online Support Forums},
booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
month          = {June},
year           = {2022},
address        = {Marseille, France},
publisher      = {European Language Resources Association},
pages     = {2226--2235},
url       = {https://aclanthology.org/2022.lrec-1.239}}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Crawler		Crawler
Experiments		Experiments
Info		Info
.DS_Store		.DS_Store
.Rhistory		.Rhistory
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How can I replicate the dataset?

What do the labels mean?

Where do I find code for the experiments?

Citation

About

Releases

Packages

Languages

SelinaMeyer/GLoHBCD

Folders and files

Latest commit

History

Repository files navigation

How can I replicate the dataset?

What do the labels mean?

Where do I find code for the experiments?

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages