Skip to content

Provides "social media"-like natural language data for training health-related classification models in German language

Notifications You must be signed in to change notification settings

michael-eble/nlp-dataset-health-german-language

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

NLP Data Set: German language data for projects concerning digital health

License: CC-BY-4.0, see https://choosealicense.com/licenses/cc-by-4.0/

Purpose and characteristics of the data set

  • Provide "social media"-like natural language data for training classification models in digital health sector
  • Reason why: There are only little data sets publicly available that cover both German language and health topics
  • Each record in the text data is labelled as follows: 1=sentence is health-related, 0=sentence is not health-related
  • "'Social media'-like" means that you can expect rather short sentences, misspelled words, missing punctuation etc.

Current status of the data set

  • Number of text data records with label "related to health": 503
  • Number of text data records with label "not related to health": 503

About

Provides "social media"-like natural language data for training health-related classification models in German language

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages