add swedish_medical_ner dataset #2940

bwang482 · 2021-09-17T20:03:05Z

Adding the Swedish Medical NER dataset, listed in "Biomedical Datasets - BigScience Workshop 2021"

lhoestq

Cool thank you !

I just added a few comments:

lhoestq · 2021-09-21T09:25:53Z

datasets/swedish_medical_ner/README.md

+language_creators:
+- found
+languages:
+  - sv-SE


Suggested change

- sv-SE

- sv-SE

lhoestq · 2021-09-21T09:28:11Z

datasets/swedish_medical_ner/README.md

+})
+
+In: data['train'][0]['sentence']
+Out: '{kropp} beskrivs i till exempel människokroppen, anatomi och f'


Maybe we can remove the parenthesis, brackets and curly brackets in _generate_examples?
This way people can start training models without further preprocessing

I will leave this to the end users. I am not sure if I should edit the original data much given the data licence. As of now the index fields match with the brackets in the sentences.

Ok since the index match the text with the brackets it's fine.
Note that the data license shouldn't be an issue if we want to add further data processing in the script

datasets/swedish_medical_ner/swedish_medical_ner.py

lhoestq

Thank you ! It looks all good now :)

I just did some minor changes

datasets/swedish_medical_ner/README.md

add swedish_medical_ner dataset

1ef0196

bwang482 mentioned this pull request Sep 17, 2021

adding swedish_medical_ner #2873

Closed

lhoestq reviewed Sep 21, 2021

View reviewed changes

bwang482 added 2 commits September 28, 2021 22:58

update swedish_medical_ner

2570664

Merge remote-tracking branch 'upstream/master' into swedish_medical_ner

02f9cd0

lhoestq approved these changes Oct 5, 2021

View reviewed changes

datasets/swedish_medical_ner/README.md Show resolved Hide resolved

datasets/swedish_medical_ner/README.md Outdated Show resolved Hide resolved

datasets/swedish_medical_ner/README.md Outdated Show resolved Hide resolved

Apply suggestions from code review

d93a761

lhoestq merged commit fdc02f3 into huggingface:master Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add swedish_medical_ner dataset #2940

add swedish_medical_ner dataset #2940

bwang482 commented Sep 17, 2021

lhoestq left a comment

lhoestq Sep 21, 2021

bwang482 Sep 29, 2021

lhoestq Sep 21, 2021

bwang482 Sep 29, 2021

lhoestq Oct 5, 2021

lhoestq left a comment

add swedish_medical_ner dataset #2940

add swedish_medical_ner dataset #2940

Conversation

bwang482 commented Sep 17, 2021

lhoestq left a comment

Choose a reason for hiding this comment

lhoestq Sep 21, 2021

Choose a reason for hiding this comment

bwang482 Sep 29, 2021

Choose a reason for hiding this comment

lhoestq Sep 21, 2021

Choose a reason for hiding this comment

bwang482 Sep 29, 2021

Choose a reason for hiding this comment

lhoestq Oct 5, 2021

Choose a reason for hiding this comment

lhoestq left a comment

Choose a reason for hiding this comment