Upload greek-legal-code dataset #2966

christospi · 2021-09-25T16:52:15Z

No description provided.

lhoestq

Cool ! Thanks for adding this one :)

I added a few suggestions in the comments. My main concern is that it would be nice to have the list of all the classes. This is useful to define the id2label parameter when instantiating a model for text classification.

lhoestq · 2021-09-29T13:18:31Z

datasets/greek_legal_code/greek_legal_code.py

+            # License for the dataset if available
+            license=_LICENSE,
+            # Citation for the dataset
+            citation=_CITATION,


we can add a text classification template:

Suggested change

citation=_CITATION,

citation=_CITATION,

task_templates=[TextClassification(text_column="text", label_column="label")],

lhoestq · 2021-09-29T13:22:04Z

datasets/greek_legal_code/greek_legal_code.py

+        features = datasets.Features(
+            {
+                "text": datasets.Value("string"),
+                "label": datasets.Value("string"),


it would be nice to have the list of the possible labels here (even though they can be numerous in the "subject" case):

Suggested change

"label": datasets.Value("string"),

"label": datasets.ClassLabel(names=[...]),

datasets/greek_legal_code/README.md

lhoestq · 2021-09-29T13:23:37Z

datasets/greek_legal_code/README.md

+`text`: (**str**)  The full content of each document, which is represented by its `header` and `articles` (i.e., the `main_body`).\
+`volume`: (**str**)  The volume-level class it belongs to.\
+`chapter`: (**str**)  The chapter-level class it belongs to.\
+`subject`: (**str**)  The subject-level class it belongs to.


maybe we can list the name of the classes here ? Or at least for the "volume" case ?

datasets/greek_legal_code/README.md

albertvillanova

Thanks for adding this dataset @christospi.

I found a potential issue (see description below).

albertvillanova · 2021-10-04T06:20:36Z

datasets/greek_legal_code/greek_legal_code.py

+                data = json.loads(row)
+                yield id_, {
+                    "text": data["text"],
+                    "label": data[self.config.label_type],


I think there is an issue in the "label" field: the dataset will be identically repeated 3 times, once for each of the configuration values.

Hi @albertvillanova, thanks for reviewing.

Why is this an issue?

The goal is that the users will be able to use 3 different configurations of the very same dataset based on the examined label set they prefer.

e.g.,

from datasets import load_dataset # Load the dataset including the volume labels dataset = load_dataset('greek_legal_code', label_type='volume') # Load the dataset including the chapter labels dataset = load_dataset('greek_legal_code', label_type='chapter') # Load the dataset including the chapter labels dataset = load_dataset('greek_legal_code', label_type='subject')

I think it's fine this way. The texts are the same and just the labels are changing depending on the granularity that you need for classification.

…ode-dataset

christospi · 2021-10-11T16:22:27Z

@albertvillanova @lhoestq thank you very much for reviewing! 🤗

I 've pushed some updates/changes as requested.

lhoestq

Thanks !

Let me just do some final minor changes in the dataset card:

datasets/greek_legal_code/README.md

christospi force-pushed the add-greek-legal-code-dataset branch from f5cfdd5 to 775e71b Compare September 25, 2021 17:10

lhoestq reviewed Sep 29, 2021

View reviewed changes

datasets/greek_legal_code/README.md Outdated Show resolved Hide resolved

albertvillanova requested changes Oct 4, 2021

View reviewed changes

christospi added 2 commits October 11, 2021 18:55

Upload greek-legal-code dataset

dc8be94

Merge remote-tracking branch 'upstream/master' into add-greek-legal-c…

2ecbeac

…ode-dataset

christospi force-pushed the add-greek-legal-code-dataset branch from ab2d82a to 2ecbeac Compare October 11, 2021 15:57

lhoestq approved these changes Oct 13, 2021

View reviewed changes

datasets/greek_legal_code/README.md Outdated Show resolved Hide resolved

datasets/greek_legal_code/README.md Outdated Show resolved Hide resolved

datasets/greek_legal_code/README.md Outdated Show resolved Hide resolved

Apply suggestions from code review

6f5905c

lhoestq merged commit 647712f into huggingface:master Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upload greek-legal-code dataset #2966

Upload greek-legal-code dataset #2966

christospi commented Sep 25, 2021

lhoestq left a comment

lhoestq Sep 29, 2021

christospi Oct 9, 2021

lhoestq Sep 29, 2021

christospi Oct 9, 2021

lhoestq Sep 29, 2021

christospi Oct 9, 2021

albertvillanova left a comment

albertvillanova Oct 4, 2021

iliaschalkidis Oct 9, 2021

lhoestq Oct 13, 2021

christospi commented Oct 11, 2021

lhoestq left a comment

	citation=_CITATION,
	citation=_CITATION,
	task_templates=[TextClassification(text_column="text", label_column="label")],

	"label": datasets.Value("string"),
	"label": datasets.ClassLabel(names=[...]),

Upload greek-legal-code dataset #2966

Upload greek-legal-code dataset #2966

Conversation

christospi commented Sep 25, 2021

lhoestq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albertvillanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christospi commented Oct 11, 2021

lhoestq left a comment

Choose a reason for hiding this comment