Minor change in the code #7

shrinidhin · 2022-03-22T07:22:23Z

Hi!I noticed that in the following line of code in the preprocess_clausecat.py file, at line 61 in the for loop while splitting the dataset into train and test set

for label in label_dict:
        split = int(len(label_dict[label]) * eval_split)
        train += label_dict[label][split:]
        dev += label_dict[label][:split]
        checksum += len(label_dict[label])
        table_data.append(
            (
                label,
                len(label_dict[label]),
                len(label_dict[label][split:]),
                len(label_dict[label][:split]),
            )
        )

the train and dev assignment statements need to be interchanged. As per the existing assignment, The train set has fewer samples than the dev set. Shouldn't it be the other way round?
Something like this?

train += label_dict[label][:split]
dev += label_dict[label][split:]

The text was updated successfully, but these errors were encountered:

thomashacker · 2022-03-22T13:00:56Z

Ah yes, I see why it can be a little confusing but I think the code seems right.
Let's have a look at a small example:

split = int(len(label_dict[label]) * eval_split)

Let's say len(label_dict[label]) = 100
and eval_split = 0.2 (20%)

Then we'd get split = 20

for dev += label_dict[label][:20] we would get the first 20 elements (0->19)
for train += label_dict[label][20:] we would get everything after the first 20 elements (20->len(label_dict[label])-1)

So this way we'd end up with a train (80%) and dev (20%) split.

shrinidhin · 2022-03-28T06:10:48Z

Okay. So eval_split should be the percentage of split for the dev set right?Meaning out of 100 % data, If I want the split to be 70:30, then i need to give a value of 0.3 for eval_split.
Thank you!

shrinidhin closed this as completed Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor change in the code #7

Minor change in the code #7

shrinidhin commented Mar 22, 2022

thomashacker commented Mar 22, 2022

shrinidhin commented Mar 28, 2022

Minor change in the code #7

Minor change in the code #7

Comments

shrinidhin commented Mar 22, 2022

thomashacker commented Mar 22, 2022

shrinidhin commented Mar 28, 2022