You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!I noticed that in the following line of code in the preprocess_clausecat.py file, at line 61 in the for loop while splitting the dataset into train and test set
for label in label_dict:
split = int(len(label_dict[label]) * eval_split)
train += label_dict[label][split:]
dev += label_dict[label][:split]
checksum += len(label_dict[label])
table_data.append(
(
label,
len(label_dict[label]),
len(label_dict[label][split:]),
len(label_dict[label][:split]),
)
)
the train and dev assignment statements need to be interchanged. As per the existing assignment, The train set has fewer samples than the dev set. Shouldn't it be the other way round?
Something like this?
train += label_dict[label][:split]
dev += label_dict[label][split:]
The text was updated successfully, but these errors were encountered:
Ah yes, I see why it can be a little confusing but I think the code seems right.
Let's have a look at a small example:
split = int(len(label_dict[label]) * eval_split)
Let's say len(label_dict[label]) = 100
and eval_split = 0.2 (20%)
Then we'd get split = 20
for dev += label_dict[label][:20] we would get the first 20 elements (0->19)
for train += label_dict[label][20:] we would get everything after the first 20 elements (20->len(label_dict[label])-1)
So this way we'd end up with a train (80%) and dev (20%) split.
Okay. So eval_split should be the percentage of split for the dev set right?Meaning out of 100 % data, If I want the split to be 70:30, then i need to give a value of 0.3 for eval_split.
Thank you!
Hi!I noticed that in the following line of code in the preprocess_clausecat.py file, at line 61 in the for loop while splitting the dataset into train and test set
the train and dev assignment statements need to be interchanged. As per the existing assignment, The train set has fewer samples than the dev set. Shouldn't it be the other way round?
Something like this?
The text was updated successfully, but these errors were encountered: