You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Random splitting between train set and test set can lead to biased results if related cases (same documents reused or same author) are distributed in the two sets.
The text was updated successfully, but these errors were encountered:
possible option: before splitting, apply some kind of doc to doc comparison (like for impostors) and detect possible duplicates or semi-duplicates. But even in that case it's not always possible to find a good way to split.
issue migrated from original private gitlab repo
Random splitting between train set and test set can lead to biased results if related cases (same documents reused or same author) are distributed in the two sets.
The text was updated successfully, but these errors were encountered: