This repository has been archived by the owner on Oct 9, 2023. It is now read-only.
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
How do I ensure no data leakage in the validation split #143
Labels
question
Further information is requested
What is your question?
I'm working on a dataset that has some identifiable information which would lead to data leakage if not the data is not split properly. Currently the validation split is hard coded in each respective DataModules.
Ideally I'd like a flag that enables me to ensure there is no overlap in the train and validation data on these fields by rebalancing any overlap. Due to the way dataset initialization is hardcoded once the validation dataset is created it becomes immutable.
What is the best way to handle such a check, in Flash?
The text was updated successfully, but these errors were encountered: