shuffling True on test data decreasing score #58986
Labels
module: nn
Related to torch.nn
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
After training model when i test model in batch using shuffle=False give me good score , when i use the same model and same test records using shuffle=True give me bad score , i am confused why it is so?
dataset = pd.read_csv('Churn_Modelling.csv')
I shuffle the data before splitting data into train/test
I am embedding following categorical variables
After encoding label encoder above , these columns converted into integer - hence re converting them to
category
Get embedding categorical columns
Splitting train/test data
Following function will return categorical , numerical columns separately , reason for this i want to embed categorical column separately and then combined with numerical features while training
Size of embedding columns
Model
Training
Validation When Shuffle=False- Sores are below
Validation When Shuffle=True- Sores are below
As you can see when shuffle=False the class "1" precision/recall score is way better than when shuffle=True - I am lost why it is, in real world data could be in any order? Please help
cc @albanD @mruberry @jbschlosser
The text was updated successfully, but these errors were encountered: