Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The difference between testing dataset and validation dataset #6

Closed
Gongmian784 opened this issue Dec 18, 2023 · 1 comment
Closed

Comments

@Gongmian784
Copy link

Hi, I found the GTEx bulk RNA-seq donors were divided into three parts (training, validation, and testing donors). I can grasp the purposes of the training and validation subsets in relation to the Hypergraph model's training and accuracy validation respectively, but I cannot fully comprehend the role of the testing dataset.
Could anyone elaborate on the specific purpose of the testing dataset and how it differs from the validation dataset? Can I just split the data into training and validation, and treat the validation dataset as the testing dataset?

Thanks in advance!
Mian

@rvinas
Copy link
Owner

rvinas commented Dec 18, 2023

Hi Mian, thank you for your interest in our work. We used the test dataset to evaluate the model's performance on data from individuals who were not observed at train time and also not used for hyperparameter optimisation (validation individuals).

Can I just split the data into training and validation, and treat the validation dataset as the testing dataset?

It depends on what is your objective. If you are interested in evaluating the performance of the model on unseen data, then you should use a test dataset. The hyperparameters of the model were chosen to maximize performance on the validation individuals, so validation performance might not be an accurate estimate of the generalisation performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants