The difference between testing dataset and validation dataset #6

Gongmian784 · 2023-12-18T10:27:05Z

Hi, I found the GTEx bulk RNA-seq donors were divided into three parts (training, validation, and testing donors). I can grasp the purposes of the training and validation subsets in relation to the Hypergraph model's training and accuracy validation respectively, but I cannot fully comprehend the role of the testing dataset.
Could anyone elaborate on the specific purpose of the testing dataset and how it differs from the validation dataset? Can I just split the data into training and validation, and treat the validation dataset as the testing dataset?

Thanks in advance!
Mian

rvinas · 2023-12-18T11:26:17Z

Hi Mian, thank you for your interest in our work. We used the test dataset to evaluate the model's performance on data from individuals who were not observed at train time and also not used for hyperparameter optimisation (validation individuals).

Can I just split the data into training and validation, and treat the validation dataset as the testing dataset?

It depends on what is your objective. If you are interested in evaluating the performance of the model on unseen data, then you should use a test dataset. The hyperparameters of the model were chosen to maximize performance on the validation individuals, so validation performance might not be an accurate estimate of the generalisation performance.

Gongmian784 closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The difference between testing dataset and validation dataset #6

The difference between testing dataset and validation dataset #6

Gongmian784 commented Dec 18, 2023

rvinas commented Dec 18, 2023

The difference between testing dataset and validation dataset #6

The difference between testing dataset and validation dataset #6

Comments

Gongmian784 commented Dec 18, 2023

rvinas commented Dec 18, 2023