question on mic data #5

chq1155 · 2023-09-26T08:43:06Z

Hi, in the script mic_classifier_training_prodecure.ipynb, there are about 3000 mic_x_train, and about 10000 negatives_x_train.

But why in the training output, it says 'Train on 20457 samples, validate on 1312 samples'?

Thank you for your time

dzjxzyd · 2023-11-30T20:20:33Z

Hi,
I have the similar question in regarding the script "mic_classifier_training_prodecure.ipynb".

x_train = np.concatenate([mic_x_train, negatives_x_train])
y_train = np.concatenate([mic_y_train, np.zeros(len(negatives_x_train))])

At this line, why we combine the negative dataset (assumed, retrieved from UniProt) with the inactive dataset of MIC. the dataset will be come highly imbalanced. Those(assumed) negative sequences from Uniprot is easier to be predicted as negative.

Thanks for your time.
Sincerely,
Zhenjiao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question on mic data #5

question on mic data #5

chq1155 commented Sep 26, 2023

dzjxzyd commented Nov 30, 2023 •

edited

question on mic data #5

question on mic data #5

Comments

chq1155 commented Sep 26, 2023

dzjxzyd commented Nov 30, 2023 • edited

dzjxzyd commented Nov 30, 2023 •

edited