Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question on mic data #5

Open
chq1155 opened this issue Sep 26, 2023 · 1 comment
Open

question on mic data #5

chq1155 opened this issue Sep 26, 2023 · 1 comment

Comments

@chq1155
Copy link

chq1155 commented Sep 26, 2023

Hi, in the script mic_classifier_training_prodecure.ipynb, there are about 3000 mic_x_train, and about 10000 negatives_x_train.

But why in the training output, it says 'Train on 20457 samples, validate on 1312 samples'?

Thank you for your time

@dzjxzyd
Copy link

dzjxzyd commented Nov 30, 2023

Hi,
I have the similar question in regarding the script "mic_classifier_training_prodecure.ipynb".

x_train = np.concatenate([mic_x_train, negatives_x_train])
y_train = np.concatenate([mic_y_train, np.zeros(len(negatives_x_train))])

At this line, why we combine the negative dataset (assumed, retrieved from UniProt) with the inactive dataset of MIC. the dataset will be come highly imbalanced. Those(assumed) negative sequences from Uniprot is easier to be predicted as negative.

Thanks for your time.
Sincerely,
Zhenjiao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants