Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libmolgrid issues about stratifying receptors #101

Open
YanjingLiLi opened this issue Jan 21, 2023 · 1 comment
Open

libmolgrid issues about stratifying receptors #101

YanjingLiLi opened this issue Jan 21, 2023 · 1 comment

Comments

@YanjingLiLi
Copy link

Hi authors, I have an issue using the stratify functions in ExampleProvider.

I tried two ways :

  1. train_samples = molgrid.ExampleProvider(ligmolcache=args.trligte, recmolcache=args.trrecte, shuffle=True, default_batch_size=args.batch_size, iteration_scheme=molgrid.IterationScheme.SmallEpoch, balanced=True, stratify_pos=3, stratify_step=1, stratify_max=6, stratify_min=0)

    train_samples.populate(args.trainfile)

(for the whole dataset, stratify_max=20958)

  1. train_samples = molgrid.ExampleProvider(ligmolcache=args.trligte, recmolcache=args.trrecte, shuffle=True, default_batch_size=args.batch_size, iteration_scheme=molgrid.IterationScheme.SmallEpoch, balanced=True, stratify_receptor=True)

    train_samples.populate(args.trainfile)

But when I ran them on cuda, neither of them can function properly. The GPU won't be used and after waiting for a long time, it can have error messages like:
train_samples.populate(args.trainfile) ValueError: No valid examples found in training set. wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.

I have attached my whole dataset types and reduced_data types file here. Would you please take a look at what happens here?
data.zip

@dkoes
Copy link
Contributor

dkoes commented Jan 26, 2023

If you want to both balance and stratify, all of your strata need to have both positive and negative examples. They don't:

$ awk '{print $1,$4}' reduced_data.types  | sort -u
0 0
0 1
0 2
0 3
0 4
0 5
0 6
1 0
1 1
1 2
$ awk '{print $1,$4}' whole_data.types  | sort -u | grep -c "^1"
17596
$ awk '{print $1,$4}' whole_data.types  | sort -u | grep -c "^0"
20763

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants