Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide compound ID for all files #55

Open
davidlmobley opened this issue Aug 25, 2017 · 5 comments
Open

Provide compound ID for all files #55

davidlmobley opened this issue Aug 25, 2017 · 5 comments

Comments

@davidlmobley
Copy link
Member

I think we should probably move towards a model where all ligands (or guests) in each benchmark set have an appropriate, unique, paper-specific numerical compound ID, rather than the current model where this is dependent on what set we're looking at. For example:

  • CB7 Tables 1&2: Has unique CID we assigned
  • GDCC Tables 3: Has unique CID we assigned, but will get broken if we want to provide structures docked into hosts as there are two hosts but only one set of compound IDs
  • GDCC Table 4: Has unique CID we assigned
  • CD Table 5 and 6: Has unique CID we assigned
  • lysozyme Tables 7 and 8: No CIDs, uses compound names only
  • BRD4(1) Table 9: Uses heterogeneous identifiers -- "Compound 4", "alprazolam", "Bzt-7", "JQ1(+)" etc.; this is probably the worst offender since some of these are pretty unsuitable as filenames due to special characters and/or spaces (e.g. some tools can't load files with spaces in their filenames and/or handle some of these special characters).

@GHeinzelmann @nhenriksen - thoughts? My preference I think is to make sure every set has a unique numerical compound ID in the tables and that this is used for all of the relevant files.

@GHeinzelmann
Copy link
Collaborator

That sounds good, and it can be done quickly I think. I'll change the ligands names to a provided ID (from 1 -10), and change the associated tables in the paper and in the README file.

@GHeinzelmann
Copy link
Collaborator

GHeinzelmann commented Aug 25, 2017

Working in the BRD4(1) benchmarks table in the main paper, and I won't fit in the page if I keep the ligand names but also add an extra ligand ID column (as done in the CD tables). Should I drop the ligands names altogether? They might not be essential since we are also providing the references.

@davidlmobley
Copy link
Member Author

I'm all for dropping the ligand names, or if you really want to keep track of them, put them in footnotes or in a separate markdown file you link to.

@GHeinzelmann
Copy link
Collaborator

No we can drop them, I only gave the ligands names so the table would look the same as the Lysozyme one. I'll just give a number for each, which will also make the table look better (it was a little decentralized before since it was too wide). Then I'll change the README table and the ligand files names.

@davidlmobley
Copy link
Member Author

Resolved for bromodomains in #48 ; still needs to be done for lysozyme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants