Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tractability plugin in data pipeline #880

Closed
andrewhercules opened this issue Mar 16, 2020 · 4 comments
Closed

Update tractability plugin in data pipeline #880

andrewhercules opened this issue Mar 16, 2020 · 4 comments
Assignees
Labels
Enhancement Update to existing feature

Comments

@andrewhercules
Copy link
Contributor

andrewhercules commented Mar 16, 2020

In order to process the latest tractability .tsv file generated by ChEMBL, we will need to update the pipeline as the headings for the small molecule modality have changed. Also, there is now a section for other clinical modalities (e.g. protein, enzyme) with the same clinical precedence buckets 1, 2, and 3. And there is also a single string of ChEMBL IDs that support the clinical precedence buckets 1, 2, and 3 for small molecule, antibody, and other clinical modalities.

As such, @cmalangone, can you please update the pipeline with the following changes:

  1. Update the pipeline to use the data about other clinical modalities and add an entry in the gene index tractability object - see below for a scaffold of what it could look like:
"tractability": {
  "smallmolecule": {},
  "antibody": {},
  "other_modalities": {
    "buckets": [
      1
    ],
    "categories": {
      "clinical_precedence": 1
    }
  }
}

For other clinical modalities, the buckets are 1, 2, and 3 and the categories are "clinical_precedence".

  1. Update the pipeline to use the new small molecule column headings
Old column name / heading New column name / heading
Bucket_1 Bucket_1_sm
Bucket_2 Bucket_2_sm
Bucket_3 Bucket_3_sm
Bucket_4 Bucket_4_sm
Bucket_5 Bucket_5_sm
Bucket_6 Bucket_6_sm
Bucket_7 Bucket_7_sm
Bucket_8 Bucket_8_sm
Bucket_sum Bucket_sum_sm
Top_bucket Top_bucket_sm
Category Category_sm
Clinical_Precedence Clinical_Precedence_sm
Discovery_Precedence Discovery_Precedence_sm
Predicted_Tractable Predicted_Tractable_sm
PDB_Known_Ligand PDB_Known_Ligand
ensemble DrugEBIlity_score
High_Quality_ChEMBL_compounds High_Quality_ChEMBL_compounds
Small_Molecule_Druggable_Genome_Member Small_Molecule_Druggable_Genome_Member
@andrewhercules
Copy link
Contributor Author

ChEMBL have made the data available and it has been uploaded into otar001-core/Tractability/20.04

@andrewhercules andrewhercules removed the Data Relates to Open Targets data team label Mar 26, 2020
@andrewhercules
Copy link
Contributor Author

Based on a conversation with @cmalangone, we will not update the pipeline to process the ChEMBL IDs and labels for this release. Rather, we will work with ChEMBL to update the JSON generated by the tractability pipeline for 20.06.

@d0choa d0choa modified the milestones: 20.04, Wedding crasher sprint Apr 1, 2020
@cmalangone
Copy link

Point 1 and 2 done.
Had a conversation with @LucaFumis about the format for the index.

PR done and merge.
Run a first test and no issue came up.

@andrewhercules
Copy link
Contributor Author

Thank you @cmalangone - the data is now available in the API! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Update to existing feature
Projects
None yet
Development

No branches or pull requests

4 participants