Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✍️ Contribution period: <Ahmed Yusuf> #615

Closed
13 tasks done
AhmedYusuff opened this issue Mar 7, 2023 · 62 comments
Closed
13 tasks done

✍️ Contribution period: <Ahmed Yusuf> #615

AhmedYusuff opened this issue Mar 7, 2023 · 62 comments

Comments

@AhmedYusuff
Copy link

AhmedYusuff commented Mar 7, 2023

Week 1 - Get to know the community

  • Join the communication channels
  • Open a GitHub issue (this one!)
  • Install the Ersilia Model Hub and test the simplest model
  • Write a motivation statement to work at Ersilia
  • Submit your first contribution to the Outreachy site

Week 2 - Install and run an ML model

  • Select a model from the suggested list
  • Install the model in your system
  • Run predictions for the EML
  • Compare results with the Ersilia Model Hub implementation!

Week 3 - Propose new models

  • Suggest a new model and document it (1)
  • Suggest a new model and document it (2)
  • Suggest a new model and document it (3)

Week 4 - Prepare your final application

  • Submit the final application in the Outreachy website
@AhmedYusuff
Copy link
Author

Following the Guidelines listed, I have been able to successfully install Ersilia Model Hub on my system.

I Installed the Hub on My Ubuntu Substation

@AhmedYusuff
Copy link
Author

I have been able to successfully test that Ersilia works, By running a fetch, serve and calculate command on a simple "CCC" Molecules.

image

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 7, 2023

Why I want to Contribute to Ersilia

I got to Know about Outreachy while browsing the internet, I was intrigued by its support of diversity and how much help it has rendered the underrepresented community over the years, and I could see the amount of contributions it has rendered to the Open source community over the years.

I was filled with elation yesterday when I got the congratulatory message that my Initial Application has been accepted and I can proceed to the contribution page.

While going through the list of Open source projects I can contribute to, I stumbled upon Ersilia, and one of the reasons I became interested in this is because Ersilia Research is geared towards Underdeveloped and Developing countries, especially countries on the African continent. And being from Nigeria this is really something that resonates with me.

Ersilia is also listed in Forbes's annual roundup of tech nonprofits, and being allowed to contribute to such a project will be an indescribable honor to me.

As a lover of Data Analyzation, I have a BSc in Information Technology from Middlesex University in Malta, And an Advanced Diploma In Software Engineering From Aptech, I have always been fascinated by AI/ML, the processes behind it, how the data are sourced i.e. data integrity, seeing data being applied to our biggest medical challenges.

I have always been looking for ways to improve my skills and test myself in this field, and by working on the Ersilia Open source initiative, which is a very good project and one that will have a lasting impact on humanity. I will have fulfilled a long dream of mine, to contribute to human society and also learn at the same time.

@GemmaTuron
Copy link
Member

Hi @TemiTomTom !

Welcome to Ersilia's community and thanks for the work. Let us make sure all applicants are set up and we will assign some more tasks to complete!

@AhmedYusuff
Copy link
Author

Many thanks, @GemmaTuron. I'm glad to be here.

@GemmaTuron
Copy link
Member

Hi @AhmedYusuff !
Can you please see this issue for a new model we are incorporating and test if it works for you? you can report it in the issue directly:
ersilia-os/eos81ew#2

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 10, 2023

Hello @GemmaTuron.

MODEL TEST FOR EOS81EW

I tested the Model on my Ubuntu 22.04 system and Google Colab.

Outcome

  • Encountered a KeyError: Input and a PermissionError: [Errno 13] Permission denied: '../../checkpoints' Errors at various stages on Ubuntu 22.04.
    Firsterror.log.log
    SecondError.log.log

  • Going through the Log file I also noticed that the Model was having trouble with the metadata and there seems to be a difference between the value of the slug or model identifier I'm getting while fetching and what we have on the project repo.

  • The Model test was successful on Google colab.

Result

  • The Metadata was Reformatted by @GemmaTuron and @pauline-banye made a PR to resolve the problem with the slug and model identifier.

  • The Permission error was due to the Model trying to create a directory, But the User had insufficient permission, So I had to grant the user Privileges.

  • After that, I was able to successfully fetch, serve and run model eos81ew on ClI.
    eos81ewfetch.log

  • Model eos81ew was tested with a simple smile string on the CLI.
    eos81ewtest.csv

This process is documented in ersilia-os/eos81ew#2

@GemmaTuron
Copy link
Member

Thanks @AhmedYusuff !
Very detailed explanation, really appreciated. I'll make sure to add comments on user permissions for running models, since in most cases we do need to create temporary directories in the system.

@AhmedYusuff
Copy link
Author

Many Thanks @GemmaTuron

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 13, 2023

Model Name

SARS-CoV2 activity (Image Mol)

Model description

ImageMol is a molecular image-based unsupervised pre-training deep learning framework for computational drug discovery.
SARS-CoV2 activity (Image Mol) generates images that show and describe the activity of molecules against SARS-Cov2, through Image Mol we can simplify the challenges faced by scientists and researchers in visualizing and analyzing molecular structures. It has also shown high accuracy in identifying anti-SARS-CoV-2 molecules across 13 high-throughput experimental datasets from the National Center for Advancing Translational Sciences.

Why I Choose this Model

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly transmissible and pathogenic coronavirus that emerged in late 2019 and has caused a pandemic of acute respiratory disease, named ‘coronavirus disease 2019’ (COVID-19), which threatens human health and public safety.

COVID-19 is a disease that has left a long-lasting impact on our society, personally, I have also been affected by this pandemic, as I lost people close to me to this disease. Coming from a Low-income Country, I witnessed the various challenges we had to face because of our lack of resources, and how our health sectors had to wait for a vaccine to be developed by other advanced countries and hoped we can get some of it ourselves.

So therefore any help I can render, no matter how minute in the potential treatment of COVID-19 is something that will leave me with deep satisfaction.

@AhmedYusuff
Copy link
Author

Installation Steps

I'll be following the steps listed in this repo.

https://github.com/HongxinXiang/ImageMol

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 14, 2023

Installation environment

1. GPU environment ---CUDA 10.1


For this, I will need to change my Environment, as my Ubuntu 22.04 is installed on a virtual box, and to use CUDA I will need direct access to the GPU hardware, Even though I have tried various methods to support a GPU passthrough and install CUDA on a VMWare it has not been successful. @GemmaTuron do you have any advice for me on this?

Going forward I'll be setting up WSL 2 to continue my installation steps.

@GemmaTuron
Copy link
Member

Installation environment

1. GPU environment ---CUDA 10.1

For this, I will need to change my Environment, as my Ubuntu 22.04 is installed on a virtual box, and to use CUDA I will need direct access to the GPU hardware, Even though I have tried various methods to support a GPU passthrough and install CUDA on a VMWare it has not been successful. @GemmaTuron do you have any advice for me on this?

Going forward I'll be setting up WSL 2 to continue my installation steps.

Hi @AhmedYusuff ,

For the image mol model, you do not need to retrain it, but try to make predictions using the SARS-CoV2 pre-trained model. (see section Finetuning in the README file of the repo). For that, no GPU is needed. You might need to tweak the installation so that it works on cpu machines

@AhmedYusuff
Copy link
Author

Many thanks @GemmaTuron

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 14, 2023

Installation Environment


2.Create a new conda environment and activate it

A Conda environment ImageMol was created successfully with no issues. The Environment was created with python version 3.7.3, and activated.

Download some packages

  • RDKit Package:

Before downloading this package, remember to activate your Conda environment with Conda activate Imagemol. Not doing this might lead to UnsatisfiableError where your version of RDkit might not be compatible with the existing python installation in your environment. As shown below

8

I also got a CondaHTTPError, which could be a result of wrong Proxy configuration settings, weak Internet or problem with firewall or an anti-virus software.

After doing all this I was able to install the RDKit package successfully in the Conda Environment.


  • Torch_and Torchvision:

These packages were installed successfully with no issue.

  • Torch-cluster, Torch-scatter, Torch-sparse, and Torch-spline-conv.

I got the below error messages while trying to install this lib package.

9

From the error message, it can be seen that it's the Torch-scatter lib that failed to install, so therefore, I installed the rest of the lib packages separately from their binary source and the installation was successful.

I was able to resolve the issue of Torch-scatter by installing this version of torch 1.8.0 via pip install --no-index torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html

@GemmaTuron , there is no specific requirement of a particular version for this lib package, so I shouldn't have any issue, right?

@GemmaTuron
Copy link
Member

Hi @AhmedYusuff !

thanks for the detailed explanation, this looks good to me! Indeed, torch package always gives issues so it's best to specify the exact version always

@Zainab-ik
Copy link
Collaborator

HI @AhmedYusuff, Thanks for the detailed explanation.
I'd like to ask for the version of your

  • CUDA
  • Torch and
  • Torchvision
    I'm also working on this model, I'm unable to download the specific version of CUDA stated in the installation procedure, I was able to download CUDA 11.5 as averse to CUDA 10.1 and I realized the installation code for the other Torch packages specified CUDA10.1 in the code which can pose an issue.
    @GemmaTuron , I'd also like to ask if installation mode is not a problem because I was able to install the other Pytorch packages such as Torch-scatter and Torch-sparse using the conda instead of the pip specified.

kindly check issue #624

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 15, 2023

Many thanks, @GemmaTuron.

@Zainab-ik . I'm making use of CPU Instead of GPU. So I didn't Install CUDA.

I'm using the Torch and Torchvision version recommended i.e 1.40 and 0.5.0 respectively.

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 15, 2023

Finetuning models


The Pre-trained SARS-CoV-2 Model will be Finetuned on 13 Datasets/Assays.

  1. 3CL_enzymatic_activity

  2. ACE2_enzymatic_activity

  3. HEK293_cell_line_toxicity

  4. Human_fibroblast_toxicity

  5. MERS_Pseudotyped_particle_entry

  6. MERS_Pseudotyped_particle_entry_(Huh7_tox_counterscreen)

  7. SARS-CoV-2_cytopathic_effect_(CPE)

  8. SARS-CoV-2_cytopathic_effect_(host_tox_counterscreen)

  9. SARS-CoV_Pseudotyped_particle_entry

  10. SARS-CoV_Pseudotyped_particle_entry_(VeroE6_tox_counterscreen)

  11. Spike-ACE2_protein-protein_interaction_(AlphaLISA)

  12. Spike-ACE2_protein-protein_interaction_(TruHit_Counterscreen)

  13. TMPRSS2_enzymatic_activity

The SARS-CoV-2 Datasets were gotten from here https://drive.google.com/file/d/1UfROoqR_aU6f5xWwxpLoiJnkwyUzDsui/view?usp=sharing

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 15, 2023

3CL_enzymatic_activity


To finetune the SARs-CoV-2 Model on this Dataset. I executed the below code.

python ./ImageMol/finetune.py --gpu 0 \ --save_finetune_ckpt 1 \ --log_dir ./logs/toxcast \ --dataroot ./ImageMol/datasets/finetuning/SARS-CoV-2 \ --dataset 3CL_enzymatic_activity\ --task_type classification \ --resume ./ckpts/ImageMol.pth.tar \ --image_aug \ --lr 0.5 \ --batch 64 \ --epochs 20

Outcome

I immediately got this error message.

12

It can be seen from the screenshot the CUDA Runtime Error, so I modified the filetune.py to work on CPU by changing model = model.cuda() to model = model

Result

The result of finetuning of 3CL_enzymatic_activity Dataset.

final results: highest_valid: 0.579, final_train: 0.467, final_test: 0.167

13


Model Evaluation

The process of comparing and evaluating our Model Performance.

I executed the below code.

python ./ImageMol/evaluate.py --dataroot ./ImageMol/datasets/finetuning/SARS-CoV-2 \ --dataset 3CL_enzymatic_activity\ --task_type classification \ --resume ./toxcast.pth \ --batch 128

To forestall RuntimeError: cuda runtime error (100). I modified evaluate.py so as to allow the evaluation to be done on the CPU.

Result

ROCAUC VALUE = 69.9%

14

A ROCAUC score tells us how efficient the Model performance is, A score that is closer to 50% Usually indicates a Poor Model performance, and closer to 100% shows that the model is very good at identifying prediction values.

Our 69.9% values show that our model performed poorly, I'm looking to see why @GemmaTuron ?

And I'm thinking that the lr abbreviation in the finetuning code is the learning rate which was set at 0.5

Also, can it be suggested that other performance metrics like Precision and Recall Value be incorporated in evaluation.py ?

@paulinebanye
Copy link
Contributor

Hello @GemmaTuron.

MODEL TEST FOR EOS81EW

I tested the Model on my Ubuntu 22.04 system and Google Colab.

Outcome

  • Encountered a KeyError: Input and a PermissionError: [Errno 13] Permission denied: '../../checkpoints' Errors at various stages on Ubuntu 22.04.
    Firsterror.log.log
    SecondError.log.log
  • Going through the Log file I also noticed that the Model was having trouble with the metadata and there seems to be a difference between the value of the slug or model identifier I'm getting while fetching and what we have on the project repo.
  • The Model test was successful on Google colab.

Result

  • The Metadata was Reformatted by @GemmaTuron and @pauline-banye made a PR to resolve the problem with the slug and model identifier.
  • The Permission error was due to the Model trying to create a directory, But the User had insufficient permission, So I had to grant the user Privileges.
  • After that, I was able to successfully fetch, serve and run model eos81ew on ClI.
    eos81ewfetch.log
  • Model eos81ew was tested with a simple smile string on the CLI.
    eos81ewtest.csv

This process is documented in ersilia-os/eos81ew#2

@masroor07 @ZakiaYahya

@masroor07
Copy link

Hello @GemmaTuron.

MODEL TEST FOR EOS81EW

I tested the Model on my Ubuntu 22.04 system and Google Colab.

Outcome

  • Encountered a KeyError: Input and a PermissionError: [Errno 13] Permission denied: '../../checkpoints' Errors at various stages on Ubuntu 22.04.
    Firsterror.log.log
    SecondError.log.log
  • Going through the Log file I also noticed that the Model was having trouble with the metadata and there seems to be a difference between the value of the slug or model identifier I'm getting while fetching and what we have on the project repo.
  • The Model test was successful on Google colab.

Result

  • The Metadata was Reformatted by @GemmaTuron and @pauline-banye made a PR to resolve the problem with the slug and model identifier.
  • The Permission error was due to the Model trying to create a directory, But the User had insufficient permission, So I had to grant the user Privileges.
  • After that, I was able to successfully fetch, serve and run model eos81ew on ClI.
    eos81ewfetch.log
  • Model eos81ew was tested with a simple smile string on the CLI.
    eos81ewtest.csv

This process is documented in ersilia-os/eos81ew#2

@masroor07 @ZakiaYahya

I tried granting the user extra privileges. I am still trying to figure out why it still ain’t working.
Thank you tho

@AhmedYusuff
Copy link
Author

@masroor07 @ZakiaYahya, you need to give permission to your user to access the directory.

Use sudo chown -R <username>: <DirName> , your Directory name in this case will be ..

@masroor07
Copy link

@masroor07 @ZakiaYahya, you need to give permission to your user to access the directory.

Use sudo chown -R <username>: <DirName> , your Directory name in this case will be ..

Prolly this seems like it will work. I was using hone/user as the parent directory.
Thank you!

@AhmedYusuff
Copy link
Author

You are welcome @masroor07

@masroor07
Copy link

masroor07 commented Mar 15, 2023

You are welcome @masroor07

Not working! I even tried changing the directory to ../... I got:

chown: changing ownership of '../../sys/kernel/tracing': Operation not permitted
chown: changing ownership of '../../sys/kernel/debug': Operation not permitted
chown: changing ownership of '../../sys/fs/bpf': Operation not permitted
chown: changing ownership of '../../sys/fs/fuse/connections': Operation not permitted

@ZakiaYahya
Copy link
Contributor

@masroor07 @ZakiaYahya, you need to give permission to your user to access the directory.

Use sudo chown -R <username>: <DirName> , your Directory name in this case will be ..

@AhmedYusuff AND @masroor07 Not working in my case too, still giving the same error.

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 15, 2023

@masroor07 and @ZakiaYahya.

Use this command.

sudo chown -R <username>:<username> .. also make sure you are in the Home directory.

You can also navigate to the root directory using cd / then sudo adduser username.

You are trying to give your current User permission to create a folder/directory in your root directory.

@GemmaTuron
Copy link
Member

Great, thanks @AhmedYusuff !
If you have time to tackle the extra task (Try to run predictions from ImageMol downloading the pre-trained model for HIV (finetuned models section) - compare with the Ersilia implementation of the model (eos6hy3)) that would be great

@Zainab-ik
Copy link
Collaborator

Many thanks, @GemmaTuron.

@Zainab-ik . I'm making use of CPU Instead of GPU. So I didn't Install CUDA.

I'm using the Torch and Torchvision version recommended i.e 1.40 and 0.5.0 respectively.

Alright, Thank you. Same here.

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 17, 2023

Prediction of molecular properties and Compounds that can inhibit HIV Replications Using ImageMol and Ersilia(eos6hy3).


MODEL NAME

HIV (ImageMol)

MODEL DESCRIPTION

This pre-trained Model has already been finetuned and has been pretrained on 10 million unlabelled drug-like, bioactive molecules, to predict molecular targets of candidate compounds.

The HIV dataset was introduced by the Drug Therapeutics Program (DTP) AIDS Antiviral Screen, which tested the ability to inhibit HIV replication for over 40,000 compounds. Screening results were evaluated and placed into three categories: confirmed inactive (CI), confirmed active (CA), and confirmed moderately active (CM). We further combine the latter two labels, making it a classification task between inactive (CI) and active (CA and CM).

https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html#hiv-datasets

This is a Binary classification Model and the Bioactivity (i.e. the variable that is to be predicted) is either active or Inactive, I have listed the Bioactivity below.

  1. Confirmed inactive (CI)----(0).
  2. Confirmed active (CA) and Confirmed Moderately active (CM)-----(1).

Outcome

@GemmaTuron , @pauline-banye , @DhanshreeA . I'm getting this error message below while trying to get the predictions for the HIV finetuned Model.

21

Looking through filetune.py if os.path.isfile(args.resume): # only support ResNet18 when loading resume it seems the loaded model extension has to be in .pth or pt.

Now i'm getting this error message.

Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old.

Will I have to Update my PyTorch Version?, even though I am using the version that was prescribed in the installation manual 1.4.0 .

I have set the map_location argument to use the CPU in the torch.load() function, but still getting the same error message.

I was able to resolve this by manually changing the version file from 3 to 2 and saving the .pth file again.

But I seem to be getting another error mesages.

raise KeyError("filename %r not found" % name) KeyError: "filename 'storages' not found"

I re-saved the model with some new parameters, but still getting errors. I'm suspecting the file to have been corrupted.

I downloaded new Model checkpoints. But I'm getting these error messages.

RuntimeError: [enforce fail at inline_container.cc:197] . file not found: archive/tensors/2721195271856

Result

This issue was a result of the Model Checkpoints being saved with a Newer Version of PyTorch, than the version that I was using to load it.

The Pytorch version that was recommended in the GitHub repo is insufficient in Loading the Finetuned Model's.

I had to Upgrade Using this command conda install pytorch torchvision torchaudio cpuonly -c pytorch.

Predictions were successful on the MPP/ HIV Datasets, and I obtained a ROCAUC score of 80% which is similar to the 0.814 from the Author's (see below screenshots).

26

20

Result Explanation

Our Benchmark Dataset is gotten from https://moleculenet.org/.

HIV (human immunodeficiency virus) targets several molecular components in the Human Body during its replication cycle, In treating HIV, we will need to look for drugs that target the various step in the HIV viral replications cycle and interfere with the process.

The aim of our Model is to predict compounds that show activity against HIV Molecular target profiles, that will inhibit the Replication of HIV by interfering with the process. Thereby reducing the production of a new virus in the body.

The dataset that was used for the predictions contains over 40,000 compounds and the result is placed into three categories confirmed inactive (CI), confirmed active (CA), and confirmed moderately active (CM). We further combine the latter two labels, making it a classification task between inactive (CI) and active (CA and CM).

The ROC curve shown above is drawn based on two Parameters.

1. True Positive Rate:*

These are the Compounds that were predicted to be Active by our Model compared to the Compounds that are actually Active.

2. False Positive Rate*

These are the Compounds that were predicted to be Inactive by our Model compared to the Compounds that are actually Inactive.

An AUC graph is drawn based on these two values , FPR against TPR, the ROCAUC score is gotten by calculating Area Under the Curve.

I'll run the predictions on the Ersilia Model (eos6hy3) and compare Results in the next step.

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 18, 2023

HIV Prediction on Ersilia (eos6hy3)----Continuation


Error 1

  1. Trying to fetch the Model (eos6hy3) will produce this Error InvalidModelIdentifierError.

Error.log

To Reproduce

Steps to Reproduce the behaviour.

  • Ersilia -v fetch eos6hy3 > Error.log 2>&1

Testing Environment

  • Ubuntu 22.04

Expected behavior

  • eos6hy3 Model fetched Successfully.

Solution

The Model Identifier was replaced with the Model slug ersilia -v fetch image-mol-hiv > neww.log 2>&1 .

@GemmaTuron , this could be an issue with the metadata.json file.


Error 2

  1. Trying to fetch the Model (with slug as identifier) will produce this Error (remote: Invalid username or password.).
    Error2.log

To Reproduce Error 2

Steps to Reproduce the behaviour.

  • ersilia -v fetch image-mol-hiv > Error2.log 2>&1

Testing Environment

  • Ubuntu 22.04

Expected behavior

  • eos6hy3 Model fetched Successfully.

@GemmaTuron
Copy link
Member

@AhmedYusuff,

Thanks, this is a great model implementation and good work.
The error with the Ersilia Model was a small typo in our database - I've fixed it (the model identifier had a typo, so it was not found). Can you try it again before proceeding to week 3 tasks?

@AhmedYusuff
Copy link
Author

@AhmedYusuff,

Thanks, this is a great model implementation and good work. The error with the Ersilia Model was a small typo in our database - I've fixed it (the model identifier had a typo, so it was not found). Can you try it again before proceeding to week 3 tasks?

Thank You @GemmaTuron. I'll try again now.

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 20, 2023

Hi @GemmaTuron.

  • The Model(eos6hy3) has been fetched successfully. fetchm.log

  • I'm getting this error while running predictions return json.loads(req.text)["PropertyTable"]["Properties"][0]["InChIKey"] KeyError: 'InChIKey'

  • Running Predictions on a simple SMILE string was successful.
    output.csv

The error was a result of Spaces left on my CSV file. Predictions of 30 Compounds on Model (eos6hy3) was successful, and Very accurate when compared with HIV-ImageMol Model.

HIV-ImageMol = olddata.csv

Eos6hy3 Model = newdata.csv

@AhmedYusuff
Copy link
Author

Week 3

@GemmaTuron
Copy link
Member

Hi @GemmaTuron.

  • The Model(eos6hy3) has been fetched successfully. fetchm.log
  • I'm getting this error while running predictions return json.loads(req.text)["PropertyTable"]["Properties"][0]["InChIKey"] KeyError: 'InChIKey'
  • Running Predictions on a simple SMILE string was successful.
    output.csv

The error was a result of Spaces left on my CSV file. Predictions of 30 Compounds on Model (eos6hy3) was successful, and Very accurate when compared with HIV-ImageMol Model.

HIV-ImageMol = olddata.csv

Eos6hy3 Model = newdata.csv

Perfect, thanks @AhmedYusuff ! Let's focus on week 3 now

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 21, 2023

Prediction of synergistic drug combinations with dual feature fusion networks

Model Name

DFFNDDS (Dual Feature Fusion Network for Drug–Drug Synergy prediction).

Model Description

Drug Combination has been shown to be an effective treatment for various diseases, but the challenge of identifying valid drugs that can be used together is a real one, as it is not all drugs combination that can be synergistic, some can be quite harmful. The search for synergistic drug combinations is what lead to the production of this model. It is a Deep- Learning Model that Utilizes a fine-tuned pre-trained language model and dual feature fusion mechanism to predict synergistic drug combinations.

Model Interpretation

In the Quest for synergistic drug combination discovery, there have been various Deep learning methods that
have been applied. Some of them include.

  • DeepSynergy: This uses a deep neural network to extract relevant features from drugs such as the molecular weight and
    genomic features to predict promising drug combinations.

  • GraphSynergy: A Spatial-based Graph Convolutional Network.

  • DeepDDS: This uses a graph neural network, where SMILES strings are converted to molecular graphs through the use of
    RDKit and are used in predicting Drug combination synergy.

There are two things these methods seek to do in identifying valid drug combinations, feature extraction, and feature fusion. In feature extraction, the task is to extract relevant information or characteristics of a drug such as its atoms, molecular weight, chemical structures, etc. And also extract cell line features.

The combination of different features that have been extracted defines feature fusion. However, these methods
do not sufficiently investigate the SMILES information and the fusion methods do not fully capture the interactions
between these features.

The Dual Feature Fusion Network for Drug-Drug Synergy (DFFND) looks to redress this problem by using a
fine-tuned BERT model to identify efficient drug features, and a double-view feature fusion mechanism to combine
the extracted drug and cell lines feature.

Slug

dffn-dds

Tags

Fingerprints, Target identification

Publication

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00690-3

Source Code

https://github.com/sorachel/DFFNDDS

License

None

@GemmaTuron
Copy link
Member

Hi @AhmedYusuff !

This is an interesting model, but please, try to avoid copying literal sentences from the publication and try to explain it in your own words

Since you have used the model suggestion schema (I think that's a good idea!) please, check the guidelines on GitBook, you will see the tags are restricted to a few specific ones, the licenses as well... -- otherwise in a real case scenario this would not pass the automatic checks
If you can please add the interpretation of the model as well!

Thanks

@AhmedYusuff
Copy link
Author

Hi @AhmedYusuff !

This is an interesting model, but please, try to avoid copying literal sentences from the publication and try to explain it in your own words

Since you have used the model suggestion schema (I think that's a good idea!) please, check the guidelines on GitBook, you will see the tags are restricted to a few specific ones, the licenses as well... -- otherwise in a real case scenario this would not pass the automatic checks If you can please add the interpretation of the model as well!

Thanks

Many thanks for the suggestion @GemmaTuron. I'll work on them right now.

@AhmedYusuff
Copy link
Author

@GemmaTuron , Are we only accepting Models that have been trained on the 7 provided Datasets listed in the Gitbook?

@GemmaTuron
Copy link
Member

Hi @AhmedYusuff !
I'm unsure about which are the 7 datasets listed... my comment about gitbook was related to the tags, only a few predefined tags are accepted

@AhmedYusuff
Copy link
Author

Hi @AhmedYusuff ! I'm unsure about which are the 7 datasets listed... my comment about gitbook was related to the tags, only a few predefined tags are accepted

Alright.

@AhmedYusuff
Copy link
Author

Hi @AhmedYusuff !

This is an interesting model, but please, try to avoid copying literal sentences from the publication and try to explain it in your own words

Since you have used the model suggestion schema (I think that's a good idea!) please, check the guidelines on GitBook, you will see the tags are restricted to a few specific ones, the licenses as well... -- otherwise in a real case scenario this would not pass the automatic checks If you can please add the interpretation of the model as well!

Thanks

Hi, I've made a few changes to the model and also added the interpretation as well.

@GemmaTuron
Copy link
Member

GemmaTuron commented Mar 22, 2023

Hi @AhmedYusuff !

Thanks! Let's include this model in our model suggestions to be incorporated.
Looking forward to your next suggestion!

@AhmedYusuff
Copy link
Author

Hi @AhmedYusuff !

Thanks! Let's include this model in our model suggestions to be incorporated. Looking forward to your next suggestion!

I've Done that @GemmaTuron. Many thanks for the Assistance.

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 22, 2023

Model Name

eToxPred

Model Description

A Machine Learning Model that has been trained on different datasets containing known drugs, potentially hazardous chemicals, natural products, and synthetic bioactive compounds, which gives it the ability to learn the relationships between chemical structures and their toxic effects, and Predict various types of Toxicity in molecules such as carcinogenicity potency, cardiotoxicity, endocrine disruption, mutagenicity, and acute oral toxicity.

Model Interpretation

The Inability to recognize relevant features and factors that leads to Toxicity in compounds due to the complex nature of chemicals is one of the challenges in developing a model that can accurately predict Toxicity in Molecules.

eToxPred makes use of DBN (Deep belief Network) a neural network built on multiple RBM (Restricted Boltzmann Machine ) nodes which consist of two (2) layers, the visible layer, and the hidden layer.

The visible layer represents the input data, while the hidden layer encodes higher-level representations of the input, all-together they serve as the building blocks of the deep belief networks.

This architecture allows for fast, layer-by-layer training allowing the model to learn the complex representation of Molecular structures and extract the most relevant and informative features from the compounds

Slug

etox-pred

Tags

Toxicity, Tox21, Fingerprints.

Publications

https://bmcpharmacoltoxicol.biomedcentral.com/articles/10.1186/s40360-018-0282-6

Source Code

https://github.com/pulimeng/etoxpred

License

GPL-3.0

@GemmaTuron
Copy link
Member

Hi @AhmedYusuff !

Actually etoxpred is already in our hub, if you check the website it should pop out! (eos92sw). Let's find a third one that is not there :)

@AhmedYusuff
Copy link
Author

Hi @AhmedYusuff !

Actually etoxpred is already in our hub, if you check the website it should pop out! (eos92sw). Let's find a third one that is not there :)

Alright @GemmaTuron .

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 23, 2023

Model Name

DETIRE (a hybrid Deep lEarning model for idenTifying vIral sequences fRom mEtagenomes)

Model Description

Virus Detection from metagenomes can be quite complicated and difficult, as genetic materials from other organisms are often present in the environmental samples collected.

This model is a combination of various deep-learning techniques, a graph convolutional network (GCN) based sequence embedder, and a two-path deep-learning model which are the Convolutional neural networks (CNN) and BiLSTM which is a type of recurrent neural network (RNN). The Model is leveraging on these various combinations to create a Hybrid deep-learning model that could detect viruses directly from metagenomics sequences.

Slug

det-ire

Tags

Target identification.

Publications

https://www.biorxiv.org/content/10.1101/2021.11.19.469211v1.full

Source Code

https://github.com/crazyinter/DETIRE

License

Proprietary

@GemmaTuron
Copy link
Member

Hi @AhmedYusuff !

Thanks, this model is focused on genomic data, therefore at this stage of the Ersilia Model Hub development we cannot tackle it, though we will certainly be doing it in the near future!
I'd say this model license is Non-Commercial right? they do give free access for non commercial applications

Next steps:

  • Can you add your first suggestion, the dff-nds model, in the Ersilia list if you haven't yet?
  • Help out interns who are starting late in the contribution period set up
  • Can you have a look at this issue and update if there is still the same bug in the model, and we'll take it from there?
  • Next week start preparing the application letter!

@AhmedYusuff
Copy link
Author

Hi @AhmedYusuff !

Thanks, this model is focused on genomic data, therefore at this stage of the Ersilia Model Hub development we cannot tackle it, though we will certainly be doing it in the near future! I'd say this model license is Non-Commercial right? they do give free access for non commercial applications

Next steps:

  • Can you add your first suggestion, the dff-nds model, in the Ersilia list if you haven't yet?
  • Help out interns who are starting late in the contribution period set up
  • Can you have a look at this issue and update if there is still the same bug in the model, and we'll take it from there?
  • Next week start preparing the application letter!

Many thanks for the feedback @GemmaTuron. Yes, the Model License is Non-Commercial, And dff-nds has been added to the suggestion list. I'll also Proceed with the other Task.

@GemmaTuron
Copy link
Member

To keep track of the contributions, @AhmedYusuff worked on issue #387
Can you also check issues
#389
#371

@AhmedYusuff
Copy link
Author

To keep track of the contributions, @AhmedYusuff worked on issue #387 Can you also check issues #389 #371

Okay. I'll be on it.

@AhmedYusuff
Copy link
Author

@GemmaTuron, Week 4 task has been completed by submitting my final application.

I'll Continue working towards resolving the outstanding issue #389. Should I also add incorporating model dffn-dds into the Hub?

@GemmaTuron
Copy link
Member

Hi @AhmedYusuff !

Thanks for your work, at this moment we cannot provide further support to contributors until the internship period, the model is already on the suggestions list so it will be tackled as soon as possible!

Thanks

@AhmedYusuff
Copy link
Author

AhmedYusuff commented Mar 31, 2023

Many thanks @GemmaTuron for all the assistance.

It has been a very rich experience learning from you. I will go ahead and close this issue as completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants