-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✍️ Contribution period: <Ahmed Yusuf> #615
Comments
Following the Guidelines listed, I have been able to successfully install Ersilia Model Hub on my system. I Installed the Hub on My Ubuntu Substation |
Why I want to Contribute to ErsiliaI got to Know about Outreachy while browsing the internet, I was intrigued by its support of diversity and how much help it has rendered the underrepresented community over the years, and I could see the amount of contributions it has rendered to the Open source community over the years. I was filled with elation yesterday when I got the congratulatory message that my Initial Application has been accepted and I can proceed to the contribution page. While going through the list of Open source projects I can contribute to, I stumbled upon Ersilia, and one of the reasons I became interested in this is because Ersilia Research is geared towards Underdeveloped and Developing countries, especially countries on the African continent. And being from Nigeria this is really something that resonates with me. Ersilia is also listed in Forbes's annual roundup of tech nonprofits, and being allowed to contribute to such a project will be an indescribable honor to me. As a lover of Data Analyzation, I have a BSc in Information Technology from Middlesex University in Malta, And an Advanced Diploma In Software Engineering From Aptech, I have always been fascinated by AI/ML, the processes behind it, how the data are sourced i.e. data integrity, seeing data being applied to our biggest medical challenges. I have always been looking for ways to improve my skills and test myself in this field, and by working on the Ersilia Open source initiative, which is a very good project and one that will have a lasting impact on humanity. I will have fulfilled a long dream of mine, to contribute to human society and also learn at the same time. |
Hi @TemiTomTom ! Welcome to Ersilia's community and thanks for the work. Let us make sure all applicants are set up and we will assign some more tasks to complete! |
Many thanks, @GemmaTuron. I'm glad to be here. |
Hi @AhmedYusuff ! |
Hello @GemmaTuron. MODEL TEST FOR EOS81EWI tested the Model on my Ubuntu 22.04 system and Google Colab. Outcome
Result
This process is documented in ersilia-os/eos81ew#2 |
Thanks @AhmedYusuff ! |
Many Thanks @GemmaTuron |
Model NameSARS-CoV2 activity (Image Mol) Model descriptionImageMol is a molecular image-based unsupervised pre-training deep learning framework for computational drug discovery. Why I Choose this ModelSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly transmissible and pathogenic coronavirus that emerged in late 2019 and has caused a pandemic of acute respiratory disease, named ‘coronavirus disease 2019’ (COVID-19), which threatens human health and public safety. COVID-19 is a disease that has left a long-lasting impact on our society, personally, I have also been affected by this pandemic, as I lost people close to me to this disease. Coming from a Low-income Country, I witnessed the various challenges we had to face because of our lack of resources, and how our health sectors had to wait for a vaccine to be developed by other advanced countries and hoped we can get some of it ourselves. So therefore any help I can render, no matter how minute in the potential treatment of COVID-19 is something that will leave me with deep satisfaction. |
Installation StepsI'll be following the steps listed in this repo. |
Installation environment1. GPU environment ---CUDA 10.1 For this, I will need to change my Environment, as my Ubuntu 22.04 is installed on a virtual box, and to use CUDA I will need direct access to the GPU hardware, Even though I have tried various methods to support a GPU passthrough and install CUDA on a VMWare it has not been successful. @GemmaTuron do you have any advice for me on this? Going forward I'll be setting up WSL 2 to continue my installation steps. |
Hi @AhmedYusuff , For the image mol model, you do not need to retrain it, but try to make predictions using the SARS-CoV2 pre-trained model. (see section Finetuning in the README file of the repo). For that, no GPU is needed. You might need to tweak the installation so that it works on cpu machines |
Many thanks @GemmaTuron |
Installation Environment2.Create a new conda environment and activate it A Conda environment ImageMol was created successfully with no issues. The Environment was created with python version 3.7.3, and activated. Download some packages
Before downloading this package, remember to activate your Conda environment with I also got a CondaHTTPError, which could be a result of wrong Proxy configuration settings, weak Internet or problem with firewall or an anti-virus software. After doing all this I was able to install the RDKit package successfully in the Conda Environment.
These packages were installed successfully with no issue.
I got the below error messages while trying to install this lib package. From the error message, it can be seen that it's the Torch-scatter lib that failed to install, so therefore, I installed the rest of the lib packages separately from their binary source and the installation was successful. I was able to resolve the issue of Torch-scatter by installing this version of torch 1.8.0 via @GemmaTuron , there is no specific requirement of a particular version for this lib package, so I shouldn't have any issue, right? |
Hi @AhmedYusuff ! thanks for the detailed explanation, this looks good to me! Indeed, torch package always gives issues so it's best to specify the exact version always |
HI @AhmedYusuff, Thanks for the detailed explanation.
kindly check issue #624 |
Many thanks, @GemmaTuron. @Zainab-ik . I'm making use of CPU Instead of GPU. So I didn't Install CUDA. I'm using the Torch and Torchvision version recommended i.e 1.40 and 0.5.0 respectively. |
Finetuning modelsThe Pre-trained SARS-CoV-2 Model will be Finetuned on 13 Datasets/Assays.
|
3CL_enzymatic_activityTo finetune the SARs-CoV-2 Model on this Dataset. I executed the below code.
OutcomeI immediately got this error message. It can be seen from the screenshot the ResultThe result of finetuning of 3CL_enzymatic_activity Dataset.
Model EvaluationThe process of comparing and evaluating our Model Performance. I executed the below code.
To forestall ResultROCAUC VALUE = 69.9% A ROCAUC score tells us how efficient the Model performance is, A score that is closer to 50% Usually indicates a Poor Model performance, and closer to 100% shows that the model is very good at identifying prediction values. Our 69.9% values show that our model performed poorly, I'm looking to see why @GemmaTuron ? And I'm thinking that the Also, can it be suggested that other performance metrics like Precision and Recall Value be incorporated in |
|
I tried granting the user extra privileges. I am still trying to figure out why it still ain’t working. |
@masroor07 @ZakiaYahya, you need to give permission to your user to access the directory. Use |
Prolly this seems like it will work. I was using hone/user as the parent directory. |
You are welcome @masroor07 |
Not working! I even tried changing the directory to ../... I got:
|
@AhmedYusuff AND @masroor07 Not working in my case too, still giving the same error. |
@masroor07 and @ZakiaYahya. Use this command.
You can also navigate to the root directory using You are trying to give your current User permission to create a folder/directory in your root directory. |
Great, thanks @AhmedYusuff ! |
Alright, Thank you. Same here. |
Prediction of molecular properties and Compounds that can inhibit HIV Replications Using ImageMol and Ersilia(eos6hy3).MODEL NAMEHIV (ImageMol) MODEL DESCRIPTIONThis pre-trained Model has already been finetuned and has been pretrained on 10 million unlabelled drug-like, bioactive molecules, to predict molecular targets of candidate compounds.
https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html#hiv-datasets This is a Binary classification Model and the Bioactivity (i.e. the variable that is to be predicted) is either active or Inactive, I have listed the Bioactivity below.
Outcome@GemmaTuron , @pauline-banye , @DhanshreeA . I'm getting this error message below while trying to get the predictions for the HIV finetuned Model. Looking through Now i'm getting this error message.
Will I have to Update my PyTorch Version?, even though I am using the version that was prescribed in the installation manual I have set the I was able to resolve this by manually changing the version file from 3 to 2 and saving the .pth file again. But I seem to be getting another error mesages.
I re-saved the model with some new parameters, but still getting errors. I'm suspecting the file to have been corrupted. I downloaded new Model checkpoints. But I'm getting these error messages.
ResultThis issue was a result of the Model Checkpoints being saved with a Newer Version of PyTorch, than the version that I was using to load it. The Pytorch version that was recommended in the GitHub repo is insufficient in Loading the Finetuned Model's. I had to Upgrade Using this command Predictions were successful on the MPP/ HIV Datasets, and I obtained a ROCAUC score of 80% which is similar to the 0.814 from the Author's (see below screenshots). Result ExplanationOur Benchmark Dataset is gotten from https://moleculenet.org/. HIV (human immunodeficiency virus) targets several molecular components in the Human Body during its replication cycle, In treating HIV, we will need to look for drugs that target the various step in the HIV viral replications cycle and interfere with the process. The aim of our Model is to predict compounds that show activity against HIV Molecular target profiles, that will inhibit the Replication of HIV by interfering with the process. Thereby reducing the production of a new virus in the body. The dataset that was used for the predictions contains over 40,000 compounds and the result is placed into three categories confirmed inactive (CI), confirmed active (CA), and confirmed moderately active (CM). We further combine the latter two labels, making it a classification task between inactive (CI) and active (CA and CM). The ROC curve shown above is drawn based on two Parameters. 1. True Positive Rate:* These are the Compounds that were predicted to be Active by our Model compared to the Compounds that are actually Active. 2. False Positive Rate* These are the Compounds that were predicted to be Inactive by our Model compared to the Compounds that are actually Inactive. An AUC graph is drawn based on these two values , FPR against TPR, the ROCAUC score is gotten by calculating Area Under the Curve. I'll run the predictions on the Ersilia Model (eos6hy3) and compare Results in the next step. |
HIV Prediction on Ersilia (eos6hy3)----ContinuationError 1
To ReproduceSteps to Reproduce the behaviour.
Testing Environment
Expected behavior
SolutionThe Model Identifier was replaced with the Model slug @GemmaTuron , this could be an issue with the metadata.json file. Error 2
To Reproduce Error 2Steps to Reproduce the behaviour.
Testing Environment
Expected behavior
|
Thanks, this is a great model implementation and good work. |
Thank You @GemmaTuron. I'll try again now. |
Hi @GemmaTuron.
The error was a result of Spaces left on my CSV file. Predictions of 30 Compounds on Model (eos6hy3) was successful, and Very accurate when compared with HIV-ImageMol Model. HIV-ImageMol = olddata.csv Eos6hy3 Model = newdata.csv |
Week 3 |
Perfect, thanks @AhmedYusuff ! Let's focus on week 3 now |
Prediction of synergistic drug combinations with dual feature fusion networks Model NameDFFNDDS (Dual Feature Fusion Network for Drug–Drug Synergy prediction). Model DescriptionDrug Combination has been shown to be an effective treatment for various diseases, but the challenge of identifying valid drugs that can be used together is a real one, as it is not all drugs combination that can be synergistic, some can be quite harmful. The search for synergistic drug combinations is what lead to the production of this model. It is a Deep- Learning Model that Utilizes a fine-tuned pre-trained language model and dual feature fusion mechanism to predict synergistic drug combinations. Model InterpretationIn the Quest for synergistic drug combination discovery, there have been various Deep learning methods that
There are two things these methods seek to do in identifying valid drug combinations, feature extraction, and feature fusion. In feature extraction, the task is to extract relevant information or characteristics of a drug such as its atoms, molecular weight, chemical structures, etc. And also extract cell line features. The combination of different features that have been extracted defines feature fusion. However, these methods The Dual Feature Fusion Network for Drug-Drug Synergy (DFFND) looks to redress this problem by using a Slugdffn-dds TagsFingerprints, Target identification Publicationhttps://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00690-3 Source Codehttps://github.com/sorachel/DFFNDDS LicenseNone |
Hi @AhmedYusuff ! This is an interesting model, but please, try to avoid copying literal sentences from the publication and try to explain it in your own words Since you have used the model suggestion schema (I think that's a good idea!) please, check the guidelines on GitBook, you will see the tags are restricted to a few specific ones, the licenses as well... -- otherwise in a real case scenario this would not pass the automatic checks Thanks |
Many thanks for the suggestion @GemmaTuron. I'll work on them right now. |
@GemmaTuron , Are we only accepting Models that have been trained on the 7 provided Datasets listed in the Gitbook? |
Hi @AhmedYusuff ! |
Alright. |
Hi, I've made a few changes to the model and also added the interpretation as well. |
Hi @AhmedYusuff ! Thanks! Let's include this model in our model suggestions to be incorporated. |
I've Done that @GemmaTuron. Many thanks for the Assistance. |
Model NameeToxPred Model DescriptionA Machine Learning Model that has been trained on different datasets containing known drugs, potentially hazardous chemicals, natural products, and synthetic bioactive compounds, which gives it the ability to learn the relationships between chemical structures and their toxic effects, and Predict various types of Toxicity in molecules such as carcinogenicity potency, cardiotoxicity, endocrine disruption, mutagenicity, and acute oral toxicity. Model InterpretationThe Inability to recognize relevant features and factors that leads to Toxicity in compounds due to the complex nature of chemicals is one of the challenges in developing a model that can accurately predict Toxicity in Molecules. eToxPred makes use of DBN (Deep belief Network) a neural network built on multiple RBM (Restricted Boltzmann Machine ) nodes which consist of two (2) layers, the visible layer, and the hidden layer. The visible layer represents the input data, while the hidden layer encodes higher-level representations of the input, all-together they serve as the building blocks of the deep belief networks. This architecture allows for fast, layer-by-layer training allowing the model to learn the complex representation of Molecular structures and extract the most relevant and informative features from the compounds Slugetox-pred TagsToxicity, Tox21, Fingerprints. Publicationshttps://bmcpharmacoltoxicol.biomedcentral.com/articles/10.1186/s40360-018-0282-6 Source Codehttps://github.com/pulimeng/etoxpred LicenseGPL-3.0 |
Hi @AhmedYusuff ! Actually etoxpred is already in our hub, if you check the website it should pop out! ( |
Alright @GemmaTuron . |
Model NameDETIRE (a hybrid Deep lEarning model for idenTifying vIral sequences fRom mEtagenomes) Model DescriptionVirus Detection from metagenomes can be quite complicated and difficult, as genetic materials from other organisms are often present in the environmental samples collected. This model is a combination of various deep-learning techniques, a graph convolutional network (GCN) based sequence embedder, and a two-path deep-learning model which are the Convolutional neural networks (CNN) and BiLSTM which is a type of recurrent neural network (RNN). The Model is leveraging on these various combinations to create a Hybrid deep-learning model that could detect viruses directly from metagenomics sequences. Slugdet-ire TagsTarget identification. Publicationshttps://www.biorxiv.org/content/10.1101/2021.11.19.469211v1.full Source Codehttps://github.com/crazyinter/DETIRE LicenseProprietary |
Hi @AhmedYusuff ! Thanks, this model is focused on genomic data, therefore at this stage of the Ersilia Model Hub development we cannot tackle it, though we will certainly be doing it in the near future! Next steps:
|
Many thanks for the feedback @GemmaTuron. Yes, the Model License is Non-Commercial, And dff-nds has been added to the suggestion list. I'll also Proceed with the other Task. |
To keep track of the contributions, @AhmedYusuff worked on issue #387 |
Okay. I'll be on it. |
@GemmaTuron, Week 4 task has been completed by submitting my final application. I'll Continue working towards resolving the outstanding issue #389. Should I also add incorporating model dffn-dds into the Hub? |
Hi @AhmedYusuff ! Thanks for your work, at this moment we cannot provide further support to contributors until the internship period, the model is already on the suggestions list so it will be tackled as soon as possible! Thanks |
Many thanks @GemmaTuron for all the assistance. It has been a very rich experience learning from you. I will go ahead and close this issue as completed. |
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application
The text was updated successfully, but these errors were encountered: