Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outreachy Code Project: Amna Ali #104

Closed
26 tasks done
Amna-28 opened this issue Apr 3, 2022 · 16 comments
Closed
26 tasks done

Outreachy Code Project: Amna Ali #104

Amna-28 opened this issue Apr 3, 2022 · 16 comments

Comments

@Amna-28
Copy link
Contributor

Amna-28 commented Apr 3, 2022

Applicant: @Amna-28

Welcome to the Ersilia Open Source Initiative. This issue will serve to track all your contributions for the project “Improve the Ersilia Model Hub, a FOSS platform offering pre-trained AI/ML models for research”.

Please tick the tasks as you complete them. To make a final application it is not required to have completed all tasks. This project requires knowledge of the Python programming language. The tasks are not ordered from more to less important, they are simply related to different skills. Start where you feel most comfortable.


Initial steps

  • Record your application for the project in the Outreachy website referencing this issue. Please make sure to select the right project on the website.
  • Join the Slack channel to follow public communications.
  • Comment under this issue explaining why you are interested in this project.

Installation of the Ersilia Model Hub

  • Install the ersilia library.
  • Add a screenshot under this issue showing that you are able to run one model (for example, the chemprop-antibiotic model)
  • Fetch at least 3 models from the Ersilia Model Hub. You can find these models with the ersilia catalog command. Add a screenshot of the local catalog (ersilia catalog –local)

CLI

  • Check if there are open issues related to the command line interface. Continue with the next tasks if they are open.
  • Select one issue related to improving the CLI and request to be assigned to it.
  • Link the #PR as a comment under this issue.
  • Make any changes required in the PR and tick this box once it has been approved.
  • Suggest at least one missing feature in the CLI (one sentence is enough, for example: “Add command to estimate memory usage of a particular model”).

Python library

  • Add a screenshot showing that you are able to run predictions using ersilia as a Python library (find more information here). Ideally, use a Jupyter notebook.
  • Create a simple Streamlit app using the ersilia Python library. The app can have an input and an output box, and perhaps a few models to select. Add a screenshot of the app as seen in your browser.
  • Write a docstring for the ErsiliaModel class. Use the Google Python Style guide. Paste the docstring as a comment below (do not use a PR).

Scientific content

  • Check the models available in the Hub
  • Select one model from the list and write a technical card (what is the model for, what input, which data was used to create it, what kind of ML algorithm uses…) for it
  • Add your card as a comment to this issue
  • Search the scientific literature and suggest 3 new models (comment in this issue) that would be relevant to incorporate in the Hub.

Other

If you have interest in working on related topics, or have new suggestions, please do the following

  • Add a comment in this issue with your new idea, tagging the mentor
  • Get feedback from the mentor and act accordingly
  • Link in the comments any other PR you have contributed to.

Community

  • Look up two other projects and comment on their issues with feedback on one of their tasks
  • If you have feedback from your peers, answer it in this issue.

Final application

  • I have answered all comments from mentors and contributors
  • All PR or issues assigned to me are complete
  • I have submitted my final application to the project
@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 3, 2022

Running "molecular-weight" Model:

ersilia-install

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 4, 2022

Screenshot of command ersilia catalog --local :

image

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 4, 2022

Select one issue related to improving the CLI and request to be assigned to it

The contribution solves the issue #8 Track unused models and automatically delete them
Models in Ersilia hub require sizable disk usage. It was required to delete a model and all the related files (conda environment or docker container) automatically if they have not been used for a month.
To achieve this, I did the following:

  • fetched_models.txt file was maintained which records model name and the timestamp when it gets fetched. This was carried out by a method _fetchtime() in ModelFetcher class that gets executed every time a new model is fetched.
  • update_model_usage_time() method is created in ErsiliaModel class and called in serve() method so that model's usage time in the fetched_models.txt gets updated whenever the model is used i.e. whenever serve() method is called.
  • Created delete_model_entry() method in ModelFullDeleter class. When delete operation is executed for a model, this function deletes the model's entry from fetched_models.txt.
  • cron.py contains the function model_cleanup() that maintains a file last_cleaned.json to keep track of the time when the last cleanup of unused models was done. It checks if the cleanup was done more than a week before, it initiates the cleanup by using fetched_models.txt. It calculates the time difference using the current time and the time recorded in fetched_models.txt and invokes the delete function for the model if 30 or more days have passed since the model was last used.

Future Enhancements/Suggestions:
A prompt can be added to the cleanup function that informs the user about the models that are not used for more than 30 days and prompts the user whether he/she wants to delete all of them.

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 4, 2022

Suggest at least one missing feature in the CLI :

Improve catalog by including model size in "ersilia catalog" output
A Column displaying model size against each model would help the user to get an idea of the amount of time it will take to download, also the user would get to know how much space is required in the disk.
An additional option could be added to the catalog command such as "ersilia catalog --size" that would print the catalog differently (including the size information)
I have opened the following issue as an enhancement, and was assigned to work on it

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 4, 2022

Run predictions using ersilia as a Python library using jupyter notebook

image

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 4, 2022

Created a simple Streamlit app : takes the chemical structure as input and displays the results based on which model and api is selected by the user:

Calculating the molecular weight of Aspirin
Capture1

The antibiotic activity prediction
Capture2

@GemmaTuron
Copy link
Member

Hi @Amna-28 this looks great
See my suggestion under issue #126

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 5, 2022

Add a comment in this issue with your new idea, tagging the mentor

Suggested enhancement :
Developed a streamlit app for Ersilia Model Hub, for better user experience
In this contribution, Ersilia Model app was developed. It has improved the usability of the Ersilia Model Hub as the user can select a model to run, choose api, provide the input molecule in the given textbox, perform the predictions/calculations depending upon the chosen api and get the results. The app also shows the representation of the molecular structure of the input.
In particular, I have developed following functionalities :

  • The app is developed using the python Streamlit framework.
  • A drop-down list is added so that the user can choose a model to run.
  • To include the models that are available at the local machine, the local() method from ModelCatalog class is used so that the user gets to choose from all the models that are present in the local machine.
  • Once the user selects a model, it shows buttons for api supported by that particular model. This is achieved by using the get_apis() method from the ErsiliaModel class.
  • The molecular structure of the input molecule is displayed using the Draw.MolToImage() function from RDKit Library.

Future Enhancements/Suggestions:

  1. The app has a multi-page view with three tabs that could be toggled using the radio buttons. This feature can be extended further to display any information about the Ersilia app.
  2. A page/tab such as "Manage Ersilia Models" could be added in the app that would contain buttons to perform operations such as fetching new models to the local machine and/or deleting a particular existing model, executing clean-up that deletes all the unused models.

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 5, 2022

Why am I interested in this project?

I am a Data science and Machine learning enthusiast with a masters degree in computer science. I have worked with different machine learning algorithms and tools. As this project involves Machine learning and Python programming, that I am very passionate about, I would really like to be a part of something that helps people and institutes where ML and bioinformatic experties are difficult to obtain.

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 13, 2022

Select one model from the list and write a technical card

66585ecb76614e319d3161ff9690a5ec-0001

Hi @GemmaTuron @miquelduranfrigola, this is my first time writing a technical card for a ML model. I followed this article to write it. Please let me know if there is anything that I can improve. Thanks!

@mahamtariq58
Copy link

Hi @Amna-28 , you have done an amazing job and the approach you used for tracking of unused models is quite good. Also, the streamlit app looks awesome.

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 19, 2022

Hi @Amna-28 , you have done an amazing job and the approach you used for tracking of unused models is quite good. Also, the streamlit app looks awesome.

Thank you very much @mahamtariq58 for your feedback

@Nawarrr
Copy link

Nawarrr commented Apr 19, 2022

Select one model from the list and write a technical card 66585ecb76614e319d3161ff9690a5ec-0001

Hi @GemmaTuron @miquelduranfrigola, this is my first time writing a technical card for a ML model. I followed this article to write it. Please let me know if there is anything that I can improve. Thanks!

Great Model card, seems to cover all the important points about the model,
just remember to add the References

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 20, 2022

Select one model from the list and write a technical card 66585ecb76614e319d3161ff9690a5ec-0001
Hi @GemmaTuron @miquelduranfrigola, this is my first time writing a technical card for a ML model. I followed this article to write it. Please let me know if there is anything that I can improve. Thanks!

Great Model card, seems to cover all the important points about the model, just remember to add the References

Thank you @Nawarrr for your feedback!

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 20, 2022

Write a docstring for the ErsiliaModel class


class ErsiliaModel(ErsiliaBase):
    '''
    Ersilia Model class
    This is the main used to perform operations on models such as serve and predict   

    Attributes:
        model (str): slug of model
        save_to_lake (bool): flag that determines whether to cache model predictions or not 
        service_class (str): contains the service class
        config_json (str): contains path to json configuration file
        credentials_json (str): contains path to json credentials file
        verbose (bool): flag to enable or disable verbose output
        fetch_of_not_avalible (bool): flag to enable or disable fetch operation

    '''

image

@Amna-28
Copy link
Contributor Author

Amna-28 commented Apr 20, 2022

Search the scientific literature and suggest 3 new models that would be relevant to incorporate in the Hub

  1. Next-Gen QSAR Models with MolPMoFiT
    Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT

  2. Mycobacterium tuberculosisIn Vitro Activity Prediction and Target Visualization
    Machine Learning Models for Mycobacterium tuberculosisIn Vitro Activity: Prediction and Target Visualization

  3. Novel MAO-B Hit Inhibitors Using Multidimensional Molecular Modeling for Prediction of Therapeutic Activity, Pharmacokinetic and Toxicity Properties
    Proposing Novel MAO-B Hit Inhibitors Using Multidimensional Molecular Modeling Approaches and Application of Binary QSAR Models for Prediction of Their Therapeutic Activity, Pharmacokinetic and Toxicity Properties

@Amna-28 Amna-28 closed this as completed Apr 20, 2022
gitbook-com bot pushed a commit that referenced this issue Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants