Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internship Project]: Hellen Namulinda #714

Closed
5 tasks
GemmaTuron opened this issue Jun 21, 2023 · 58 comments
Closed
5 tasks

[Internship Project]: Hellen Namulinda #714

GemmaTuron opened this issue Jun 21, 2023 · 58 comments
Assignees

Comments

@GemmaTuron
Copy link
Member

GemmaTuron commented Jun 21, 2023

Summary

Hello,

This is a public issue for a virtual daily stand-up. We will use this to briefly share the tasks of the day and the challenges and advances made, so that we can ensure smooth support from the Ersilia mentors and alignment between daily tasks and overall internship goals.

Scope

Initiative 🐋

Objective(s)

Internship goals:

  • Learn how to work in an Open Source Community
  • Improve the infrastructure of the Ersilia Model Hub
  • Improve the usability and platform compatibility of Ersilia Models, mainly via Docker Images
  • Learn about AI/ML models for drug discovery
  • Identify new relevant AI/ML models and datasets and add them to this discussion
  • Incorporate new AI/ML models in the Hub
  • Improve the documentation of the Ersilia Model Hub for users
  • Improve the documentation of the Ersilia Model Hub for developers

Team

Role & Responsibility Username(s)
Intern @HellenNamulinda
Mentor @GemmaTuron
Coordinator @GemmaTuron

Timeline

Before starting your work, line up a few tasks and short description. This should not take long. For example, it could be something like:
Wednesday 21st June

  • Create Intern daily task tracker
  • Update the Ersilia GitHub Project
  • Make sure Git Actions start upon success of the other workflows
  • Work on the documentation for model testing
  • Attend the Drug Discovery intro by WCAIR

Documentation

No response

@GemmaTuron
Copy link
Member Author

In this example, following the tasks of Wednesday 21st June:

  • Create Intern daily task tracker
  • Update the Ersilia GitHub Project
  • Make sure Git Actions start upon success of the other workflows
  • Work on the documentation for model testing
  • Attend the Drug Discovery intro by WCAIR

I have created all the templates for the Interns, spend a couple of hours revising the GitHub Project and updating the tasks. I have been working on identifying the bug in the GitActions, solved in this issue.
I have set a meeting with @miquelduranfrigola to discuss in detail his comments on the Model testing discussion but I haven't been able to start writing yet, i am to do so by the end of the week.
Looking forward to the WCAIR next lesson!

Pro tip: adding the links to the issues and discussions you mention will be very helpful!

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 22, 2023

Wednesday 21st June: Tasks

Model eos31ve whose changes were merged, was tested and feedback shows it's working. Though its arm64 build failed, then amd64 was successful.
Docker images are quite big(4GB +) and my limited internet hindered the testing.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 22, 2023

Thursday 22nd June: Tasks

I had slow internet issues and could not fetch the models pending testing. But I'm working around my network and will have these tested by the end of the weekend.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 24, 2023

Friday 23rd June: Tasks

I'm working to resolve the pending tasks by the end of the weekend.
I tested models eos9tyg and eos2r5a which were intially not allowing me to pass an output file. They all worked with output files.
Model eos526j and model eos7pw8 work well using Google Colab, but they gave null outputs when using CLI and Docker.

For model eos65rt, I was able to ressolve the package dependency conflicts and created a PR.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 26, 2023

Monday 26th June: Tasks

Model CleanUp :

Troubleshooting

Testing model eos2re5 using Colab failed, it was requiring shells to be activated. The sudo run commands also required root access when using CLI, and even after providing the password, it was taking too long to fetch the model.

The first PR I made for refactoring model eos65rt didn't capture all the commits for updating api, and it was failing at testing using ersilia run. I pushed the changes for updating api to run.

Currently, ersilia crashes when given invalid smiles(wrong inputs). Instead of continuing to predict for the correct ones, the execution halts. The exception isn't handled correctly.
And since most models, don't have a way of running predictions for only right smiles, this has to be handled by ersilia such that when making inferences, only valid smiles are passed. For the wrong inputs, their prediction value should a message of say "Invalid smile".
I setup a VM to test proposed changes without affecting my normal ersilia environment.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 26, 2023

@GemmaTuron, the pending issue I'm working on is the presentation and solution to

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 27, 2023

Tuesday 27th June: Tasks

I updated model eos2gth and pushed the changes. There were some errors when running workflows, but these were corrected by Miquel because they were originating from ersilia.
On checking model eos24jm, it was up to date because it was incorporated recently. Its images were also available on dockerhub. So, I just tested it and it was functional.
I tested model eos97yu using Colab, CLI and Docker. It was working well.
For model eos7pw8, changes were made but they were not yet reflected on dockerhub. I will test it again.

The proposal for dealing with wrong inputs was accepted. And this feature will be incorporated in ersilia with Miquel leading the coding session on Wednesday, June 28 at 6pm CET.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 28, 2023

Wednesday 28th June: Tasks

@GemmaTuron
Copy link
Member Author

@HellenNamulinda
I've added a second model in case you finish all the tasks!

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 29, 2023

Thursday 29th June: Tasks

@GemmaTuron
Copy link
Member Author

Hi @HellenNamulinda !

See my comment on model es3sa2, I am afraid is a work in progress model that was never finished, I am sorry for this!

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 29, 2023

Hello @GemmaTuron,
Oh yes, I have read the comment. When we are done with cleaning the other models, I will be glad to investigate the model and if possible re-incorporate it.

I will first continue to the next model eos4avb.

Also, when I convert the suggested task to issue, it unassigns me, forexample model eos6hy3. Hope you will be handling that.

@GemmaTuron
Copy link
Member Author

Hi @HellenNamulinda
I've fixed the task to issue thing !

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jun 30, 2023

Hello Gemma,
Thank you!
Due to power issues today, I'm handling the tasks late. But I will go through all my pending tasks before next week.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 3, 2023

Hi @GemmaTuron,
Monday 3rd July: Tasks

For model eos4avb, it involved changing rdkit be installed using pip instead of conda so that the build for arm64 can be successful.
For model eos7asg, it requires installing java-jre using conda. But this needs a very low version of conda for the installation to work. So, commands to first downgrade conda, and update it after installing packages were added.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 4, 2023

Hi @GemmaTuron,
Tuesday 4th July: Tasks

@GemmaTuron
Copy link
Member Author

Hi @HellenNamulinda !

Good, I've assigned you a new model in case you finish with debugging the eos7asg.

@HellenNamulinda
Copy link
Contributor

Sure,
I will work on it.

@GemmaTuron
Copy link
Member Author

Hi @HellenNamulinda

I have assigned you a new model in case you are dine with the eos2thm, but don't worry if you don't get to it today.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 5, 2023

Wednesday 5th July: Tasks

I spent time comparing rdkit versions for model eos2thm. I set up an environment with packages specified in the original repo. This forced me to first downgrade conda inorder to install rdkit 2019 with 200 descriptors(messes up other packages). I got the 8 descriptors that were added to versions with 208 descriptors that are missing in 2019.03

Comparing the results of model; It so happened that all all rdkit versions with 200+ descriptors gave the same. Yet to find out why 😮

@GemmaTuron
Copy link
Member Author

@HellenNamulinda

How are you on the tasks today? I think you still have a coupe of models for refactoring?

@HellenNamulinda
Copy link
Contributor

Hi @GemmaTuron,
Yes I have to finish these two models for refactoring. Plus testing the one pending testing after changes by Emma.
I'm done with eos8a4x but finalizing with local testing.
With the comment I just added on model eos2thm. I just need your go ahead.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 6, 2023

Hello @GemmaTuron

Thursday 6th July: Tasks

The original model code for eos2thm had a file molfeaturizer.py where the 200 descriptors were explicitly specified. tha's why using rdkit versions with 208 doesn't change model output.

@GemmaTuron
Copy link
Member Author

Hi @HellenNamulinda !

I think we can safely merge the PR on eos2thm. I've assigned you two new models!

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 7, 2023

Hi @GemmaTuron

Friday 7th July: Tasks

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 11, 2023

Hi @GemmaTuron

Monday 10th July Tasks

All the models tested(eos3ae6 and eos1amr) work well using Colab and Docker. model eos1amr despite working well with string inputs on CLI, it was raising a TypeError: object of type 'float' has no len() for file outputs. Model eos3ae6 works well using the three.

Done with refactoring model eos4u6p and created a PR.
For model eos7a04, I'm still fixing package version conflicts.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 12, 2023

Hi @GemmaTuron

Tuesday 11th July Tasks

The two models tested(eos6m4j and eos24ci) work well using Colab, CLI and Docker. Though model eos6m4j returns null for some smiles on CLI(for a single string and when part of a file) yet it works well using Colab and Docker.

@GemmaTuron
Copy link
Member Author

Thanks for the update @HellenNamulinda

There is still the issue in eos6m4j where some mols cannot be predicted? Also please confirm eos7w6n works now
I'll assign you new models meanwhile.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 13, 2023

Hi @GemmaTuron

Thursday 13th July Tasks

@GemmaTuron
Copy link
Member Author

@HellenNamulinda

Great, I will not be assigning new models since you have a few open already, good work.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 17, 2023

Hello @GemmaTuron, Sorry for the late report. I thought I had added the Friday tasks already.

Friday 14th July Tasks

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 17, 2023

Hi @GemmaTuron

Monday 17th July Tasks

The model eos2lm8 makes predictions well using Colab, CLI and Docker.
The only concern is that, output values for the same smiles are not consistent on different runs(same platform) and across the three platforms.

For model eos9f6t, the major error was related to the incompatibility of tensorbodX and protobuf. This is because Bentoml depends on protobuf<3.19,>=3.8.0. But initially installing chemprop==1.3.0, was installing tensorboardx 2.6.1 since the compatible version for it was not specified and it caused a dependency error tensorboardx 2.6.1 requires protobuf>=4.22.3, but you have protobuf 3.18.3 which is incompatible. So, I had to specify the tensorboardX==2.0 because it is compatible with chemprop 1.3 and protobuf 3.18.3
The only issue with the model is that, while run.sh returns consistent values, its output values when served in ersilia for the same smiles are not consistent on different runs.

With model eos69p9, it works well locally and output values are consistent.

Model eos43at uses rdkit 2019.3.3 which has 200 descriptors, and rdkit 2020+ versions don't have the same number. I'm still working on the best way to install it(outdated for new conda versions, and also not available on pypi) while maintaining the model output. I'm testing after downloading the files using wget. instead of downgrading conda.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 18, 2023

Tuesday 18th July: Tasks

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 20, 2023

Thursday 20th July: Tasks

When testing eos7pwe, the error persisted when fetching from S3/github and repo_path(locally).

ERROR: The certificate of ‘anaconda.org’ has expired.
#8 ERROR: process "/bin/sh -c wget https://anaconda.org/LICH/syba/1.0.2.alpha/download/noarch/syba-1.0.2.alpha-py_0.tar.bz2" did not complete successfully: exit code: 5

I will try to explore more to come up with a solution.

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 21, 2023

Hi @GemmaTuron,

Friday 21st July: Tasks

I spent some time resolving ERROR: The certificate of ‘anaconda.org’ has expired. with model eos7pw8 when using CLI to fetch from S3, github and repo_path(locally). The model was using Mode: docker when installing packaes, I configured the default to conda. All details are explained in the comment. Another issue with the model was TypeError: object of type 'float' has no len() for csv file outputs. More explanation on how it was resolved here

Also, to extract assays from the ChEMBL database that will be used to test the EnsembleTabPFN package, I successfully set up the chembl_ml_tools package, including a postgres database server containing the ChEMBL database. I installed the latest ChEMBL database.

I will now explore model eos43at more by testing it on Codespaces.
Plus refactoring model eos6fza

@GemmaTuron
Copy link
Member Author

@HellenNamulinda

Thanks, I won't assign new tasks so you can focus on the current ones!

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 25, 2023

25th July: Tasks

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 26, 2023

26th July: Tasks

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Jul 28, 2023

27th July: Tasks

  • Attended the weekly lab meeting
  • Start work on categorizing the assays extracted from the ChEMBL database based on their standard types like; IC50, MIC, Ki, ...

@GemmaTuron
Copy link
Member Author

@HellenNamulinda

I think you have still quite some models pending to be refactored, which is probably my fault assigning too many at once. Since you are also busy with the ChEMBL data, would you tell me which models you have not started to work on, so I might re-assign them and free up some of your tasks?

@HellenNamulinda
Copy link
Contributor

Hi @GemmaTuron,
It's my fault for not updating the issues for some days now. Apologies.
Let me ensure to complete all the pending by week's end.

@HellenNamulinda
Copy link
Contributor

Tuesday 1st August: Tasks

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Aug 2, 2023

Wednesday 2nd August: Tasks

  • Attend the weekly lab meeting

Model Testing:

Model Refactoring

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Aug 3, 2023

Thursday 3rd August: Tasks

@GemmaTuron
Copy link
Member Author

@HellenNamulinda

Are you working in any other model aside from the ChEMBL data?

@HellenNamulinda
Copy link
Contributor

@GemmaTuron,
There is no other model.

@GemmaTuron
Copy link
Member Author

Perfect, let's use today's meeting to focus on the ChEMBL data then

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Aug 9, 2023

Tuesday 8th August: Tasks

@GemmaTuron
Copy link
Member Author

Hi @HellenNamulinda

In addition to the model testing and working on model eos96ia to help Riley, please:

  • Try out the test module developed by Riley and Febie and provide feedback in the Slack thread, specifying which model did you test, and if you have any further suggestions for improvement.
  • Let me know when you want me to revise the ChEMBL data

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Aug 9, 2023

Hello @GemmaTuron,
I'm going to work on these.
Model eos96ia is one of the molgrad models and therefore will have the same docker file as eos43at. I'm testing this locally and will have a proper discussion for these molgrad models with Riley(eos96ia), Febie(eos1af5) and Zakia(eos6ao8).

Thank you!

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Aug 10, 2023

Thursday 10th August: Tasks

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Aug 13, 2023

Friday 11th August: Tasks

  • Testing the Ersilia test module: This was done on two models; eos43at(eos43at_test.txt) , and eos6f7za(eos6fza_test.txt)
  • ChEMBL data cleaning. Previously, I had generated summary files for all the assays(pathogens of interest). For each pathogen assays, the summary file shows the different standard types, the total of molecules for each type, a list of standard units for each type, and the number of molecules for each unit.
    I started with cleaning the Enterococcus faecium assays, merging some standard types into one, and converting all units to one standard unit. For example, EC50: Merging EC50 with ED50 and converting the different units to one, IC50: Merging IC50 with IC80, and MBC: Merging MBC, MBEC, MBC90, MBC99.9, MBIC & MBIC90

@GemmaTuron
Copy link
Member Author

perfect @HellenNamulinda

Let's try to meet on Monday to discuss the ChEMBL data!

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Aug 15, 2023

Monday 14th & Tuesday 15th, August: Tasks

  • Meeting with my mentor, Gemma, to discuss the next steps for the ChEMBL data.
  • Prepare ML datasets from the ChEMBL data using the example automated pipeline, antimicrobial-ml-tasks
  • Attend weekly team meeting

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Aug 16, 2023

Wednesday 16th, August: Tasks

@HellenNamulinda
Copy link
Contributor

HellenNamulinda commented Aug 17, 2023

Thursday 17th August: Tasks

@GemmaTuron
Copy link
Member Author

Hi @HellenNamulinda !

It was great to work with you, thanks so much for your contributions to Ersilia, we hoped you learnt and enjoyed as much as we did! Please, remain engaged with the community and feel free to open any issues or contribute to open ones :)
I'll now close this issue as completed !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants