Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Input Data Types #603

Closed
Femme-js opened this issue Feb 16, 2023 · 13 comments
Closed

Add new Input Data Types #603

Femme-js opened this issue Feb 16, 2023 · 13 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Femme-js
Copy link
Contributor

Ersilia currently takes three kinds of Input Type : Compound, Protein and Text.

The repository require the IO class for text input data format as text.py file.

This issue is to follow the addition of same file.

@Femme-js
Copy link
Contributor Author

Hi @miquelduranfrigola !

Last week I got the following error while incorporating the IO class for text input type and testing on the #eos1086 model.
eos10861.log
The same error I received while testing eos4cxk. However, I was it might be the problem with particular network connection.

The latest logs show the error with 'input_shape'.
eos10862.log

    def _get_from_model(self, model_id):
        input_type = self._read_input_from_card(model_id)
        if input_type is None:
            input_type = "naive"
        input_shape = self.shape(model_id)
        self.logger.debug("Input type is: {0}".format(input_type))
        self.logger.debug("Input shape is: {0}".format(input_shape.name))
        module = ".types.{0}".format(input_type)
        self.logger.debug("Importing module: {0}".format(module))
        return importlib.import_module(module, package="ersilia.io").IO(
            input_shape=input_shape
        )

This is the function from the input.py file. I am slightly confused about how to define the input shape for the text data type.

@GemmaTuron
Copy link
Member

Hello @Femme-js ,

As we discussed in previous meetings, the Hub needed a new class for "text" inputs, which would work for your model and also @carcablop model on IUPAC name translation (and future models).
In your last meeting with @miquelduranfrigola he showed you the steps to create this new input shape class. Could you please provide an update of the status of adding the new input shape?
Thanks

@Femme-js
Copy link
Contributor Author

@GemmaTuron

Can I make a PR for this, showing the new files I have added?

@GemmaTuron
Copy link
Member

Hi @Femme-js

It is not clear to me from the PR the modifications you have done. Please explain here with more detail so we can follow. @carcablop I am tagging you here because you can probably help as well

@Femme-js
Copy link
Contributor Author

Hi @Femme-js and @carcablop !

Ersilia requires an IO class for each input type, and currently ersilia is not having an IO class for the text input.

It is required to add an IO class inside 'ersilia/ersilia/IO' folder similar to compoun.py.

For that I have added a 'text.py' file inside IO folder, an example text data as text.tsv inside 'IO/examples' folder, and an text.py file for TextIdentifier class inside 'IO/utils/identifiers' folder.

However, I am recieving the same error as described above while testing eos1086 (my model).
eos10862.log

@GemmaTuron GemmaTuron added enhancement New feature or request help wanted Extra attention is needed labels Mar 8, 2023
@GemmaTuron GemmaTuron changed the title IO Class Required for Each Input Type Add new Input Data Types May 22, 2023
@miquelduranfrigola
Copy link
Member

Hi @GemmaTuron and @carcablop what is the current status of this? If we need to tackle the issue, let's organize a quick meeting and strategize. Thanks!

@GemmaTuron
Copy link
Member

Hi @miquelduranfrigola

@carcablop is currently focusing on the GUI, so this task is orphan at the moment

@carcablop
Copy link
Collaborator

Hi @miquelduranfrigola and @GemmaTuron.
I am also focused on these two things (gui and input text). My progress in this class:
I am in the process of creating the IO class to allow text as input, I have based myself a bit on the compound.py example. For this I have created a text.py file inside the ersilia/ersilia/io/types folder. But before proceeding with this class, I have defined another TextIdentifier class to verify that the text is really a 'valid text', in this case I have validated that IUPAC name is valid using the rdkit library. Therefore I have worked on the following:
2. Inside the utils/identifiers/ folder I create a text.py where I create a class called TextIdentifier and define a function that will be def _is_iupac_name() where I check if the text is a valid iupac name, more specifically, using Chem.MolFromIUPAC( text), returning true or false.
Additionally, in the IO class I take into account that my input shape for a text type input is Single. I haven't accounted for other form of input shape like list or pair of list.

@carcablop
Copy link
Collaborator

carcablop commented Jul 6, 2023

Hi @miquelduranfrigola and @GemmaTuron

I have finished with the implementation of this task, and I am testing its functionality with the two models that accept input as text: eos1086 and eos5cc. But I have presented these errors while fetching those models:

Detailed error:
Model API eos1086:run did not produce an output File "/home/carcablop/eos/repository/eos1086/20230706105658_8796F0/eos1086/artifacts/framework/code/main.py", line 56
     dout.to_csv(output_path)
        ^
SyntaxError: invalid syntax

I could tell the problem is here by the square bracket in this line of code:
https://github.com/ersilia-os/eos1086/blob/main/model/framework/code/main.py#L54

  • For the eos5ecc model https://github.com/ersilia-os/eos5ecc, a Readme update is required. The metadata.json was not updated for this model, I think it was a task that was carried out automatically for all models and it was not applied to this one.

A small summary of the implementation.

Ersilia currently receives input of type compound. To accept an input type as text, the following is required

Creation of the IO class to accept models with input type: text

  1. A text.py is created in the path: ersilia-> io -> types -> text.py
  2. The IO object is built, with the initialization parameters depending on the input shape.
  3. A method was built to valid the input text. This method calls the corresponding functions depending on the type of input, if it is iupac_name, name, smile or inchikey, it will treat the type of input and return a dictionary with a key, input, and a text. To do this, do the following:
  4. Following the structure used by ersilia, another file text.py is created in the path, ersilia->utils-> identifiers->text.py.
  5. The TextIdentifier class is created to handle input.
  6. To identify the input type, for example to identify that it is a valid iupac name, NCI tool and PubChem are used to obtain the inchikey. In the same way, if the input is another type of data, smile or inchi key, the input identifier is resolved.
  7. And create two sample input types for the tests. This inside the examples -> text.py folder, with two example of input shape (single and list).

Can we have a meet to expose what I have done? Thank you.

@GemmaTuron
Copy link
Member

Hi @carcablop

This is great, let's find a moment to meet, we'll sync via Slack!

@GemmaTuron
Copy link
Member

The Hub now accepts text input in the format of biomedical text!
We still need to work on other types of text, such as molecules described

@GemmaTuron
Copy link
Member

@miquelduranfrigola

Shall we close this issue? If we want to specify other data types such as described molecules I'd open a specific issue for it

@GemmaTuron GemmaTuron self-assigned this Dec 3, 2023
@GemmaTuron
Copy link
Member

We will close this issue and focus on new input data types one by one as we require them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants