Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert gene_IDs_names to toml #17

Closed
manulera opened this issue Jun 24, 2022 · 4 comments
Closed

Convert gene_IDs_names to toml #17

manulera opened this issue Jun 24, 2022 · 4 comments
Assignees

Comments

@manulera
Copy link
Owner

manulera commented Jun 24, 2022

Create a script get_data/convert_genes2toml.py that writes the file data/genes.toml from data/gene_IDs_names.tsv, the same way that get_data/convert_alleles2toml.py writes data/alleles_pombemine.tsv.

What the fields are is explained in the readme.

For a row in the file, such as:

SPAC1250.01	snf21	SPAC29A4.21,brg1

The output should be

# Note that here the gene name is doublequoted, because it contains a dot, so that toml doesn't think we 
# are specifying a subclass. You don't have to specify this in python, you can use 'SPAC1250.01' as a dictionary
# key, and it will know to format it like this for the output.
[gene."SPAC1250.01"] 
ref = "SPAC1250.01"
# "name" will be empty sometimes. Not all genes have a preferred name.
# This should be taken into account when replacing gene names
name = "snf21"
# This field will be missing sometimes (not all genes have synonyms).
synonyms = ["SPAC29A4.21", "brg1"]
@manulera
Copy link
Owner Author

manulera commented Jun 24, 2022

Hi @anamika-yadav99

I had a look at the first implementation. There are some issues with it, I have added some comments:

8af4e1e#r76881074

Some more details. If the gene has no synonyms (third column is empty), then it should not have the synonyms attribute in toml

[gene."SPNCRNA.4639"]
name = "SPNCRNA.4639" # name should not be added if 2nd column is empty
ref = "SPNCRNA.4639" # This should always be here
synonyms = [ "SPNCRNA.4639",] # synonyms should not be added if 3rf column is empty

@anamika-yadav99
Copy link
Collaborator

@manulera this issue is resolved I think. The final commit to solve this issue : 876c380

@manulera
Copy link
Owner Author

Good job @anamika-yadav99 ! The function does what it should do, just a few details on the code style:

if len(ls) > 1:
if ls[1] != '':

When checking two conditions, use and instead of two subsequent if statements. When you use and the first statement is checked first, so when you write if len(ls) > 1 and ls[1] != '': the second part of the statement is only executed if the first one is true, so even if the length of ls is only 1, ls[1] will not be executed and will not give an error. Same applies to line 19.

synonyms = re.split(",", ls[2])

You don't need to use re here, regular string splitting should work. You can then remove the import of re module as well.

@manulera
Copy link
Owner Author

manulera commented Jul 1, 2022

Added in #23

@manulera manulera closed this as completed Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants