Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalise allele feture subsitution from toml files #19

Closed
1 task done
manulera opened this issue Jun 24, 2022 · 3 comments
Closed
1 task done

Generalise allele feture subsitution from toml files #19

manulera opened this issue Jun 24, 2022 · 3 comments
Assignees

Comments

@manulera
Copy link
Owner

manulera commented Jun 24, 2022

Do this after #17 so you have the genes in toml file as well.

  • In the script you have written to extract the data from the Dey lab, add a function that:
    • Takes as an input:
      • A list of genotypes.
      • A toml file with allele features (genes, alleles, tags, etc.).
      • A string that will be used to replace the allele features.
    • Replaces the allele features defined in the toml file by the string you pass as third argument (see example below)
  • Remember to replace the longest names or synonyms first, so that kanMx6 gets replaced before kan etc.
  • Remember to make everything lowercase when replacing strings.

The function may look like this:

def replace_allele_features( toml_file, genotypes, word):
    code

genotypes = ['cls1-36 ase1-GFP:KanMx6']
# Replace all the allele features included in alleles.toml by the string ALLELE
genotypes2 = replace_allele_features('data/alleles.toml', genotypes, 'ALLELE')
#genotypes2: ['ALLELE ase1-GFP:KanMx6']

# Replace all the genes included in genes.toml by the string GENE
genotypes3 = replace_allele_features('data/alleles.toml', genotypes2, 'GENE')
#genotypes3: ['ALLELE GENE-GFP:KanMx6']

Then try the code for alleles and genes as in the example above

@anamika-yadav99
Copy link
Collaborator

42f3e07
@manulera Can you have a look at this?

@manulera
Copy link
Owner Author

Hello @anamika-yadav99 good job, that looks much better.

There are still some changes to be made so that it does what it's meant to do, some of it was mentioned in the call today, you can try these two inputs that will give you errors. Remember not to use the ref field, since it contains a url sometimes:

genotypes = ['SPAC1002.01'] # Gene ids are not replaced by GENE
genotypes = ['cls1-36 ase1-GFP:KanMx6', 'SPAC1002.01'] # Does not work for lists with len>1 (see indentation line 38)

Also, you are not addressing this:

Remember to replace the longest names or synonyms first, so that kanMx6 gets replaced before kan etc.

Some comments to improve readability of the code:

  • The function feature_dict is a bit hard to follow. Here some tips to improve it:

    • You don't need the function feature_name, it is the same as calling f.keys()[0] and you are reading the whole toml file again. You can add a comment on top of the line where you do this to explain why you do it like this (so far, there is only one key for the whole file, which is shared by all the features listed in the toml file).
    • Give variables meaningful names:
      • In f the keys are the "feature types" (alleles, tags, etc.), and the value is a dictionary of features of that type. You can name this variable feature_types_dict, for example.
      • f[f.keys()[0]] is a dictionary of features, so you can assign it to a variable features_dict to improve readability vs. f[toml_key_name].
      • toml_key represents the key of a feature in the list, you can call it feature_key.
      • names is a single string value, so it should not be in plural, should be called name.
      • synonym is a list of strings, so it should be a plural
    • You don't need to call keys() to iterate over the keys of a dictionary, it's the default: for key in dict is the same as for key in dict.keys()
    • Use consistent syntax to assign values to a dictionary:
  • Comments to improve replace_allele_features:

  • Remove unused dependencies

@manulera
Copy link
Owner Author

manulera commented Jul 1, 2022

Added in #23

@manulera manulera closed this as completed Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants