Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nltk trees manu #39

Merged
merged 15 commits into from
Sep 15, 2022
Merged

Nltk trees manu #39

merged 15 commits into from
Sep 15, 2022

Conversation

manulera
Copy link
Owner

All my changes to be merged into your branch

@anamika-yadav99
Copy link
Collaborator

Hi @manulera ! I looked though the code and I ran the code on nbrp strains and it's not working as expected.
Some examples:

  • "leu1-32:pnpg1-npg1-gfp-tadh1-ura4+": "(ROOT (ALLELE leu1-32) (- :) (PROMOTER_GENE (other p) (GENE npg1)) (- -) (GENE npg1) (- -) (TAG gfp) (- -) (PROMOTER_GENE (other t) (GENE adh1)) (- -) (ALLELE ura4+))"
  • "ade6-m210<<ade6+:mfm1-k34stop": "(ROOT (ALLELE ade6-m210) (- <<) (ALLELE_AA_SUBSTITUTION (GENE ade6) (other +)) (- :) (ALLELE_AA_SUBSTITUTION (GENE mfm1) (- -) (other k34s)) (other top))",

The reason why this is happening is because these false patterns do follow the chunker rule so they have label of the chunk rule they match to but the value of other tag doesn't match so they are actually false matches. These trees have to be deleted if the value of the other tag doesn't match.

@manulera
Copy link
Owner Author

Hi @anamika-yadav99, I noticed that that pattern fails and picks up wrong things. I thikn it's probably the regex pattern. You can have a look and see if you figure it out. You can debug it more easily here: https://regex101.com/

@manulera
Copy link
Owner Author

When it's ready we can merge into your branch, and merge that into master

@anamika-yadav99
Copy link
Collaborator

Regex works fine. It's the wrong tree which should be removed.

@manulera
Copy link
Owner Author

hhmm you are right. I wonder why though the tree should only be substituted if changes are made, and regex are matched. One error for sure is line 99:

if updated_tree != allele_tree:
# should be
if updated_tree != output_tree:

You can have a look and see if you can fix it. It may have to do with this way of checking whether a match was found (not sure how ParentedTree equality is tested). However, I don't see why the tree should be updated if the regex are not matched: how do we get to either of the options in line 111 when the regex fails?

# Line 111
if outcome == 'match':
    output_tree = updated_tree
elif outcome == 'split parent':

@manulera
Copy link
Owner Author

I can have a look at it myself later, I would suggest to use only the part of that allele that fails and see if you can figure out where it fails.

@anamika-yadav99
Copy link
Collaborator

hhmm you are right. I wonder why though the tree should only be substituted if changes are made, and regex are matched. One error for sure is line 99:

if updated_tree != allele_tree:
# should be
if updated_tree != output_tree:

You can have a look and see if you can fix it. It may have to do with this way of checking whether a match was found (not sure how ParentedTree equality is tested). However, I don't see why the tree should be updated if the regex are not matched: how do we get to either of the options in line 111 when the regex fails?

# Line 111
if outcome == 'match':
    output_tree = updated_tree
elif outcome == 'split parent':

There will be one more condition here if the match is found then sure go ahead and return but if outcome =='no match' then the tree should be deleted and all the leaves will have to be inserted back at the same position.

Line 99 is correct. Updated tree will not be equal to allele tree if parser identifies .

@manulera
Copy link
Owner Author

Hi Anamika,

The line 99 is incorrect as is. allele_tree is the original input, and it never gets updated, the one that updates is the output_tree (incorporates new chunks found with the rules).

There is no need to delete trees, as output_tree should only be updated if the conditions are met.

@manulera manulera merged commit d5a61f3 into nltk_trees Sep 15, 2022
manulera added a commit that referenced this pull request Sep 15, 2022
* WIP:nltk trees

* readme and doc strings

* more addition to readme

* alleles_pattern_nltk.json

* tests updated

* fixed the failing tests

* updating psuedo grammar

* more to tests

* readme updated

* bug fix in nltk_trees

* Nltk trees manu (#39)

* fix fpbase things

* intermediate fix

* simplified version, new grammar, does not handle split

* half way

* added poetry dependencies

* simple version working

* fix tests

* update gitignore

* Ci workflow (#38)

* ci_yaml and docker

* updating ci.yaml

* updating ci.yaml

* dockerfile updated

* fixing ci

Co-authored-by: Anamika Yadav <anamika310.yadav@gmail.com>

* fix ci line

* remove docker action

* make action run at each push

* download tags in CI

* silly mistake CI fixed

* fixed error

Co-authored-by: Anamika Yadav <anamika310.yadav@gmail.com>

Co-authored-by: Manuel Lera Ramirez <manulera14@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants