Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expandall stalling. #11

Closed
tsalo opened this issue Jan 19, 2017 · 11 comments
Closed

Expandall stalling. #11

tsalo opened this issue Jan 19, 2017 · 11 comments

Comments

@tsalo
Copy link
Member

tsalo commented Jan 19, 2017

I don't know if the regular expressions are getting too long, but I'm trying to run expandall on a large number of text files and it's getting stuck on some of them. @emdupre, before I dig into this, have you encountered it?

I'm guessing that this also applies to findall, but I haven't tested it.

@emdupre
Copy link
Collaborator

emdupre commented Jan 19, 2017

I haven't seen that, at least on the test texts currently uploaded. Are these full articles you're running against, now?

@tsalo
Copy link
Member Author

tsalo commented Jan 19, 2017

Actually on both full articles and abstracts. I just checked and the thing that's causing problems right now is a single-letter abbreviation (in this case s). It's not even a true abbreviation. It's an optional pluralization: brain structure(s).

@tsalo
Copy link
Member Author

tsalo commented Jan 19, 2017

And the stall is coming from utils.replace!

@emdupre
Copy link
Collaborator

emdupre commented Jan 19, 2017

In findall, is it returning 'structure' as the term? If so, should we set it so that abbreviations must be enclosed in parentheses and preceded by a space?

@tsalo
Copy link
Member Author

tsalo commented Jan 19, 2017

It's returning structure(s because we have a line that finds ' (' in the full term, which doesn't exist in this case. When the substring isn't found in a string, the find method returns -1 (the last character).

Then, it gets stuck in the while loop in utils.replace. I think we need both an escape for the while loop (to prevent infinite loops) and a check for the space before the open parenthesis before trying to replace throughout the text.

A perhaps 'hack-y' way to do it would be to say that index cannot equal -1.

@tsalo
Copy link
Member Author

tsalo commented Jan 19, 2017

Okay maybe requiring that there be a space is enough. It looks like it fixed it for me. pytest isn't working for me because test_utils.py is empty. How do you run the tests?

@emdupre
Copy link
Collaborator

emdupre commented Jan 19, 2017

As soon as you pushed the commit the Travis CI build started— looks like both versions of python still pass!

@tsalo
Copy link
Member Author

tsalo commented Jan 19, 2017

Oooh wow I totally forgot about CI. I need to stop directly committing and start doing PRs from a fork like you do. Anyway, looks like it's solved at the moment.

@tsalo tsalo closed this as completed Jan 19, 2017
@tsalo tsalo reopened this Jan 19, 2017
@tsalo
Copy link
Member Author

tsalo commented Jan 19, 2017

Yeah so that was one problem causing infinite loops. Another one just came up.

This is definitely a false positive, but the identified abbreviation is X and the full term is XX.
Testing on the string 'XX XX (X) X' causes an infinite loop.

I think it has something to do with keeping track of where to start searching for the full term after replacing it once here. Maybe when the "abbreviation" X is replaced with XX, it's extending past the new start_idx in text and so it finds a new X to replace with XX, etc.

@tsalo
Copy link
Member Author

tsalo commented Jan 20, 2017

I think I've managed to deal with the new problem in #12.

@emdupre
Copy link
Collaborator

emdupre commented Jan 21, 2017

I think it's a reasonable fix, and the builds are still passing. I went ahead and merged #12 and will close this issue unless something else arises. Thanks for catching and fixing that!

@emdupre emdupre closed this as completed Jan 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants