Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix remaining travis warnings #81

Merged
merged 11 commits into from
Jun 25, 2019
Merged

Fix remaining travis warnings #81

merged 11 commits into from
Jun 25, 2019

Conversation

rillian
Copy link
Contributor

@rillian rillian commented Jun 10, 2019

Address the remaining escape sequence warnings, and replace deprecated pep8 with pycodestyle.

Closes #78 .

Raw strings (marked by the `r` prefix) are normally used for
regular expression patterns to allow that module to expand
any backslash-escape sequences without interference from
the Python interpreter's string expansion, and thus the
requirement for extra quoting.

This practice is further encouraged by python 3.5 and later
raising a `DepreciationWarning` when encountering escape sequences
it can't expand, as is common is re patterns.

However, in python 2.7, the `re` module doesn't expand \u and \U
escapes for unicode codepoints, so patterns with non-ascii
characters can only be used in raw strings when using python 3.3
or later, or the external `regex` module, and neither `u` nor `r`
quoted strings will work for patterns in both languages.

However, Python will concatenate whitespace-separated string
literals even if they use different quoting. Take advantage
of this to split the lexer patterns into raw- and unicode-quoted
segments as appropriate.

Also remove some unnecessary escape sequences.
The pep8 name is deprecated, and the tool continues under
the new name.
Now that we're using a newer pytest, we can invoke it as such
even in the jython build, instead of using the older `py.test`.
The parser's  token pattern strings escaped almost all punctuation
characters, but the Python `re` module has a fairly limited set
of special characters which must be backslash-escaped.

In particular, characters like @ ! : and space do not need escapes,
and even more are fine inside a [] environment describing a set
of characters.

Audit the expression strings and remove most unnecessary escapes.
I left some sequences like `[^\^\[]` which aren't ambiguous without
escapes for clarity of reading.
The default osx image on travis doesn't have an up-to-date
set of homebrew packages, so the `brew upgrade` step must
compile many packages from source and takes tens of minutes
to complete. This makes for very slow test feedback and
occasionally failures when the job times out.

Travis itself doesn't provide python3 packages through
the `language` and `python` yaml keys like it does on
Linux. Setting these keys fails trying to download the
requested tarball.

Give up therefore, and just use whatever python3 was installed
on the image by default. Currently that is 3.6.5 from homebrew.
This significantly speeds up test feedback.
@codecov-io
Copy link

codecov-io commented Jun 11, 2019

Codecov Report

Merging #81 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #81   +/-   ##
=======================================
  Coverage   87.24%   87.24%           
=======================================
  Files          27       27           
  Lines         996      996           
=======================================
  Hits          869      869           
  Misses        127      127
Impacted Files Coverage Δ
pyoracc/atf/common/atflex.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c32960b...6959cd9. Read the comment docs.

@rillian
Copy link
Contributor Author

rillian commented Jun 17, 2019

@ageorgou I was just wondering if you've had time to look at this?

Copy link
Contributor

@ageorgou ageorgou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rillian!

This looks fine, with one exception (the missing ?). I'm not confident enough with the lexer to be sure that the intention is preserved, and that the tests are covering all cases.

If I'm not misreading the missing ?, could you possibly add a test to cover that case? (which I'm assuming is not covered)

# --- Rules for paragaph state----------------------------------
# Free text, ended by double new line

terminates_para = \
"(\#|\@[^i][^\{]|\&|\Z|(^[0-9]+[\'\u2019\u2032\u02CA\xb4]?\.))"
r'(#|@[^i][^{]|&|\Z|(^[0-9]+' u'[\'\u2019\u2032\u02CA\xb4]\\.))'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is missing a ?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely a mistake! Thanks for the thorough review. I'll come up with a test to get coverage on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having some trouble coming up with a test for this. If I understand correctly, this pattern tries to match anything which would indicate a new section and thus it's time to leave the paragraph-parsing substate. So we have &, # sigils for new texts, processing directives or comments, empty lines, @ sigils with some exceptions (?) and then something that looks like a line number with an optional prime.

The problem is, I don't really understand what a 'paragraph' is in the context of the parser. I've tried adding and removing lines between line numbers but haven't seen a difference in output. Suggestions what the corresponding atf would look like are welcome! :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the 'para' state only applies to #note lines, which continue until a double-newline or the sigil for another data type. This is unlike #tr lines which must be whitespace-indented to continue over multiple lines.

There is indeed no coverage for this, and silent failures with some of the sample corpus. I think I can write a regression test now.

pyoracc/atf/common/atflex.py Show resolved Hide resolved
The #note environment can be any free text up to a double newline.
Subsequent lines don't need to be intented like with #tr.

It didn't seem like we had text coverage for this feature.
The #note environment can span multiple lines, but is supposed
to be terminated by a new line label. There was a test for
termination by an @object, but not for this.

Check all the primed variants for line numbers to prevent regression
of an issue I introduced addressing issue oracc#78.
Sanitizing the regular expression pattern strings, I accidentally
removed the optional mark ('?') from the collection of prime
characters used to conclude a #note when it doesn't end in a
double newline.

Restore this, and clarify the related comment.
Python 2.7 can't handle the unicode prime values in normal `str`
objects, so construct the generated line label as a unicode object
instead.
@rillian
Copy link
Contributor Author

rillian commented Jun 20, 2019

Ok @ageorgou, ready for follow-up!

@ageorgou
Copy link
Contributor

Thanks @rillian! I'll take a look and also double check with someone who has more ATF experience, so it will be early next week when I can review with confidence.

Copy link
Contributor

@ageorgou ageorgou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a look now and it all seems fine to me. Thanks for the extra test, too. As I said above, I'll wait to check with someone who's more of an expert before approving. I have some minor comments, but those can also be changed by us later on.

brew upgrade python
brew install python3
python3 --version
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had posted this in an earlier review, oops.

Thanks for noticing that the build was taking a long time because of this!

I've experimented a bit, and we can get a newer environment (Python 3.7.3, pytest 4.6.3) by using a later Xcode image. This gives us (at the moment) more or less the same versions as homebrew, while still not delaying the build. Setting this up is simple, as shown here.

Note that a previous version (Xcode 10) gave a strange error (same as in #74), but maybe that was fixed in a minor pytest update.

Considering that the Python from the base image (currently Xcode 9.4) is still relatively up-to-date, I suggest leaving this without specifying an image, for easier maintenance - but I wanted to comment on this somewhere!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw your experimental branch with the newer macOS image. I came to the same conclusion about the default being easier, although travis isn't doing a great job either way. It would also be good to test on Python 3.7.

I think that should happen in a separate PR, though, so this one can remain focussed on cleaning up warnings in the test output.

pyoracc/test/atf/test_atflexer.py Outdated Show resolved Hide resolved
pyoracc/atf/common/atflex.py Outdated Show resolved Hide resolved
u"4\u2032",
u"5\u02CA",
u"6\xb4"]:
self.compare_note_ended_by_line(label)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a helper function, I think that it would be a bit clearer to parametrise the test (essentially the code in compare_note_ended_by_line) by the label. I may be missing something, though!

Copy link
Contributor Author

@rillian rillian Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would. I tried to use @pytest.mark.parametrize here like in test_atffile.py, but couldn't get it to interact properly with the tests being class methods.

It looked like the unittest functionality could be replaced by a pytest fixture, moving all the tests up to module level and saving an indent, but that's something which touches most of the file, so I thought it was better to leave that for a separate change.

I used this to guide constructing the new unit tests.
Thanks to ageorgou for catching that I'd left it in
the committed version.
Rename a varable to avoid confusion with a built-in.
Thanks to ageorgou for the suggestion.
@rillian
Copy link
Contributor Author

rillian commented Jun 21, 2019

Thanks for the update. Hopefully all the additional issues are addressed now.

Copy link
Contributor

@ageorgou ageorgou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have double-checked with a local ATF user, and the note rules and tests make sense, so all looks good! Thanks again @rillian.

@rillian
Copy link
Contributor Author

rillian commented Jun 25, 2019

Great, thanks! \o/ Can you also merge the PR for me?

@ageorgou ageorgou merged commit 61cc9e5 into oracc:master Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sanitise regular expressions in lexer
3 participants