New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default custom regex enhancement #571

Merged
merged 7 commits into from Aug 17, 2018

Conversation

Projects
None yet
4 participants
@Zlopez
Copy link
Member

Zlopez commented Aug 6, 2018

This should solve the issue #482

The lint test fails because pytoml can't parse multiline string.

Zlopez added some commits Aug 6, 2018

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Aug 6, 2018

Codecov Report

Merging #571 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #571   +/-   ##
=======================================
  Coverage   89.45%   89.45%           
=======================================
  Files          54       54           
  Lines        2561     2561           
  Branches      327      327           
=======================================
  Hits         2291     2291           
  Misses        203      203           
  Partials       67       67
Impacted Files Coverage Δ
anitya/config.py 100% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74ea662...ceddcdc. Read the comment docs.

@@ -52,7 +52,7 @@
librariesio_platform_whitelist = ['pypi', 'rubygems']
default_regex = '%(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\s]+?)(?i)(?:[-_](?:minsrc|src|source|asc|release))?\.(?:tar|t[bglx]z|tbz2|zip)'

This comment has been minimized.

@pypingou

pypingou Aug 6, 2018

Contributor

You could also just split the string over multiple lines, as you prefer :)

This comment has been minimized.

@Zlopez

Zlopez Aug 6, 2018

Member

I already fixed this by using dummy regex in the test.
Your solution will not work (tested), because pytoml has issue with reading multiline string.

@jeremycline
Copy link
Member

jeremycline left a comment

I've got one small change request, but otherwise it looks good!


REGEX = '%(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\s]+?)(?i)(?:[-_]'\
'(?:minsrc|src|source|asc))?\.(?:tar|t[bglx]z|tbz2|zip)'
REGEX = anitya_config.get('DEFAULT_REGEX')

This comment has been minimized.

@jeremycline

jeremycline Aug 6, 2018

Member

Since 'DEFAULT_REGEX' should always be present in the dictionary, it'd be better here to use anitya_config['DEFAULT_REGEX']. That way if (somehow) it's not present it fails early and in an obvious way at import time rather than being None later.

This comment has been minimized.

@Zlopez

Zlopez Aug 6, 2018

Member

Ok, I will change this, I looked how other value is read in the code and just copied it.

@@ -51,6 +51,9 @@ social_auth_redirect_is_https = true
# via Libraries.io.
librariesio_platform_whitelist = []

# Default regular expression used for backend
default_regex = '%(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\s]+?)(?i)(?:[-_](?:minsrc|src|source|asc|release))?\.(?:tar|t[bglx]z|tbz2|zip)'

This comment has been minimized.

@jeremycline

jeremycline Aug 6, 2018

Member

I believe TOML requires double quotes (") for strings. In case you'd like to try and break it up into multiple lines, the syntax for multi-line strings without adding extraneous whitespace is at https://github.com/toml-lang/toml#string. There's only so much that can be done to make the regex readable, though, so do what you think looks best here :)

This comment has been minimized.

@Zlopez

Zlopez Aug 6, 2018

Member

I already tried that, but it looks like pytoml has issue with this syntax and it fails with error.

This comment has been minimized.

@Zlopez

Zlopez Aug 6, 2018

Member

But you are right, this should be at least in double quotes.

This comment has been minimized.

@Zlopez

Zlopez Aug 6, 2018

Member

Ok, it looks like the issue is in backslash character. When trying to parse text without special characters I didn't have any issue with multiline string.
I will check if there is any escape character for toml.

This comment has been minimized.

@jeremycline

jeremycline Aug 6, 2018

Member

Ah, yep, \\ should escape backslashes

This comment has been minimized.

@Zlopez

Zlopez Aug 6, 2018

Member

It looks like the escaped backslashes aren't visible on html page :-(
screenshot from 2018-08-06 17-24-33

This comment has been minimized.

@Zlopez

Zlopez Aug 7, 2018

Member

This is weird, It looks like the regex is read correctly from the configuration, but I don't know why it is shown without backslashes on the page.
Right now I have this as input:

default_regex = """\
                %(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\\]+?)(?i)(?:[-_]\
                (?:minsrc|src|source|asc|release))?\\.(?:tar|t[bglx]z|tbz2|zip).\
                """

And this is what is saved to config dictionary:

DEFAULT_REGEX = %(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\]+?)(?i)(?:[-_](?:minsrc|src|source|asc|release))?\.(?:tar|t[bglx]z|tbz2|zip).

This comment has been minimized.

@Zlopez

Zlopez Aug 7, 2018

Member

The issues with missing backslash in frontend is there even when not using multiline string.
So this looks like issue that was there before, but nobody noticed it.

Zlopez added some commits Aug 6, 2018

Change how the default_regex value is initialized
This will cause the application to fail if the DEFAULT_REGEX key
is not defined.
Fix render issue
Fix render issue with backslashes in HTML render
@@ -31,6 +31,7 @@ class CustomBackend(BaseBackend):
more_info = 'More information in the '\
'<a href=\'/about#test-your-regex\'>about#test-your-regex</a>'
default_regex = REGEX % {'name': '{project name}'}
default_regex_html = default_regex.replace('\\', '\\\\')

This comment has been minimized.

@jeremycline

jeremycline Aug 17, 2018

Member

Since this is only relevant in the HTML template, I'd recommend just putting it there. Jinja lets you write Python in the templates.

@@ -109,7 +109,7 @@ <h1>{{ context }} project</h1>
examples["{{ plugin.name }}"]="{{ plugin.examples | format_examples }}";
more_info["{{ plugin.name }}"]="{{ plugin.more_info}}";
default_regex["{{ plugin.name }}"]="{{ plugin.default_regex }}";
default_regex["{{ plugin.name }}"]="{{ plugin.default_regex_html }}";

This comment has been minimized.

@jeremycline

jeremycline Aug 17, 2018

Member

So this could just be plugin.default_regex.replace('\\', '\\\\'), I believe

I think this is fine, but just in case you hit some more complex (or user-provided) input that requires sanitizing, https://github.com/mozilla/bleach is a solid library used elsewhere in Fedora infra apps.

@jeremycline
Copy link
Member

jeremycline left a comment

Just one minor simplification and I think this will be good to go.

@Zlopez Zlopez merged commit 7e171ce into release-monitoring:master Aug 17, 2018

3 checks passed

codecov/patch Coverage not affected when comparing 74ea662...ceddcdc
Details
codecov/project 89.45% remains the same compared to 74ea662
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Zlopez Zlopez deleted the Zlopez:custom_regex_enhancement branch Aug 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment