Fix sync for the case when primary.xml contains non-ASCII characters #1035

goosemania · 2017-03-07T11:21:00Z

closes #2622
https://pulp.plan.io/issues/2622

mention-bot · 2017-03-07T11:21:01Z

@goosemania, thanks for your PR! By analyzing the history of the files in this pull request, we identified @seandst, @ipanova and @mhrivnak to be potential reviewers.

goosemania · 2017-03-07T11:25:43Z

plugins/pulp_rpm/plugins/importers/yum/parse/rpm.py

    start_index = primary_xml_snippet.find("<location ")
    end_index = primary_xml_snippet.find("/>", start_index) + 2  # adjust to end of closing tag

    first_portion = primary_xml_snippet[:start_index]
    end_portion = primary_xml_snippet[end_index:]
    location = """<location href="%s"/>""" % file_utils.make_packages_relative_path(relpath)
-    return first_portion + location + end_portion
+    modified_primary_xml_snippet = first_portion + location + end_portion
+    return modified_primary_xml_snippet.encode('utf-8')


This part is not 100% necessary because there are places where we check if the xml snippet is in unicode and encode it then, but for the consistency here I think it is better to return all the metadata in the same way.

goosemania · 2017-03-07T11:27:42Z

plugins/pulp_rpm/plugins/importers/yum/parse/rpm.py

    """
-
+    primary_xml_snippet = primary_xml_snippet.decode('utf-8', 'replace')


I put decoding back inside the change_location_tag because this function is used in multiple places.
The other option is to decode xml snippet each time before calling change_location_tag.

jortel

Decoding the primary XML fragment in change_location_tag() is convenient but not appropriate. Functions should do one thing and having the decode be side-effect of replacing the location tag seems wrong. The fragment should be decoded where it is first extracted from the rpm (like it was). Likewise, this issue should be fixed by decoding where the primary XML fragment is extracted from the YUM metadata. Not here.

goosemania · 2017-03-07T16:08:50Z

@jortel , I see your point.
Do you agree that here all the values should be encoded in utf-8 and not some of them be in unicode and the others in utf-8? So after changes made in change_location_tag I will encode the xml snippet back to utf-8, just outside of the scope of the change_location_tag?

jortel · 2017-03-07T16:21:33Z

wonder if it would make sense to:

primary_xml_snippet = primary_xml_snippet.decode('utf-8', 'replace').encode('utf-8')

to keep it utf8 from the beginning but with the invalid characters dealt with?

goosemania · 2017-03-07T16:30:02Z

It will work, but I think it is a right way to find/replace/modify string when it is in unicode, isn't? So we will decode it inside the change_location_tag anyway.
Or do you mean that we handle invalid characters outside of our function so that's ok to decode snippet inside the function for further modifications?

jortel · 2017-03-07T16:37:32Z

No, I'm suggesting the decode remain in get_package_xm() but modify slightly as:

primary_xml_snippet = primary_xml_snippet.decode('utf-8', 'replace').encode('utf-8')

and pass it to change_location_tag() as a utf8 str.

This way the invalid characters are dealt with but the fragment remains in utf8.

Then, make the same change where the primary XML fragment is extracted from the yum metadata to fix this issue.

jortel · 2017-03-07T16:43:13Z

It would be good to be consistent with utf8 str vs unicode but I don't think it's as important as dealing with the non-ascii characters where the fragment is originally created/extracted.

jortel · 2017-03-07T17:14:40Z

plugins/pulp_rpm/plugins/importers/yum/parse/rpm.py

    """
-
+    primary_xml_snippet = primary_xml_snippet.decode('utf-8')


why decode here?

closes pulp#2622 https://pulp.plan.io/issues/2622

jortel

Looks good.

goosemania added the Bugfix label Mar 7, 2017

goosemania commented Mar 7, 2017

View reviewed changes

tehsmyers approved these changes Mar 7, 2017

View reviewed changes

jortel suggested changes Mar 7, 2017

View reviewed changes

goosemania force-pushed the issue2622 branch from 8252c28 to e5cff78 Compare March 7, 2017 17:05

jortel reviewed Mar 7, 2017

View reviewed changes

Fix sync for the case when primary.xml contains non-ASCII characters

2599e0d

closes pulp#2622 https://pulp.plan.io/issues/2622

goosemania force-pushed the issue2622 branch from e5cff78 to 2599e0d Compare March 7, 2017 17:53

jortel approved these changes Mar 7, 2017

View reviewed changes

goosemania merged commit 2599e0d into pulp:2.12-dev Mar 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sync for the case when primary.xml contains non-ASCII characters #1035

Fix sync for the case when primary.xml contains non-ASCII characters #1035

goosemania commented Mar 7, 2017

mention-bot commented Mar 7, 2017

goosemania Mar 7, 2017

goosemania Mar 7, 2017

jortel left a comment •

edited

Loading

goosemania commented Mar 7, 2017

jortel commented Mar 7, 2017 •

edited

Loading

goosemania commented Mar 7, 2017

jortel commented Mar 7, 2017

jortel commented Mar 7, 2017

jortel Mar 7, 2017

jortel left a comment

		"""

		primary_xml_snippet = primary_xml_snippet.decode('utf-8', 'replace')

Fix sync for the case when primary.xml contains non-ASCII characters #1035

Fix sync for the case when primary.xml contains non-ASCII characters #1035

Conversation

goosemania commented Mar 7, 2017

mention-bot commented Mar 7, 2017

goosemania Mar 7, 2017

Choose a reason for hiding this comment

goosemania Mar 7, 2017

Choose a reason for hiding this comment

jortel left a comment • edited Loading

Choose a reason for hiding this comment

goosemania commented Mar 7, 2017

jortel commented Mar 7, 2017 • edited Loading

goosemania commented Mar 7, 2017

jortel commented Mar 7, 2017

jortel commented Mar 7, 2017

jortel Mar 7, 2017

Choose a reason for hiding this comment

jortel left a comment

Choose a reason for hiding this comment

jortel left a comment •

edited

Loading

jortel commented Mar 7, 2017 •

edited

Loading