Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sync for the case when primary.xml contains non-ASCII characters #1035

Merged
merged 1 commit into from
Mar 7, 2017

Conversation

goosemania
Copy link
Member

@mention-bot
Copy link

@goosemania, thanks for your PR! By analyzing the history of the files in this pull request, we identified @seandst, @ipanova and @mhrivnak to be potential reviewers.

start_index = primary_xml_snippet.find("<location ")
end_index = primary_xml_snippet.find("/>", start_index) + 2 # adjust to end of closing tag

first_portion = primary_xml_snippet[:start_index]
end_portion = primary_xml_snippet[end_index:]
location = """<location href="%s"/>""" % file_utils.make_packages_relative_path(relpath)
return first_portion + location + end_portion
modified_primary_xml_snippet = first_portion + location + end_portion
return modified_primary_xml_snippet.encode('utf-8')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not 100% necessary because there are places where we check if the xml snippet is in unicode and encode it then, but for the consistency here I think it is better to return all the metadata in the same way.

"""

primary_xml_snippet = primary_xml_snippet.decode('utf-8', 'replace')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put decoding back inside the change_location_tag because this function is used in multiple places.
The other option is to decode xml snippet each time before calling change_location_tag.

Copy link
Contributor

@jortel jortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decoding the primary XML fragment in change_location_tag() is convenient but not appropriate. Functions should do one thing and having the decode be side-effect of replacing the location tag seems wrong. The fragment should be decoded where it is first extracted from the rpm (like it was). Likewise, this issue should be fixed by decoding where the primary XML fragment is extracted from the YUM metadata. Not here.

@goosemania
Copy link
Member Author

@jortel , I see your point.
Do you agree that here all the values should be encoded in utf-8 and not some of them be in unicode and the others in utf-8? So after changes made in change_location_tag I will encode the xml snippet back to utf-8, just outside of the scope of the change_location_tag?

@jortel
Copy link
Contributor

jortel commented Mar 7, 2017

wonder if it would make sense to:

primary_xml_snippet = primary_xml_snippet.decode('utf-8', 'replace').encode('utf-8')

to keep it utf8 from the beginning but with the invalid characters dealt with?

@goosemania
Copy link
Member Author

It will work, but I think it is a right way to find/replace/modify string when it is in unicode, isn't? So we will decode it inside the change_location_tag anyway.
Or do you mean that we handle invalid characters outside of our function so that's ok to decode snippet inside the function for further modifications?

@jortel
Copy link
Contributor

jortel commented Mar 7, 2017

No, I'm suggesting the decode remain in get_package_xm() but modify slightly as:

primary_xml_snippet = primary_xml_snippet.decode('utf-8', 'replace').encode('utf-8')

and pass it to change_location_tag() as a utf8 str.

This way the invalid characters are dealt with but the fragment remains in utf8.

Then, make the same change where the primary XML fragment is extracted from the yum metadata to fix this issue.

@jortel
Copy link
Contributor

jortel commented Mar 7, 2017

It would be good to be consistent with utf8 str vs unicode but I don't think it's as important as dealing with the non-ascii characters where the fragment is originally created/extracted.

"""

primary_xml_snippet = primary_xml_snippet.decode('utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why decode here?

Copy link
Contributor

@jortel jortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@goosemania goosemania merged commit 2599e0d into pulp:2.12-dev Mar 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants