-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sync for the case when primary.xml contains non-ASCII characters #1035
Conversation
@goosemania, thanks for your PR! By analyzing the history of the files in this pull request, we identified @seandst, @ipanova and @mhrivnak to be potential reviewers. |
start_index = primary_xml_snippet.find("<location ") | ||
end_index = primary_xml_snippet.find("/>", start_index) + 2 # adjust to end of closing tag | ||
|
||
first_portion = primary_xml_snippet[:start_index] | ||
end_portion = primary_xml_snippet[end_index:] | ||
location = """<location href="%s"/>""" % file_utils.make_packages_relative_path(relpath) | ||
return first_portion + location + end_portion | ||
modified_primary_xml_snippet = first_portion + location + end_portion | ||
return modified_primary_xml_snippet.encode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is not 100% necessary because there are places where we check if the xml snippet is in unicode and encode it then, but for the consistency here I think it is better to return all the metadata in the same way.
""" | ||
|
||
primary_xml_snippet = primary_xml_snippet.decode('utf-8', 'replace') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put decoding back inside the change_location_tag
because this function is used in multiple places.
The other option is to decode xml snippet each time before calling change_location_tag
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decoding the primary XML fragment in change_location_tag()
is convenient but not appropriate. Functions should do one thing and having the decode be side-effect of replacing the location tag seems wrong. The fragment should be decoded where it is first extracted from the rpm (like it was). Likewise, this issue should be fixed by decoding where the primary XML fragment is extracted from the YUM metadata. Not here.
wonder if it would make sense to:
to keep it utf8 from the beginning but with the invalid characters dealt with? |
It will work, but I think it is a right way to find/replace/modify string when it is in unicode, isn't? So we will decode it inside the |
No, I'm suggesting the decode remain in
and pass it to This way the invalid characters are dealt with but the fragment remains in utf8. Then, make the same change where the primary XML fragment is extracted from the yum metadata to fix this issue. |
It would be good to be consistent with utf8 |
""" | ||
|
||
primary_xml_snippet = primary_xml_snippet.decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why decode here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
closes #2622
https://pulp.plan.io/issues/2622