New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix publish for the on_demand sync case #1298
Conversation
| Returns: | ||
| createrepo_c.Package: package itself in a format of a createrepo_c package object | ||
| """ | ||
| def str_list_to_createrepo_c(s): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a dumb straightforward way of solving the following problem:
- Pulp stores some JSON data for RPMs (filelists, dependencies, etc) in a TextField.
- Currently, during sync list of tuples are implicitly converted to a string.
- At publish time to create repodata using that data , Pulp needs to convert it to the format expected by createrepo_c.
- The easiest way would be to call
json.loadsbut the problem is when there is absence of data in the nested structure.
Example:
Input data: [(1, None)]
- non-working solution 1: Pulp stores as
'[(1, None)]'(string) after sync, at publish timejson.loadsbreaks because it doesn't work withNone, it should benull. - non-working solution 2: Pulp does
json.dumpsat sync time so it stores'[[1, null]]'(string) after sync, at publish timejson.loadsis fine but nowcreatrepo_ccomplains because it needs a list of tuples and not list of lists.
Alternative ways to fix the problem:
- do
json.dumpsat sync time and write custom json.JSONDecoder and use it atjson.loadscall to create tuples and not lists. - do no encoding at sync time, use ast.literal_eval to convert string representation of list to a list, it will preserve tulpes. It's safer than
evalbut it can crash Python due to stack limit, so a potential exploit if someone creates a "good" RPM for that. - maybe we should rethink how we store those structures in general
- anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have some tests for publish with different lazy sync policies.
do_publish I will take a look in why they are passing right now, and make assertions more strict if necessary.
Edit: Disregard my last comment, I already noticed what we are missing and I will update the tests properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both PostgreSQL and MySQL individually have JSONField types with validation, fast(er) serialization and deserialization, and fancy query abilities, but unfortunately Django doesn't have anything that bridges the gap. Frustrating.
I don't have any better suggestions but I think "rethink how we store those structures in general" would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to rethinking how we store structures in the future. I wonder in the short term if it makes sense to keep the json format and create a custom JSONDecoder. How hard would that be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daviddavis, there are two options, imo.
- We can implement encoder which will remember the fact that something was a tuple, similar to this but we need to handle dicts as well, so it will be more complex. And in the DB, it will take more space, since we need to remember if it was a tuple or not. And since filelists can be very long, I think, it's potentially a noticeable increase in size of data we store.
- We can make a very custom decoder like below and use it for files only:
class MyDecoder(json.JSONDecoder):
def __init__(self, **kwargs):
json.JSONDecoder.__init__(self, **kwargs)
self.parse_array = self.JSONArray
self.scan_once = json.scanner.py_make_scanner(self)
def RpmFilesJSONArray(self, s_and_end, scan_once, **kwargs):
values, end = json.decoder.JSONArray(s_and_end, scan_once, **kwargs)
if isinstance(values, list):
tupled = []
for v in values:
if isinstance(v, list):
tupled.append(tuple(v))
else:
tupled.append(v)
return tupled, end
return values, end
It doesn't look any elegant to me ;)
8773bd6
to
8e1b0e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and it works great 👍
closes #4412
https://pulp.plan.io/issues/4412