New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix publish for the on_demand sync case #1298
Conversation
Returns: | ||
createrepo_c.Package: package itself in a format of a createrepo_c package object | ||
""" | ||
def str_list_to_createrepo_c(s): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a dumb straightforward way of solving the following problem:
- Pulp stores some JSON data for RPMs (filelists, dependencies, etc) in a TextField.
- Currently, during sync list of tuples are implicitly converted to a string.
- At publish time to create repodata using that data , Pulp needs to convert it to the format expected by createrepo_c.
- The easiest way would be to call
json.loads
but the problem is when there is absence of data in the nested structure.
Example:
Input data: [(1, None)]
- non-working solution 1: Pulp stores as
'[(1, None)]'
(string) after sync, at publish timejson.loads
breaks because it doesn't work withNone
, it should benull
. - non-working solution 2: Pulp does
json.dumps
at sync time so it stores'[[1, null]]'
(string) after sync, at publish timejson.loads
is fine but nowcreatrepo_c
complains because it needs a list of tuples and not list of lists.
Alternative ways to fix the problem:
- do
json.dumps
at sync time and write custom json.JSONDecoder and use it atjson.loads
call to create tuples and not lists. - do no encoding at sync time, use ast.literal_eval to convert string representation of list to a list, it will preserve tulpes. It's safer than
eval
but it can crash Python due to stack limit, so a potential exploit if someone creates a "good" RPM for that. - maybe we should rethink how we store those structures in general
- anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have some tests for publish with different lazy sync policies.
do_publish I will take a look in why they are passing right now, and make assertions more strict if necessary.
Edit: Disregard my last comment, I already noticed what we are missing and I will update the tests properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both PostgreSQL and MySQL individually have JSONField types with validation, fast(er) serialization and deserialization, and fancy query abilities, but unfortunately Django doesn't have anything that bridges the gap. Frustrating.
I don't have any better suggestions but I think "rethink how we store those structures in general" would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to rethinking how we store structures in the future. I wonder in the short term if it makes sense to keep the json format and create a custom JSONDecoder. How hard would that be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daviddavis, there are two options, imo.
- We can implement encoder which will remember the fact that something was a tuple, similar to this but we need to handle dicts as well, so it will be more complex. And in the DB, it will take more space, since we need to remember if it was a tuple or not. And since filelists can be very long, I think, it's potentially a noticeable increase in size of data we store.
- We can make a very custom decoder like below and use it for files only:
class MyDecoder(json.JSONDecoder):
def __init__(self, **kwargs):
json.JSONDecoder.__init__(self, **kwargs)
self.parse_array = self.JSONArray
self.scan_once = json.scanner.py_make_scanner(self)
def RpmFilesJSONArray(self, s_and_end, scan_once, **kwargs):
values, end = json.decoder.JSONArray(s_and_end, scan_once, **kwargs)
if isinstance(values, list):
tupled = []
for v in values:
if isinstance(v, list):
tupled.append(tuple(v))
else:
tupled.append(v)
return tupled, end
return values, end
It doesn't look any elegant to me ;)
8773bd6
to
8e1b0e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and it works great 👍
closes #4412
https://pulp.plan.io/issues/4412