Fix publish for the on_demand sync case #1298

goosemania · 2019-03-13T16:41:43Z

closes #4412
https://pulp.plan.io/issues/4412

goosemania · 2019-03-13T17:05:22Z

pulp_rpm/app/models.py

+        Returns:
+            createrepo_c.Package: package itself in a format of a createrepo_c package object
+        """
+        def str_list_to_createrepo_c(s):


This is a ~~dumb~~ straightforward way of solving the following problem:

Pulp stores some JSON data for RPMs (filelists, dependencies, etc) in a TextField.

Currently, during sync list of tuples are implicitly converted to a string.

At publish time to create repodata using that data , Pulp needs to convert it to the format expected by createrepo_c.

The easiest way would be to call json.loads but the problem is when there is absence of data in the nested structure.
Example:

Input data: [(1, None)]

non-working solution 1: Pulp stores as '[(1, None)]' (string) after sync, at publish time json.loads breaks because it doesn't work with None, it should be null.

non-working solution 2: Pulp does json.dumps at sync time so it stores '[[1, null]]' (string) after sync, at publish time json.loads is fine but now creatrepo_c complains because it needs a list of tuples and not list of lists.

Alternative ways to fix the problem:

do json.dumps at sync time and write custom json.JSONDecoder and use it at json.loads call to create tuples and not lists.

do no encoding at sync time, use ast.literal_eval to convert string representation of list to a list, it will preserve tulpes. It's safer than eval but it can crash Python due to stack limit, so a potential exploit if someone creates a "good" RPM for that.

maybe we should rethink how we store those structures in general

anything else?

We do have some tests for publish with different lazy sync policies.
do_publish I will take a look in why they are passing right now, and make assertions more strict if necessary.

Edit: Disregard my last comment, I already noticed what we are missing and I will update the tests properly.

Both PostgreSQL and MySQL individually have JSONField types with validation, fast(er) serialization and deserialization, and fancy query abilities, but unfortunately Django doesn't have anything that bridges the gap. Frustrating.

I don't have any better suggestions but I think "rethink how we store those structures in general" would be useful.

+1 to rethinking how we store structures in the future. I wonder in the short term if it makes sense to keep the json format and create a custom JSONDecoder. How hard would that be?

@daviddavis, there are two options, imo.

We can implement encoder which will remember the fact that something was a tuple, similar to this but we need to handle dicts as well, so it will be more complex. And in the DB, it will take more space, since we need to remember if it was a tuple or not. And since filelists can be very long, I think, it's potentially a noticeable increase in size of data we store.

We can make a very custom decoder like below and use it for files only:

class MyDecoder(json.JSONDecoder): def __init__(self, **kwargs): json.JSONDecoder.__init__(self, **kwargs) self.parse_array = self.JSONArray self.scan_once = json.scanner.py_make_scanner(self) def RpmFilesJSONArray(self, s_and_end, scan_once, **kwargs): values, end = json.decoder.JSONArray(s_and_end, scan_once, **kwargs) if isinstance(values, list): tupled = [] for v in values: if isinstance(v, list): tupled.append(tuple(v)) else: tupled.append(v) return tupled, end return values, end

It doesn't look any elegant to me ;)

daviddavis

Tested and it works great 👍

closes #4412 https://pulp.plan.io/issues/4412

goosemania added Bugfix 3.0 labels Mar 13, 2019

goosemania commented Mar 13, 2019

View reviewed changes

goosemania force-pushed the issue4412 branch 2 times, most recently from 8773bd6 to 8e1b0e8 Compare March 13, 2019 17:14

nixocio mentioned this pull request Mar 13, 2019

Fix test SyncPublishDownloadPolicyTestCase #1299

Merged

daviddavis approved these changes Mar 26, 2019

View reviewed changes

Fix publish for the on_demand sync case

10a1c0f

closes #4412 https://pulp.plan.io/issues/4412

goosemania force-pushed the issue4412 branch from 8e1b0e8 to 10a1c0f Compare March 26, 2019 20:30

goosemania merged commit 98d3b71 into pulp:master Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix publish for the on_demand sync case #1298

Fix publish for the on_demand sync case #1298

goosemania commented Mar 13, 2019

goosemania Mar 13, 2019

nixocio Mar 13, 2019 •

edited

dralley Mar 19, 2019 •

edited

daviddavis Mar 25, 2019

goosemania Mar 26, 2019

daviddavis left a comment

Fix publish for the on_demand sync case #1298

Fix publish for the on_demand sync case #1298

Conversation

goosemania commented Mar 13, 2019

goosemania Mar 13, 2019

Choose a reason for hiding this comment

nixocio Mar 13, 2019 • edited

Choose a reason for hiding this comment

dralley Mar 19, 2019 • edited

Choose a reason for hiding this comment

daviddavis Mar 25, 2019

Choose a reason for hiding this comment

goosemania Mar 26, 2019

Choose a reason for hiding this comment

daviddavis left a comment

Choose a reason for hiding this comment

nixocio Mar 13, 2019 •

edited

dralley Mar 19, 2019 •

edited