Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix publish for the on_demand sync case #1298

Merged
merged 1 commit into from Mar 26, 2019
Merged

Conversation

goosemania
Copy link
Member

Returns:
createrepo_c.Package: package itself in a format of a createrepo_c package object
"""
def str_list_to_createrepo_c(s):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dumb straightforward way of solving the following problem:

  1. Pulp stores some JSON data for RPMs (filelists, dependencies, etc) in a TextField.
  2. Currently, during sync list of tuples are implicitly converted to a string.
  3. At publish time to create repodata using that data , Pulp needs to convert it to the format expected by createrepo_c.
  4. The easiest way would be to call json.loads but the problem is when there is absence of data in the nested structure.
    Example:

Input data: [(1, None)]

  • non-working solution 1: Pulp stores as '[(1, None)]' (string) after sync, at publish time json.loads breaks because it doesn't work with None, it should be null.
  • non-working solution 2: Pulp does json.dumps at sync time so it stores '[[1, null]]' (string) after sync, at publish time json.loads is fine but now creatrepo_c complains because it needs a list of tuples and not list of lists.

Alternative ways to fix the problem:

  • do json.dumps at sync time and write custom json.JSONDecoder and use it at json.loads call to create tuples and not lists.
  • do no encoding at sync time, use ast.literal_eval to convert string representation of list to a list, it will preserve tulpes. It's safer than eval but it can crash Python due to stack limit, so a potential exploit if someone creates a "good" RPM for that.
  • maybe we should rethink how we store those structures in general
  • anything else?

Copy link

@nixocio nixocio Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have some tests for publish with different lazy sync policies.
do_publish I will take a look in why they are passing right now, and make assertions more strict if necessary.

Edit: Disregard my last comment, I already noticed what we are missing and I will update the tests properly.

Copy link
Contributor

@dralley dralley Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both PostgreSQL and MySQL individually have JSONField types with validation, fast(er) serialization and deserialization, and fancy query abilities, but unfortunately Django doesn't have anything that bridges the gap. Frustrating.

I don't have any better suggestions but I think "rethink how we store those structures in general" would be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to rethinking how we store structures in the future. I wonder in the short term if it makes sense to keep the json format and create a custom JSONDecoder. How hard would that be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daviddavis, there are two options, imo.

  1. We can implement encoder which will remember the fact that something was a tuple, similar to this but we need to handle dicts as well, so it will be more complex. And in the DB, it will take more space, since we need to remember if it was a tuple or not. And since filelists can be very long, I think, it's potentially a noticeable increase in size of data we store.
  2. We can make a very custom decoder like below and use it for files only:
class MyDecoder(json.JSONDecoder):

      def __init__(self, **kwargs):
          json.JSONDecoder.__init__(self, **kwargs)
          self.parse_array = self.JSONArray
          self.scan_once = json.scanner.py_make_scanner(self)

      def RpmFilesJSONArray(self, s_and_end, scan_once, **kwargs):
          values, end = json.decoder.JSONArray(s_and_end, scan_once, **kwargs)
          if isinstance(values, list):
               tupled = []
               for v in values:
                   if isinstance(v, list):
                       tupled.append(tuple(v))
                   else:
                       tupled.append(v)
               return tupled, end
          return values, end

It doesn't look any elegant to me ;)

Copy link
Contributor

@daviddavis daviddavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and it works great 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants