Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model fields for Package type #1105

Merged
merged 1 commit into from
May 18, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ python:
- "3.5"
- "3.6"
env:
- DB=sqlite
- DB=postgres
addons:
# postgres versions provided by el7 RHSCL (lowest supportable version)
Expand Down
24 changes: 24 additions & 0 deletions pulp_rpm/app/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from types import SimpleNamespace

CHECKSUM_TYPES = SimpleNamespace(
UNKNOWN='unknown',
MD5='md5',
SHA='sha1', # compatibility nickname from original createrepo
SHA1='sha1',
SHA224='sha224',
SHA256='sha256',
SHA384='sha384',
SHA512='sha512'
)

# The same as above, but in a format that choice fields can use
CHECKSUM_CHOICES = (
(CHECKSUM_TYPES.UNKNOWN, CHECKSUM_TYPES.UNKNOWN),
(CHECKSUM_TYPES.MD5, CHECKSUM_TYPES.MD5),
(CHECKSUM_TYPES.SHA, CHECKSUM_TYPES.SHA),
(CHECKSUM_TYPES.SHA1, CHECKSUM_TYPES.SHA1),
(CHECKSUM_TYPES.SHA224, CHECKSUM_TYPES.SHA224),
(CHECKSUM_TYPES.SHA256, CHECKSUM_TYPES.SHA256),
(CHECKSUM_TYPES.SHA384, CHECKSUM_TYPES.SHA384),
(CHECKSUM_TYPES.SHA512, CHECKSUM_TYPES.SHA512)
)
196 changes: 183 additions & 13 deletions pulp_rpm/app/models.py
Original file line number Diff line number Diff line change
@@ -1,33 +1,203 @@
from logging import getLogger

from django.db import models
from pulpcore.plugin.models import Content, Remote, Publisher

from pulp_rpm.app.constants import CHECKSUM_CHOICES


log = getLogger(__name__)


class RpmContent(Content):
"""
The "rpm" content type.
The "rpm" content type. Maps directly to the fields provided by createrepo_c

Description of the content type.
https://github.com/rpm-software-management/createrepo_c/

Fields:
field1 (type): Description of the field1
field2 (type): Description of the field2
...
name (Text):
Name of the package
epoch (Text):
The package's epoch
version (Text):
The version of the package. For example, '2.8.0'
release (Text):
The release of a particular version of the package. Although this field
can technically be anything, packaging guidelines usually require it to
be an integer followed by the platform, e.g. '1.el7' or '3.f24'. This field
is incremented by the packager whenever a new release of the same version
is created.
arch (Text):
The target architecture for a package. For example, 'x86_64', 'i686', or 'noarch'.

pkgId (Text):
Checksum of the package file
checksum_type (Text):
Type of checksum, e.g. 'sha256', 'md5'

summary (Text):
Short description of the packaged software
description (Text):
In-depth description of the packaged software
url (Text):
URL with more information about the packaged software. This could be the project's
website or its code repository.

changelogs (Text):
Changelogs that package contains - see comments below
files (Text):
Files that package contains - see comments below

requires (Text):
Capabilities the package requires - see comments below
provides (Text):
Capabilities the package provides - see comments below
conflicts (Text):
Capabilities the package conflicts with - see comments below
obsoletes (Text):
Capabilities the package obsoletes - see comments below
suggests (Text):
Capabilities the package suggests - see comments below
enhances (Text):
Capabilities the package enhances - see comments below
recommends (Text):
Capabilities the package recommends - see comments below
supplements (Text):
Capabilities the package supplements - see comments below

location_base (Text):
Base location of this package
location_href (Text):
Relative location of package to the repodata

rpm_buildhost (Text):
Hostname of the system that built the package
rpm_group (Text):
RPM group (See: http://fedoraproject.org/wiki/RPMGroups)
rpm_license (Text):
License term applicable to the package software (GPLv2, etc.)
rpm_packager (Text):
Person or persons responsible for creating the package
rpm_sourcerpm (Text):
Name of the source package (srpm) the package was built from
rpm_vendor (Text):
Name of the organization that produced the package
rpm_header_start (BigInteger):
First byte of the header
rpm_header_end (BigInteger):
Last byte of the header

size_archive (BigInteger):
Size, in bytes, of the archive portion of the original package file
size_installed (BigInteger):
Total size, in bytes, of every file installed by this package
size_package (BigInteger):
Size, in bytes, of the package

time_build (BigInteger):
Time the package was built in seconds since the epoch.
time_file (BigInteger):
The mtime of the package file in seconds since the epoch; this is the 'file' time
attribute in the primary XML.
"""
TYPE = 'type'
TYPE = 'rpm'

# Required metadata
name = models.TextField()
epoch = models.TextField()
version = models.TextField()
release = models.TextField()
arch = models.TextField()

pkgId = models.TextField(unique=True) # formerly "checksum"
checksum_type = models.TextField(choices=CHECKSUM_CHOICES)

# Optional metadata
summary = models.TextField(blank=True)
description = models.TextField(blank=True)
url = models.TextField(blank=True)

# A string containing a JSON-encoded list of dictionaries, each of which represents a single
# changelog. Each changelog dict contains the following fields:
#
# date (int): date of changelog - seconds since epoch
# author (str): author of the changelog
# changelog (str: changelog text
changelogs = models.TextField(default='[]', blank=True)

# field1 = models.TextField(blank=False, null=False)
# field2 = models.TextField(blank=False, null=False)
# A string containing a JSON-encoded list of dictionaries, each of which represents a single
# file. Each file dict contains the following fields:
#
# type (str): one of "" (regular file), "dir", "ghost"
# path (str): path to file
# name (str): filename
files = models.TextField(default='[]', blank=True)

# class Meta:
# unique_together = (
# 'field1',
# 'field2'
# )
# Each of these is a string containing a JSON-encoded list of dictionaries, each of which
# represents a dependency. Each dependency dict contains the following fields:
#
# name (str): name
# flags (str): flags
# epoch (str): epoch
# version (str): version
# release (str): release
# pre (bool): preinstall
requires = models.TextField(default='[]', blank=True)
Copy link
Contributor Author

@dralley dralley May 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do 'rich dependencies' fit in here? I couldn't find any information with respect to createrepo_c + 'rich' or 'boolean' dependencies.

Is it something we don't really have to be concerned with at the modelling level?

Copy link
Member

@goosemania goosemania May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I believe we should not be concerned about that for the modeling part.
Any dependency solving should be handled by an external library and it's its business to figure out all the relationships and types of dependencies the package has by repository and package metadata.

And for createrepo_c I'd guess that any rich dependency would be just a string which is not parsed.

provides = models.TextField(default='[]', blank=True)
conflicts = models.TextField(default='[]', blank=True)
obsoletes = models.TextField(default='[]', blank=True)
suggests = models.TextField(default='[]', blank=True)
enhances = models.TextField(default='[]', blank=True)
recommends = models.TextField(default='[]', blank=True)
supplements = models.TextField(default='[]', blank=True)

location_base = models.TextField(blank=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be null, not just empty string. FWIW, I'm testing on the packages from EPEL repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for at least url and description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Django heavily recommends that you not use null=True with TextField or CharField because it just creates extra "empty" states.

We can do it, but it would be preferable to just store None as an empty string.

location_href = models.TextField(blank=True)

rpm_buildhost = models.TextField(blank=True)
Copy link
Contributor

@jortel jortel May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these particular fields prefixed with rpm_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because that's what createrepo_c does, and we're matching what they do precisely

rpm_group = models.TextField(blank=True)
rpm_license = models.TextField(blank=True)
rpm_packager = models.TextField(blank=True)
rpm_sourcerpm = models.TextField(blank=True)
rpm_vendor = models.TextField(blank=True)
rpm_header_start = models.BigIntegerField(null=True, blank=True)
rpm_header_end = models.BigIntegerField(null=True, blank=True)

size_archive = models.BigIntegerField(null=True, blank=True)
size_installed = models.BigIntegerField(null=True, blank=True)
size_package = models.BigIntegerField(null=True, blank=True)

time_build = models.BigIntegerField(null=True, blank=True)
time_file = models.BigIntegerField(null=True, blank=True)

@property
def nevra(self):
""" Package NEVRA string (Name-Epoch-Version-Release-Architecture)
"""
return "{n}-{e}:{v}-{r}.{a}".format(
n=self.name, e=self.epoch, v=self.version, r=self.release, a=self.arch)

@property
def nvra(self):
""" Package NVRA string (Name-Version-Release-Architecture)
"""
return "{n}-{v}-{r}.{a}".format(
n=self.name, v=self.version, r=self.release, a=self.arch)

class Meta:
unique_together = (
'name', 'epoch', 'version', 'release', 'arch', 'checksum_type', 'pkgId'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone remind me, why we use checksum_type and checksum(pkgId) as a part of the unique index?

Why can't we leave just checksum_type? The only reason to support two packages with the same nevra and checksum_type but different checksum(pkgId) is if one package is wrong and another one is a correct one which was fixed/updated, thus checksum(pkgId) is different. What do you all think? Is it a valid use case?

Why can't we just leave checksum(pkgId) in the unique index? It will cover the use case ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checksum value gives an easy way to know if the same package has been signed by different gpg keys (or no key at all). That's the primary reason checksum is included in the unit key. It could be better to use information about the signature directly in the unit key, but that's a lot harder to access since it isn't included in any of the XML metadata. You need to inspect the RPM file directly to get at it.

Assuming you keep the checksum value in the unit key, the checksum type isn't very helpful for establishing uniqueness. But it's such a fundamental part of the checksum value, that it would be weird to have a value without knowing the type. So that's why both have been included in the unit key.

)


class SrpmContent(RpmContent):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a separate content type, or an is_srpm=True boolean flag on the RPM model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally +1 to have a separate type. It will be a separate table in DB, right? I think it will make all the type related operations easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of how it's modeled. it would be great if when filtering packages that you have 1 endpoint that has both rpms and srpms with a flag like is_srpm=True. With that in mind modeling it as the same may be the easiest. I'm ok with either approach.

"""
The "srpm" content type

(same as the "rpm" content type)
"""
TYPE = 'srpm'


class RpmRemote(Remote):
Expand Down