Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Source package from package information databases #638

Merged
merged 5 commits into from
Oct 15, 2018

Conversation

KeyboardNerd
Copy link
Contributor

In order to resolve the source package problem, the first step is to extract the source package from the package information files.
This PR contains the following as first step:

  1. Refactor featurefmt testing code to accept a csv file containing all the expected extracted packages.
  2. Modify dpkg code to parse the source packages from the dpkg database.
  3. Modify database model Feature to have Parent pointer, which represents the source package relationship. When ever a root feature is affected by some vulnerability, the tree of features from that root are affected.

Feature now has a pointer to parent feature. If a vulnerability affects
a parent feature, this child feature will be affected.
// will also affect this feature.
//
// e.g. A source package is the parent feature of a binary package.
Parent *Feature
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Parent or have Children list?
The problem with using Children list is that lots of places in the code base requires Feature to be Hashable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that just because we're using it as the key in a map? If so, it's not too hard to implement String() on this type and use that as the key.

edit: oh, it's also because of the use of sets now, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapset uses map internally to store elements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm wrong. I assumed that when comparing pointers, mapset checks the internal structure. This is not true. Yes, we'll need a pre-hash function for the Feature type.

@@ -20,6 +20,8 @@ import (
"regexp"
"strings"

"github.com/deckarep/golang-set"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -36,33 +38,87 @@ var (

type lister struct{}

// dpkgPackageMetadata stores the direct parsed objects from the dpkg database.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly

return p.PackageVersion
}

func convertToFeatures(pkgs mapset.Set) []database.Feature {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes me think that we should have Feature to be an interface (ADT) having

GetName(), GetVersionFormat(),...

so that we don't have to copy everything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that overcomplicates things. These lists aren't that big and each struct is only going to be marginally larger than an the struct that backs interfaces in the Go runtime.

I would like the deduplicate this code across formats, though.

pkg.Name = ""
pkg.Version = ""
}
if pkg.Valid() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

"github.com/stretchr/testify/require"

"github.com/coreos/clair/ext/featurefmt"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove


"github.com/coreos/clair/database"
"github.com/coreos/clair/pkg/tarutil"
"github.com/stretchr/testify/assert"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regroup

"github.com/stretchr/testify/assert"
)

// TestData represents the data used to test an implementation of Lister.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either we'll keep this for small instances of package information testing, or we migrate to the new way of testing, which I prefer for larger sample.

//
// The CSV File Must have all extractable packages from the package installation
// source file.
func LoadExpectedResult(relativeFilePath string) []ExpectedPackage {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't export

ext/featurefmt/featurefmttest/featurefmttest.go Outdated Show resolved Hide resolved
@KeyboardNerd KeyboardNerd force-pushed the featureTree branch 2 times, most recently from d6cad0e to 532a40b Compare October 10, 2018 20:28
@KeyboardNerd KeyboardNerd changed the title Add parent feature to Feature model Parse Source package from package information databases Oct 10, 2018
@KeyboardNerd
Copy link
Contributor Author

KeyboardNerd commented Oct 10, 2018

These commits should be all for this PR.

I found a problem with using versionfmt as the hint for associating the namespace.
Instead of the version format, it should be the package manager's name. As we can assume that normally operating system can only use one package manager to install packages.
e.g.
ubuntu:latest have dpkg database, which refers to all features detected in dpkg database.
alpine:latest have apk database, which refers to all features detected in apk database.

If there are multiple namespaces, we can just determine the namespace based on the package installer type.

I suggest to have:
Version Format <-> Package Manager <-> Feature
e.g. Feature openssl:v1.0 is under package manager APK, APK knows it can parse the version using dpkg version parser.

Package manager has the version format to parse the feature, which is found under the package manager.
Our assumption is:
Namespace <-> Package Manager
e.g. Ubuntu:latest has package manager dpkg, therefore, all features that don't have namespace but installed by package manager dpkg should be associated with Ubuntu:latest

Maybe, we can remodel the database to have:

Ancestry -> Layer/NamespacedFeature
Layer -> Feature/Namespace
Feature: Name, Version, Detector
Namespace: Name, Version, Detector
Detector: Name, VersionFormat
NamespacedFeature: Feature+Namespace

@jzelinskie jzelinskie added this to the v3.0.0 milestone Oct 11, 2018
@jzelinskie jzelinskie added kind/design relates to the fundamental design of a component priority/high important functionality component/ext/imagefmt labels Oct 11, 2018
// will also affect this feature.
//
// e.g. A source package is the parent feature of a binary package.
Parent *Feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that just because we're using it as the key in a map? If so, it's not too hard to implement String() on this type and use that as the key.

edit: oh, it's also because of the use of sets now, too?

ext/featurefmt/apk/apk.go Outdated Show resolved Hide resolved
ext/featurefmt/apk/apk.go Outdated Show resolved Hide resolved
ext/featurefmt/apk/apk.go Outdated Show resolved Hide resolved
apk-tools,2.6.7-r0,,
scanelf,1.1.6-r0,pax-utils,1.1.6-r0
musl-utils,1.1.14-r10,musl,1.1.14-r10
libc-utils,0.7-r0,libc-dev,0.7-r0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda dislike testdata directories like this. We can keep this in CSV format if you'd like but we should store it in the same package like this:

var testData = `
musl,1.1.14-r10,,
busybox,1.24.2-r9,,
alpine-baselayout,3.0.3-r0,,
alpine-keys,1.1-r0,,
zlib,1.2.8-r2,,
libcrypto1.0,1.0.2h-r1,openssl,1.0.2h-r1
libssl1.0,1.0.2h-r1,openssl,1.0.2h-r1
apk-tools,2.6.7-r0,,
scanelf,1.1.6-r0,pax-utils,1.1.6-r0
musl-utils,1.1.14-r10,musl,1.1.14-r10
libc-utils,0.7-r0,libc-dev,0.7-r0 
`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that? it's easier to manipulate the expected output CSV file if it's directly stored as a file. Both dpkg and rpm, which support formatting, can easily generate CSV output. I think it's just extra effort to put these inside a go file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid complicating the tests with code that loads test files.
I've noticed that in the standard library, they only use testdata directories for benchmarks.
I don't feel super strong about it though, since we already do this for a bunch of other tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in std not very true https://github.com/golang/go/blob/a0d6420d8be2ae7164797051ec74fa2a2df466a1/src/net/interface_linux_test.go#L93
in this case, it loads the test file for testing the parser. Basically the same usage as ours.

Yes, you're right that loading test file will involve extra code complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. I think we should play things by ear. If there are many test files or they're in a format that's a pain to edit without tools, we should use a testdata directory with files. If it isn't much of a pain to keep things in the test package in Go source files, we should do that.

I leave the decision up to you.

return p.PackageVersion
}

func convertToFeatures(pkgs mapset.Set) []database.Feature {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that overcomplicates things. These lists aren't that big and each struct is only going to be marginally larger than an the struct that backs interfaces in the Go runtime.

I would like the deduplicate this code across formats, though.

ext/featurefmt/featurefmttest/featurefmttest.go Outdated Show resolved Hide resolved
ext/featurefmt/rpm/rpm.go Outdated Show resolved Hide resolved
ext/featurefmt/rpm/rpm.go Outdated Show resolved Hide resolved
@KeyboardNerd KeyboardNerd force-pushed the featureTree branch 3 times, most recently from fe53af1 to 51e3793 Compare October 11, 2018 21:37
1. Featurefmt testing code is moved to featurefmttest package.
2. Featurefmt now can be tested against a csv file, which contains the
expected package information result.
The source package metadata is extracted from the source line instead
of forcing the binary package to have source package information.
Source package is now extracted from the RPM database by using
${SourceRPM} option in the rpm --qf argument.
"o" field is used to extract the Package Origin from the APK database.
@KeyboardNerd
Copy link
Contributor Author

addressed comments
still keep Parent *Feature for now

Copy link
Contributor

@jzelinskie jzelinskie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now. We can discuss where to go next after this is merged.

@KeyboardNerd KeyboardNerd merged commit fe614f2 into quay:master Oct 15, 2018
@KeyboardNerd KeyboardNerd deleted the featureTree branch October 15, 2018 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design relates to the fundamental design of a component priority/high important functionality
Development

Successfully merging this pull request may close these issues.

None yet

2 participants