Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internally version all detected content by extension #620

Merged
merged 14 commits into from
Oct 8, 2018

Conversation

KeyboardNerd
Copy link
Contributor

@KeyboardNerd KeyboardNerd commented Sep 12, 2018

Conceptual change:

  • Detector now represents anything that extracts information from a layer blob. Detector has two different types for now: Namespace ( Detector to extract namespace) and Feature ( Detector to extract features )
  • Processor and Lister are renamed to Detector

Changes in API:

  • Add Detector message type, Namespace message type
  • Add detector field to Feature/Namespace to indicate which detector finds a specific feature or namespace
  • Change all scanned_* to be just detector

Changes in Database Model:

  • Every features and namespaces are associated with the detectors that found them.
  • Aggregated concepts:
    • Entities: objects relating the vulnerabilities and the ancestries as the result of Analysis.
    • Ancestries: layers and ancestries scanned with detectors with metadata (name/structure/relation to entities/detectors) saved in the database.
    • Vulnerabilities: vulns fetched by the updaters.
    • Notification: vulnerability notifications spawned by vulnerability changes.
    • UpdaterLock: lock and keyValue.

Changes in Behavior:

  • Remove Clair side (dis)enable detectors in configuration. Now, every layer is scanned with every detectors.

Additional responsibility to Client:

  • Client should filter out the features that are not necessary by filtering the detectors ( This responsibility maybe moved to API level on server side in the future )

@KeyboardNerd
Copy link
Contributor Author

Please review clair.proto and models.go to confirm the expectation. Those are expected to not be changed in the following commits.

Copy link
Contributor Author

@KeyboardNerd KeyboardNerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self comments

}

// detector is analysis extensions used by the worker.
detector = dbutil.MigrationQuery{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Table: Detectors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detector should contain all metadata information of all the extensions used to scan a layer.

var (
// entities are the basic building blocks to relate the vulnerabilities with
// the ancestry.
entities = dbutil.MigrationQuery{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entities are the real immutable content in the database.

On a higher level, Layer ( layer hash with layer content ) is not really immutable because a new extension can scan a layer and add new associated features.

if name == "" {
panic("featurefmt: could not register a Lister with an empty name")
}
func RegisterLister(l Lister) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change Lister to detector in the following PRs.


// Detector is an extention to scan a layer's content.
type Detector struct {
// Name of the detector
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review this design to use string to represent the Detector Type.
I can also define FeatureDetector and NamespaceDetector struct types for the same purpose.

config.yaml.sample Show resolved Hide resolved
cmd/clair/main.go Outdated Show resolved Hide resolved
api/v3/clairpb/clair.proto Show resolved Hide resolved
@@ -71,6 +65,7 @@ type Layer struct {
type Namespace struct {
Name string
VersionFormat string
DetectedBy Detector
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Detector here maybe not valid. As the reason shows in the database, namespace/feature may be detected by two different detectors in two different layers.

database/detector.go Show resolved Hide resolved
// Version of the detector
Version string
// Type of the detector
Type DetectorType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is a keyword so I'm a little uneasy about using this name.
As a counter-point, the reflect standard library package does define their own type Type interface {...}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// NewNamespaceDetector returns a new namespace detector.
func NewNamespaceDetector(name string, version string) Detector {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name, verison string

}

// NewFeatureDetector returns a new feature detector.
func NewFeatureDetector(name string, version string) Detector {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name, version string

}
// The name of the detector.
string name = 1;
// The Version of the detector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version -> version

RegisterMigration(dbutil.NewSimpleMigration(1,
[]dbutil.MigrationQuery{
entities, detector, layer, ancestry,
vulnerability, updaterLock, notification,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one line each

@@ -0,0 +1,30 @@
package dbutil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyright headers

and let's move this into database/psql/migrations

All processors will now be used to process the layers.
@KeyboardNerd KeyboardNerd force-pushed the feature/detector branch 4 times, most recently from b67ac8c to 11e5efa Compare September 19, 2018 19:03
type Layer struct {
Hash string
Namespaces DetectedNamespaces
DetectedFeatures DetectedFeatures
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DetectedFeatures -> Features

// ProcessedBy.
// DetectedNamespaces contains a map from detector to the detected namespace.
// Nil detected namespace means it's not found.
type DetectedNamespaces map[Detector]*Namespace
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new map type introduces a lot more complexity. It makes the worker complicated and also the database implementation. Maybe, we don't want this map to organize since the map will at most contain around 1000 features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flatten the map to be []DetectedNamespace, which contains Detector and Namespace.
This is for simplifying the data structure, we can then use pointers and a new data structure if we need to improve the performance.

@KeyboardNerd KeyboardNerd force-pushed the feature/detector branch 5 times, most recently from c6d9b77 to 51a6a87 Compare September 27, 2018 22:09
@KeyboardNerd KeyboardNerd changed the title WIP: Associate Detector with Feature and Namespace Internally version all detected content by extension Sep 28, 2018
@@ -44,17 +44,42 @@ message Vulnerability {
repeated Feature affected_versions = 8;
}

message Detector {
enum Type {
DETECTOR_TYPE_INVALID = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is a requirement by the prototool linter.

api/v3/util.go Show resolved Hide resolved
api/v3/util.go Outdated
// this feature for now, we should refactor the implementation if there's
// any performance issue. It's expected that the number of features is less
// than 1000.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove line

api/v3/util.go Show resolved Hide resolved
clair.EnabledUpdaters = strutil.CompareStringListsInBoth(config.Updater.EnabledUpdaters, updaters)
"Detectors": strings.Join(database.SerializeDetectors(clair.EnabledDetectors), ","),
"Updaters": strings.Join(clair.EnabledUpdaters, ","),
}).Info("Enabled Clair components")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lower case

nslist = append(nslist, *ns)
}
return nslist, nil
return namespaces, nil
}

// RequiredFilenames returns the total list of files required for all
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the comment

@@ -3,11 +3,12 @@ package featurens_test
import (
"testing"

"github.com/coreos/clair/ext/featurens"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re order

pkg/dbutil/dbutil.go Outdated Show resolved Hide resolved
return nil
}

func CacheRelatedVulnerability(datastore database.Datastore, features []database.NamespacedFeature) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: I start to think that the abstraction on database layer is wrong. Somehow we need to figure out a way to get ride of the implicit database feature relationship.

return FilterDetectors(detectors, database.NamespaceType)
}

func FilterDetectors(detectors []database.Detector, dtype database.DetectorType) []database.Detector {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this being used? remove

@KeyboardNerd KeyboardNerd force-pushed the feature/detector branch 2 times, most recently from 9e42bed to 4804d41 Compare September 28, 2018 16:42
Copy link
Contributor Author

@KeyboardNerd KeyboardNerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to other PR reviewer
TODO means it's a thing for the future PR not this one
NIT means it's not blocking merging the PR
Not related means it's a old issue and can be not addressed in this PR
Otherwise it means blocking the PR.

repeated string scanned_listers = 4;
// The configured list of namespace detectors used to scan an ancestry.
repeated string scanned_detectors = 5;
// The detectors used to scan this Ancestry. It maybe not the current set of
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase: The detectors used to scan this Ancestry. It may not be the same set of detectors in the Clair instance.

api/v3/util.go Show resolved Hide resolved
ext/featurefmt/driver.go Show resolved Hide resolved
ext/featurens/driver.go Show resolved Hide resolved
worker.go Outdated Show resolved Hide resolved
worker.go Outdated Show resolved Hide resolved
worker.go Show resolved Hide resolved
worker.go Outdated
}
if !ok {
return false, nil
ancestry, ok, err := dbutil.FindAncestry(datastore, name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's wasteful to find ancestry and all the features associated here. We should later consider to have this function on database interface.

worker.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jzelinskie jzelinskie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first pass. I'm going to need to take a look much harder at the database changes.

return &Detector{
Name: detector.Name,
Version: detector.Version,
Type: Detector_Type(Detector_Type_value[string(detector.DType)]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this kinda sucks. I wonder if gogo protobuf generates something more ergonomic than this.

// NamespaceDetectorType is a type of detector that extracts the namespaces.
NamespaceDetectorType DetectorType = "DETECTOR_TYPE_NAMESPACE"
// FeatureDetectorType is a type of detector that extracts the features.
FeatureDetectorType DetectorType = "DETECTOR_TYPE_FEATURE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to even use a string? Why not have an integer?

const (
  NamespaceDectectorType DetectorType = iota
  FeatureDetectorType
  ...
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I want to store enum in the database. Well I can write a conversion to make this work anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enums are most often just integers in the database that have special meaning for the application. You use enums because it makes the column more simple (thus easier to migrate, copy, compare values etc...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the point isn't it. DetectorType (Detector Enum) is just like Severity Enum. You can look up the database schema for that.

}

// SerializeDetectors returns the string representation of given detectors.
func SerializeDetectors(detectors []Detector) []string {
Copy link
Contributor

@jzelinskie jzelinskie Oct 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put this in strutil if you change the function to be:

func StringifyList(xs []fmt.Stringer) (ys []string) {
  for _, x := range xs {
    ys = append(ys, x.String())
  }
  return
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then it will require the type conversion from []database.Detector to []fmt.Stringer, which sucks

Copy link
Contributor

@jzelinskie jzelinskie Oct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This is super unfortunate... It's because interfaces are different memory layouts and creating a list of them is O(N), so you have to explicitly make the copy yourself rather than having Go do it for you when you pass it into the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me think if this kind of design is wrong after all. I think we are thinking about functions using generics but interfaces are not generics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should keep the original design for the serialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


// DetectedFeatures are the features introduced by this layer when it was
// Hash is the name of the ancestry's layer.
Hash string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is comment also wrong. Hash is the sha256 digest of the manifest for a particular image.

database/pgsql/detector.go Show resolved Hide resolved
// See the License for the specific language governing permissions and
// limitations under the License.

package dbutil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything in here makes sense to just have in the database package rather than in a utility package.

@@ -0,0 +1,285 @@
package testutil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of this is useful for only the database. I think this would also be fine being just in the database package.

Also, copyright header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those are for comparing the database models but it's not only for the database tests. It's also used in the worker test and in the future updater test.

Copy link
Contributor

@jzelinskie jzelinskie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general approach for the database is good. We discussed it enough beforehand that it's pretty much exactly what I expected. I think that there were a bunch of things I noticed for database optimizations, but I don't think we should really spend time on that until we're positive it's the finalized schema.

some additional thoughts... i ran the codebase and sat pondering our current schema for a bit and came up with a few things that might not necessarily need to be done in this PR:

Get rid of namespaced_feature

ancestry_feature -> (id, ancestry_layer_id, feat_id, feat_detector_id, ns_id, ns_detector_id)

make kv table consistent

  • rename fields to updater/$NAMESPACE
  • prefix value to include the SCM: git-sha:hf92dk9

detector type

  • rename column type -> dtype
  • change column to this enum: CREATE TYPE detector_type AS ENUM ('feature', 'namespace');

debian:unstable ns needs to be mapped to a number

we should lstrip "v" off of the alpine namespace version numbers, no other namespaces do that

@@ -180,6 +154,18 @@ func Boot(config *Config) {
st.Stop()
}

func initLogger(flagLogLevel *string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoid using the word init to avoid confusion around the keyword.


clair.EnabledUpdaters = strutil.CompareStringListsInBoth(config.Updater.EnabledUpdaters, updaters)
"Detectors": strings.Join(database.SerializeDetectors(clair.EnabledDetectors), ","),
"Updaters": strings.Join(clair.EnabledUpdaters, ","),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these have to be strings? I'd be nice to keep them as arrays in the JSON logs.
Also the detectors string is borderline unreadable:

{"Detectors":"DETECTOR_TYPE_FEATUREDetector/apk/1.0,DETECTOR_TYPE_FEATUREDetector/dpkg/1.0,DETECTOR_TYPE_FEATUREDetector/rpm/1.0,DETECTOR_TYPE_NAMESPACEDetector/alpine-release/1.0,DETECTOR_TYPE_NAMESPACEDetector/apt-sources/1.0,DETECTOR_TYPE_NAMESPACEDetector/lsb-release/1.0,DETECTOR_TYPE_NAMESPACEDetector/os-release/1.0,DETECTOR_TYPE_NAMESPACEDetector/redhat-release/1.0"}

database/dbutil.go Outdated Show resolved Hide resolved
database/dbutil.go Outdated Show resolved Hide resolved

// PersistNamespaces wraps session PersistNamespaces function with begin and
// commit.
func PersistNamespaces(datastore Datastore, namespaces []Namespace) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should come up with a naming convention for the functions that are just wrapping calls in a transaction.

database/dbutil.go Show resolved Hide resolved
}
}

return "", ErrFailedToParseSeverity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copypasta?


// String returns a unique string representation of the detector.
func (d Detector) String() string {
return fmt.Sprintf("%sDetector/%s/%s", d.DType, d.Name, d.Version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I comment on main, but yeah this is pretty verbose and hard to read

}

// SerializeDetectors returns the string representation of given detectors.
func SerializeDetectors(detectors []Detector) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// Ancestry is a manifest that keeps all layers in an image in order.
type Ancestry struct {
// Name is expected to be the sha-256 digest of the manifest for a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name is a globally unique value for a set of layers. This is often the sha256 digest of an OCI/Docker manifest.

All detected features and namespaces under the context of Layer and
Ancestry will now have the detectors associated, so that the API can
provide the detection information to the Client.
'detector' table is added to store the metadata of detectors.
'layer_feature', 'layer_namespace', and 'ancestry_feature' tables are
modified to store the detection relationship between the
feature/namespace with the detector.
Golang-set library is added to make it easier to support set operations.
1. Every Lister and Detector are versioned
2. detected content, are returned in a map with detector info as the key
Aggregate queries in their corresponding files instead of having the
single file for every queries because the database is more complicated.
Change the V3 implementation to accommondate the detectors.
The worker is changed to accommodate the new database model and API.
Worker is refactored to move the database query helper functions to pkg.
* Refactor layer and ancestry
* Add tests
* Fix bugs introduced when the queries were moved
@KeyboardNerd
Copy link
Contributor Author

This is done:
detector type
rename column type -> dtype
change column to this enum: CREATE TYPE detector_type AS ENUM ('feature', 'namespace');

Move dbutil and testutil to database from pkg
Rename all "result"
All database utility functions are renamed to explicitly say if it will
commit changes or rollback changes on success.
Rename detector type to DType because all reserved key words should be
avoided used as type name or variable name.
@KeyboardNerd KeyboardNerd merged commit 3c72fa2 into quay:master Oct 8, 2018
@KeyboardNerd KeyboardNerd deleted the feature/detector branch October 8, 2018 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants