Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchestrator - Discovery + Run scan job #3

Merged
merged 23 commits into from
Nov 20, 2022
Merged

Conversation

fishkerez
Copy link
Contributor

@fishkerez fishkerez commented Nov 13, 2022

Add Orchestrator logic, same as in KubeClarity.

Main features:

  • Get a runtime scan request with scan scope
  • discover all the instances that need to scan
  • for each instance, run a scanner job:
    • take a snapshot of the target instance root volume
    • spin a scanning job with the snapshot attach to it
    • delete the job with all relevant resources when done

Discovery flow according to Scan Scope Configuration:

  • All or…
  • By regions, which in each region the user can choose specific:
    - VPCs which in each VPC the user can choose specific security groups

In each scope type the user can choose tag selector:

  • Non-running (stopped) VMs
  • Excluded Tags
  • Included Tags

@fishkerez fishkerez requested review from akpsgit, FrimIdan and a user and removed request for akpsgit, FrimIdan and a user November 13, 2022 08:20
@fishkerez fishkerez self-assigned this Nov 13, 2022
@fishkerez fishkerez requested review from FrimIdan and akpsgit and removed request for akpsgit and FrimIdan November 13, 2022 09:10
@akpsgit akpsgit requested review from a user, pbalogh-sa and FrimIdan and removed request for FrimIdan, a user and pbalogh-sa November 13, 2022 16:39
runtime_scan/pkg/config/config.go Outdated Show resolved Hide resolved
runtime_scan/pkg/provider/aws/client.go Outdated Show resolved Hide resolved
runtime_scan/pkg/provider/aws/client.go Outdated Show resolved Hide resolved
func (c *Client) WaitForInstanceReady(instance types.Instance) error {
ticker := time.NewTicker(3 * time.Second)
defer ticker.Stop()
timeout := time.After(3 * time.Minute)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use a context.WithTimeout here so that if we timeout here the timeout goes through to the ec2Client too like:

ctx := context.WithTimeout(context.Background(), 3 * time.Minute)
for {
     select {
     case <-time.After(3 * time.Second)
         out, err := c.ec2Client.DescribeInstances(ctx, &ec2.DescribeInstancesInput{
         ....
     case <- ctx.Done():
	return ctx.Err()
     }
 }

I've also changed to putting the time.After into the for loop because using a ticker can result in two requests happening immediately if the request round trip takes about the same about of time as the ticker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

runtime_scan/pkg/provider/aws/client.go Outdated Show resolved Hide resolved
runtime_scan/pkg/config/config.go Show resolved Hide resolved
runtime_scan/pkg/provider/aws/client.go Show resolved Hide resolved
runtime_scan/pkg/provider/aws/client.go Outdated Show resolved Hide resolved
runtime_scan/pkg/scanner/job_managment.go Show resolved Hide resolved
runtime_scan/pkg/scanner/job_managment.go Show resolved Hide resolved
runtime_scan/pkg/scanner/job_managment.go Show resolved Hide resolved
select {
case <-data.resultChan:
log.WithFields(s.logFields).Infof("Instance scanned result has arrived. instanceID=%v", data.instance.ID)
case <-ticker.C:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case <-ticker.C:
case <- time.After(s.scanConfig.JobResultTimeout):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put a todo with sam's comment and then in the end of the project see what is the diff between the orchestrator here and kubeclarity, and handle the comments as part of the diff. WDYT?

data.timeout = true
data.completed = true
s.Unlock()
case <-ks:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: replace with context

runtime_scan/pkg/scanner/job_managment.go Show resolved Hide resolved
return s.providerClient.RunScanningJob(jobConfig)
}

func (s *Scanner) deleteJobIfNeeded(job *types.Job, isSuccessfulJob, isCompletedJob bool) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't need to be part of *Scanner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put a todo with sam's comment and then in the end of the project see what is the diff between the orchestrator here and kubeclarity, and handle the comments as part of the diff. WDYT?

runtime_scan/pkg/types/types.go Outdated Show resolved Hide resolved
runtime_scan/pkg/types/types.go Show resolved Hide resolved
runtime_scan/pkg/types/types.go Show resolved Hide resolved
runtime_scan/pkg/types/types.go Outdated Show resolved Hide resolved
runtime_scan/pkg/types/types.go Show resolved Hide resolved
Erez Fishhimer added 2 commits November 17, 2022 13:53
ghost
ghost previously approved these changes Nov 17, 2022
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Eriz, lets get this merged and follow up with any cleanups/refactors


func (c *Client) LaunchInstance(ami, deviceName, subnetID string, snapshot types.Snapshot) (types.Instance, error) {
out, err := c.ec2Client.RunInstances(context.TODO(), &ec2.RunInstancesInput{
func (c *Client) LaunchInstance(ctx context.Context, snapshot provider.Snapshot) (provider.Instance, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this a function of the snapshot interface i.e.:

instance := snapshot.LaunchScannerInstance(ctx context.Context)

then all the snapshot info that we need can remain private to the snapshot implementation struct.

Alternatively we can cast the snapshot when we receive it then we can use any private fields, and prevent weird errors:

func (c *Client) LaunchInstance(ctx context.Context, snapshot provider.Snapshot) (provider.Instance, error) {
    awsSnapshot, ok := snapshot.(*SnapshotImpl)
    if !ok {
        return nil, fmt.Errorf("can not launch AWS instance with non-aws snapshot")
    }
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be done in a follow up

}
out, err := v.ec2Client.CreateSnapshot(ctx, &params, func(options *ec2.Options) {
options.Region = v.region
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the requirement to move the snapshots between regions is AWS specific weirdness, I think if that is the case, that can be part of this function. The AWS provider snapshot implementation could then keep track of both src and dest snapshot ID's for the cleanup.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be cleaned up in follow up PRs


cpySnapshot, err := snapshot.Copy(ctx, s.region)
if err != nil {
return provider.Job{}, fmt.Errorf("failed to copy snapshot %v: %v", snapshot.GetID(), err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap this in `if s.Region != snapshot.GetRegion()" ? In the case when we don't need to copy the snapshot we can leave SrcSnapshot as nil in the job for the cleanup

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be cleaned up in follow up PRs

ImageID string
DeviceName string
SubnetID string
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think the Job/JobConfig belong in the provider package, they are part of the scanner and should refer to objects from the providers.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be cleaned up in follow up PRs

@@ -304,18 +304,30 @@ func (c *Client) ListAllRegions(ctx context.Context) ([]Region, error) {
return ret, nil
}

// AND logic - if excludeTags = {tag1:val1, tag2:val2},
// then instance will be excluded only if he have ALL this tags ({tag1:val1, tag2:val2})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// then an instance will be excluded only if it has ALL these tags ({tag1:val1, tag2:val2})

@@ -314,7 +314,7 @@ func Test_hasExcludedTags(t *testing.T) {
want: false,
},
{
name: "instance has excluded tags",
name: "instance does not have ALL the excluded tags",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth also adding a test where one of the tags in the excluded is not matched (case of partial matching)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also this test- instance does not have ALL the excluded tags

@@ -77,7 +77,7 @@ import (
// },
// },
// ScanStopped: true,
// IncludeTags: []*types.Tag{
// TagSelector: []*types.Tag{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we delete the dead tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

)

//
//func TestClient_ListAllRegions(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

akpsgit
akpsgit previously approved these changes Nov 20, 2022
Erez Fishhimer added 2 commits November 20, 2022 12:59
@fishkerez fishkerez merged commit cc41f0d into main Nov 20, 2022
@fishkerez fishkerez deleted the targets-discovery branch November 20, 2022 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants