System Artifact Manager for storing and managing system-generated non-OCI artifact data #16646

prahaladdarkin · 2022-04-04T15:41:23Z

Scope of change:

System artifact manager model definitions
System artifact manager subsystem implementation
Integration of system artifact clean-up jobs with job service
Unit tests/functional tests for the above changes

Signed-off-by: prahaladdarkin prahaladd@vmware.com

…-OCI artifact data Signed-off-by: prahaladdarkin <prahaladd@vmware.com>

codecov · 2022-04-04T15:47:43Z

Codecov Report

Merging #16646 (dac5fe7) into main (846dc94) will decrease coverage by 0.03%.
The diff coverage is 77.45%.

@@            Coverage Diff             @@
##             main   #16646      +/-   ##
==========================================
- Coverage   67.39%   67.35%   -0.04%     
==========================================
  Files         951      960       +9     
  Lines       78909    79229     +320     
  Branches     2321     2331      +10     
==========================================
+ Hits        53183    53368     +185     
- Misses      22167    22276     +109     
- Partials     3559     3585      +26

Flag	Coverage Δ
unittests	`67.35% <77.45%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/controller/systemartifact/callback.go	`0.00% <0.00%> (ø)`
src/jobservice/job/impl/systemartifact/cleanup.go	`53.84% <53.84%> (ø)`
src/controller/systemartifact/execution.go	`63.04% <63.04%> (ø)`
src/pkg/systemartifact/dao/dao.go	`65.90% <65.90%> (ø)`
src/pkg/systemartifact/model/model.go	`80.00% <80.00%> (ø)`
src/pkg/systemartifact/manager.go	`97.50% <97.50%> (ø)`
src/jobservice/runtime/bootstrap.go	`50.00% <100.00%> (+0.32%)`	⬆️
src/pkg/systemartifact/cleanupcriteria.go	`100.00% <100.00%> (ø)`
src/controller/event/handler/auditlog/auditlog.go	`8.69% <0.00%> (-52.18%)`	⬇️
...c/app/base/project/repository/artifact/artifact.ts	`51.85% <0.00%> (-48.15%)`	⬇️
... and 25 more

chlins · 2022-04-14T07:48:00Z

make/migrations/postgresql/0080_2.5.0_schema.up.sql

+Github proposal link : https://github.com/goharbor/community/pull/181
+*/
+
+ CREATE TABLE IF NOT EXISTS "system_artifact" (


Should move this migration script to 2.6.0_schema.up.sql as this pr will be merged into main branch.

chlins · 2022-04-14T08:00:11Z

src/controller/systemartifact/execution.go

+
+	if !async {
+		err := c.createCleanupTask(ctx, jobParams, execId)
+		if err != nil {


chlins · 2022-04-14T08:08:19Z

src/controller/systemartifact/execution.go

+		logger.Info("Created job for scan data export successfully")
+		return nil
+	}
+	go func(ctx context.Context) {


I'm confusing here why we need a flag async, IMO createCleanupTask is only create task in db and will not take too long time. And here has a different behavior, in sync mode error will be returned as Start() error, but for async mode error only be logged.

chlins · 2022-04-14T08:15:04Z

src/core/main.go

@@ -321,3 +326,37 @@ func getDefaultScannerName() string {
 	}
 	return ""
 }
+


I suggest create a new folder to save this function or other system job in the future and import here like systemjob.Register(ctx) to keep the main clean and avoid too many business packages imported to main.

chlins · 2022-04-14T08:16:15Z

src/core/main.go

@@ -321,3 +326,37 @@ func getDefaultScannerName() string {
 	}
 	return ""
 }
+
+func scheduleSystemArtifactCleanJob(ctx context.Context) {
+	cronType := "Daily"


Maybe can be defined as const to share for future usage.

chlins · 2022-04-14T08:23:23Z

src/pkg/systemartifact/cleanupcriteria.go

+)
+
+var (
+	DefaultCleanupWindowSeconds = 86400


Do we need expose this flag to config?

chlins · 2022-04-14T08:24:58Z

src/pkg/systemartifact/cleanupcriteria.go

+	duration := time.Duration(DefaultCleanupWindowSeconds) * time.Second
+	timeRange := q.Range{Max: currentTime.Add(-duration).Format(time.RFC3339)}
+	logger.Infof("Cleaning up system artifacts with range: %v", timeRange)
+	query := q.Query{Keywords: map[string]interface{}{"create_time": &timeRange}}


chlins · 2022-04-14T08:25:35Z

src/pkg/systemartifact/dao/dao.go

+	"context"
+	"github.com/goharbor/harbor/src/lib/orm"
+)
+import "github.com/goharbor/harbor/src/pkg/systemartifact/model"


go fmt && go imports.

chlins · 2022-04-14T08:32:49Z

src/pkg/systemartifact/manager.go

+
+func (mgr *systemArtifactManager) GetStorageSize(ctx context.Context) (int64, error) {
+	listQuery := q.Query{}
+	artifacts, err := mgr.dao.List(ctx, &listQuery)


if there are a lot of artifacts, here maybe exist potential performance issue, how about use SUM SQL to aggregate the result?

chlins · 2022-04-14T08:33:31Z

src/pkg/systemartifact/manager.go

+
+func (mgr *systemArtifactManager) RegisterCleanupCriteria(vendor string, artifactType string, criteria CleanupCriteria) {
+	key := fmt.Sprintf(keyFormat, vendor, artifactType)
+	cleanupCriterias[key] = criteria


do we need lock?

github-actions · 2022-06-13T09:03:50Z

This PR is being marked stale due to a period of inactivty. If this PR is still relevant, please comment or remove the stale label. Otherwise, this PR will close in 30 days.

stonezdj · 2022-06-23T02:03:14Z

src/core/main.go

@@ -274,6 +278,7 @@ func main() {

 	log.Info("Fix empty subiss for meta info data.")
 	oidc.FixEmptySubIss(orm.Context())
+	scheduleSystemArtifactCleanJob(ctx)


scheduleSystemArtifactCleanJob(ctx) might fail because the jobservice container might be not ready when harbor core is not ready.

@stonezdj thank you for this update. Please note that the updated PR to be reviewed washttps://github.com//pull/16879 but the concern you mentioned still holds true even for the new implementation. Could you please provide guidance on how the job should be scheduled at start-up since the artifact cleanup job is an internal system that cannot be triggered by end user unlike the scan-all job. cc @heww @wy65701436

I proposed a possible solution in here

stonezdj · 2022-06-23T07:14:12Z

src/pkg/systemartifact/manager.go

+}
+
+func NewManager() Manager {
+	sysArtifactMgr := &systemArtifactManager{


return &systemArtifactManager{ ... } directly

stonezdj · 2022-06-23T08:26:28Z

src/pkg/systemartifact/manager.go

+
+		if err != nil {
+			logger.Errorf("Error when cleaning up system artifacts for 'vendor:artifactType':%s, %v", key, err)
+			return totalRecordsDeleted, totalReclaimedSize, err


For some storage types, it might cause an error when deleting a file, should we log the error and continue the cleanup operation? because it fails, then it will always fail on the same error, the system artifact might be accumulated.

stonezdj · 2022-06-23T08:27:18Z

src/pkg/systemartifact/manager.go

+	for _, record := range records {
+		err = mgr.Delete(ctx, record.Vendor, record.Repository, record.Digest)
+		if err != nil {
+			logger.Errorf("Error cleaning up artifact record for vendor: %s, repository: %s, digest: %s", record.Vendor, record.Repository, record.Digest)


Same as above, should we log the error and continue?

prahaladdarkin · 2022-07-06T07:49:05Z

Closing stale PR. Changes have been merged as part of separate smaller PRs.

System Artifact Manager for storing and managing system-generated non…

dac5fe7

…-OCI artifact data Signed-off-by: prahaladdarkin <prahaladd@vmware.com>

This was referenced Apr 5, 2022

Task: Model struct definition for System Manager entities #16541

Closed

Task: Postgres SQL schema file changes to create System artifact tables #16540

Closed

Task: DAO layer implementation for System Artifact Manager entities #16542

Closed

wy65701436 assigned stonezdj, wy65701436 and Vad1mo Apr 5, 2022

chlins reviewed Apr 14, 2022

View reviewed changes

github-actions bot added the Stale label Jun 13, 2022

stonezdj reviewed Jun 23, 2022

View reviewed changes

prahaladdarkin closed this Jul 6, 2022

prahaladdarkin reopened this Jul 6, 2022

prahaladdarkin closed this Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Artifact Manager for storing and managing system-generated non-OCI artifact data #16646

System Artifact Manager for storing and managing system-generated non-OCI artifact data #16646

prahaladdarkin commented Apr 4, 2022

codecov bot commented Apr 4, 2022 •

edited

chlins Apr 14, 2022

chlins Apr 14, 2022

chlins Apr 14, 2022

chlins Apr 14, 2022

chlins Apr 14, 2022

chlins Apr 14, 2022

chlins Apr 14, 2022

chlins Apr 14, 2022

chlins Apr 14, 2022

stonezdj Jun 23, 2022

chlins Apr 14, 2022

github-actions bot commented Jun 13, 2022

stonezdj Jun 23, 2022

prahaladdarkin Jun 23, 2022

ywk253100 Jun 28, 2022

stonezdj Jun 23, 2022

stonezdj Jun 23, 2022

stonezdj Jun 23, 2022

prahaladdarkin commented Jul 6, 2022

System Artifact Manager for storing and managing system-generated non-OCI artifact data #16646

System Artifact Manager for storing and managing system-generated non-OCI artifact data #16646

Conversation

prahaladdarkin commented Apr 4, 2022

codecov bot commented Apr 4, 2022 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prahaladdarkin commented Jul 6, 2022

codecov bot commented Apr 4, 2022 •

edited