New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System Artifact Manager for storing and managing system-generated non-OCI artifact data #16646
System Artifact Manager for storing and managing system-generated non-OCI artifact data #16646
Conversation
…-OCI artifact data Signed-off-by: prahaladdarkin <prahaladd@vmware.com>
Codecov Report
@@ Coverage Diff @@
## main #16646 +/- ##
==========================================
- Coverage 67.39% 67.35% -0.04%
==========================================
Files 951 960 +9
Lines 78909 79229 +320
Branches 2321 2331 +10
==========================================
+ Hits 53183 53368 +185
- Misses 22167 22276 +109
- Partials 3559 3585 +26
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Github proposal link : https://github.com/goharbor/community/pull/181 | ||
*/ | ||
|
||
CREATE TABLE IF NOT EXISTS "system_artifact" ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move this migration script to 2.6.0_schema.up.sql as this pr will be merged into main branch.
|
||
if !async { | ||
err := c.createCleanupTask(ctx, jobParams, execId) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log error?
logger.Info("Created job for scan data export successfully") | ||
return nil | ||
} | ||
go func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confusing here why we need a flag async
, IMO createCleanupTask
is only create task in db and will not take too long time. And here has a different behavior, in sync mode error will be returned as Start()
error, but for async mode error only be logged.
@@ -321,3 +326,37 @@ func getDefaultScannerName() string { | |||
} | |||
return "" | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest create a new folder to save this function or other system job in the future and import here like systemjob.Register(ctx)
to keep the main clean and avoid too many business packages imported to main.
@@ -321,3 +326,37 @@ func getDefaultScannerName() string { | |||
} | |||
return "" | |||
} | |||
|
|||
func scheduleSystemArtifactCleanJob(ctx context.Context) { | |||
cronType := "Daily" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe can be defined as const to share for future usage.
) | ||
|
||
var ( | ||
DefaultCleanupWindowSeconds = 86400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need expose this flag to config?
duration := time.Duration(DefaultCleanupWindowSeconds) * time.Second | ||
timeRange := q.Range{Max: currentTime.Add(-duration).Format(time.RFC3339)} | ||
logger.Infof("Cleaning up system artifacts with range: %v", timeRange) | ||
query := q.Query{Keywords: map[string]interface{}{"create_time": &timeRange}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q.New().
"context" | ||
"github.com/goharbor/harbor/src/lib/orm" | ||
) | ||
import "github.com/goharbor/harbor/src/pkg/systemartifact/model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go fmt && go imports.
|
||
func (mgr *systemArtifactManager) GetStorageSize(ctx context.Context) (int64, error) { | ||
listQuery := q.Query{} | ||
artifacts, err := mgr.dao.List(ctx, &listQuery) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there are a lot of artifacts, here maybe exist potential performance issue, how about use SUM SQL to aggregate the result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
func (mgr *systemArtifactManager) RegisterCleanupCriteria(vendor string, artifactType string, criteria CleanupCriteria) { | ||
key := fmt.Sprintf(keyFormat, vendor, artifactType) | ||
cleanupCriterias[key] = criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need lock?
This PR is being marked stale due to a period of inactivty. If this PR is still relevant, please comment or remove the stale label. Otherwise, this PR will close in 30 days. |
@@ -274,6 +278,7 @@ func main() { | |||
|
|||
log.Info("Fix empty subiss for meta info data.") | |||
oidc.FixEmptySubIss(orm.Context()) | |||
scheduleSystemArtifactCleanJob(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scheduleSystemArtifactCleanJob(ctx) might fail because the jobservice container might be not ready when harbor core is not ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stonezdj thank you for this update. Please note that the updated PR to be reviewed washttps://github.com//pull/16879 but the concern you mentioned still holds true even for the new implementation. Could you please provide guidance on how the job should be scheduled at start-up since the artifact cleanup job is an internal system that cannot be triggered by end user unlike the scan-all job. cc @heww @wy65701436
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I proposed a possible solution in here
} | ||
|
||
func NewManager() Manager { | ||
sysArtifactMgr := &systemArtifactManager{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return &systemArtifactManager{ ... } directly
|
||
if err != nil { | ||
logger.Errorf("Error when cleaning up system artifacts for 'vendor:artifactType':%s, %v", key, err) | ||
return totalRecordsDeleted, totalReclaimedSize, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some storage types, it might cause an error when deleting a file, should we log the error and continue the cleanup operation? because it fails, then it will always fail on the same error, the system artifact might be accumulated.
for _, record := range records { | ||
err = mgr.Delete(ctx, record.Vendor, record.Repository, record.Digest) | ||
if err != nil { | ||
logger.Errorf("Error cleaning up artifact record for vendor: %s, repository: %s, digest: %s", record.Vendor, record.Repository, record.Digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, should we log the error and continue?
Closing stale PR. Changes have been merged as part of separate smaller PRs. |
Scope of change:
Signed-off-by: prahaladdarkin prahaladd@vmware.com