Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Artifact Manager for storing and managing system-generated non-OCI artifact data #16646

Closed

Conversation

prahaladdarkin
Copy link
Contributor

Scope of change:

  • System artifact manager model definitions
  • System artifact manager subsystem implementation
  • Integration of system artifact clean-up jobs with job service
  • Unit tests/functional tests for the above changes

Signed-off-by: prahaladdarkin prahaladd@vmware.com

…-OCI artifact data

Signed-off-by: prahaladdarkin <prahaladd@vmware.com>
@codecov
Copy link

codecov bot commented Apr 4, 2022

Codecov Report

Merging #16646 (dac5fe7) into main (846dc94) will decrease coverage by 0.03%.
The diff coverage is 77.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #16646      +/-   ##
==========================================
- Coverage   67.39%   67.35%   -0.04%     
==========================================
  Files         951      960       +9     
  Lines       78909    79229     +320     
  Branches     2321     2331      +10     
==========================================
+ Hits        53183    53368     +185     
- Misses      22167    22276     +109     
- Partials     3559     3585      +26     
Flag Coverage Δ
unittests 67.35% <77.45%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/controller/systemartifact/callback.go 0.00% <0.00%> (ø)
src/jobservice/job/impl/systemartifact/cleanup.go 53.84% <53.84%> (ø)
src/controller/systemartifact/execution.go 63.04% <63.04%> (ø)
src/pkg/systemartifact/dao/dao.go 65.90% <65.90%> (ø)
src/pkg/systemartifact/model/model.go 80.00% <80.00%> (ø)
src/pkg/systemartifact/manager.go 97.50% <97.50%> (ø)
src/jobservice/runtime/bootstrap.go 50.00% <100.00%> (+0.32%) ⬆️
src/pkg/systemartifact/cleanupcriteria.go 100.00% <100.00%> (ø)
src/controller/event/handler/auditlog/auditlog.go 8.69% <0.00%> (-52.18%) ⬇️
...c/app/base/project/repository/artifact/artifact.ts 51.85% <0.00%> (-48.15%) ⬇️
... and 25 more

Github proposal link : https://github.com/goharbor/community/pull/181
*/

CREATE TABLE IF NOT EXISTS "system_artifact" (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this migration script to 2.6.0_schema.up.sql as this pr will be merged into main branch.


if !async {
err := c.createCleanupTask(ctx, jobParams, execId)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log error?

logger.Info("Created job for scan data export successfully")
return nil
}
go func(ctx context.Context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confusing here why we need a flag async, IMO createCleanupTask is only create task in db and will not take too long time. And here has a different behavior, in sync mode error will be returned as Start() error, but for async mode error only be logged.

@@ -321,3 +326,37 @@ func getDefaultScannerName() string {
}
return ""
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest create a new folder to save this function or other system job in the future and import here like systemjob.Register(ctx) to keep the main clean and avoid too many business packages imported to main.

@@ -321,3 +326,37 @@ func getDefaultScannerName() string {
}
return ""
}

func scheduleSystemArtifactCleanJob(ctx context.Context) {
cronType := "Daily"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe can be defined as const to share for future usage.

)

var (
DefaultCleanupWindowSeconds = 86400
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need expose this flag to config?

duration := time.Duration(DefaultCleanupWindowSeconds) * time.Second
timeRange := q.Range{Max: currentTime.Add(-duration).Format(time.RFC3339)}
logger.Infof("Cleaning up system artifacts with range: %v", timeRange)
query := q.Query{Keywords: map[string]interface{}{"create_time": &timeRange}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q.New().

"context"
"github.com/goharbor/harbor/src/lib/orm"
)
import "github.com/goharbor/harbor/src/pkg/systemartifact/model"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go fmt && go imports.


func (mgr *systemArtifactManager) GetStorageSize(ctx context.Context) (int64, error) {
listQuery := q.Query{}
artifacts, err := mgr.dao.List(ctx, &listQuery)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are a lot of artifacts, here maybe exist potential performance issue, how about use SUM SQL to aggregate the result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


func (mgr *systemArtifactManager) RegisterCleanupCriteria(vendor string, artifactType string, criteria CleanupCriteria) {
key := fmt.Sprintf(keyFormat, vendor, artifactType)
cleanupCriterias[key] = criteria
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need lock?

@github-actions
Copy link

This PR is being marked stale due to a period of inactivty. If this PR is still relevant, please comment or remove the stale label. Otherwise, this PR will close in 30 days.

@github-actions github-actions bot added the Stale label Jun 13, 2022
@@ -274,6 +278,7 @@ func main() {

log.Info("Fix empty subiss for meta info data.")
oidc.FixEmptySubIss(orm.Context())
scheduleSystemArtifactCleanJob(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheduleSystemArtifactCleanJob(ctx) might fail because the jobservice container might be not ready when harbor core is not ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stonezdj thank you for this update. Please note that the updated PR to be reviewed washttps://github.com//pull/16879 but the concern you mentioned still holds true even for the new implementation. Could you please provide guidance on how the job should be scheduled at start-up since the artifact cleanup job is an internal system that cannot be triggered by end user unlike the scan-all job. cc @heww @wy65701436

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed a possible solution in here

}

func NewManager() Manager {
sysArtifactMgr := &systemArtifactManager{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return &systemArtifactManager{ ... } directly


if err != nil {
logger.Errorf("Error when cleaning up system artifacts for 'vendor:artifactType':%s, %v", key, err)
return totalRecordsDeleted, totalReclaimedSize, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some storage types, it might cause an error when deleting a file, should we log the error and continue the cleanup operation? because it fails, then it will always fail on the same error, the system artifact might be accumulated.

for _, record := range records {
err = mgr.Delete(ctx, record.Vendor, record.Repository, record.Digest)
if err != nil {
logger.Errorf("Error cleaning up artifact record for vendor: %s, repository: %s, digest: %s", record.Vendor, record.Repository, record.Digest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, should we log the error and continue?

@prahaladdarkin
Copy link
Contributor Author

Closing stale PR. Changes have been merged as part of separate smaller PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants