Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WalkMetas* functions and meta.Name field #1089

Merged

Conversation

joelanford
Copy link
Member

@joelanford joelanford commented Apr 22, 2023

Description of the change:
These function are like WalkFS except that callers provide a function that handles a Meta object at a time, rather than an entire file's worth of FBC objects.

This PR also adds Name string to the Meta struct, which is necessary information to distinguish meta objects that have the same schema and package (e.g. olm.channel and olm.bundle objects)

Motivation for the change:
For callers that can operate on a single meta at a time, these functions will provide significant reductions in memory usage. The catalogd project will likely make use of this function.

Closes #1080

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 22, 2023
@codecov
Copy link

codecov bot commented Apr 22, 2023

Codecov Report

Merging #1089 (f33af22) into master (0aebc3c) will increase coverage by 0.03%.
The diff coverage is 73.61%.

❗ Current head f33af22 differs from pull request most recent head 7e4f23a. Consider uploading reports for the commit 7e4f23a to get more accurate results

@@            Coverage Diff             @@
##           master    #1089      +/-   ##
==========================================
+ Coverage   52.60%   52.64%   +0.03%     
==========================================
  Files         107      107              
  Lines        9389     9411      +22     
==========================================
+ Hits         4939     4954      +15     
- Misses       3523     3526       +3     
- Partials      927      931       +4     
Impacted Files Coverage Δ
alpha/declcfg/declcfg.go 80.48% <73.33%> (-19.52%) ⬇️
alpha/declcfg/load.go 75.00% <73.80%> (-1.53%) ⬇️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@joelanford joelanford changed the title Add WalkMetasFS function Add WalkMetas* functions and meta.Name field Apr 23, 2023
@joelanford
Copy link
Member Author

/hold
I want a few reviews on this one, so I don't want a single lgtm allowing this to merge.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2023
@dinhxuanvu dinhxuanvu removed their request for review April 24, 2023 15:40
Copy link
Contributor

@anik120 anik120 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

The catalogd project will likely make use of this function.

I was a little mislead by this. If I'm not wrong, nothing changes in terms of wirings for clients, I.e clients still call LoadReader like they have been. So it's more The catalogd project will likely benefit from this function. Is that correct?

}
return walkFn(nil, err)
}
doc = []byte(strings.NewReplacer(`\u003c`, "<", `\u003e`, ">", `\u0026`, "&").Replace(string(doc)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have some documentation for what the values for the NewReplacer method are? I.e what exactly are we achieving with this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See some context here: #1089 (comment)

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just have a few nits/comments - I don't think any of them should block this PR from merging though.

alpha/declcfg/load.go Show resolved Hide resolved
alpha/declcfg/load.go Outdated Show resolved Hide resolved
alpha/declcfg/load.go Show resolved Hide resolved
}
return walkFn(nil, err)
}
doc = []byte(strings.NewReplacer(`\u003c`, "<", `\u003e`, ">", `\u0026`, "&").Replace(string(doc)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a bytes replacer that keeps you from having to double-allocate memory here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a bytes.NewReplacer. There is bytes.Replace(All), but as far as I can tell, they only handle one substitution at a time. So I think options are:

  1. Roll with this and tolerate the double-allocation
  2. Call bytes.Replace in a loop for each replacement. (seems worse from an allocation standpoint since we have 3 replacements)
  3. Roll our own bytes replacer.
  4. Figure out a way to not require this replacement in the first place. It's been awhile, but iirc, its because an encoder is used somewhere along the line that we don't control and therefore can't set enc.SetEscapeHTML(false)
  5. Not bother with the replacement and let the unicode show up in the output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mis-remembered this, sorry - I used this library in the past and found it nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One producer of these artifacts is opm render itself. :) If we output YAML, it escapes html-hostile runes like these, so we have to replace them on ingest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that the component trying to do "user-facing YAML" should choose how to encode whatever it's outputting and call it good. Having to incur this burden in the entire back-end stack is a bit wild.

alpha/declcfg/load.go Show resolved Hide resolved
alpha/declcfg/load_test.go Show resolved Hide resolved
alpha/declcfg/declcfg.go Show resolved Hide resolved
@joelanford
Copy link
Member Author

joelanford commented Apr 24, 2023

clients still call LoadReader like they have been

@anik120 That section of code reads the entire contents of the catalog into memory (twice actually: once into a bytes.Buffer, and a second time into declcfg.DeclarativeConfig), returns it from the function, and then the caller calls createPackage and createBundleMetadata, each of which basically iterate the appropriate schema-based lists.

It would be way more performant if we did declcfg.LoadMetasReader(podLogs, walkFunc), where walk func could:

  1. directly create BundleMetadata when it sees an olm.bundle
  2. collect packages and channels up to eventually be used to create Package objects.

Later, if/when we treat metas a bit more opaquely, we can just sync every object directly to the underlying storage without having to collect anything up at all. (e.g. something closer to this, which was the original intent behind this PR in the first place.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2023
These function are like WalkFS except that callers provide a function
that handles a Meta object at a time, rather than an entire file's worth
of FBC objects.

For callers that can operate on a single meta at a time, these functions
will provide significant reductions in memory usage.

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2023
Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
@@ -93,17 +100,68 @@ func (m Meta) MarshalJSON() ([]byte, error) {
}

func (m *Meta) UnmarshalJSON(blob []byte) error {
blob = bytereplacer.New(`\u003c`, "<", `\u003e`, ">", `\u0026`, "&").Replace(blob)
Copy link
Member Author

@joelanford joelanford Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the bytes replacement here, which meant I could drop that extra unmarshal call into doc json.RawMessage when we're deciding the reader. Between the allocation improvement of a direct bytes replacer and the removal of the other unmarshal, we shave off ~7% of the runtime of opm render when I run it against the operatorhub catalog.

Not too shabby!

In a follow-up, I might take a look at an alternate mode of action.Render (maybe a new function alongside Run()?), which lets the caller use a walk function. That would save a bunch on memory and might shave off more time as well.

// TODO: return an error that includes the the full JSON message,
// the offset of the error, and the error message. Let callers
// decide how to format it.
return errors.New(resolveUnmarshalErr(blob, err))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this here because it was the only place access to the full json message existed after I removed the extra unmarshal to json.RawMessage.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 25, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: everettraven, joelanford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@joelanford
Copy link
Member Author

/hold cancel
I think this is ready to merge (just needs /lgtm)

For follow-ups, we have:

  • Refactor the unmarshal error formatting to return an error struct to give callers more formatting flexibility.
  • Look at improving WalkMetas* tests to assert on the contents of the objects.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 25, 2023
@everettraven
Copy link
Contributor

/LGTM

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 5, 2023
@openshift-merge-robot openshift-merge-robot merged commit e1eebae into operator-framework:master May 5, 2023
10 checks passed
theishshah pushed a commit to theishshah/operator-registry that referenced this pull request Jun 7, 2023
* add WalkMetasFS and WalkMetasReader functions

These function are like WalkFS except that callers provide a function
that handles a Meta object at a time, rather than an entire file's worth
of FBC objects.

For callers that can operate on a single meta at a time, these functions
will provide significant reductions in memory usage.

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>

* address PR feedback

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>

---------

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] Add optional "name" field to FBC schema
6 participants