Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Return manifest info from get package info endpoints #143198

Closed
hop-dev opened this issue Oct 12, 2022 · 12 comments · Fixed by #144343
Closed

[Fleet] Return manifest info from get package info endpoints #143198

hop-dev opened this issue Oct 12, 2022 · 12 comments · Fixed by #144343
Labels
enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@hop-dev
Copy link
Contributor

hop-dev commented Oct 12, 2022

In #142353 we moved to using the package content during installation. But we do not use the package content for the get package info APIs. This is because not all manifest fields are returned from the package registry API, and new fields going forward will not be added to the package registry API. This means to get all of the package info, we must go to the archive.

For viewing a given integration (title, description, screenshots etc) , the package registry API is sufficient, however when we go to create a package policy, we need the full manifest. This has caused issues recently e.g #131609 is blocked until we return all fields. Also input packages which were recently added define many new fields, as a result we now have a fork in the get package info logic for input packages, where we always get the info from the archive, this is OK for input packages as they are very small by design.

Going forward we will need a way to get the full manifest information ideally without downloading and extracting the full package archive which is not bounded in size.

Issues/concerns:

  • how do we minimise the amount we download and extract in order to get all of the information needed to install a package?
  • how do we keep the APIs speedy
  • any solution will need to play well with package validation

Solution Ideas

1. Package registry provides a way for us to get all manifest.yml files directly

The package registry API could allow fleet to get the main manifest file and all data stream manifest files directly. This would mean the registry team don't have to update their schema/parsing logic every time a new field is added, this is handled by fleet.

this reduces the bandwidth of downloading the full archive, and reduces the overhead of extracting the archive, but does increase the complexity of the EPR API. And are there backwards compatibility issues?

2. Fleet has different APIs for "viewing" a package vs. getting "full" info needed for create package policy

Fleet could only load the full package info from the archive when the user goes to create a package policy for the first time. This means when clicking and viewing integrations the API will still be quick, but on loading the package policy create page there could be a long delay.

One concern also is this makes our APIs more complex, we would have a get info endpoint as well as a flag or other API for getting the "full" info

3. ???

@hop-dev hop-dev added enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team labels Oct 12, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@hop-dev
Copy link
Contributor Author

hop-dev commented Oct 12, 2022

@joshdover I have done a brain dump here of what we chatted about. I think we will need to address this sooner rather than later.

CC @kpollich for info!

@joshdover
Copy link
Member

joshdover commented Oct 17, 2022

For viewing a given integration (title, description, screenshots etc) , the package registry API is sufficient, however when we go to create a package policy, we need the full manifest.

In terms of unblocking #131609, this part is interesting to me. I'm not incredibly familiar with this code, but it seems to me that the way the package policy form works today requires that we have all the information about the package's data streams to construct the package policy via the API, when all we should really need is the input vars and stream structure. Essentially, we're requiring the UI to build the full policy rather than the server where all the necessary data should already be available.

I think we may be able to leverage the new package policy API to solve this:

  1. UI loads necessary info to show the package policy form from the registry
  2. Use the new package policy API format to request creating the policy
    1. Backend installs the full package first
    2. Backend uses the package contents to build the full policy, without relying on the form to send all the necessary data in the API call

That said, I don't think this will be a solution for all cases. Even in this case, if we wanted to display anything in the UI for the allow_routing flag, we'd need to download the entire package to be able to show this since it's not currently specified on the registry API.

I also don't think having different behavior for different package types or requiring additional API calls for each data stream manifest is going to scalable. My suggestion would be that we pursue some version of option 2, and accepting the penalty this will incur for large packages, like ones that contain ML models. This package doesn't even really need a package policy anyways, and could have it's "Add" button switched to just install the package directly.

@hop-dev
Copy link
Contributor Author

hop-dev commented Oct 18, 2022

UI loads necessary info to show the package policy form from the registry

For input packages this would need some more info returned from the registry API: elastic/package-registry#864. And I think in the future there would be more fields added to the API as packages become more complex, I think this is what the EPR team are trying to move away from.

This package doesn't even really need a package policy anyways, and could have it's "Add" button switched to just install the package directly.

I like this, this could also work for sample data packages which we've talked about before.

@hop-dev
Copy link
Contributor Author

hop-dev commented Oct 18, 2022

I generally agree with some version of option 2, we have GET /packages/${name}/${version} at the moment.
We could either:

    1. add a new /packages/${name}/${version}/full endpoint or similar (could also be a query param) which returns the package info from the archive and leave the existing endpoint as it is.
    1. have the existing endpoint return the full info from the archive and add a new /packages/${name}/${version}/summary or similar endpoint which only returns title, description, screenshots etc

Where option 1 is a less breaking change

@kpollich
Copy link
Member

add a new /packages/${name}/${version}/full endpoint or similar (could also be a query param) which returns the package info from the archive and leave the existing endpoint as it is.

A query param ?full=true aligns with how our existing agent policy endpoint works when requesting the full policy object, so I'm in favor of this change.

@joshdover
Copy link
Member

@hop-dev are you actively working on this? Should we assign this to you? If not, we need to get this scheduled so we can finally unblock the allow_routing requirement from APM.

@hop-dev
Copy link
Contributor Author

hop-dev commented Oct 28, 2022

@joshdover not currently but this has just blocked a ticket I was working on ⬆️ , so I could look to pick it up this sprint, need to check with @kpollich

@ruflin
Copy link
Member

ruflin commented Nov 1, 2022

After the package is installed, will there be any future request to the registry? Next time a policy is created, will Fleet try to reach out to the registry again? How does this work in the context of "upload package" feature?

@hop-dev
Copy link
Contributor Author

hop-dev commented Nov 1, 2022

Just to emphasise that with the chosen solution (have a new ?full param on the get info endpoint which gets all fields from the archive) our behaviour when getting package info wont change, this new param will only be used when opening the package policy editor.

@ruflin regarding your question, for a registry package that is installed we currently attempt go to the registry first to get the package info then fall back to the archive, I won't be changing this behaviour. That is unless ?full is provided in which case the archive will always be used.

@ruflin
Copy link
Member

ruflin commented Nov 2, 2022

for a registry package that is installed we currently attempt go to the registry first to get the package info then fall back to the archive

Doing this has a few downsides:

  • Adds latency to the request instead of being able to do a local lookup
  • Creates cost on the CDN side
  • Creates 2 different paths for packages manually uploaded vs packages which are part of the registry

@kpollich
Copy link
Member

kpollich commented Nov 2, 2022

  • Adds latency to the request instead of being able to do a local lookup
  • Creates cost on the CDN side

FWIW we do cache most of our requests that incur EPR requests on Kibana's side as well via cache-control headers on Fleet's /epm/packages/* endpoints - see #125794.

This does help with perceived latency on the client side and limits traffic to the CDN.

During our get info workflow, we always check the registry first today in order to determine whether the package in question is out of date. A better strategy here would probably be to make the "is there an update available" part of the process a separate, non-blocking request. There's a lot of improvements we can make to the critical path of loading the integration to reduce latency here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants