Skip to content

Fix premature read on stream requests in the sys.take_raft_snapshot method #771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 14, 2023

Conversation

dekimsey
Copy link
Contributor

When a caller (such as sys.take_raft_snapshot) performs a stream
request, the act of attempting to parse the response as JSON causes the
entire response body to be read and the underlying connection to be
closed. This renders the streaming response moot.

This change addresses that issue by examining if the caller requested a
stream response, and if so returning the response as is.

Without the change, it is impossible to read raft snapshots that are
larger than memory as the entire response is read into memory to attempt
to read as JSON.

@dekimsey dekimsey requested a review from a team as a code owner September 27, 2021 20:49
rudexi pushed a commit to Japannext/hazelsync that referenced this pull request Dec 9, 2021
* Support for system CA bundle: Basically fixing what requests should support by default
* Fixing the Dummy backend
* Getting more consistent about the slots returned by the job backup()
* Refactoring ZFS functions into utils (more practical if we ever switch to a C binding library)
* Attempting to fix slow streaming in hvac (see hvac/hvac#771)
* Adding gzip checksum validation (it ensures we don't have any scripting error as well)
@briantist briantist force-pushed the develop branch 2 times, most recently from 085b858 to 9dbfa98 Compare May 10, 2023 06:41
@briantist briantist changed the base branch from develop to main June 17, 2023 22:15
@briantist briantist force-pushed the fix-empty-stream-response branch from 3be579c to 512f3b5 Compare June 17, 2023 22:18
@codecov
Copy link

codecov bot commented Jun 17, 2023

Codecov Report

Merging #771 (5d93c9a) into main (11adf67) will decrease coverage by 0.02%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##             main     #771      +/-   ##
==========================================
- Coverage   84.78%   84.77%   -0.02%     
==========================================
  Files          65       65              
  Lines        3083     3087       +4     
==========================================
+ Hits         2614     2617       +3     
- Misses        469      470       +1     
Impacted Files Coverage Δ
hvac/api/system_backend/raft.py 40.00% <33.33%> (-1.67%) ⬇️
hvac/adapters.py 89.81% <100.00%> (+0.29%) ⬆️

... and 1 file with indirect coverage changes

@briantist
Copy link
Contributor

Hi @dekimsey ,

The hvac project is simplifying its developer workflow and removing the develop branch. Please be aware for future PRs that the project's default branch is now main.

Since your PR was opened before this change, I have:

  • updated your PR to merge into the project's main branch
  • rebased your branch against main and force-pushed it (please pull before making any further changes)

Re: the PR, as I've taken a more active role in maintaining the project and become more aware of its inner workings, I've found a lot of issues with the Adapter system that I think are going to require large structural changes.

This change makes some sense, but in a way it breaks the (already tenuous) contract of the JSONAdapter in that it's supposed to return objects, but in "stream" mode it would not do that.

The change could be made "upstream" in the RequestsAdapter where there is no such guarantee, and I think it would make more sense implemented there.

It would still do the same thing in that JSONAdapter will pass the kwargs along, which is still kind of unfortunate because it's not obvious unless someone looks at the implementation.

Anyway I am happy to discuss this change and other Adapter stuff more if you are still interested, I know it's been a while.
See also #898 for some stuff I was working on (but we will need even bigger changes than that imo).

@dekimsey
Copy link
Contributor Author

Hi @briantist, it has been awhile , but that's okay! I appreciate you updating the PR's target for me.

I'll try to look at this tomorrow and see if I understand where to put the change. At first glance, I'm not sure. It's not clear to me how I might obtain a RawAdapter object from this point in the code.

But this is just from vague memory and I'm on mobile right now. I'll see if I can dig into this better tomorrow.

Thank you!

PS I did identify I think this issue was caused by #537, when adapters were changed.

@briantist
Copy link
Contributor

Hi @briantist, it has been awhile , but that's okay! I appreciate you updating the PR's target for me.

I'll try to look at this tomorrow and see if I understand where to put the change. At first glance, I'm not sure. It's not clear to me how I might obtain a RawAdapter object from this point in the code.

This is actually one of the things that can make the adapter system we have now a little confusing, and at times, restrictive.

At this point in the code, you cannot choose which adapter is in use. The library is designed that a method such as this is supposedly agnostic to the adapter. But many methods then assume things about the response, based on what the project's default Adapter is.

But perhaps more to the point for this PR is that you don't have to worry about that in this case, my recommendation to move the code from JSONAdapter to RawAdapter is because it's more at home at that level, but JSONAdapter inherits from RAWAdapter so the result you're going for will be the same. This is why I say it's unfortunate, because the JSONAdapter in theory should probably not support returning streams (maybe it could support deserializing from a stream?).

If you made this change in JSONAdapter, it would also not work for anyone who might be using RAWAdapter (for some reason) nor any other adapter based on RAWAdatper, so that's another reason I think it belongs there, if anywhere.

Not sure if I've helped with this explanation or made it more confusing 😅 , happy to hash it out further.

The raft snapshot endpoint is an interesting one for two reasons:

  1. Since it forces a streamed response, the default adapter being JSONAdapter doesn't work (added in Make returned responses more consistent #537 as you found), and the entire idea of using a passed in Adapter doesn't work well for this method unless there's a way for Adapters to assert stream support. This is something I will want to address in [WIP] Add AdapterResponse types, and HvacAdapter #898 and other work I think we need to do.
  2. I cannot find this method listed in Vault API documentation anymore, I only see sys/storage/raft/snapshot-auto even when I look at quite old Vault versions, do you have any ideas on that?

But this is just from vague memory and I'm on mobile right now. I'll see if I can dig into this better tomorrow.

Thank you!

PS I did identify I think this issue was caused by #537, when adapters were changed.

No worries, thanks very much for submitting this. Stream support was not something I had my mind at all when thinking about the Adapter system recently so it's quite helpful.

@dekimsey
Copy link
Contributor Author

dekimsey commented Jun 22, 2023

I cannot find this method listed in Vault API documentation anymore, I only see sys/storage/raft/snapshot-auto even when I look at quite old Vault versions, do you have any ideas on that?

It's here, it's just under a different link on the parent page. Makes it easy to miss.

If you made this change in JSONAdapter, it would also not work for anyone who might be using RAWAdapter (for some reason) nor any other adapter based on RAWAdatper, so that's another reason I think it belongs there, if anywhere.

I don't quite follow this, from my read RawAdapter passes kwargs through to requests and doesn't manipulate the response object at all. From my read, it already works as is.

@briantist briantist self-assigned this Jun 24, 2023
@briantist briantist added system backend generally related to the Vault system backend routes adapters related to the Adapter system labels Jun 24, 2023
@briantist
Copy link
Contributor

It's here, it's just under a different link on the parent page. Makes it easy to miss.
🤦 oof I tried checking there before and still missed it, thank you!

If you made this change in JSONAdapter, it would also not work for anyone who might be using RAWAdapter (for some reason) nor any other adapter based on RAWAdatper, so that's another reason I think it belongs there, if anywhere.

I don't quite follow this, from my read RawAdapter passes kwargs through to requests and doesn't manipulate the response object at all. From my read, it already works as is.

Actually, you are completely right.

I am starting to think that any fix we could make right now in the Adapter system would not be great long term (due to the other issues with the system that need to be changed).

Because the need to return a stream is so rare (this is the only call I'm aware of that returns raw binary data, if you know of more, that would be good to know), I'm starting to warm up to the idea of having this one method not use the _adaper set on the client, and to instead have this snapshot method always use a RawAdapter.

What do you think about that idea?


The "tricky" (annoying) part is having to pass in all the fields from the current adapter into the constructor of RawAdapter to create a new instance with the all the same fields preserved.

raw_adapter = hvac.adapters.RawAdapter(base_url=self._adapter.base_uri, token=self._adapter.token, ..., ..., ..., ...)
return raw_adapter.get(
    url=api_path,
    stream=True,
)

Although we could do that directly in the method like that, it would be prone to accidental breakage if parameters are changed in any adapters in the chain, so I'd prefer we add a classmethod to the Adapter base class that takes care of it, so that it's somewhat more obvious where the responsibility lies.

class Adapter(metaclass=ABCMeta):
    def __init__(
        self,
        base_uri=DEFAULT_URL,
        token=None,
        cert=None,
        verify=True,
        timeout=30,
        proxies=None,
        allow_redirects=True,
        session=None,
        namespace=None,
        ignore_exceptions=False,
        strict_http=False,
        request_header=True,
    ):
        # real implementation here
        pass

    @classmethod
    def from_adapter(
        cls,
        adapter,
    ):
        return cls(
            base_uri=adapter.base_uri,
            token=adapter.token,
            cert=adapter.cert,
            verify=adapter.verify,
            timeout=adapter.timeout,
            proxies=adapter.proxies,
            allow_redirects=adapter.allow_redirects,
            session=adapter.session,
            namespace=adapter.namespace,
            ignore_exceptions=adapter.ignore_exceptions,
            strict_http=adapter.strict_http,
            request_header=adapter.request_header,
        )

And then this would make the actual update for raft snapshots look more like this:

raw_adapter = hvac.adapters.RawAdapter.from_adapter(self._adapter)
return raw_adapter.get(
    url=api_path,
    stream=True,
)

Eager to hear your thoughts! If that sounds good, let me know how you'd like to proceed, I could put up a separate PR with the classmethod so that you're working off of that, or it can be implemented here as well, either by you or I could push up some commits, or let me know what works.

@briantist
Copy link
Contributor

Hey @dekimsey wondering if you've had a chance to look at the above. I think this is a viable fix for the issue and that we could get it out in the next release. I'm happy to implement it myself, but I appreciate your report and patience, and if you're still interested in contributing to hvac I want to give you the opportunity to do that.

We're actively looking for new contributors but I also understand you posted this a while ago and may not be interested or able to do that now. Either way I thank you!

When a caller (such as `sys.take_raft_snapshot`) performs a stream
request, the act of attempting to parse the response as JSON causes the
entire response body to be read and the underlying connection to be
closed. This renders the streaming response moot.

This change addresses that issue by examining if the caller requested a
stream response, and if so returning the response as is.

Without the change, it is impossible to read raft snapshots that are
larger than memory as the entire response is read into memory to attempt
to read as JSON.
@briantist briantist force-pushed the fix-empty-stream-response branch from 512f3b5 to 52de1b1 Compare July 9, 2023 21:55
@briantist briantist added the bug label Jul 9, 2023
@briantist briantist changed the title Fix premature read on stream requests Fix premature read on stream requests in the sys.take_raft_snapshot method Jul 9, 2023
@briantist briantist added this to the 1.2.0 milestone Jul 9, 2023
@briantist
Copy link
Contributor

@dekimsey I've gone ahead and implemented the changes I proposed, but pushed to this PR, hope that's ok. I would appreciate if you have a few minutes to check it out. I'd be especially grateful if you're able to actually test the raft snapshot method itself, since we don't have any raft tests. 😢

@kaypeter87 do you have a sense of how difficult it might be to get raft set up in a Vault dev server? Is it even possible with a single node? Wondering if we might be able to get minimal testing going or if that's going to be infeasible.

@kaypeter87
Copy link

@briantist that shouldn't be too difficult. Every feature of Vault is available in "dev" mode. The -dev flag just short-circuits a lot of setup to insecure defaults. All data is stored (encrypted) in-memory.

I haven't personally tried enabling dev mode with raft, but it should be possible from what the documentation states.

Copy link
Member

@adammike adammike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only potential use case for these streaming results? Should this change be more broad to cover any potential larger than system memory use cases?

@briantist
Copy link
Contributor

Is this the only potential use case for these streaming results? Should this change be more broad to cover any potential larger than system memory use cases?

@adammike right now, it is the only known use case. The original PR sought to solve it more broadly, but due to the way Adapters are defined, and the leaky abstraction we have, there was not actually a good place to make such a broad change (this cannot work with the JSONAdapter generally). Some of the issues we've discussed in the past about Adapters, and some of the changes in #898 should lead us toward a future where that might make sense, but I believe that for now this is best solved for this one method.

Copy link
Member

@adammike adammike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG2M

@briantist briantist merged commit 1f88016 into hvac:main Aug 14, 2023
constantinneagu added a commit to Geries-Ingenieure-GmbH/hvac that referenced this pull request Sep 18, 2023
commit 1f88016
Author: Daniel Kimsey <90741+dekimsey@users.noreply.github.com>
Date:   Mon Aug 14 15:22:10 2023 -0500

    Fix premature read on stream requests in the `sys.take_raft_snapshot` method (hvac#771)

    * Fix premature read on stream requests

    When a caller (such as `sys.take_raft_snapshot`) performs a stream
    request, the act of attempting to parse the response as JSON causes the
    entire response body to be read and the underlying connection to be
    closed. This renders the streaming response moot.

    This change addresses that issue by examining if the caller requested a
    stream response, and if so returning the response as is.

    Without the change, it is impossible to read raft snapshots that are
    larger than memory as the entire response is read into memory to attempt
    to read as JSON.

    * add the Adapter.from_adapter class method

    * add test for adapter class method

    * Modify raft snapshot to always use RawAdapter

    Co-authored-by: Daniel Kimsey <dkimsey@trustwave.com>

    ---------

    Co-authored-by: Brian Scholer <1260690+briantist@users.noreply.github.com>

commit c398774
Author: ceesios <cees@virtu-on.nl>
Date:   Mon Aug 14 22:17:12 2023 +0200

    ldap auth method - add missing `configure` params by vault api names (hvac#975)

    * add missing params by vault api names

    * move new parameters to the end

    * remove duplicate keys

    * add ldap tests for new params

    * fix black formatting

    * fix integrtion test with raw string

    * remove userfilter from integration test

    * Revert "remove userfilter from integration test"

    This reverts commit 296e9f2.

    * fix userFilter failure on Vault < 1.9

    * fix capitalization

    * fix conditional for other tests

    * fix typo in user_dn doc

    * add generate_parameter_deprecation_message utility function

    * fixup

    * stop suppressing deprecation errors

    * add aliased_parameter decorator and tests

    * fix asterisks in docstring

    * Revert "fix asterisks in docstring"

    This reverts commit 1a599ec.

    * fix docstring asterisks without side effects

    * add testcases to fill out coverage of alias decorator

    * fix lint

    * update LDAP configure to use alias wrapper for replaced parameter names

    * update test references to use canonical names

    * add client_tls, connection_timeout, max_page_size, dereference_aliases and order remove_nones

    * Revert "add client_tls, connection_timeout, max_page_size, dereference_aliases and order remove_nones"

    This reverts commit 55d28f8.

    * add client_tls, connection_timeout, max_page_size, dereference_aliases and order remove_nones

    * add new params to unit tests

    ---------

    Co-authored-by: Brian Scholer <1260690+briantist@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapters related to the Adapter system bug system backend generally related to the Vault system backend routes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants