Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make httpx_stubs generate cassettes consistent with other stubs. #649

Merged

Conversation

the-allanc
Copy link

Encountered this myself, but this was reported in #550.

I had generated some cassettes with httplib2, and then wanted to use them for httpx - but they were entirely incompatible. This pull request resolves that.

But the cassette structure defined by httpx_stubs was incompatible with other parts of the code - like the code to decode compressed responses. As such, I've changed the code such that it uses a set of response fields which are consistent with elsewhere in the codebase.

I've kept compatibility with the custom format that httpx_stubs was using, so these cassettes can still be read - but no longer generated.

This pull request also includes two cassettes using the custom and regular formats, to show they can both be processed. I've based them on gzipped content, as this tests one of the complexities around how httpx was being used compared to other libraries (it seems content can only be retrieved in its decoded form, rather than the raw content given).

@codecov-commenter
Copy link

codecov-commenter commented Jun 13, 2022

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (f1e0241) 90.13% compared to head (d64ef18) 90.17%.

Files Patch % Lines
vcr/stubs/httpx_stubs.py 95.83% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #649      +/-   ##
==========================================
+ Coverage   90.13%   90.17%   +0.03%     
==========================================
  Files          27       27              
  Lines        1815     1822       +7     
  Branches      336      338       +2     
==========================================
+ Hits         1636     1643       +7     
  Misses        134      134              
  Partials       45       45              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@the-allanc the-allanc force-pushed the httpx-cassette-compatibility branch from 9bb7fb2 to 970d4fd Compare June 13, 2022 17:28
@the-allanc
Copy link
Author

I fixed the linting issues (using Black) and have force pushed the changes - can we get this pipeline run again?

Comment on lines 43 to 53
"status": dict(code=resp.status_code, message=resp.reason_phrase),
"headers": _transform_headers(resp),
"body": {"string": resp.content},
Copy link

@hans2520 hans2520 Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question -- does this really generate the body content as a yaml binary string as in your test example above? I believe that is just for reading back a cassette that has this type of content, but a new cassette generated with this stub will not be encoded the same way the others will be (or at least with requests library).

This is effectively what I needed to be able to generate the body content in the same yaml binary format:

from ..serializers.yamlserializer import serialize
...

def _to_serialized_response(resp):
    return {
        "status": dict(code=resp.status_code, message=resp.reason_phrase),
        "headers": _transform_headers(resp),
        "body": {"string": serialize(resp.content).encode("unicode_escape")},
    }

Can you confirm? And if not outright add this into PR, can we look at having an option we can set so this will be possible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I misunderstand.

Just to compare requests and httpx, I ran this in a Python prompt based on my branch:

with vcr.use_cassette('httpx_and_requests.yaml'):
    requests.get('https://httpbin.org/json')
    httpx.get('https://httpbin.org/json')
    requests.get('https://httpbin.org/gzip')
    httpx.get('https://httpbin.org/gzip')

Using that, I get this output generated - for the JSON resource, both httpx and requests generate a regular string (without the !!binary marker). For the GZIP resource, requests creates a binary string, but httpx does not.

I'm able to load the requests generated by the httpx stub with the requests library, and vice versa. If I retrieve binary content (like a JPEG image) using httpx, and capture that in a cassette, then I get the !!binary marker.

The only reason that the gzip resource differs between requests and httpx is because requests seems to allow you to get the raw data stream (which hasn't been decoded - i.e. un-gzipped), whereas I had trouble accomplishing that with httpx - so we store the decoded (un-gzipped) text and remove the Content-Encoding response header.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your test, and I don't think you're misunderstanding anything.

Attempting to make httpx not decompress is its own can of worms and isn't possible without modifying source. The upshot is that we can't generate cassettes with httpx that have binary encoded yaml like we did with requests -- and we considered that a feature, for various reasons.

So at least allowing the option on whether to always store content as binary encoded would be greatly appreciated!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hans2520 OK, I've spent more time on this one...

After stepping into the httpx source code, I figured out that if you stream the response, and then manipulate the headers before httpx tries to process it, then you can get access to the raw data stream. So that's what I did.

I've made changes to try to prevent httpx reading a response, then reading the content without decoding, and then finally amending the response to still make it usable to regular clients that will want the content decoded normally.

httpx does still automatically read the content of responses on some occasions (particularly on redirects), so we still have to support the serialisation of decoded responses - so that code still exists in that form. I've added a new "client" class for test_httpx which will automatically read responses to show that we handle decoded responses just as well as their non-decoded equivalents.

Hope you find this set of changes much more to your liking... 😄

Copy link

@hans2520 hans2520 Jun 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, you are correct about manipulating the httpx headers (I don't think streaming was a requirement actually, but you would have needed to patch two different locations in my test... which was over a week ago, sorry can't remember exactly. But for the streaming -- if it works, why not).

I haven't had a chance to test your modification just yet, but just glancing at the commit you made and inferring from your comment, the only thing I'm wondering about is if the format of the response will actually be "raw" content (i.e. unicode character sequences) vs the yaml binary format. That's what happened in my testing when I modified the httpx source so as to not decode the response, and why I ultimately preferred not doing that but making the suggestion above to use the yaml serializer to get yaml binary format.

If that's true, the one nice thing about yaml binary format is plays with margins nicely. But the main reason we were wanting it is so that in the end we would not have to regenerate a bunch of cassette formats we had in source control, theoretically if only the client changed (from request to httpx) but the response stayed the same (and it was still yaml binary encoded), none of those responses would change...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format of the cassettes that are generated for httpx seem to be the same as the ones generated by requests (at least with my latest set of changes).

@the-allanc
Copy link
Author

Any chance to get this merged in?

@07pepa
Copy link

07pepa commented Oct 11, 2022

i suggest use pydantic to validate casetes each time you load them ... will be much more secured

aka some validation needs to be done othervise incompatibility will creap in

@07pepa
Copy link

07pepa commented Oct 11, 2022

also please test uploads across bord to not left it out

i have PR for it but i am not shure if it is right

btw i just spinned fastapi server to record and later replay casete

pr is here please test upload a file casete
#657

@the-allanc
Copy link
Author

I've updated my changes so that there are no longer merge conflicts (since my original changes were over a year old). Perhaps this could be reconsidered to be merged in?

@jairhenrique
Copy link
Collaborator

@the-allanc
Copy link
Author

the-allanc commented Aug 2, 2023

@the-allanc can you check this https://github.com/kevin1024/vcrpy/actions/runs/5733507649/job/15553772124?pr=649

I've fixed the typo in the comment.

(And then all the other linting errors that were being flagged up.)

@the-allanc
Copy link
Author

Everything has been resolved here.

@jairhenrique
Copy link
Collaborator

LGTM.
@hartwork can you help with this review?

@hartwork
Copy link
Collaborator

hartwork commented Aug 7, 2023

@jairhenrique sorry, I lack bandwidth for this, I need to cut down my time on VCR.py for now.

@the-allanc
Copy link
Author

I'll flag this one up again - I've already rebased the original MR to make it compatible, and I can see there are other MRs which are attempting to address issues around httpx (I think some of which have already been addressed in this MR), so I would like this one to get put in place before I then have to rebase it further.

@estahn
Copy link

estahn commented Oct 27, 2023

@kevin1024 Can we get this merged?

@svdgraaf
Copy link

Yeah, this would be a great feature to add. Thanks for the work @the-allanc!

Allan Crooks added 2 commits January 20, 2024 17:15
This was due to a custom format being defined in the HTTPX stub.
As part of this, I've removed the tests which inspect the
data type of the response content in the cassette. That
behaviour should be controlled via the inbuilt serializers.
Also copied over and adjusted some of the tests from
test_requests.py relating to gzipped handling to show
that the HTTPX stub is behaving in a consistent way to
how the requests stub is.
@the-allanc
Copy link
Author

This is now the third version of this pull request. I've had to rebase it twice because other changes have been pulled into the main branch before this one - including one which made significant changes to how the HTTPX stub was implemented, meaning the code was incompatible with what was done before.

Could I please finally get this merged in - the changes make it possible to:

  1. Use HTTPX with cassettes generated through other HTTP clients.
  2. That new cassettes generated via HTTPX can be manipulated with the standard set of filters, serializers and other utilities that vcrpy provides.
  3. Is still able to read cassettes previously generated by HTTPX in older versions of vcrpy.

@the-allanc
Copy link
Author

@jairhenrique As you merged in the other recent pull requests relating to HTTPX (#784 and #788), maybe you're in the best place to decide on this one?

@jairhenrique jairhenrique merged commit 54bc646 into kevin1024:master Jan 23, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants