Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filtering the data saved #67

Closed
Aaron1011 opened this issue Mar 10, 2014 · 12 comments
Closed

Allow filtering the data saved #67

Aaron1011 opened this issue Mar 10, 2014 · 12 comments
Assignees

Comments

@Aaron1011
Copy link

Currently, vcrpy saves all of the data from a request. However, sometimes there may be part of a request that you wouldn't want recorded, such as authentication data. It would be nice if vcrpy allowed a way to scrub data from a request before it is saved, similar to the way httreplay does it.

@kevin1024
Copy link
Owner

Wow, this is the first time I've seen httreplay. This is an interesting idea, I'm going to noodle on it a bit.

@msabramo
Copy link
Contributor

On the subject of capture/replay libs that you may or may not have seen, have you seen CaptureMock?

It's a little odd because it came out of this GUI testing tool called TextTest, but the really interesting thing is that it can mock things besides HTTP. You could conceivably mock database, memcache, etc.

@kevin1024
Copy link
Owner

@msabramo, that is really interesting, thanks for sending that.

@Aaron1011 Sorry it took so long but I think I like this idea. I'm still trying to think of a good implementation though. I noticed that httpreplay does this by letting you filter headers or query params. But what if there is sensitive data in the body as well?

The way Ruby VCR does this is by allowing you to define a block that returns a string that is filtered out wherever it appears, whether in headers, the body of the response, wherever. It's then replaced by a string that your supply.

This implementation seems more flexible but is a little more work to set up.

@kevin1024
Copy link
Owner

Also, in Ruby VCR:

When the interactions are replayed, the sensitive text will replace the substitution string so that the interaction will be identical to what was originally recorded.

@kevin1024
Copy link
Owner

Hmm... HTTP Basic auth base64 encodes the username and password, making simple string substitution fail in this case.

@kevin1024
Copy link
Owner

I wonder if the best solution would be to add callbacks that allows you to modify the request / response before they are saved or loaded.

@msabramo
Copy link
Contributor

Yeah callbacks are general and powerful and the you don't have to anticipate everyone's needs.

If it becomes clear that everyone is using the callbacks to do the same thing then you can specialize.

@msabramo
Copy link
Contributor

@kevin1024: If you want to see some examples of CaptureMock in action, check out https://github.com/msabramo/capturemock_examples

@kevin1024
Copy link
Owner

After thinking about this a bit, I think I would like to implement the 2 callbacks, but also provide a simpler method for the common case of stripping a header and stripping a query parameter. I think this is a good mix of flexibility and ease of use.

Example usage:

import vcr

my_vcr = vcr.VCR(
    filter_headers = ['Authorization'],
    filter_query_parameters=['api_key'],
)

with my_vcr.use_cassette('test.yml'):
    # your http code here

Example usage of callback:

def before_record_cb(request, response, cassette):
    if request.path != '/login':
        return request, response

my_vcr = vcr.VCR(
    before_record = before_record_cb,
)
with my_vcr.use_cassette('test.yml'):
    # your http code here

The callback is called before the cassette is serialized to disk and takes 3 parameters: the recorded request, the recorded response, and the current cassette. It should return the modified request and response. If it returns None, it's not recorded at all.

What do you think?

@kevin1024
Copy link
Owner

Starting work on this in the filter branch

@kevin1024
Copy link
Owner

When is the correct time to filter out the information? Before adding it to the cassette, or before saving the cassette to disk?

@kevin1024
Copy link
Owner

When I filter out a querystring argument, requests like this:

http://www.someapi.com/?api_key=secret

get turned into this:

http://www.someapi.com/

and recorded in the cassette. But VCR will use the URL to match existing requests in a cassette, when you run your tests a second time, the cassette won't contain the request. In the default recording mode (once), this will raise an exception. This is definitely not useful behavior! The request is in the cassette, VCR just can't find it.

So, I think I'm going to apply the filters to the request before checking for matches in the cassette. This means that the callback will only get the request passed in, not the request and response, since when checking if a request exists in the cassette, the response is not always available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants