Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path encoding issue after upgrade from 0.9.6 to 1.0.1 #1561

Closed
aaronjwhiteside opened this issue Apr 17, 2021 · 60 comments
Closed

Path encoding issue after upgrade from 0.9.6 to 1.0.1 #1561

aaronjwhiteside opened this issue Apr 17, 2021 · 60 comments
Assignees

Comments

@aaronjwhiteside
Copy link
Contributor

aaronjwhiteside commented Apr 17, 2021

From reading the documentation it seems that paths should be url encoded by Karate.

This example works in 0.9.6 but fails after upgrading to 1.0.1.

Feature: Test
  Background:
    * def demoUrl = 'http://httpbin.org'
    * def scope = 'root|'

  Scenario: Get scope
    Given url demoUrl
    And path 'example', 'path', 'v1', 'scopes', scope
    When method GET

with the following error:

10:02:33.618 [main] ERROR com.intuit.karate - Illegal character in path at index 46: http://httpbin.org/example/path/v1/scopes/root|, http call failed after 2 milliseconds for url: http://httpbin.org/example/path/v1/scopes/root|
10:02:33.619 [main] ERROR com.intuit.karate - src/test/java/tabapay/test2.feature:9

Looking at the spec, https://tools.ietf.org/html/rfc3986, the character | does not appear to be special?

Changing the scope variable to be 'root%7C' seems to fix the problem and karate is happy, but | is not a special character for URLs, and this works fine in 0.9.6..

log output from 0.9.6

10:11:53.887 [ForkJoinPool-1-worker-1] DEBUG com.intuit.karate - request:
1 > GET http://httpbin.org/example/path/v1/scopes/root%7C
1 > Accept-Encoding: gzip,deflate
1 > Connection: Keep-Alive
1 > Host: httpbin.org
1 > User-Agent: Apache-HttpClient/4.5.12 (Java/1.8.0_252)

I can provide a full minimal project if requested, but I feel this should be enough to reproduce, let me know and I'll attach it to the ticket.

@ptrthomas
Copy link
Member

@aaronjwhiteside closing as won't fix. we just use apache behind the scenes. you are welcome to provide a PR to fix it

@aaronjwhiteside
Copy link
Contributor Author

@ptrthomas I'm ok with taking a stab at fixing this.

But I just want to make sure I'm fixing the right thing..

From the README.md

REST-style path parameters. Can be expressions that will be evaluated. Comma delimited values are supported which can be more convenient, and takes care of URL-encoding and appending '/' where needed.

I take this to mean that if I use the Comma delimited style that every "part" of that path should be escaped.

For example:

Given path 'abc', '/', '123|', '&', '?', '='

Should come out like this: abc/%2F/123%7C/%26/%3F/%3D

But

Given path 'abc//123|&?='

Should be left alone? I have a feeling that isn't right.

Because

and takes care of URL-encoding and appending

is part of the sentence that talks about comma delimited values, it can be taken to imply that URL-encoding only happens when using comma delimited values. And not when using a single value.

Looking at the code I don't see anything that treats comma separated paths specially, vs non comma separated, because everything seems to end up as a List.
https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/core/ScenarioEngine.java#L365-L366

I even see code that assumes comma separated parts can have trailing slashes that are left in place.. and not urlencoded?
https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/http/HttpRequestBuilder.java#L275-L277

If my interpretation of the README.md is correct, then my approach would be:

  • For each element of the path List, url encode it and concatenate it using /

But there are difficulties when dealing with trailing slashes in this case, questions like: was the slash in the part intended to be url encoded or to cap off the path? Maybe slashes are the one thing that are excluded from url encoding?

I suspect this is probably "correct" in one sense, but may break for a few people..

Another approach might be to maintain a list of characters that the apache http client doesn't think are valid in the URIs it accepts and just url encode them. I think that's probably less "correct" but much less likely to break people's existing usage of the path step.

@ptrthomas
Copy link
Member

@aaronjwhiteside all this is fine. in 1.X we decided to simplify what "http client" implementations need to handle and all the processing that the http-request-builder does ends up as a HttpRequest object.

now in this object, the path-chunks and even query-string-params have already been boiled into a single string url field. and then this is passed to apache or whatever which I would have thought would do the right thing and url-encode all the things. maybe this design is faulty and the HttpRequest object should retain all the path, chunks - I don't know - it sounds overly complicated and maybe you can find something elementary the current code is missing. as I said - if apache is not encoding things correctly, not sure if fixing that is worth it unless it is a simple config-tweak etc

@ptrthomas
Copy link
Member

@aaronjwhiteside one more thing, I'm open to adding keywords like pathEncoded and urlEncoded to complement path and url - as a way to "bring your own encoding" - perhaps that may simplify the rules ?

@aaronjwhiteside
Copy link
Contributor Author

aaronjwhiteside commented Apr 18, 2021

@ptrthomas So I think I have a grasp on it now. The problem is that here we're expecting Apache Http Client to know how to encode a full url we pass to it.

But it expects the URL to already be encoded, and performs no translation/encoding on it.

They provide a utility to help users in this regard:
https://javadoc.io/doc/org.apache.httpcomponents/httpclient/latest/org/apache/http/client/utils/URIBuilder.html

This makes sense if you think about the following scenarios:

http://some.domain/some/path?some==query

Is the second = part of the query key or query value?

The same issue exists for other examples like this:

http://some.domain/some///path?hello&world

Is the middle / in /// the name of a folder or just another superfluous path separator?
The same for the ?hello&world is & part of the query parameter name or just the query parameter separator?

The Apache Http Client cannot know our intent here and so expects it to already be encoded correctly.

Now on to a possible solution:

  1. I think the url step would remain as is, Karate shouldn't try and escape anything, this allows users to build special/weird and even erroneous requests.
  2. The path step should provide sanity and escape everything passed to it as individual segments/parts.

I've also got a solution to the trailing / problem, an empty last segment.

Given path 'hello', 'world', ''

Should produce hello/world/

So question for you, it looks like the https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/http/HttpRequest.java and https://github.com/intuit/karate/blob/master/karate-core/src/main/java/com/intuit/karate/http/HttpRequestBuilder.java are intended to be abstractions over any implementation of the HttpClient interface.

How do you feel about using https://javadoc.io/doc/org.apache.httpcomponents/httpclient/latest/org/apache/http/client/utils/URIBuilder.html inside Karate's HttpRequestBuilder?

If not, and we modify the HttpRequest to contain all the individual path segments, same goes for query parameters, fragments, base url, and remaining constituents of the URL etc.. Then HttpRequestBuilder contains no logic around building the URL and all of that is moved into the specific HttpCient implementation class.

Personally I'm in favour of just putting the logic into the ApacheHttpClient class and making the HttpRequest a simple value holder for anything that's used to create the URL.

But we could of course recreate Apache's URIBuilder inside Karate's HttpRequestBuilder.

Let me know..

@ptrthomas ptrthomas added this to the 1.0.2 milestone Apr 18, 2021
@ptrthomas
Copy link
Member

reopening. thanks for the research, that helps. agree that url should remain as-is and path should always encode. there is a question about what if users append the forward-slash via path which unfortunately is quite common, many users don't realize that the comma-delimited way is preferred. so not very sure about the empty trailing path idea, I would keep it simple - or which is why we can consider a new pathEncoded keyword

I don't mind using URIBuilder inside the http-request-builder, we decided to hard-depend on apache anyway in version 1.0 - I would consider using netty utilities btw, netty is IMO the lib we will always depend on - and in the future apache may need to be swapped out e.g. for http2/3 and async, but not a big deal

IIRC you can perform the url-encoding + path-building using the java URL class

@ptrthomas ptrthomas reopened this Apr 18, 2021
@aaronjwhiteside
Copy link
Contributor Author

I think if users append the forward slash to the first path segment, we continue to strip it off as is done right now, just to keep that part backwards compatible.

The trailing slash is actually an artifact of how the current code is written and is the recommended way when using Apache's URIBuilder to append a trailing slash without having it be escaped by appending it to the end of the last path segment.

I don't mind using URIBuilder inside the http-request-builder,

Ok, even better by me, I can use that there and keep HttpClient as is.

IIRC you can perform the url-encoding + path-building using the java URL class

I just checked this and URI and URL only provide ways to pass the the full path as a single argument.

https://docs.oracle.com/javase/7/docs/api/java/net/URI.html
https://docs.oracle.com/javase/7/docs/api/java/net/URL.html

I would consider using netty utilities btw

Agree, I'll take a look and see what netty provide in this regard.

@ptrthomas
Copy link
Member

@aaronjwhiteside all good, so a PR with some extra tests should be fine. I just worry about teams I've seen doing things like * path 'foo/bar/baz'

@ptrthomas
Copy link
Member

@aaronjwhiteside thinking more about my example * path 'foo/bar/baz' I don't mind us forcing a breaking change in the next version so that this stops working, this has always bugged me

aaronjwhiteside added a commit to aaronjwhiteside/karate that referenced this issue Apr 19, 2021
HttpRequestBuilder.java
- replaced armeria QueryParamsBuilder with apache http URIBuilder.
- minor generics cleanup
- introduced support for fragments.
- URIBuilder now handles path segment encoding, query parameters and any fragment.
- care was taken to ensure that path's prefixed with / are accepted, but a warning is printed letting people know they should remove the prefixing /

encoding.feature
- added more demo/test cases for path encoding

HttpUtils.java
- minor generics cleanup
- final is redundant on static methods

karate-demo/pom.xml
- scope = import only works in the dependencyManagement section, this fixes the maven warning.

Request.java
- minor generics cleanup
- use computeIfAbsent()

HttpClientFactory.java
- public static final are redundant on interface fields
- also use method reference
aaronjwhiteside added a commit to aaronjwhiteside/karate that referenced this issue Apr 19, 2021
HttpRequestBuilder.java
- replaced armeria QueryParamsBuilder with apache http URIBuilder.
- minor generics cleanup
- introduced support for fragments.
- URIBuilder now handles path segment encoding, query parameters and any fragment.
- care was taken to ensure that path's prefixed with / are accepted, but a warning is printed letting people know they should remove the prefixing /

encoding.feature
- added more demo/test cases for path encoding

HttpUtils.java
- minor generics cleanup
- final is redundant on static methods

karate-demo/pom.xml
- scope = import only works in the dependencyManagement section, this fixes the maven warning.

Request.java
- minor generics cleanup
- use computeIfAbsent()

HttpClientFactory.java
- public static final are redundant on interface fields
- also use method reference
@aaronjwhiteside
Copy link
Contributor Author

hi @ptrthomas I've opened a draft PR, #1565

Still working through some issues:

karate-demo passes, but karate-mock-servlet fails

[ERROR] Failures: 
[ERROR]   MockSpringMvcServletTest.testSpringBootDemo:51 match failed: EQUALS
  $ | not equal (STRING:STRING)
  '�Ill~Formed@RequiredString!'
  '�Ill~Formed@RequiredString!'

classpath:demo/encoding/encoding.feature:8
match failed: EQUALS
  $ | not equal (STRING:STRING)
  '�Ill~Formed@RequiredString!'
  '�Ill~Formed@RequiredString!'

classpath:demo/encoding/encoding.feature:15
[INFO] 
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0

@ptrthomas
Copy link
Member

@aaronjwhiteside yes, that was in response to a very old issue raised - and since I've felt that we may be doing "too much"

take a look at the server-side unpacking of the request which uses armeria / netty and you can see what I mean, and I did that in a hurry:
https://github.com/intuit/karate/blob/7cce901cdcf90d1718f51b94f5bc737859350063/karate-core/src/main/java/com/intuit/karate/http/Request.java#L187

so we don't need to treat that test-case as the "truth" - but do what makes sense to us

@aaronjwhiteside
Copy link
Contributor Author

@ptrthomas OK, I'll take a look tomorrow, getting late here :)

aaronjwhiteside added a commit to aaronjwhiteside/karate that referenced this issue Apr 20, 2021
HttpRequestBuilder.java
- replaced armeria QueryParamsBuilder with apache http `URIBuilder`.
- minor generics cleanup
- `URIBuilder` now handles path segment encoding and query parameters.
- we now check if any paths are prefixed with `/` and issue a warning log, that this is probably a mistake

encoding.feature
- added more demo/test cases for path encoding

HttpUtils.java
- minor generics cleanup
- final is redundant on static methods

karate-demo/pom.xml
- scope = import only works in the dependencyManagement section, this fixes the maven warning.

Request.java
- minor generics cleanup
- use computeIfAbsent()

HttpClientFactory.java
- public static final are redundant on interface fields
- also use method reference

KarateMockHandlerTest.java
KarateHttpMockHandlerTest.java
- removed all `/` path prefixes

README.md
- updated with details on how to correctly use path
- changes any erroneous examples of path usage
@aaronjwhiteside
Copy link
Contributor Author

so we don't need to treat that test-case as the "truth" - but do what makes sense to us

@ptrthomas

So the issue turns out to be that http servlet engines are required to decode the url before passing it to servlets..

A quick hack of, in Request.java gets past that particular issue but causes other problems.

It's not trivial to decode a url, some things need to be decoded but essentially ignored by the servlet engine, the case of / is a good example, a servlet engine is not supposed to pass %2F to the servlet, but at the same time it's not supposed to interpret the decoded / as a path separator as it was specifically escaped. So it should not interfere with the path routing/dispatch logic to the servlets. The same thing goes for all the other characters with special meaning, like ?, &, = etc..

The naive approach of using URLDecoder doesn't take these things into account.

So for the moment I'll keep digging, otherwise I've updated the PR with the changes discussed in the PR comments.

    public void setUrl(String url) {
        // http servlet engines are required to decode the URL before passing to any servlets
        // this solution isn't ideal, as URLDecoder really deals with x-www-form-urlencoded decoding,
        // but it's good enough for our mock http servlet.
        try {
            urlAndPath = URLDecoder.decode(url, StandardCharsets.UTF_8.name());
        } catch (UnsupportedEncodingException e) {
            throw new RuntimeException(e);
        }
        StringUtils.Pair pair = HttpUtils.parseUriIntoUrlBaseAndPath(url);
        urlBase = pair.left;
        QueryStringDecoder qsd = new QueryStringDecoder(pair.right);
        setPath(qsd.path());
        setParams(qsd.parameters());
    }

@ptrthomas
Copy link
Member

@aaronjwhiteside thanks, the changes look ok and thanks for the readme edits. I'll wait for the ci to be green then

cc @joelpramos for a second opinion, especially about the proposal to break users who have mixed forward-slashes into path usage.

@aaronjwhiteside
Copy link
Contributor Author

@ptrthomas

Just pushed a new commit.

Long story short tracked down the issue to Spring, it was decoding the path info using Latin-1 instead of UTF-8.

Two changes,

  1. in the mock servlet case, don't allow spring to decode the pathInfo, we set it explicitly.
  2. in the normal spring boot + tomcat case, don't use @PathVariable that too was being decoded using Latin-1 instead of UTF-8.
    2.1) because the mock and real servlet environments are quite different, we needed to check two places for the decoded path information.

Some references:
https://stackoverflow.com/questions/966077/java-reading-undecoded-url-from-servlet
https://stackoverflow.com/questions/4931323/whats-the-difference-between-getrequesturi-and-getpathinfo-methods-in-httpservl/42706412

@ptrthomas
Copy link
Member

@aaronjwhiteside great, thanks for the research. if possible - can you suggest a way teams can scan their existing feature files to see if any usage of path has a forward-slash mixed in on the right, which can be part of the release notes. maybe a simple regex find in the IDE etc

@aaronjwhiteside
Copy link
Contributor Author

Something like this should work:

.* path.*('\/.*)+

@joelpramos
Copy link
Contributor

I have mixed feelings about this change - primarily as I find comma separated paths counter intuitive as an HTTP path can definitely have forward slashes (regardless of interpretation for RESTful this is observed with SOAP and just in general across the internet).

My feeling is that the impacted users of this change will vastly exceed the number of users with needs for special encoding.

Would we be able to apply this change only when there is a List of paths (provided by comma separated list or when the keyword shows multiple times) and documenting that if you have special needs for encoding you should use that alternative?

@ptrthomas
Copy link
Member

@aaronjwhiteside I think the other steps are fine, this is the only one that just takes an array. the rest are key-values (param/s, header/s etc which already support lists as values)

I'm in 2 minds. I propose that users like @gerben86 who have anyway done non-trivial JS have to manually fix tests anyway, might as well use path or url

@aaronjwhiteside
Copy link
Contributor Author

aaronjwhiteside commented Jun 2, 2021

I propose that users like @gerben86 who have anyway done non-trivial JS have to manually fix tests anyway, might as well use path or url

This is the way I lean too, better to keep moving forward without workarounds/hacks..

@ptrthomas
Copy link
Member

@aaronjwhiteside thanks. I've already gotten a handful of comments from users who are not very comfortable with this change. but as of now, I'm clear that we should do this. Karate is an API testing framework and the rules of HTTP come first.

also for the record, I always intended for path to work this way - without users having to "worry" about slashes. it devolved a bit, even in some of my own examples. so IMO this has to be done right going forward

raw path and url are reasonable workarounds. I would have liked to introduce a new non-breaking keyword, but path is so short and sweet we can't hijack it

@krstcs
Copy link

krstcs commented Jun 2, 2021

Recommend looking at another option: We leave path as it is, but if you want to escape a literal character, put a backslash ('\') in front of it. Path would take ALL strings passed in (one or more through comma-separated list of strings or lists-of-strings) and do the following:

  1. split each string in all lists and sub-lists on forward-slash ('/')
  2. encode all non-escaped special characters in each string in the full list
  3. recombine full list using forward-slash ('/') or pass full list into UriBuilder or whichever one is being used

This allows the user to pass in any number of strings as a list, and should include lists of lists as you mentioned earlier. It allows us to escape specials we want to be used literally. It doesn't break most existing tests. I can think of only some weird, odd-ball situations that someone might do, but as @aaronjwhiteside said there are so many ways people use and abuse URIs/URLs/servelets/etc. that we can't account for them all, but I think this method will be the most flexible for the most people. This would also require users to go in and escape all path characters in their tests that are to be literal, that could be an issue for existing tests as well, but should be a one-time change, although that can be a lot of work.

@workwithprashant
Copy link

workwithprashant commented Jun 2, 2021

I agree with @krstcs as the approach will NOT break existing consumers especially after teams have already spent 1-2 sprints on 1.0 migration of their projects.

@krstcs
Copy link

krstcs commented Jun 2, 2021

I'm also thinking, we might want to do the same thing with url for consistency? The backend mechanic would be the same that way. url would just have the task of setting the url itself while path would just add paths to the set url. But I don't know if that makes sense? Maybe everything after the first single-forward-slash gets the same treatment? Edit: i.e.: <literal>protocol://host:port/</literal><encode>the rest</encode>.

@gerben86
Copy link
Contributor

gerben86 commented Jun 3, 2021

@gerben86 I also think this would have worked for you: Given url 'https://stackoverflow.com' + '/' + completePath

I think this will work, but I prefer the path because it's just a path :)

Anyway, I understand that we need to update something, but I'll wait for the final future-proof solution.
Good to see that there's a proper discussion about this point.

@ptrthomas
Copy link
Member

ptrthomas commented Jun 3, 2021

yes, the whole point of the RC is to get this feedback and I'm thankful for the conversation here.

@aaronjwhiteside I like the proposal to keep path as-is and use an escape character \ if a slash is REALLY needed (EDIT to be encoded), I never thought of that option to be honest and thanks @krstcs

let me know if you are ok or I can do this change over the weekend

@aaronjwhiteside
Copy link
Contributor Author

aaronjwhiteside commented Jun 4, 2021

  1. split each string in all lists and sub-lists on forward-slash ('/')

@krstcs sub-lists if I understand you correctly, are not actually supported, and are probably better suited to another issue? I think they could be an interesting feature proposition on their own.

['hello', ['world', '123']] being expanded to /hello/world/123

@aaronjwhiteside I like the proposal to keep path as-is and use an escape character \ if a slash is REALLY needed (EDIT to be encoded), I never thought of that option to be honest and thanks @krstcs

@ptrthomas
I don't think the backslash escape char removes the need for the two path styles currently in the RC.

Consider the use case where I have some dynamic data, say I have called an api and got a response and I'm using some value from the response to make another api call..

Background:
  * def otherFeature = call read('.....') { xxx: yyy }
  * def someId = otherFeature.response.id

Scenario:
  Given url baseUrl
  And path 'service', 'entities', someId
  When method GET

Now consider that someId contains characters I want encoded (/, |, =, etc, remember we cannot make assumptions here)

If path is no longer performs encoding of all characters (/) by default, how do I manually do this? Karate would have to expose some method to escape a string before being able to use it with path. Or I'd have to write one myself, and then the utility of karate is diminished.

Since the escape character \, essentially has to be defined manually/statically/explicitly, why not just define the correct path segments manually/statically/explicitly.

  Given path '/hello/world/this_path_contains_\/_a_slash'

vs

  Given path 'hello', 'world', 'this_path_contains_/_a_slash'
  # or
  Given raw path '/hello/world/this_path_contains_%2F_a_slash'

And at least with the raw path support, existing libraries can be used to URL encode/escape strings..

But maybe the escape character has a use within the raw path step, by allowing an encoding shortcut. For example someone wanting to URL encode / without having to actually lookup what the url encoded value is.. But this still suffers from issue when passing in dynamic data. And of course they could just use the "normal" path step that does the encoding automatically for them..

In summary:
path should not make assumptions or even exceptions (\) to things it encodes, it must encode ALL THE THINGS equally.

@ptrthomas
Copy link
Member

@aaronjwhiteside I hear all your arguments - but FWIW you were the first one ever to find issues with path encoding - so there is an aspect of how relevant this change is to the larger community

maybe we should just revert to 0.9.6 behavior and think about it a little more. I do think the edge case where users need to re-use a path segment from a previous response will always be a full URL - going by REST / HATEOAS concepts.

@aaronjwhiteside
Copy link
Contributor Author

aaronjwhiteside commented Jun 4, 2021

maybe we should just revert to 0.9.6 behavior and think about it a little more.

@ptrthomas this is perfectly ok by me, I think that's all I wanted in the beginning :)

I think the raw path implementation might be pretty close to the old 0.9.6 behaviour. Would need more testing..

I do think the edge case where users need to re-use a path segment from a previous response will always be a full URL - going by REST / HATEOAS concepts.

This is more or less our entire use case for Karate..

We have a platform made up of over 60+ microservices, testing these usually entails making an api call to a top level microservice, gathering the id(s) from the response and querying downstream microservices to ensure state was mutated correctly and or entities were created as expected.

Our APIs are RESTful, but do not follow HATEOAS concepts.. which wouldn't help with cross microservice interaction anyway.

@ptrthomas
Copy link
Member

@aaronjwhiteside I'm still confused with your understanding of path vs raw path it seems to be in reverse to what my understanding is at times when I read all the comments above. can we agree on the below summary

  • path will url-encode everything except /
  • unless the / is prefixed with \
  • existing users who have used /foo/bar in path will not need to change tests
  • I'd like to rename raw path to pathEncoded
  • and pathEncoded will use what is passed as-is, and never encode
  • both path and pathEncoded will support arrays as arguments
  • I will add a karate.urlEncode() and karate.urlDecode() helper for general use

@aaronjwhiteside
Copy link
Contributor Author

aaronjwhiteside commented Jun 4, 2021

I'm still confused with your understanding of path vs raw path it seems to be in reverse to what my understanding is at times when I read all the comments above.

@ptrthomas
Yeah I was expecting this..

At the time I was implementing the raw path logic, I didn't pay enough attention to how the path being passed in would be parsed by the URIBuilder, but after having some time to reflect and thoroughly read the code/javadoc.. I realised it's almost exactly the same as the previous 0.9.6 logic.

https://javadoc.io/doc/org.apache.httpcomponents/httpclient/latest/org/apache/http/client/utils/URIBuilder.html#setPath(java.lang.String)

Sets URI path. The value is expected to be unescaped and may contain non ASCII characters.

It's expected to be unescaped because it parses the path and splits it by the path separator into segments, stored internally, and when building the final url it does encoding of a subset of what it considers special characters, for example | in the path is encoded as %7C, this happens to be the same behaviour as 0.9.6, though I haven't done extensive tests to verify the exact differences, if any.

This obviously wasn't my intention, I wanted it to just accept whatever I passed in as the raw path and not touch it in any way.

But if you just wanted to keep things simple and get back to how path worked in 0.9.6, reusing the current raw path logic as the path logic I think would do this..

can we agree on the below summary

path will url-encode everything except /
existing users who have used /foo/bar in path will not need to change tests

I still think it should encode / too, but for the sake of getting something accepted, I'm not going to fight you on this one. :)

unless the / is prefixed with \

I think I get your comment now about inverse, my take on prefixing something with \ is that is skips encoding.. not forces encoding.. but that's just from how things like regex and strings in java work, If you want something to be treated as verbatim and not have the normal logic applied to it that might be applied by its very nature, you would force it to be escaped.

So using an escape prefix to force something to be encoded, aka treated differently, vs treated as verbatim, doesn't make sense to me, as it goes against prior experience with escaping. And I suspect it would with others too? But maybe this can just be a quirk of Karate, because sometimes things are just not that simple. shrug

Having said all that, I'm willing to accept / only being encoded when prefixed with \.

I think internally we can split the path into segments, so that Given path '/foo/bar', hello (where * def hello = 'world') is represented internally as an array ['foo', 'bar', 'world'] and rendered as /foo/bar/world.

I'd like to rename raw path to pathEncoded

This still feels ambiguous to me:

  • It kind of implies that the path will be encoded?
  • Or that the path is expected to already be encoded?

Maybe pathUnencoded or unencodedPath makes the distinction clearer. raw path I thought was a good attempt at conveying that, but really I'm happy with whatever as long as it's not ambiguous.

and pathEncoded will use what is passed as-is, and never encode

Agree!

On a technical note, to really keep this path untouched, we might have to drop using URIBuilder and go back to building the URL by hand.

both path and pathEncoded will support arrays as arguments

Agree!

I will add a karate.urlEncode() and karate.urlDecode() helper for general use

So long as it correctly encodes for URLs. Remembering that the JDK's URLEncoder doesn't quite do this, as it's intended for url encoded form bodies, which have slightly different rules to actual URLs.

@krstcs
Copy link

krstcs commented Jun 4, 2021

OK, I think I see the issue. Forward slash would be backwards from every other special character in my use-case because it would NOT be encoded normally, but would only be encoded if it was escaped, but then forward-slash IS a special case in URI encoding anyway so it makes sense (to me???) that it is treated differently. Every other special character would be encoded UNLESS it was escaped, forward-slash would be encoded if it IS escaped. Escaping should make the character be handled the reverse of how it's normally handled.

Now, again, I understand the issue that we have here, because one character IS being treated differently than all the rest. But isn't that always the case in URIs??

I still don't see the need for pathEncoded though. I think it's more fluff that isn't necessary if we treat ONLY forward-slash as a special case for escaping.

The other option would be to encoded EVERY special character (so != [a-zA-Z0-9]) that isn't escaped, which would force us to then change all forward-slashes in all tests to be escaped, that doesn't seem to be what we want does it?

@zgael
Copy link

zgael commented Jun 11, 2021

Hey there,
writing my thoughts on the whole path encoding issue after a test I reported in issue #1635

IMHO, there need to be a way to encode URLs, but it should NOT be the default.
See below example, whichever is easier to read ?

Given url 'http://httpbin.org'
And path 'anything/with/a/long/path/but/no/weird/characters'

looks way more readable to me than :

Given url 'http://httpbin.org'
And path 'anything', 'with', 'a', 'long', 'path', 'but', 'no', 'weird', 'characters'

Plus, the former is copy-pastable from a browser/curl command, which is quite handy to write a test easily.

So I would either :

  • make path smart, able to find out if he needs to encode characters or not (defaulting to "do not encode / chars, as they are part of the URL")
  • create alternate keywords, like @ptrthomas suggested with pathEncoded and urlEncoded

These are just my two cents, but I'd be happy to discuss if people disagree with this.

ptrthomas added a commit that referenced this issue Jun 12, 2021
bring back 0.9.6 behavior
forward-slashes are supported
not sure about trailing slashes but not a big deal imo
added karate.urlEncode() and karate.urlDecode() helpers
no need for raw path or equivalent
@ptrthomas
Copy link
Member

ptrthomas commented Jun 12, 2021

to everyone subscribed here, I've made changes to bring back the old behavior, there will be no breaking change.

expect a 1.1.0.RC3 shortly - but it would be great if anyone can test in the meantime following the developer guide

as of now, you can get the http client to encode a forward-slash by doing this:

* path '/hello\\\\/world'

which is kinda hilarious but I don't care anymore lol. haven't dug into it but my guess is most servers will process the %2F as a forward-slash anyway, but maybe this weirdness is worth it if you want to test that your server is indeed doing the right thing

EDIT: no more raw path, people can use the url if needed

karate.urlEncode() and karate.urlDecode() have been added.

@hlemmur
Copy link

hlemmur commented Jun 24, 2021

In case if it helps anyone in the light of the changes above for the case with the json payload in a request body (was struggling trying to get it working with 1.1.0.RC3 until found this thread):

before (1.1.0.RC2 and earlier)

    Given url endpoint
    And path userId, 'decision/'
    And request {caseId : '123', decision: 'approve'}
    When method PUT
    Then status 200

after (1.1.0.RC3)

    Given url endpoint
    And path userId, 'decision\\\/'
    And request {caseId : '123', decision: 'approve'}
    When method PUT
    Then status 200

Couldn't find less ugly (and requiring minimal refactoring) workaround.

@ptrthomas
Copy link
Member

@hlemmur can you explain that more. if your server doesn't work unless the URL ends with a / it is a bug on your server. or is this a "negative" test case ? do you suggest any other behavior ? and does this not work if you build a url without using path ?

@hlemmur
Copy link

hlemmur commented Jun 25, 2021

@ptrthomas no, it's not a negative case. Checked the source code of the requested service, the request handler is defined with a trailing / like

    @ApiOperation(value = "some operation",
            httpMethod = "PUT",
            consumes = "application/json")
    @PutMapping(value = "/{foo}/bar/")
    ...

Trailing / is being cut off even if I put the path into the url, so no difference but it requires more efforts on refactoring.

Though I think my case is a special case, so I don't mind to use \\\/ or %2F to encode the trailing / . It's just confusing that 1.1.0.RC3 is announced with 'no breaking changes', but in fact they are, comparing to RC2.

To me the more consistent behaviour would be to consider adding the trailing / after each path segment including the final one, but not sure if it doesn't break anything else.

@ptrthomas
Copy link
Member

@hlemmur just made the change so at least the url preserves a trailing slash. I still think a server should accept both /foo/ and /foo but anyway. one of the intents for path is to save you the trouble of having to think of / you can ignore them completely and use variables (comma delimited etc).

so yes, in your case I think the escaping is fine. and a rare case, so I don't plan to make any changes to the docs (PR-s are welcome as always)

@ptrthomas
Copy link
Member

an update for all watching this thread, the next version will support preserving a trailing slash even when you use path refer #1863

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants