Skip to content

Conversation

LeafChage
Copy link

I failed to access API with OAuth.
I want to "%20".

@andreaferretti
Copy link
Collaborator

Seems to make sense according to this discussion https://stackoverflow.com/questions/1634271/url-encoding-the-space-character-or-20

@data-man
Copy link
Contributor

PHP has rawurlencode and rawurldecode according to rfc3986.

Note:
rawurldecode() does not decode plus symbols ('+') into spaces. urldecode() does.

Maybe we should also add similar functions.
Or to add raw parameter.

@LeafChage
Copy link
Author

@data-man
I see.

encodeUrl(s: string, rawOn = false)
false => '+'
true => "%20"
like this?

@data-man
Copy link
Contributor

Yes.
The Curl also does this: escape.c
But using iconv: tool_convert.c#L49

@data-man
Copy link
Contributor

And wget's code: url.c#L240

@dom96
Copy link
Contributor

dom96 commented Apr 27, 2018

Python also uses + by default (as described here: https://stackoverflow.com/a/1634314/492186). Gotta love these quirks.

For the best of both worlds this should indeed be customisable. But call the parameter usePlus (it should be true by default) instead of rawOn.

@data-man
Copy link
Contributor

@dom96

it should be true by default

Why?

@LeafChage
Copy link
Author

@dom96
@data-man

it should be true by default
I think it should be false by default, because it turn on raw encoding on purpose.

@data-man
Copy link
Contributor

I vote for false

@dom96
Copy link
Contributor

dom96 commented Apr 27, 2018

Why?

Two reasons:

  1. Consistent with Python
  2. Doesn't break backwards compatibility

@data-man
Copy link
Contributor

Doesn't break backwards compatibility

Therefore, it is preferable to add two new procs, rather than add a new parameter.

@data-man
Copy link
Contributor

And I think that we need to add references to RFCs in the docs.

@Araq
Copy link
Member

Araq commented Apr 27, 2018

Therefore, it is preferable to add two new procs, rather than add a new parameter.

I prefer a new parameter.

@data-man
Copy link
Contributor

@Araq
true or not true, that is the question. (Inspired by W. Shakespeare)

@Araq
Copy link
Member

Araq commented Apr 28, 2018

@data-man what @dom96 said.

@LeafChage
Copy link
Author

How is this?

@data-man
Copy link
Contributor

@LeafChage
I prefer

proc encodeUrl*(s: string, rawOn = false): string =
  ## Encodes a value to be HTTP safe: This means that characters in the set
  ## ``{'A'..'Z', 'a'..'z', '0'..'9', '_'}`` are carried over to the result,
  ## a space is converted to ``'%20'`` and every other character is encoded as
  ## ``'%xx'`` where ``xx`` denotes its hexadecimal value.
  result = newStringOfCap(s.len + s.len shr 2) # assume 12% non-alnum-chars
  if rawOn:
    for c in s:
      case c
      of 'a'..'z', 'A'..'Z', '0'..'9', '_', '-', '.', '~': add(result, c)
      else:
        add(result, '%')
        add(result, toHex(ord(c), 2))
  else:
    for c in s:
      case c
      of 'a'..'z', 'A'..'Z', '0'..'9', '_': add(result, c)
      of ' ': add(result, '+')
      else:
        add(result, '%')
        add(result, toHex(ord(c), 2))

@LeafChage
Copy link
Author

As I thought, I think so.
I am going to rewrite.

@data-man
Copy link
Contributor

And maybe to rename rawOn to asRaw.

@dom96
Copy link
Contributor

dom96 commented Apr 29, 2018

How did we end up here? I asked for usePlus and you're making completely unrelated changes.

@dom96
Copy link
Contributor

dom96 commented Apr 29, 2018

Once again, revert back to the original. Then simply add a usePlus parameter, add an if that checks whether it's true and add + or %20 when it's not.

@data-man
Copy link
Contributor

Maybe asRFC3986?

@dom96
Copy link
Contributor

dom96 commented Apr 29, 2018

If other characters need to be handled differently as well then please state what they are so we can decide what to do about them.

@LeafChage
Copy link
Author

@data-man

I guess so.
It is is easy to understand.

@LeafChage
Copy link
Author

LeafChage commented Apr 29, 2018

@dom96
Should I add only usePlus for changing only '+' to "%20"?

@LeafChage
Copy link
Author

I add only usePlus for now.

@LeafChage
Copy link
Author

LeafChage commented Apr 29, 2018

Which is good?

# A
proc encodeUrl*(s: string, usePlus = false): string =                                                                                                                                                                                                                                                                                                                                                                            
  result = newStringOfCap(s.len + s.len shr 2) # assume 12% non-alnum-chars                                                                                                                                                                                                                                                                                                                                                      
  for i in 0..s.len-1:                                                                                                                                                                                                                                                                                                                                                                                                           
    case s[i]                                                                                                                                                                                                                                                                                                                                                                                                                    
    of 'a'..'z', 'A'..'Z', '0'..'9', '_': add(result, s[i])                                                                                                                                                                                                                                                                                                                                                                      
    of ' ': add(result, if usePlus: "%20" else: "+")                                                                                                                                                                                                                                                                                                                                                                             
    else:                                                                                                                                                                                                                                                                                                                                                                                                                        
      add(result, '%')                                                                                                                                                                                                                                                                                                                                                                                                           
      add(result, toHex(ord(s[i]), 2))     

or

# B
proc encodeUrl*(s: string, usePlus = false): string =                                                                                                                                                                                                                                                                                                                                                                            
  result = newStringOfCap(s.len + s.len shr 2) # assume 12% non-alnum-chars                                                                                                                                                                                                                                                                                                                                                      
  for i in 0..s.len-1:                                                                                                                                                                                                                                                                                                                                                                                                           
    case s[i]                                                                                                                                                                                                                                                                                                                                                                                                                    
    of 'a'..'z', 'A'..'Z', '0'..'9', '_': add(result, s[i])                                                                                                                                                                                                                                                                                                                                                                      
    of ' ':
        if usePlus:
            add(result, "%20")
        else:
            add(result, '+')  
    else:                                                                                                                                                                                                                                                                                                                                                                                                                        
      add(result, '%')                                                                                                                                                                                                                                                                                                                                                                                                           
      add(result, toHex(ord(s[i]), 2))     

I like the former.

@data-man
Copy link
Contributor

Well, the benchmarks: :)

GlobalBenchmark 263.80ps 3.79G
bench.nim relative time/iter iters/s
encodeUrlWithTrue 135.66ns 7.37M
encodeUrlFormerWithTrue 91.09% 148.92ns 6.71M
encodeUrlWithFalse 114.58ns 8.73M
encodeUrlFormerWithFalse 77.63% 147.59ns 6.78M

@LeafChage
Copy link
Author

Thank you:)
Well......
Is it better than encodeUrlWith?

proc encodeUrl*(s: string, usePlus = false): string =                                                                                                                                                                                                                                                                                                                                                                            
  result = newStringOfCap(s.len + s.len shr 2) # assume 12% non-alnum-chars                                                                                                                                                                                                                                                                                                                                                      
  for i in 0..s.len-1:                                                                                                                                                                                                                                                                                                                                                                                                           
    case s[i]                                                                                                                                                                                                                                                                                                                                                                                                                    
    of 'a'..'z', 'A'..'Z', '0'..'9', '_': add(result, s[i])                                                                                                                                                                                                                                                                                                                                                                      
    of ' ': add(result, if usePlus: "%20" else: "+")                                                                                                                                                                                                                                                                                                                                                                             
    else:                                                                                                                                                                                                                                                                                                                                                                                                                        
      add(result, '%')                                                                                                                                                                                                                                                                                                                                                                                                           
      add(result, toHex(ord(s[i]), 2)) 

@data-man
Copy link
Contributor

data-man commented Apr 29, 2018

I compare your latest version with #7700 (comment)

@LeafChage
Copy link
Author

I edited #7700 (comment)
Do you compare A to B?

@data-man
Copy link
Contributor

No.
My version with for c in s and B.

@LeafChage
Copy link
Author

LeafChage commented Apr 29, 2018

@data-man
OK! Thank you.
I rewrite code more speedy and easy to understand.
I refer to bench

@data-man
Copy link
Contributor

data-man commented Apr 29, 2018

But why you left if usePlus: in the loop?
Exactly this part slows down conversion.

bench.nim relative time/iter iters/s
encodeUrlWithTrue 142.36ns 7.02M
encodeUrlLatestWithTrue 103.24% 137.89ns 7.25M
encodeUrlWithFalse 115.03ns 8.69M
encodeUrlLatestWithFalse 83.19% 138.27ns 7.23M

Result №2:

bench.nim relative time/iter iters/s
encodeUrlWithTrue 135.90ns 7.36M
encodeUrlLatestWithTrue 96.32% 141.09ns 7.09M
encodeUrlWithFalse 115.18ns 8.68M
encodeUrlLatestWithFalse 81.65% 141.05ns 7.09M

@data-man
Copy link
Contributor

data-man commented Apr 29, 2018

Also, I continue to prefer on compliance with RFC3986.
You can compare our implementation with https://www.url-encode-decode.com, e.g.
Or with https://www.freeformatter.com/url-encoder.html

Results with dots differ!

The unreserved characters can be encoded, but should not be encoded. The unreserved characters are:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~

The reserved characters have to be encoded only under certain circumstances. The reserved characters are:
! * ' ( ) ; : @ & = + $ , / ? % # [ ]

@LeafChage
Copy link
Author

LeafChage commented Apr 30, 2018

@data-man

But why you left if usePlus: in the loop?

Exactly, I think too that my code is not good.

@LeafChage
Copy link
Author

LeafChage commented Apr 30, 2018

@data-man

Also, I continue to prefer on compliance with RFC3986.

I prefer too.
But, can I continue it in this pull request?

@data-man
Copy link
Contributor

@LeafChage

I prefer too.
But, can I continue it in this pull request?

@Araq @dom96 What do you think?

@LeafChage
Copy link
Author

How is this?

@dom96
Copy link
Contributor

dom96 commented May 3, 2018

I did some reading on this. Here is how I think this should work:

  • usePlus parameter should determine how (Space) is encoded.
    • true - encode as + (default).
    • false - encode as %20.
  • Comply with RFC3986 regardless of the usePlus value.

So this PR is almost ready.

@LeafChage
Copy link
Author

@dom96
Ok, I will try it.

@LeafChage
Copy link
Author

I referred to escape.c.

@dom96
Copy link
Contributor

dom96 commented May 3, 2018

I ended up merging this manually and making some other changes too. Thank you for your PR :)

@dom96 dom96 closed this May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants