Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove leading and trailing whitespaces #247

Merged
merged 1 commit into from Aug 14, 2022
Merged

Conversation

big-r81
Copy link
Contributor

@big-r81 big-r81 commented Aug 11, 2022

from field-values to be RFC 7230 compliant.

Fix #246

from field-values to be RFC 7230 compliant.
@@ -215,6 +215,9 @@ tokenize_header_value(undefined) ->
tokenize_header_value(V) ->
reversed_tokens(trim_and_reverse(V, false), [], []).

trim_leading_and_trailing_ws(S) ->
re:replace(S, "^[ \\t]*|[ \\t]*$", "", [global, {return, list}]).
Copy link
Contributor

@rnewson rnewson Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest string:trim(String, both, [9,32]). for performance;

RE input: '     hello there     ', output: 'hello there', speed: 6.276318
TRIM input: '     hello there     ', output: 'hello there', speed: 1.377677
    Count = 1_000_000,
    {Time, _} = timer:tc(fun() -> lists:foreach(fun(_) -> F() end, lists:seq(1, Count)) end),
    Time / Count.```

Copy link
Collaborator

@nickva nickva Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same idea too but then noticed mochiweb supports Erlang 18+ versions and string:trim/1 is an Erlang 20+ function.

string:strip/3 can be used but it would have to be called twice once for space and once for the tab character.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest string:trim(String, both, [9,32]). for performance;

RE input: '     hello there     ', output: 'hello there', speed: 6.276318
TRIM input: '     hello there     ', output: 'hello there', speed: 1.377677

Thats significant. Can we make a runtime distinction and differentiate between the Erlang versions? So if Erlang > 20 use trim/3 else use strip/3?

I had the same idea too but then noticed mochiweb supports Erlang 18+ versions and string:trim/1 is an Erlang 20+ function.
If we want to be RFC compliant, we can't use trim/1 because that would remove more chars than space and tab.

OWS = *( SP / HTAB )
    ; optional whitespace

Copy link
Member

@etrepum etrepum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to know what the performance of trim_and_reverse(trim_and_reverse(S)) would be, that would have the desired effect and leverage code that's already in that module. A few tests to verify the new behavior would be nice too.

@etrepum
Copy link
Member

etrepum commented Aug 12, 2022

On closer read the trim_and_reverse implementation does not quite do what we want here, but something like this might perform better than a regex:

-define(IS_WHITESPACE(C),
        (C =:= $\s orelse C =:= $\t orelse C =:= $\r orelse C =:= $\n)).

is_whitespace(C) ->
    ?IS_WHITESPACE(C).

trim_leading_and_trailing_ws(S) ->
    trim_trailing_ws(lists:dropwhile(fun is_whitespace/1, S), [], 0).

trim_trailing_ws([C | Rest], Acc, N) when ?IS_WHITESPACE(C) ->
    trim_trailing_ws(Rest, [C | Acc], 1 + N);
trim_trailing_ws([C | Rest], Acc, N) ->
    trim_trailing_ws(Rest, [C | Acc], 0);
trim_trailing_ws([], Acc, N) ->
    lists:reverse(lists:nthtail(Acc, N)).

@big-r81
Copy link
Contributor Author

big-r81 commented Aug 12, 2022

On closer read the trim_and_reverse implementation does not quite do what we want here, but something like this might perform better than a regex:

But 2 timestrim_and_reverse would work too and it seems it's the fasted way to get our solution!

1> mochiweb_headers_playground:test().
T1 (trim_leading_and_trailing_ws)  : 13.43003
T2 (trim_and_reverse)              : 0.404285
T3 (trim_leading_and_trailing_ws_2): 0.609125

I made a small playground file which can be used to test this (with all your work and ideas)!

So, let us use the following?

trim_and_reverse(S) ->
    trim_and_reverse(trim_and_reverse(S, false), false).

@etrepum
Copy link
Member

etrepum commented Aug 14, 2022

I'll merge this and apply the suggestions (a test and more performant implementation) in a follow-up

@etrepum etrepum merged commit 16c43ef into mochi:main Aug 14, 2022
@etrepum etrepum mentioned this pull request Aug 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Whitspaces in header values
4 participants