Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encoders to the options #28

Closed
wants to merge 1 commit into from

Conversation

williamthome
Copy link
Contributor

@williamthome williamthome commented Oct 10, 2023

Close #27

This PR adds the possibility to define custom encoders.
In this way, a custom encoder can be defined in the unknown option for an unmatched value.
The original idea is from @leonardb (thanks). I found it really useful.
This PR solves the inets issue in #25.
I've not implemented the date encoders that @leonardb has implemented in #26.
Suggestions welcome.

Copy link
Owner

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello!

I'm unsure about adding this level of flexibility to the package. How does it benchmark against the previous version?

@williamthome
Copy link
Contributor Author

Sorry, missed the benchmarks.
Mostly it's better, but I don't know why.

Better

image

image

image

image

image

Worse

image

image

image

@williamthome
Copy link
Contributor Author

Ran it again.

Better

image

image

image

image

image

image

Worst

image

image

@williamthome
Copy link
Contributor Author

I think there is no loss here.
The results are almost the same or better.

@williamthome williamthome force-pushed the feat/encoder branch 2 times, most recently from 4568e92 to b1cca12 Compare October 11, 2023 11:03
@williamthome
Copy link
Contributor Author

I've updated the PR with minor changes to maintain consistency and improve performance based on @michalmuskala's comment in #27.

@leonardb
Copy link
Contributor

leonardb commented Oct 11, 2023 via email

@williamthome
Copy link
Contributor Author

williamthome commented Oct 11, 2023

The performance is almost the same using map_size/1 or =:= #{}:

 ./erlperf 'map_size(#{a=>a,b=>b,c=>c,d=>d,e=>e}) =:= 0.' '#{a=>a,b=>b,c=>c,d=>d,e=>e} =:= #{}.'
Code                                                 ||        QPS       Time   Rel
#{a=>a,b=>b,c=>c,d=>d,e=>e} =:= #{}.                  1   76647 Ki      13 ns  100%
map_size(#{a=>a,b=>b,c=>c,d=>d,e=>e}) =:= 0.          1   76297 Ki      13 ns  100%

But =:= #{} removes the overhead of the map check:

with_overhead(M) when is_map(M), map_size(M) =:= 0 -> M.

without_overhead(M) when M =:= #{} -> M.

BTW, the benchmark is almost the same:

./erlperf 'M = #{a=>a,b=>b,c=>c,d=>d,e=>e}, is_map(M) andalso map_size(M) =:= 0.' 'M = #{a=>a,b=>b,c=>c,d=>d,e=>e}, M =:= #{}.'
Code                                                                          ||        QPS       Time   Rel
M = #{a=>a,b=>b,c=>c,d=>d,e=>e}, M =:= #{}.                                    1   77480 Ki      12 ns  100%
M = #{a=>a,b=>b,c=>c,d=>d,e=>e}, is_map(M) andalso map_size(M) =:= 0.          1   77461 Ki      12 ns  100%

@williamthome
Copy link
Contributor Author

FWIW, the last change in the PR does not remove the overhead.

@williamthome
Copy link
Contributor Author

Looking at the code again, I think the encoders should be exposed, e.g. I need to parse a key-value pair (which it's a proplist) into a datetime, but I would need to call the function that exists in thoas that encodes proplists:

encode(Data) ->
    Opts = #{encoders => #{proplist => proplist/3},
    thoas:encode(Data, Opts).

proplist({{_,_,_},{_,_,_}} = Date, Escape, Encoders) ->
    datetime(Date); % <- custom
proplist(KV, Escape, Encoders) ->
    thoas_proplist_encoder:encode(KV, Escape, Encoders). % <- new module (exposed encoder)

datetime(Date) ->
    Date.

So, a behavior can be created:

-module(thoas_encoder).

% Just an example, I'm unsure about the types now.
-callback encode(Value, Escape, Encoders) -> Result
    when Value :: term(), 
         Escape :: function(), 
         Encoders :: map(), 
         Result :: iodata().

What do you think? Any other suggestions?

@leonardb
Copy link
Contributor

@williamthome I'm not sure you do. Date-time is a 2-tuple, not a proplist so would hit the unknown encoder case

Defining a behavior would definitely be helpful from a documenting-how-to-write-a-custom-encoder perspective

@williamthome
Copy link
Contributor Author

williamthome commented Oct 11, 2023

Yes, you are right, my example is wrong.
That's what I wanted to say:

proplist([{{_,_,_},{_,_,_}} = Date | T], Escape, Encoders) ->
    % loop

But makes no sense.
Please ignore my example in the previous comment.

@williamthome
Copy link
Contributor Author

williamthome commented Oct 11, 2023

Anyway, the behavior idea I believe to be valid.

@leonardb
Copy link
Contributor

Thought I'd just drop this here as a potential example for documentation updates should this be accepted.

thoas:encode(MyData, #{encoders => #{unknown => fun my_encoder:encode/3}}).
%% Example custom encoder module
%% Extends the default encoder module to
%% support IP addresses and CIDR encoding
-module(my_encoder).

%% API
-export([encode/3]).

encode({O1, O2, O3, O4} = InVal, Escape, #{binary := Encode} = Encoders)
  when is_integer(O1) andalso O1 >= 0 andalso O1 =< 255 andalso
       is_integer(O2) andalso O2 >= 0 andalso O2 =< 255 andalso
       is_integer(O3) andalso O3 >= 0 andalso O3 =< 255 andalso
       is_integer(O4) andalso O4 >= 0 andalso O4 =< 255 ->
    case inet:ntoa(InVal) of
        {error, einval} ->
            error(invalid_type);
        IpStr ->
            Encode(list_to_binary(IpStr), Escape, Encoders)
    end;
encode({{O1, O2, O3, O4} = Ip, Mask}, Escape, #{binary := Encode} = Encoders)
  when is_integer(O1) andalso O1 >= 0 andalso O1 =< 255 andalso
       is_integer(O2) andalso O2 >= 0 andalso O2 =< 255 andalso
       is_integer(O3) andalso O3 >= 0 andalso O3 =< 255 andalso
       is_integer(O4) andalso O4 >= 0 andalso O4 =< 255 andalso
       is_integer(Mask) andalso Mask >= 0 andalso Mask =< 32 ->
    case inet:ntoa(Ip) of
        {error, einval} ->
            error(invalid_type);
        IpStr ->
            MaskBin = integer_to_binary(Mask),
            IpBin = list_to_binary(IpStr),
            Encode(<<IpBin/binary, "/", MaskBin/binary>>, Escape, Encoders)
    end;
encode({O1, O2, O3, O4, O5, O6, O7, O8} = InVal, Escape, #{binary := Encode} = Encoders)
  when is_integer(O1) andalso O1 >= 0 andalso O1 =< 65535 andalso
       is_integer(O2) andalso O2 >= 0 andalso O2 =< 65535 andalso
       is_integer(O3) andalso O3 >= 0 andalso O3 =< 65535 andalso
       is_integer(O4) andalso O4 >= 0 andalso O4 =< 65535 andalso
       is_integer(O5) andalso O5 >= 0 andalso O5 =< 65535 andalso
       is_integer(O6) andalso O6 >= 0 andalso O6 =< 65535 andalso
       is_integer(O7) andalso O7 >= 0 andalso O7 =< 65535 andalso
       is_integer(O8) andalso O8 >= 0 andalso O8 =< 65535 ->
    case inet:ntoa(InVal) of
        {error, einval} ->
            error(invalid_type);
        IpStr ->
            Encode(list_to_binary(IpStr), Escape, Encoders)
    end;
encode({{O1, O2, O3, O4, O5, O6, O7, O8} = Ip, Mask}, Escape, #{binary := Encode} = Encoders)
  when is_integer(O1) andalso O1 >= 0 andalso O1 =< 65535 andalso
       is_integer(O2) andalso O2 >= 0 andalso O2 =< 65535 andalso
       is_integer(O3) andalso O3 >= 0 andalso O3 =< 65535 andalso
       is_integer(O4) andalso O4 >= 0 andalso O4 =< 65535 andalso
       is_integer(O5) andalso O5 >= 0 andalso O5 =< 65535 andalso
       is_integer(O6) andalso O6 >= 0 andalso O6 =< 65535 andalso
       is_integer(O7) andalso O7 >= 0 andalso O7 =< 65535 andalso
       is_integer(O8) andalso O8 >= 0 andalso O8 =< 65535 andalso
       is_integer(Mask) andalso Mask >= 0 andalso Mask =< 64 ->
    case inet:ntoa(Ip) of
        {error, einval} ->
            error(invalid_type);
        IpStr ->
            MaskBin = integer_to_binary(Mask),
            IpBin = list_to_binary(IpStr),
            Encode(<<IpBin/binary, "/", MaskBin/binary>>, Escape, Encoders)
    end;
encode(_InVal, _Escape, _Encoders) ->
    error(invalid_type).

@williamthome
Copy link
Contributor Author

williamthome commented Oct 11, 2023

The PR is now up to date with the main branch.
I'll be waiting for @lpil's feedback before implementing or fixing things.

@eproxus
Copy link
Contributor

eproxus commented Oct 12, 2023

You don't need both is_map/1 and map_size/1 in a guard, the latter is enough.

I think map_size(Map) == 0 is much more readable than Map =:= #{}, especially since normal pattern matching on an empty map usually matches non-empty maps too.

@williamthome
Copy link
Contributor Author

I think map_size(Map) == 0 is much more readable than Map =:= #{}

I also think so. The current implementation of this PR can be with map_size/1.
This point was discussed in #27.

@eproxus
Copy link
Contributor

eproxus commented Oct 12, 2023

I still think date/datetime encoding (and not decoding, strangely enough?) does not belong in the JSON library itself, since it is not part of the JSON spec (same with inet data, and whatever else we can think of...).

May I suggest as structure where encoders are not based on hard coded patterns but are passed through a sequence of modules/functions instead?

Something like thoas:encode(Term, #{encoders => [thoas_default_encoders, my_custom_encoders]}). Which would then run each term through thoas_default_encoders:encode/1 and if no match is found, try my_custom_encoders:encode/1 etc.

That way, it would be easy to add some dependency like thoas_datetime and add it to the option list. The same structure could be used for adding decoders as well.

@williamthome
Copy link
Contributor Author

Something like thoas:encode(Term, #{encoders => [thoas_default_encoders, my_custom_encoders]}). Which would then run each term through thoas_default_encoders:encode/1 and if no match is found, try my_custom_encoders:encode/1 etc.

In this way overriding a specific encoder it's not possible. Allow override specific encoders gives more control to the user, IMHO.

@eproxus
Copy link
Contributor

eproxus commented Oct 12, 2023

You can override it if you match on the same pattern as the encoder you want to override. E.g.

-module(my_overrides).
-export([encode/1]).
encode(Integer) when is_integer(Integer) -> "foo";
encode(_) -> pass. % Or whatever the API will be...

And then use it like so thoas:encode(Term, #{encoders => [my_overrides, thoas_default_encoders]}).

@eproxus
Copy link
Contributor

eproxus commented Oct 12, 2023

The problem with the map-based solution is that you can't add new patterns. Let's say I want to encode #user{name => "Foo", id => 13} to something special in JSON, it is not possible. Or if someone wants to add inet encoding/decoding, that is also not possible.

@williamthome
Copy link
Contributor Author

williamthome commented Oct 12, 2023

Hmm sorry if I hadn't understood what you said, but you can define any custom encoder using the unknown option.
See @leonardb comment/example:

thoas:encode(MyData, #{encoders => #{unknown => fun my_encoder:encode/3}}).

The default encoders will be merged to these options, so the result will be this:

#{
        atom => fun(Atom, Escape, _) -> encode_atom(Atom, Escape) end,
        binary => fun(Bin, Escape, _) -> encode_string(Bin, Escape) end,
        integer => fun(Int, _, _) -> integer(Int) end,
        float => fun(Float, _, _) -> float(Float) end,
        proplist => fun map_naive/3,
        list => fun list/3,
        map => fun map/3,
        date => fun(Date, Escape, _) -> date(Date, Escape) end,
        datetime => fun(Date, Escape, _) -> datetime(Date, Escape) end,
        unknown => fun my_encoder:encode/3 % <- CUSTOM HERE  
    }

@leonardb
Copy link
Contributor

Probably not the greatest idea, but what about adding a pre_encode (name could be better) key to the encoders map with a default noop fun?

This would allow for both cases then. You'd be able to run your custom encoder for special shapes before the data is handed to the default encoders and additionally add an encoder for unknown for unmatched types.

Alternatively just support the pre_encode which means any 'shape' of data can be handled in a user-defined way and then handed to the default encoders and crash for unmatched/unknown

@williamthome
Copy link
Contributor Author

IMHO, the current implementation of the PR is the cleanest and does not add any extra overhead to the code, also does not decrease the performance.

@lpil
Copy link
Owner

lpil commented Oct 16, 2023

Hello! After some thought I don't believe that the API of this library should be expanded in this way. My goal was to make a library as simple as possible with speeds similar to Jason, and I don't think this contributes to that.

Thank you once again. If you wish to publish a JSON project do let me know and I'll add a link to the README.

@lpil lpil closed this Oct 16, 2023
@williamthome
Copy link
Contributor Author

I just published a library that accepts more flexibility, I called it Euneus.
Let me know if something is not looking good for you, @lpil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants