Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for little endian packet sizes in gen_tcp #7265

Open
hauleth opened this issue May 19, 2023 · 8 comments
Open

Support for little endian packet sizes in gen_tcp #7265

hauleth opened this issue May 19, 2023 · 8 comments
Assignees
Labels
enhancement help wanted Issue not worked on by OTP; help wanted from the community team:PS Assigned to OTP team PS team:VM Assigned to OTP team VM

Comments

@hauleth
Copy link
Contributor

hauleth commented May 19, 2023

Is your feature request related to a problem? Please describe.

I want to implement protocol that uses gen_tcp and uses little endian for packet length definitions.

Describe the solution you'd like

Something like:

{ok, Socket} = gen_tcp:connect(Node, Port, [
  {packet, {4, little}}
]).

Or:

{ok, Socket} = gen_tcp:connect(Node, Port, [
  {packet, 4},
  {packet_endianess, little}
]).

Would make it quite handy.

Describe alternatives you've considered

It is still possible to do it manually right now via {active, false} or similar to capture the length and doing everything manually, it is just a little bit more convenient to have it as a part of packet option.

@rickard-green rickard-green added team:VM Assigned to OTP team VM team:PS Assigned to OTP team PS labels May 22, 2023
@sverker
Copy link
Contributor

sverker commented May 22, 2023

This is not high prio for us but would be reasonable easy to implement.
Do git grep TCP_PB_4 and add corresponding code for TCP_PB_LITTLE_4 and maybe also TCP_PB_LITTLE_2 for completeness.

@sverker sverker added the help wanted Issue not worked on by OTP; help wanted from the community label May 22, 2023
@tonyrog
Copy link
Contributor

tonyrog commented Jun 19, 2023

Do not forget the https://www.erlang.org/eeps/eep-0034 while you are at it :-)

@ptome
Copy link

ptome commented Dec 8, 2023

I was considering creating a PR for this, but EEP-34 mentions that the author implemented it. Is there an existing PR for this feature?

I like the use of negative numbers for the PacketType described in EEP-34 because we can keep the same type. If we use an atom, something like '4_little' would require quotes, since it starts with a digit, and little_4 breaks symmetry (subjective).

For gen_tcp:connect options, something like {packet, 4, little} or {packet, {4, little}} also introduces some asymmetry, since packet type is also used in erlang:decode_packet. It would create a tuple parameter, when all others are atoms or integers, depending on how we pass on the arguments.

@hauleth
Copy link
Contributor Author

hauleth commented Jan 5, 2024

I would go with {4, little} as it would allow us to extend it in future, as I see one more improvement possible there. Because in general we have 2 configuration options:

  • endianess
  • whether size include bytes for size itself

So in total there are 4 combinations for each of possible sizes. We could leave negative numbers for inclusive/exclusive sizes and have textual flag for endianness. That allows us to cover all possible combinations:

  • {N, big} (the same as current N) - big-endian, size exclusive
  • {-N, big} - big-endian, size inclusive
  • {N, little} - little-endian, size exclusive
  • {-N, little} - little-endian, size inclusive

I do not see how that would be a problem for erlang:decode_packet/3 as it could be implemented there as well in exactly the same format as described there.

@wojtekmach
Copy link
Contributor

wojtekmach commented Jan 5, 2024

I think this proposal or EEP-34 is a very welcome change.

For anyone interested, here's a proof-of-concept for 3 byte little-endian packet headers using the {packet, -N} notation per https://www.erlang.org/eeps/eep-0034#new-packet-types. An example use case is parsing MySQL packets 1, here focused on the handshake 2.

MySQL packets are <<Len:24/little, Seq:8, Payload:Len/binary>>. The handshake is (essentially) <<10, ServerVsn/binary, 0, ...>>.

$ docker run --rm -p 3306:3306 -e MYSQL_ALLOW_EMPTY_PASSWORD=true mysql
$ cat mysql.escript
#!/usr/bin/env escript

main(_) ->
  connect1(),
  connect2(),
  ok.

connect1() ->
  {ok, S} = gen_tcp:connect("localhost", 3306, [binary, {active, false}]),
  {ok, Data} = gen_tcp:recv(S, 0),
  <<Len:24/little, _Seq:8, Rest0:Len/binary>> = Data,
  <<10, Rest1/binary>> = Rest0,
  [ServerVsn, _Rest] = binary:split(Rest1, <<0>>),
  io:format("server version: ~s~n", [ServerVsn]).

connect2() ->
  {ok, S} = gen_tcp:connect("localhost", 3306, [binary, {active, false}, {packet, -3}]),
  {ok, Data} = gen_tcp:recv(S, 0),
  <<_Seq:8, 10, Rest/binary>> = Data,
  [ServerVsn, _Rest] = binary:split(Rest, <<0>>),
  io:format("server version: ~s~n", [ServerVsn]).
$ ./mysql.escript
server version: 8.2.0
server version: 8.2.0

Below is a diff, also available at https://github.com/wojtekmach/otp/commits/wm-little/.

commit fb3411ec6aeb84f9c7ef1116108f90a5a1e5845a
Author: Wojtek Mach <wojtek@wojtekmach.pl>
Date:   Fri Jan 5 21:58:00 2024 +0100

    wip

diff --git a/erts/emulator/beam/erl_bif_port.c b/erts/emulator/beam/erl_bif_port.c
index d58dd3a531..0bad398e4b 100644
--- a/erts/emulator/beam/erl_bif_port.c
+++ b/erts/emulator/beam/erl_bif_port.c
@@ -1512,6 +1512,7 @@ BIF_RETTYPE decode_packet_3(BIF_ALIST_3)
     case make_small(1): type = TCP_PB_1; break;
     case make_small(2): type = TCP_PB_2; break;
     case make_small(4): type = TCP_PB_4; break;
+    case make_small(-3): type = TCP_PB_LITTLE_3; break;
     case am_asn1: type = TCP_PB_ASN1; break;
     case am_sunrm: type = TCP_PB_RM; break;
     case am_cdr: type = TCP_PB_CDR; break;
diff --git a/erts/emulator/beam/packet_parser.c b/erts/emulator/beam/packet_parser.c
index a349c3ff84..428b151a4b 100644
--- a/erts/emulator/beam/packet_parser.c
+++ b/erts/emulator/beam/packet_parser.c
@@ -281,6 +281,13 @@ int packet_get_length(enum PacketParseType htype,
         plen = get_int32(ptr);
         goto remain;
 
+    case TCP_PB_LITTLE_3:
+        /* TCP_PB_LITTLE_3:    [L2,L1,L0 | Data] */
+        hlen = 3;
+        if (n < hlen) goto more;
+        plen = get_little_int24(ptr);
+        goto remain;
+
     case TCP_PB_RM:
         /* TCP_PB_RM:    [L3,L2,L1,L0 | Data] 
         ** where MSB (bit) is used to signal end of record
diff --git a/erts/emulator/beam/packet_parser.h b/erts/emulator/beam/packet_parser.h
index 633e3794c5..ef462ed072 100644
--- a/erts/emulator/beam/packet_parser.h
+++ b/erts/emulator/beam/packet_parser.h
@@ -44,7 +44,8 @@ enum PacketParseType {
     TCP_PB_HTTPH    = 11,
     TCP_PB_SSL_TLS  = 12,
     TCP_PB_HTTP_BIN = 13,
-    TCP_PB_HTTPH_BIN = 14
+    TCP_PB_HTTPH_BIN = 14,
+    TCP_PB_LITTLE_3  = 15,
 };
 
 typedef struct http_atom {
@@ -150,9 +151,10 @@ ERTS_GLB_INLINE
 void packet_get_body(enum PacketParseType htype, const char** bufp, int* lenp)
 {
     switch (htype) {
-    case TCP_PB_1:  *bufp += 1; *lenp -= 1; break;
-    case TCP_PB_2:  *bufp += 2; *lenp -= 2; break;
-    case TCP_PB_4:  *bufp += 4; *lenp -= 4; break;
+    case TCP_PB_1:        *bufp += 1; *lenp -= 1; break;
+    case TCP_PB_2:        *bufp += 2; *lenp -= 2; break;
+    case TCP_PB_4:        *bufp += 4; *lenp -= 4; break;
+    case TCP_PB_LITTLE_3: *bufp += 3; *lenp -= 3; break;
     case TCP_PB_FCGI:
 	*lenp -= ((struct fcgi_head*)*bufp)->paddingLength;
         break;
diff --git a/erts/emulator/beam/sys.h b/erts/emulator/beam/sys.h
index 8f4bc5e0a1..2910b67f22 100644
--- a/erts/emulator/beam/sys.h
+++ b/erts/emulator/beam/sys.h
@@ -1213,6 +1213,10 @@ ERTS_GLB_INLINE size_t sys_strlen(const char *s)
                       (((byte*) (s))[2] << 8)  | \
                       (((byte*) (s))[3]))
 
+#define get_little_int24(s) ((((byte*) (s))[2] << 16)  | \
+			     (((byte*) (s))[1] << 8) | \
+			     (((byte*) (s))[0]))
+
 #define get_little_int32(s) ((((byte*) (s))[3] << 24) | \
 			     (((byte*) (s))[2] << 16)  | \
 			     (((byte*) (s))[1] << 8) | \
@@ -1236,6 +1240,11 @@ ERTS_GLB_INLINE size_t sys_strlen(const char *s)
                             ((byte*)(s))[2] = (byte)(i)         & 0xff;} \
                         while (0)
 
+#define put_little_int24(i, s) do {((byte*)(s))[2] = (byte)((i) >> 16) & 0xff;  \
+                                   ((byte*)(s))[1] = (byte)((i) >> 8)  & 0xff;  \
+                                   ((byte*)(s))[0] = (byte)(i)         & 0xff;} \
+                               while (0)
+
 #define get_int16(s) ((((byte*)  (s))[0] << 8) | \
                       (((byte*)  (s))[1]))
 
diff --git a/erts/emulator/drivers/common/inet_drv.c b/erts/emulator/drivers/common/inet_drv.c
index 66a5f77d5d..7f6467cb5f 100644
--- a/erts/emulator/drivers/common/inet_drv.c
+++ b/erts/emulator/drivers/common/inet_drv.c
@@ -13147,6 +13147,10 @@ static int tcp_sendv(tcp_descriptor* desc, ErlIOVec* ev)
          put_int32(len, buf);
          h_len = 4;
          break;
+     case TCP_PB_LITTLE_3:
+         put_little_int24(len, buf);
+         h_len = 3;
+         break;
      default:
          h_len = 0;
          break;
@@ -13259,9 +13263,13 @@ static int tcp_send(tcp_descriptor* desc, char* ptr, ErlDrvSizeT len)
 	put_int16(len, buf);
 	h_len = 2; 
 	break;
-    case TCP_PB_4: 
+    case TCP_PB_4:
 	put_int32(len, buf);
-	h_len = 4; 
+	h_len = 4;
+	break;
+    case TCP_PB_LITTLE_3:
+	put_little_int24(len, buf);
+	h_len = 3;
 	break;
     default:
 	h_len = 0;
diff --git a/erts/preloaded/ebin/prim_inet.beam b/erts/preloaded/ebin/prim_inet.beam
index 5f8d5b8709..df9102386b 100644
Binary files a/erts/preloaded/ebin/prim_inet.beam and b/erts/preloaded/ebin/prim_inet.beam differ
diff --git a/erts/preloaded/src/prim_inet.erl b/erts/preloaded/src/prim_inet.erl
index a9264d551a..5c0bb1bae6 100644
--- a/erts/preloaded/src/prim_inet.erl
+++ b/erts/preloaded/src/prim_inet.erl
@@ -1495,6 +1495,8 @@ attach(S) when is_port(S) ->
 %%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
+is_sockopt_val(packet, -3) ->
+    true;
 is_sockopt_val(Opt, Val) ->
     Type = type_opt(set, Opt),
     try type_value(set, Type, Val)
@@ -1702,6 +1704,7 @@ type_opt_1(packet) ->
 	   {1, ?TCP_PB_1},
 	   {2, ?TCP_PB_2},
 	   {4, ?TCP_PB_4},
+	   {-3, ?TCP_PB_LITTLE_3},
 	   {raw,?TCP_PB_RAW},
 	   {sunrm, ?TCP_PB_RM},
 	   {asn1, ?TCP_PB_ASN1},
diff --git a/lib/kernel/src/gen_tcp_socket.erl b/lib/kernel/src/gen_tcp_socket.erl
index 38be596b44..756b3610ed 100644
--- a/lib/kernel/src/gen_tcp_socket.erl
+++ b/lib/kernel/src/gen_tcp_socket.erl
@@ -68,6 +68,9 @@
 -define(header(Packet, Size),
         (Size):(Packet)/unit:8-integer-big-unsigned).
 
+-define(little_header(Packet, Size),
+        (Size):(Packet)/unit:8-integer-little-unsigned).
+
 -define(badarg_exit(Error),
         case begin Error end of
             {error, badarg} -> exit(badarg);
@@ -512,6 +515,13 @@ send(?MODULE_socket(Server, Socket), Data) ->
                     Header_Data = [Header, Data],
                     Result = socket_send(Socket, Header_Data, SendTimeout),
                     send_result(Server, Header_Data, Meta, Result);
+                Packet =:= -3 ->
+                    Size = iolist_size(Data),
+		    %% ?DBG([{packet, Packet}, {data_size, Size}]),
+                    Header = <<?little_header(3, Size)>>,
+                    Header_Data = [Header, Data],
+                    Result = socket_send(Socket, Header_Data, SendTimeout),
+                    send_result(Server, Header_Data, Meta, Result);
                 true ->
                     Result = socket_send(Socket, Data, SendTimeout),
                     send_result(Server, Data, Meta, Result)
diff --git a/lib/kernel/src/inet_int.hrl b/lib/kernel/src/inet_int.hrl
index 2f50f2c23c..70963bffb1 100644
--- a/lib/kernel/src/inet_int.hrl
+++ b/lib/kernel/src/inet_int.hrl
@@ -212,6 +212,7 @@
 -define(TCP_PB_SSL_TLS, 12).
 -define(TCP_PB_HTTP_BIN,13).
 -define(TCP_PB_HTTPH_BIN,14).
+-define(TCP_PB_LITTLE_3, 15).
 
 
 %% getstat, INET_REQ_GETSTAT

@sverker
Copy link
Contributor

sverker commented Jan 8, 2024

I prefer {4, little} . I do not like negative values meaning little endian.

@codeadict
Copy link
Contributor

Is anyone working on a PR for this one?

@wojtekmach
Copy link
Contributor

I'm not! It's on my long term TODO list but if anyone would like to pick it up, by all means. I hope my proof of concept helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted Issue not worked on by OTP; help wanted from the community team:PS Assigned to OTP team PS team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

10 participants