New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDP sockets support #39

Closed
wants to merge 15 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@rom1v
Contributor

rom1v commented Nov 7, 2012

It is not possible to use several "MDP sockets" with the API: there are no "sockets" exposed (in fact, there is only one per app, as mdp_client_socket is an external variable in mdp_client.c).

This is a problem, because the overlay_mdp_recv() function just retrieve the "first" packet, whatever it is. If it is not the right port, it is ignored (but the thread which listened to that port will never receive it, it is dropped).

Therefore, there is no way to "multiplex" services. If a service wants to listen on port x and another on port y, some packets from x will be "received" by the recv function of y (which will just drop it, but in that case this packet is lost for no reason and the recv() function considers it has received a packet which is invalid).

Moreover, functions like overlay_mdp_getmyaddr() or overlay_mdp_bind() do call overlay_mdp_send(), which can call overlay_mdp_recv(). The received response could be one packet from another device, which has nothing to do with the request, so it will be dropped. The easiest way to see the problem is to send a packet to yourself (to your own SID), and follow what happens in debug mode.

To fix it, I changed a bit the MDP API and implementation for providing "MDP sockets" support (commit f55e2b7):

int overlay_mdp_client_socket(void);
int overlay_mdp_client_close(int mdp_sockfd);
int overlay_mdp_client_poll(int mdp_sockfd, time_ms_t timeout_ms);
int overlay_mdp_recv(int mdp_sockfd, overlay_mdp_frame *mdp, int port, int *ttl);
int overlay_mdp_send(int mdp_sockfd, overlay_mdp_frame *mdp,int flags,int timeout_ms);

overlay_mdp_client_init() is replaced by overlay_mdp_client_socket(void). Instead of returning 0 or -1, it returns the socket descriptor or -1.
overlay_mdp_client_done() is replaced by overlay_mdp_client_close(int mdp_sockfd).

Each function now takes a parameter mdp_sockfd. This mdp_sockfd is created by overlay_mdp_client_socket(). There is no global mdp_client_socket anymore in mdp_client.c: instead, each service creates and uses its own MDP socket.

Then, I adapted usage in commandline.c.

Before that, I needed to make a tiny change: make overlay_mdp_client_socket_path local to overlay_mdp_client_init() (now renamed overlay_mdp_client_socket() for consistency with socket API). It was easy, because the unlink in overlay_mdp_client_done() was useless (it was already done in overlay_mdp_client_init()). (commit 263232b)

Quite independently, for making it work, I also needed to make overlay_mdp_recv() blocking. (commit 9d44386)

What I tested after these changes:

  • send data from one device to another (worked before with blocking recv);
  • send data to myself (my own sid) (didn't work before);
  • mdp ping in commandline (worked before).

What I did not tested:

  • other command lines functions in commandline.c, because I don't know them (but I applied the same pattern for using mdp_sockfd).

Please give me any feedback.


Tip: for some functions, you should use git diff -w for ignoring whitespace differences (for example if I removed a if() {} condition, the indentation has changed).

rom1v added some commits Nov 6, 2012

Make overlay_mdp_client_socket_path local
There is no reason for this variable to be external. It is used in init() and
was used in done(), but the unlink() in done() is useless: it was already called
on init() so the file was already unlinked.
MDP sockets support
Provide MDP sockets instead of using a global one.
Remove useless external declaration
I removed the definition of mdp_client_socket, but an external declaration was
remaining.
@lakeman

This comment has been minimized.

Member

lakeman commented Nov 9, 2012

So each binding requires a separate socket connection? I guess that should work without too much impact to the existing code.

Long term I would have preferred to build a local port binding lookup table, with handler function for processing each payload. But that will require rewriting each command line operation to work in an event driven manner.

However most of the tests in the tests/dnaprotocol script are failing to resolve phone numbers.

Fix broken dna lookup
lookup_send_request() must use the socket of the caller, not create a new
one (detected by dnaprotocol test script).
@rom1v

This comment has been minimized.

Contributor

rom1v commented Nov 9, 2012

So each binding requires a separate socket connection? I guess that should work without too much impact to the existing code.

Yes, that way we can provide a socket-like API to the client easily.

Long term I would have preferred to build a local port binding lookup table, with handler function for processing each payload. But that will require rewriting each command line operation to work in an event driven manner.

What would be the benefits?

However most of the tests in the tests/dnaprotocol script are failing to resolve phone numbers.

Thank you for your feedback.
I just committed a fix (4b273b4).

rom@rompc:~/Documents/serval-workspace/batphone@development/jni/serval-dna@mdpsock$ tests/dnaprotocol
1. Lookup by wildcard... PASS
2. Lookup by empty string... PASS
3. Lookup non-existent phone number... PASS
4. Lookup local phone number... PASS
5. Lookup remote phone number... PASS
6. Node info auto-resolves for local identities... PASS
7. Node info resolves remote identities... PASS
8. Lookup phone number three nodes reply... PASS
9. Lookup phone number two nodes reply... PASS
10. Lookup phone number one node replies... PASS
10 tests, 10 pass, 0 fail, 0 error
@rom1v

This comment has been minimized.

Contributor

rom1v commented Nov 16, 2012

I just added MDP JNI bindings native-part.

It provides the implementation of native methods of MeshSocket API.
(java-part is here: servalproject/batphone#51)

rom1v added some commits Nov 22, 2012

Missing free for a JNI buffer
ReleaseByteArrayElements last parameter is mode. Its value is:
 - 0: copy back the content and free the buffer;
 - JNI_COMMIT: copy back the content and *do not* free the buffer;
 - JNI_ABORT: free the buffer without copying back the possible changes.

 We definitely want 0.

 With JNI_COMMIT, after receiving more than ~1024 packets, it crashes:
   ReferenceTable overflow (max=1024)
Merge branch 'master' into mdpsock
Conflicts:
	commandline.c
	mdp_client.c
	serval.h
	sourcefiles.mk
Fix compilation warnings
(unused, signed/unsigned variables...)
Remove question-comment
Using fd+1 is the right way to use select()
@rom1v

This comment has been minimized.

Contributor

rom1v commented Jan 8, 2013

I merged the current development branch.

https://groups.google.com/d/msg/serval-project-developers/bebngkEXesI/801JB958dZEJ

One minor concern I have is overlay_mdp_recv() becoming blocking.
Because servald is designed to be single-threaded, we need to be able
to support fully asynchronous operation. As servald may end up using
MDP socket services as a client (eg to talk to other servald instances
on other nodes to exchange various information), we need a
non-blocking solution. It might just be a flag passed in to
overlay_mdp_recv() to tell it to be non-blocking, so that we have that
option for when it is needed.

Using the same pattern there was in mdp_client.c, the code would become (+ possibly a flag for enabling/disabling it) :

int overlay_mdp_send(int mdp_sockfd, overlay_mdp_frame *mdp, int flags, int timeout_ms)
{
  //...
  set_nonblock(mdp_sockfd);
  int result=sendto(mdp_sockfd, mdp, len, 0, (struct sockaddr *)&name, sizeof(struct sockaddr_un));
  set_block(mdp_sockfd);
  //...
}
int overlay_mdp_recv(int mdp_sockfd, overlay_mdp_frame *mdp, int port, int *ttl)
{
  //...
  set_nonblock(mdp_sockfd);
  ssize_t len = recvwithttl(mdp_sockfd,(unsigned char *)mdp, sizeof(overlay_mdp_frame),ttl,recvaddr,&recvaddrlen);
  set_block(mdp_sockfd);
  //...
}

Since my commits providing "MDP sockets", mdp_sockfd is given as a parameter (instead of being a global variable), so I think it is the responsability of the caller to mark the socket as blocking/nonblocking, and manage what to do when non blocking and errno == EAGAIN || errno == EWOULDBLOCK.

Possibly, the blocking/nonblocking mode could be a parameter of the MDP socket init function overlay_mdp_client_socket.

What do you think ?

@quixotique

This comment has been minimized.

Member

quixotique commented Jan 10, 2013

The best way for a library to support different main-loop strategies (blocking/non-blocking, threaded, coroutines, callbacks, etc.) is to simply get out of the way and let the caller set up what it needs to.

In this case, the MDP client library has no business calling set_nonblock() and set_block() on the socket. The caller should do that. The library already exposes the socket's file descriptor as the mdp_sockfd parameter, so this is trivial for the caller to do. The library could expose the set_block() and set_nonblock() calls (in fact, macros) as part of its API as a convenience for the caller, but the caller can always just use the fcntl(2) sys call directly.

What the MDP library should do is to make sure that it handles the EAGAIN (and EWOULDBLOCK if present) errors as normal conditions, ie, without logging an error or even warning. See the definitions of the _write_nonblock() and _read_nonblock() functions for an example of what this means. In fact, recvwithttl() appears to already do this. It does not treat EAGAIN or EWOULDBLOCK as errors, and handles the case of len == -1 the same as len == 0.

Remove sun_path file on MDP socket close
Avoid to keep a lot of useless files like:
  var/serval-node/mdp-client-XXXX-XXXXXXXX.socket
@rom1v

This comment has been minimized.

Contributor

rom1v commented Jan 14, 2013

I totally agree with your analysis.

I also noticed a little side effect: every time a socket is created, it creates a new file in /data/data/org.servalproject/var/serval-node/. But when the socket is closed, the file is not deleted. This is not very important because when a collision occurs, we delete the file before binding a new socket, but there is no reason for keeping so many files like this:

mdp-client-2113-d4b10929.socket
mdp-client-2113-a812c682.socket
mdp-client-2113-e689b1f8.socket
mdp-client-2113-0c0916fb.socket
mdp-client-2113-5807c93e.socket
mdp-client-2113-734d5e29.socket
mdp-client-2113-8a0a67fa.socket
mdp-client-2113-386b4def.socket
mdp-client-2113-7cf2c673.socket
...

Therefore, I just commited some changes for keeping the mapping mdp_sockfd/sun_path, and removing the file on socket close (commit 92a28a2). If you see a better way to do that, please do not hesitate.

@quixotique

This comment has been minimized.

Member

quixotique commented Jan 15, 2013

As well as deleting temporary socket files on close, it is important to also clean up stale temporary files on start or at regular intervals. Our code does not always follow this policy because it is of experimental or prototype quality, but we are deliberately moving towards production quality code, so we need to start insisting on this kind of design principle in new code.

In fact, I suggest that, as a matter of design policy, unlink-on-close should possibly never be performed, instead we should always rely on the clean-up logic to remove stale files. This would ensure that testing can easily reveal that it is working, instead of waiting for exceptional events to discover that it is not.

Our policy for operation on any platform should always be to explicitly design (and test for) for power-off/SIGSEGV/SIGKILL events and long-term, unattended operation. So we can't afford to let temporary files accumulate (or unused database rows accumulate, or log files accumulate, etc.).

Delete all socket files on servald start
If serval does not close properly, socket files are kept in
/data/data/org.servalproject/var/serval-node. Therefore, we need to clean up
when servald starts.
@rom1v

This comment has been minimized.

Contributor

rom1v commented Jan 15, 2013

As well as deleting temporary socket files on close, it is important to also clean up stale temporary files on start or at regular intervals.

You're right. I just implemented a clean up of all socket files on servald start (commit db2fefc).

Maybe we should clean up more (what else?). Maybe it is dangerous to clean up all socket files (will we need a socket persistent across servald restarts?). But it seems ok.

In fact, I suggest that, as a matter of design policy, unlink-on-close should possibly never be performed, instead we should always rely on the clean-up logic to remove stale files.

In my opinion, we should also keep the unlink-on-close policy (commit 92a28a2), because we can have a lot of socket files during a servald "session": every single command creates a MDP socket (and closes it).

@lakeman lakeman closed this Jan 22, 2013

@rom1v

This comment has been minimized.

Contributor

rom1v commented Jan 29, 2013

@lakeman You closed the pull request?

@lakeman

This comment has been minimized.

Member

lakeman commented Jan 29, 2013

Not deliberately. I deleted "master" as part of cleaning up our branch names, so I guess I got the blame for making the pull impossible.
I should be able to find time to look at this patch properly next week.

@rom1v

This comment has been minimized.

Contributor

rom1v commented Feb 7, 2013

I just merged development branch into mdpsock (so mdpsock is up to date).

Unfortunately, github does not see it, but I think it is not important.

@rom1v

This comment has been minimized.

Contributor

rom1v commented Feb 11, 2013

I added Java bindings for flags and QoS management.
rom1v@8454944
servalproject/batphone#51 (comment)

@rom1v

This comment has been minimized.

Contributor

rom1v commented Feb 21, 2013

Due to this pull request unintentional close, I opened a new one:
#53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment