Skip to content
This repository was archived by the owner on Oct 8, 2025. It is now read-only.

Conversation

@echolimazulu
Copy link

@echolimazulu echolimazulu commented Apr 1, 2022

Solution to the problem:
#669

Repository to reproduce the issue:
https://github.com/echolimazulu/unit-bind-aaiu

Hello @VBart, @mar0x, @hongzhidao, @alejandro-colomar, @tippexs.

Please tell me what you think. In the process of testing this fix, I no longer observe the previously encountered problem.

I tried to take into account all the previously voiced comments regarding the code, commits, indents and other points.

I did not propose the implementation of locks, cleaning sockets and locks when the program is closed, cleaning when updating the configuration, since this is a much larger number of changes. Addressing several different issues and extending functionality that is not needed for: #669

echolimazulu added a commit to echolimazulu/unit that referenced this pull request Apr 17, 2022
Due to the mandatory presence of error messages: nginx#670 (comment)
echolimazulu added a commit to echolimazulu/unit that referenced this pull request Apr 17, 2022
Due to the mandatory presence of error messages: nginx#670 (comment)
echolimazulu added a commit to echolimazulu/unit that referenced this pull request Apr 17, 2022
Due to the mandatory presence of error messages: nginx#670 (comment)
echolimazulu added a commit to echolimazulu/unit that referenced this pull request Apr 17, 2022
echolimazulu added a commit to echolimazulu/unit that referenced this pull request Apr 17, 2022
nxt_uint_t level, nxt_uint_t sockopt);
NXT_EXPORT nxt_int_t nxt_socket_setsockopt(nxt_task_t *task, nxt_socket_t s,
nxt_uint_t level, nxt_uint_t sockopt, int val);
NXT_EXPORT nxt_int_t nxt_socket_bind(nxt_task_t *task, nxt_socket_t s,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note you can use nxt_socket_errno after calling nxt_socket_bind(). So it's not a good idea to add one more parameter *errnum. It's better to keep the API as minimal as possible, including its parameters.

Copy link
Author

@echolimazulu echolimazulu Apr 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that before. But my concern is that any other function call (like nxt_alert and others) could theoretically cause errno to change its state. I think this is probably why nxt_listen_socket_create uses pure listen and connect for errno.

Copy link
Author

@echolimazulu echolimazulu Apr 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean (see "Notes"):
https://linux.die.net/man/3/errno

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongzhidao what do you think about the above described by me?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, but the decision of where to do listen affects such changes.

Copy link
Author

@echolimazulu echolimazulu Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to draw your attention to the fact that in this conversation we are discussing errno, not listen. Sorry, I just don't see the relationship between the two.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongzhidao Let's get this straight. Is it still safe to call nxt_socket_errno after nxt_alert and other calls?


nxt_sockaddr_text(sa);

(void) unlink(sa->u.sockaddr_un.sun_path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think the previous one is good enough.

If you think it's possible that the temp file can be existing like created manually, it happens very rarely.
We can just to use unlink(sa->u.sockaddr_un.sun_path).

My thought is based on we need to do more in the future about this feature to clean up sock files.
So we can just keep the previous code here.

Your way may be used in the future. But now I prefer not to introduce it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improvement was motivated by this comment:
#649 (comment)

Yes, this function does not implement a check to see if the socket is abandoned or not. But for this, as recommended everywhere and for similar use in linux, a lock is required, which I feel we will not get to soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my previous answer, I want to note that ignoring the error handling of (void) unlink here may cause bind to fail later and throw an error (EADDRINUSE) not related to the cause of the failure (e.g. EACCES), but related to the consequence of using bind after (void) unlink.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good morning @mar0x,

Since you suggested this feature earlier, please let me know what you think of my proposed implementation. Thank you!



static nxt_int_t
nxt_main_listening_socket(nxt_sockaddr_t *sa, nxt_listening_socket_t *ls)
Copy link
Contributor

@hongzhidao hongzhidao Apr 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you didn't notice that the router process is different from the controller process.
controller has the privilege to do bind on the controller listening socket, then it does listen operation.
But router has no such privilege, so it has to ask main process to do bind on listening sockets from the configuration. The main process doesn't do listen, the router process does.

There are two operations: bind and listen.
controller process: bind + listen
router process: ask main to bind then listen by itself.

So it's wrong to reuse nxt_listen_socket_create().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't really notice that difference.

I asked @VBart earlier about the need to keep the separation between nxt_listen_socket and nxt_main_listening_socket.

My question and his answer:
#649 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listening socket in controller process is in charge of configuration management.
listening in router process is in charge of traffic requests.
So the listen operation related to listeners options need to be done in the router process.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation!

The sequence and reasons for calling via RPC between main <-> router are clear to me.

Please explain. I don't understand why nxt_listen_socket is executed exactly in router in nxt_router_listen_socket_ready? After all, in fact, this happens immediately after the creation of the socket and after the response from main -> router. If an error occurs in the main RPC handler, then the router will simply not receive fd from main and will not interact with the given socket. From the point of view of parameters: BACKLOG, TCP_DEFER_ACCEPT and other sockets for the controller and for the router seem identical to me.

Both cases:

  • controller -> nxt_listen_socket_create
  • router -> main(nxt_main_listening_socket) -> router(nxt_router_listen_socket_ready)

The same socket setup happens, only in the case of the router it is split into two parts and ends in nxt_router_listen_socket_ready. Is it only necessary to put the socket in passive mode only on successful completion of the RPC exchange between processes in the router? Does it make any real sense?

Copy link
Author

@echolimazulu echolimazulu Apr 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, then I suggest keeping the current split and just adding unlink before bind in nxt_main_listening_socket, thus solving this problem in the same way as it is already implemented in nxt_listen_socket_create. And in the future, we can replace unlink with a more secure solution, if at the moment nxt_socket_release_by_path does not suit us for one reason or another.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, listen has an effect on the socket itself and is not tied to the state of a particular process. In my understanding, listen simply switches the socket to passive mode, setting the mode of operation for incoming connections to be processed when accept is received. So I can't see a direct relationship between the traffic (which is the function of the router) and the process (which is the router) with the listen call itself, which simply changes the socket behavior.

You are right, listen only changes the bound TCP socket into a listening state.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, then I suggest keeping the current split and just adding unlink before bind in nxt_main_listening_socket, thus solving this problem in the same way as it is already implemented in nxt_listen_socket_create. And in the future, we can replace unlink with a more secure solution, if at the moment nxt_socket_release_by_path does not suit us for one reason or another.

I agree to use unlink now, but we need to reuse the logic in nxt_listen_socket_create(): create temp file then bind. Otherwise, if it's bonded with other instances like nginx, it's bad to unlink it.

Then we need to split nxt_listen_socket_create into code before and after listen execution, right?

Copy link
Contributor

@hongzhidao hongzhidao Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of "create temp path for unix domain socket" and "test connecting after listening" should work in the same process. If bind happens on the main process but listen happens on the router process, the logic is broken. We'll discuss it first and tell you if it's clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I don't understand at all. So this is the current version of the existing code in the origin master branch. Do you think it's incorrect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's right. It makes sense that the router process does the listen operation until all the listeners have been bounded in the main process.
What I'm not sure about is if the listen operation can be moved to the main process from the router process. I didn't notice the background before.

@echolimazulu echolimazulu force-pushed the master branch 3 times, most recently from dcffa05 to 062223f Compare April 18, 2022 08:22
nxt_alert(task, "listen(%d, %d) failed %E",
s, ls->backlog, nxt_socket_errno);
goto listen_fail;
if (ls->start == NULL) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongzhidao How do you like this decision? It is fully consistent with the current logic of the listen and router call program.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current patch is based on the logic from the controller process, as you used nxt_listen_socket_create(), but it seems it requires main process does listen operation, and now I don't know if it's reasonable.
My suggestion is to make it clear first. Please be patient with the review.

Copy link
Author

@echolimazulu echolimazulu Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The moment I noted, does not perform listen when called nxt_listen_socket_create that came from the router, but at the same time performs it for the controller. That is, this logic corresponds to the already existing logic (without changes) contained in the master branch.

I reproduce new options based on your suggestions and comments as they come in so that we can immediately see the result and move on to the best solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If we perform listen in the main process before all sockets are bound, clients will be able to be accepted but without any process. It's possible to happen in the real world.
  2. If we perform listen in the router process, the logic in the nxt_listen_socket_create is broken, because we don't check the listening connection.
// The below logic is required after `listen`.
#if (NXT_HAVE_UNIX_DOMAIN)

    if (orig_sa != NULL) {
        ts = nxt_socket_create(task, AF_UNIX, SOCK_STREAM, 0, 0);
        if (ts == -1) {
            goto listen_fail;
        }

        ret = connect(ts, &orig_sa->u.sockaddr, orig_sa->socklen);

        err = nxt_socket_errno;

        nxt_socket_close(task, ts);

        if (ret == 0) {
            nxt_alert(task, "connect(%d, %*s) succeed, address already in use",
                      ts, (size_t) orig_sa->length,
                      nxt_sockaddr_start(orig_sa));

            goto listen_fail;
        }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sincerely appreciate your corrections, thank you for this!

You can also move this connect check into a separate function like nxt_listen_socket and call it separately from router and from nxt_listen_socket_create with ls->start == NULL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that personally, I can't agree this new way.

  1. It seems the logic is broken.
static void
nxt_router_conf_error(nxt_task_t *task, nxt_router_temp_conf_t *tmcf)
{
    ...
    for (qlk = nxt_queue_first(&creating_sockets);
         qlk != nxt_queue_tail(&creating_sockets);
         qlk = nxt_queue_next(qlk))
    {
        skcf = nxt_queue_link_data(qlk, nxt_socket_conf_t, link);
        s = skcf->listen->socket;

        if (s != -1) {
            nxt_socket_close(task, s);
        }

        nxt_free(skcf->listen);
    }
  1. It's strange that the router process has to maintain one more socket information about rename. I think it makes things complicated.

We think the ideal solution is to clean up all the unused unix domain socket files.
This is our internal plan for the future.

Copy link
Author

@echolimazulu echolimazulu Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your correction.

I'll fix nxt_router_conf_error, thanks for bringing this to my attention.

But rename cannot be performed directly from router, as it does not have sufficient permissions to perform this operation. This moment is tested by me personally.

The whole difficulty is that you want to ensure similar work of controller and router on the basis of one nxt_listen_socket_create, depending on the place in the code where listen is called, and also provide all the necessary checks and operations after running listen ( connect, rename).

Please tell me more about your ideal solution. I want to note that cleaning up unused (abandoned) sockets and the problem of reusing sockets at the moment are completely different problems.

Copy link
Contributor

@hongzhidao hongzhidao Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell me more about your ideal solution. I want to note that cleaning up unused (abandoned) sockets and the problem of reusing sockets at the moment are completely different problems.

If we can clean up unused sockets, then we don't need the way borrowed from controller, though it's more robust. To clean up unused sockets, we need to get the pending deleting sockets in the router process, then ask the controller/main process to do it.

But rename cannot be performed directly from router, as it does not have sufficient permissions to perform this operation. This moment is tested by me personally.

Right, I agree that the way works, but it makes the router more complicated. The router code is already complicated, I prefer not to make it bigger...

Copy link
Author

@echolimazulu echolimazulu Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not take it for excessive perseverance and obsession. I'm trying to solve a real problem I'm facing in my projects.

The problem is that abandoned and deleted router sockets during a successful (soft) termination may not solve the problem of reusing sockets when the program terminates unexpectedly.

Removing existing sockets must somehow be done during the socket creation process, as it may have been abandoned at the time of the previous unexpected or or early termination of the program.

Previously (#670 (comment)) I suggested not using nxt_listen_socket_create and adding unlink before bind in nxt_main_listening_socket, but you (#670 (comment)) said that it should be abandoned in favor of nxt_listen_socket_create, which involves communication with controller and the need to separate the logic of listen, connect, rename.

When using cloud platforms, the snap-in sends a SIGKILL signal to the container and it is expected that it will complete all the necessary procedures to close all existing processes: connections, sockets, etc. If the end application does not have time to do this, then the container will be forcefully terminated within 10 seconds after SIGKILL.

I understand that in the case of a Unit, 10 seconds is more than enough to terminate all processes, however, but unexpected terminations may not be due to a SIGKILL being passed from a snap.

Copy link
Author

@echolimazulu echolimazulu Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that personally, I can't agree this new way.

  1. It seems the logic is broken.

Seems to be fixed now: 1497d19

@echolimazulu echolimazulu force-pushed the master branch 3 times, most recently from 5195c85 to 867d4ec Compare April 20, 2022 14:57
@echolimazulu
Copy link
Author

echolimazulu commented Jul 25, 2022

Hello Alex (@alejandro-colomar),

Glad I was able to answer your question!

Would a solution that works for Linux only work for you? Which kernel are you using to run Unit?

I'm using debian11 as the kernel for building unit.

Add support for abstract Unix sockets (see the discussion in #735), which currently are only implemented in Linux, but could be added to other kernels (discuss that with your local kernel vendor :P).

Abstract Unix sockets don't live in the file system, so they don't need to be unlink(2)ed (the kernel removes the name when the last sfd to it has been closed, which is what we want here). They also have drawbacks: you can't use filesystem permissions to control who can access the socket; but since we're creating our sockets 666 right now, nothing would really change for us in that regard.

Thank you for the detailed explanation of how abstract sockets work. I want to note that for me it would be critical to use sockets existing in the file system. For example, in the case of a file socket forwarding scenario inside a Kubernetes cluster between the caching and compressing nginx and the unit behind it. Abstract sockets unfortunately do not allow applications outside of the operating system they reside on to interact with them.

As @VBart previously mentioned, unit already contains or plans to implement said features in the near future. But my planning relies on the technological capabilities already available, which I can already use and which will be enough for me to solve the tasks before me. Even if Unit implements caching and compression, I will lack flexibility when dealing with headers, etc. In this regard, I would not exclude the option of using nginx in conjunction with unit, being limited only to the use of TCP / IP sockets, taking into account the existing technological capabilities of virtualization.

I see unit as a reliable and efficient cross-platform web server running directly in front of the application itself on the same virtualization unit, but is able to communicate effectively in a hyperconverged environment.

This is my vision and use of unit, of course it may differ from the vast majority.

Sorry for highlighting the text, just wanted to emphasize what I think are the most important thoughts and improve readability.

I am always glad to any discussions and work with you and your team.

Regards,
Evgenii.

@alejandro-colomar
Copy link
Contributor

Hello Alex (@alejandro-colomar),

Glad I was able to answer your question!

Would a solution that works for Linux only work for you? Which kernel are you using to run Unit?

I'm using debian11 as the kernel for building unit.

Ok, so Linux.

Add support for abstract Unix sockets (see the discussion in #735), which currently are only implemented in Linux, but could be added to other kernels (discuss that with your local kernel vendor :P).

Abstract Unix sockets don't live in the file system, so they don't need to be unlink(2)ed (the kernel removes the name when the last sfd to it has been closed, which is what we want here). They also have drawbacks: you can't use filesystem permissions to control who can access the socket; but since we're creating our sockets 666 right now, nothing would really change for us in that regard.

Thank you for the detailed explanation of how abstract sockets work. I want to note that for me it would be critical to use sockets existing in the file system. For example, in the case of a file socket forwarding scenario inside a Kubernetes cluster between the caching and compressing nginx and the unit behind it. Abstract sockets unfortunately do not allow applications outside of the operating system they reside on to interact with them.

But normal Unix sockets are already restricted to the same host (AFAIK). I don't think there are extra restrictions in this regard by using abstract Unix sockets. Do you have an example application the can work with normal Unix sockets in a multi-host environment that would stop working if the socket was an abstract one?

@echolimazulu
Copy link
Author

echolimazulu commented Jul 25, 2022

Usually it is.

I don't understand how you can grant access to connect to an abstract socket outside of the host (virtualization unit) on which it was running. I consider it correct to separate multiple connections to a file socket and actual external access to an abstract socket into different problems. Perhaps I do not know something - I admit it.

With a file socket, this option is definitely possible (shared) and multiple connections to this socket are not required in most standard scenarios, and if it is required and it is technically unavailable (limited), then it is possible to provide multiple connections using a TCP/IP socket.

Earlier, in discussions with @VBart, I said that I do not understand the logic of the F5 Networks & Nginx Inc R&D teams. which boils down to not using file sockets in the Unit product, regardless of the reasons, such as those indicated by you (as I read this): since access is 666 for file sockets - we can only use abstract ones. But no, we can't (in my humble opinion).

If you take the same nginx, from time immemorial it has supported all possible types of connection, not to mention the implementation of a larger number of protocols available in it.

I would like to recall the whole history of the development of this problem:

  • at first I was told (@VBart) that this was not part of the release plan and that file sockets were a legacy from other projects;
  • I tried to help with the implementation - I was told that the solution I provided is too complicated; (@hongzhidao) but acknowledging that it works. Special thanks to @hongzhidao for working on this;
  • the unit team tried to implement a simple solution, but ran into a problem that made my solution so complicated. While recognizing that the solution I provided is probably the only possible one. Special thanks to @alejandro-colomar for working on this.

Also a special thanks to @tippexs for leading the process.

As a result, as I understand it, the unit team came to the question of the need to implement and maintain file sockets in principle as such, which seems to me a little strange, as well as the fact that I repeatedly give detailed usage scenarios (it’s not difficult for me, I’m glad help), which I have never seen from the key developers of unit (@VBart), except for the statements that this is not necessary from real users. Apparently, in their opinion, I am not real or I am not a user of the unit product.

I don’t blame anyone, but from the outside it looks, to put it mildly, like an ordinary sabotage of the development of the project by individual developers and brings them to the decision to refuse to implement file sockets, motivated not by the lack of the possibility of implementation or the presence of real demand.

I did not see answers from @mar0x, @VBart, in this PR, those who participated in the discussions of this problem and its solution at the very beginning and asked the vector, but subsequently completely avoided discussing this problem.

I also want to note that for several months of waiting, I did not wait for the promised answer with the vision of the future Unit project, which @lcrilly promised me by asking me to write a letter to his mail: liam@nginx.com, which he kindly provided me earlier and to which I wrote to him on 03/31/2022. Nevertheless, I respect his busyness and understand that my letter could simply be lost.

I want to clarify what I am for the development and bright future of the unit product, which is why I spent a significant amount of time trying to help this project and continue to do so until now. Perhaps I am wasting my time, as I have no desire to resist the overall vision of the unit team's product development, which changes from circumstances unknown to me.

Regardless of the above, I am full of hopes and expectations related to the unit project. I have great respect for absolutely all the contributors of the project and I am glad to interact with each of them regardless of the circumstances, as I believe that the future of the unit project is the key in this whole story and we should probably all concentrate on joining forces when working on it and reaching consensus rather than developing disagreements on it.

@alejandro-colomar
Copy link
Contributor

I'm not saying we're not going to implement Unix sockets. We are. And I'm about to commit the changes into the master branch (I wanted to clear up a few discussions before that, but the code is ready).

What I'm saying is that I want to support both variants of the Unix domain socket: filesystem and abstract.

Both are equivalent from a user point of view: filesystem sockets would be used as "unix:/path/to/file", and abstract sockets would be used as "unix:@socket/name". Notice the initial @ in the pathname; that's literally the only difference you'll ever notice.

The other difference between filesystem sockets and abstract sockets is that with filesystem sockets you'll have the problem that your filesystem will have files that you'll have to clean up manually, while the abstract sockets will be cleaned up by the kernel automatically (a good thing).

You sholdn't notice any other differences (as long as the application that wants to talk to the socket also supports abstract sockets; NGINX still doesn't but I'm discussing with them about the possibility of adding support for it).

@echolimazulu
Copy link
Author

echolimazulu commented Jul 25, 2022

Thanks for your detailed answer.

Sorry, maybe I misread or misunderstood your answer (I fully admit it).

But I think that you and I understand perfectly well that the implementation of abstract and file sockets you specified is available for the most part at the current time (only through \0 at the beginning of the path), and the usage scenario you described certainly expands the possibilities of current use, but does not solves the main problem with cleaning up when the program ends and the file socket is removed, which is in fact the motivator of this PR, since it is not possible to restart the unit without removing the file socket.

My question to the implementation of file sockets you indicated at the moment is why the user needs to separately manually determine where the abandoned file socket was, track the state of the unit and delete it after the unit automatically created this file socket at startup and simply abandoned it at the end of the program? Moreover, when not deleting the file socket, the unit cannot be restarted.

@alejandro-colomar
Copy link
Contributor

alejandro-colomar commented Jul 25, 2022

I didn't solve the problem with Unix domain filesystem sockets because:

  • Unix domain abstract sockets don't have that problem, so you can switch to using them, and the problem will go away.
  • If you have an environment (in Linux) where Unix filesystem sockets work but Unix abstract sockets don't work, please show it to me, with actual commands that show the situation, to be able to debug it. I suspect there's no such scenario.
  • Even if you do need filesystem sockets (e.g., on FreeBSD), you can use the same rc script that you use to run unitd, to also clean up any sockets. In fact, in Debian, you have systemd, which solves this easily: see systemd-tmpfiles(8).

So I don't see much benefit in cleaning the files within unit, considering the complexity it imposes, and the available alternatives. It's not fully discarded, but I don't like the idea.

@echolimazulu
Copy link
Author

I didn't solve the problem with Unix domain filesystem sockets because:

  • Unix domain abstract sockets don't have that problem, so you can switch to using them, and the problem will go away.
  • If you have an environment (in Linux) where Unix filesystem sockets work but Unix abstract sockets don't work, please show it to me, with actual commands that show the situation, to be able to debug it. I suspect there's no such scenario.
  • Even if you do need filesystem sockets (e.g., on FreeBSD), you can use the same rc script that you use to run unitd, to also clean up any sockets. In fact, in Debian, you have systemd, which solves this easily: see systemd-tmpfiles(8).

So I don't see much benefit in cleaning the files within unit, considering the complexity it imposes, and the available alternatives. It's not fully discarded, but I don't like the idea.

Thanks for the options provided.

Unix domain abstract sockets don't have that problem, so you can switch to using them, and the problem will go away.

Please forget about abstract sockets in the context of this PR, I didn't mention this question at all until you first did it.

If you have an environment (in Linux) where Unix filesystem sockets work but Unix abstract sockets don't work, please show it to me, with actual commands that show the situation, to be able to debug it. I suspect there's no such scenario.

You are asking to show the environment that I have already described at least 2 times before (in this PR and before that in a discussion with @VBart in another PR). I'm talking about two Pods with roles (nginx and unit) inside a Kubernetes cluster with file socket communication (shared). How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

Even if you do need filesystem sockets (e.g., on FreeBSD), you can use the same rc script that you use to run unitd, to also clean up any sockets. In fact, in Debian, you have systemd, which solves this easily: see systemd-tmpfiles(8).

I'm relying on Google's advice, which is a good reflection of why using managers like systemd, supervisord, and shells like !/bin/sh is undesirable, and that it actually breaks the basic principle of how containerized applications work:
https://cloud.google.com/architecture/best-practices-for-building-containers

Moreover, I don't use FreeBSD, and neither do most users of your product.

So I don't see much benefit in cleaning the files within unit, considering the complexity it imposes, and the available alternatives. It's not fully discarded, but I don't like the idea.

I think that it doesn't matter what we personally like and what we don't when we talk about the needs of the business expressed in the requirements for the product that the end user needs.

@alejandro-colomar
Copy link
Contributor

alejandro-colomar commented Jul 25, 2022

How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

Exactly as you would with unix sockets. See an example here. Please feel free to ask any questions if you don't understand something.

alx@asus5775:~/tmp$ cat srv_a.c 
#include <err.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>


#define MY_SOCK_NAME       "@magic"
#define MY_LISTEN_BACKLOG  50


int main(void)
{
	int                 sfd, cfd;
	char                buf[7];
	socklen_t           len;
	struct sockaddr_un  sun = {0}, cun;

	sfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sfd == -1)
		err(EXIT_FAILURE, "socket(2)");

	sun.sun_family = AF_UNIX;
	strncpy(sun.sun_path, MY_SOCK_NAME, sizeof(sun.sun_path) - 1);
	sun.sun_path[0] = '\0';

	len = sizeof(sa_family_t) + strlen(MY_SOCK_NAME);
	if (bind(sfd, (struct sockaddr *) &sun, len) == 1)
		err(EXIT_FAILURE, "bind(2)");

	if (listen(sfd, MY_LISTEN_BACKLOG) == -1)
		err(EXIT_FAILURE, "listen(2)");

	len = sizeof(cun);
	cfd = accept(sfd, (struct sockaddr *) &cun, &len);
	if (cfd == -1)
		err(EXIT_FAILURE, "accept(2)");

	if (read(cfd, buf, sizeof(buf)) != sizeof(buf))
		err(EXIT_FAILURE, "read(2)1");

	if (write(STDOUT_FILENO, buf, sizeof(buf)) == -1)
		err(EXIT_FAILURE, "write(2)");

	if (read(cfd, buf, 1) != 0)
		err(EXIT_FAILURE, "read(2)2");

	if (close(cfd) == -1)
		err(EXIT_FAILURE, "close(cfd)");

	if (close(sfd) == -1)
		err(EXIT_FAILURE, "close(sfd)");

	exit(EXIT_SUCCESS);
}
alx@asus5775:~/tmp$ cat cli_a.c 
#include <err.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>


#define MY_SOCK_NAME       "@magic"


int main(void)
{
	int                 sfd;
	socklen_t           len;
	struct sockaddr_un  sun = {0};

	sfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sfd == -1)
		err(EXIT_FAILURE, "socket(2)");

	sun.sun_family = AF_UNIX;
	strncpy(sun.sun_path, MY_SOCK_NAME, sizeof(sun.sun_path) - 1);
	sun.sun_path[0] = '\0';

	len = sizeof(sa_family_t) + strlen(MY_SOCK_NAME);
	if (connect(sfd, (struct sockaddr *) &sun, len) == 1)
		err(EXIT_FAILURE, "connect(2)");

	if (write(sfd, "hello\n", 7) != 7)
		err(EXIT_FAILURE, "write(2)");

	if (close(sfd) == -1)
		err(EXIT_FAILURE, "close(sfd)");

	exit(EXIT_SUCCESS);
}
alx@asus5775:~/tmp$ cc -Wall -Wextra srv_a.c -o srv_a
alx@asus5775:~/tmp$ cc -Wall -Wextra cli_a.c -o cli_a
alx@asus5775:~/tmp$ ./srv_a &
[1] 2885
alx@asus5775:~/tmp$ ./cli_a 
hello
[1]+  Done                    ./srv_a

(Thanks to @ac000, who pointed out a bug in the above program. I fixed it.)

@echolimazulu
Copy link
Author

echolimazulu commented Jul 25, 2022

How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

Exactly as you would with unix sockets. See an example here. Please feel free to ask any questions if you don't understand something.

alx@asus5775:~/tmp$ cat srv_a.c 
#include <err.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>


#define MY_SOCK_NAME       "\0magic"
#define MY_LISTEN_BACKLOG  50


int main(void)
{
	int                 sfd, cfd;
	char                buf[7];
	socklen_t           len;
	struct sockaddr_un  sun = {0}, cun;

	sfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sfd == -1)
		err(EXIT_FAILURE, "socket(2)");

	sun.sun_family = AF_UNIX;
	strncpy(sun.sun_path, MY_SOCK_NAME, sizeof(sun.sun_path) - 1);

	if (bind(sfd, (struct sockaddr *) &sun, sizeof(sun)) == 1)
		err(EXIT_FAILURE, "bind(2)");

	if (listen(sfd, MY_LISTEN_BACKLOG) == -1)
		err(EXIT_FAILURE, "listen(2)");

	len = sizeof(cun);
	cfd = accept(sfd, (struct sockaddr *) &cun, &len);
	if (cfd == -1)
		err(EXIT_FAILURE, "accept(2)");

	if (read(cfd, buf, sizeof(buf)) != sizeof(buf))
		err(EXIT_FAILURE, "read(2)1");

	if (write(STDOUT_FILENO, buf, sizeof(buf)) == -1)
		err(EXIT_FAILURE, "write(2)");

	if (read(cfd, buf, 1) != 0)
		err(EXIT_FAILURE, "read(2)2");

	if (close(cfd) == -1)
		err(EXIT_FAILURE, "close(cfd)");

	if (close(sfd) == -1)
		err(EXIT_FAILURE, "close(sfd)");

	exit(EXIT_SUCCESS);
}
alx@asus5775:~/tmp$ cat cli_a.c 
#include <err.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>


#define MY_SOCK_NAME       "\0magic"


int main(void)
{
	int                 sfd;
	struct sockaddr_un  sun = {0};

	sfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sfd == -1)
		err(EXIT_FAILURE, "socket(2)");

	sun.sun_family = AF_UNIX;
	strncpy(sun.sun_path, MY_SOCK_NAME, sizeof(sun.sun_path) - 1);

	if (connect(sfd, (struct sockaddr *) &sun, sizeof(sun)) == 1)
		err(EXIT_FAILURE, "connect(2)");

	if (write(sfd, "hello\n", 7) != 7)
		err(EXIT_FAILURE, "write(2)");

	if (close(sfd) == -1)
		err(EXIT_FAILURE, "close(sfd)");

	exit(EXIT_SUCCESS);
}
alx@asus5775:~/tmp$ cc -Wall -Wextra srv_a.c -o srv_a
alx@asus5775:~/tmp$ cc -Wall -Wextra cli_a.c -o cli_a
alx@asus5775:~/tmp$ ./srv_a &
[1] 2885
alx@asus5775:~/tmp$ ./cli_a 
hello
[1]+  Done                    ./srv_a

Thanks for your explanation.

But in the previous sentence you refer to, I gave a specific example that involves interacting with a socket from two independent containers.

I'm talking about two Pods with roles (nginx and unit) inside a Kubernetes cluster with file socket communication (shared). How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

As far as I remember, the abstract socket is only suitable for interprocessor communication within the same host (container). Please take a look at the schematic I am attaching to this post. This is a much more accurate and detailed use case I'm talking about:

kubernetes-nginx-unit
.

@alejandro-colomar
Copy link
Contributor

alejandro-colomar commented Jul 25, 2022

But in the previous sentence you refer to, I gave a specific example that involves interacting with a socket from two independent containers.

I'm talking about two Pods with roles (nginx and unit) inside a Kubernetes cluster with file socket communication (shared). How can you implement this exchange (communication between two applications) through an abstract socket? Please explain this to me?

As far as I remember, the abstract socket is only suitable for interprocessor communication within the same host (container). Please take a look at the schematic I am attaching to this post. This is a much more accurate and detailed use case I'm talking about:

kubernetes-nginx-unit .

That picture helps a lot.

So, I've been testing isolating my server and client programs in Linux namespaces, and I found that the only thing that causes the socket to be invisible is having both processes in different network namespaces.

That's consistent with what network_namespaces(7) describes:

DESCRIPTION
       Network  namespaces  provide  isolation of the system re‐
       sources associated with networking: network devices, IPv4
       and IPv6 protocol stacks,  IP  routing  tables,  firewall
       rules,  the /proc/net directory (which is a symbolic link
       to /proc/PID/net), the /sys/class/net directory,  various
       files under /proc/sys/net, port numbers (sockets), and so
       on.  In addition, network namespaces isolate the UNIX do‐
       main abstract socket namespace (see unix(7)).

Are both pods in the same network Linux namespace? If not, is it something that you can change?

alx@asus5775:~/tmp$ unshare -i -m -p -u -U -C -T -f ./srv_a &
[1] 2862
alx@asus5775:~/tmp$ unshare -i -m -p -u -U -C -T -f ./cli_a 
hello
[1]+  Done                    unshare -i -m -p -u -U -C -T -f ./srv_a
alx@asus5775:~/tmp$ sudo unshare -n -f ./srv_a &
[1] 2872
alx@asus5775:~/tmp$ sudo unshare -n -f ./cli_a 
cli_a: write(2): Transport endpoint is not connected

(The following sentence was incorrect, after investigation. Different pods are in different net namespaces, and although that can be relaxed, it would be a security issue.)
As far as I know, pods in the same node share the network namespace, but I'm not sure about that.

@echolimazulu
Copy link
Author

As far as I know, pods in the same node share the network namespace, but I'm not sure about that

Hello Alex,

I apologize for the wording "interprocessor communication" - I meant communication between two processes within the same host. I formulated it poorly, since I wrote the answer and drew the picture late at night.

As far as I remember, the attached filespace driver (attached file system, shared) is from the Kubernetes snap-in and is not connected to the network in any way. This works in much the same way as mounted volumes in Docker, although in my humble opinion it is more efficient.

@alejandro-colomar
Copy link
Contributor

alejandro-colomar commented Jul 26, 2022

As far as I know, pods in the same node share the network namespace, but I'm not sure about that

Hello Alex,

I apologize for the wording "interprocessor communication" - I meant communication between two processes within the same host. I formulated it poorly, since I wrote the answer and drew the picture late at night.

As far as I remember, the attached filespace driver (attached file system, shared) is from the Kubernetes snap-in and is not connected to the network in any way. This works in much the same way as mounted volumes in Docker, although in my humble opinion it is more efficient.

Don't worry, with the picture I got a perfect idea of what you have.

So, I've been investigating about kubernetes, and you're right, pods are isolated from each other regarding network (kubernetes implements this by puting the processes running in them in diferent Linux network namespaces).

So, if you have nginx and unit in different pods, it's impossible to use abstract sockets.

But if you put the two containers in the same pod (that's possible), they will share network namespace, and they'll be able to communicate that way between them.

Would you be able to use a single pod with two containers?

As you've explained it to me, it seems that the relation of nginx containers to unit containers will be 1:1, right? If that's the case, using the same pod would make sense in terms of scalability, since you could just increase the multi-container pod instances easily. It would also remove the need for using a shared filesystem mountpoint. I think it would be even better in terms of isolation.

@tippexs
Copy link
Contributor

tippexs commented Jul 26, 2022

@echolimazulu good so see you again hanging around! Thank you very much for all the work you put into this topic! We really appreciate it!

The final infrastructure is clear to me now but could you explain whats the use case for the NGINX (OSS) Proxy in front of the Unit instance? Just want to understand what feature is missing. Looking forward to your answer.
Have a great day! Cheers Timo

@echolimazulu
Copy link
Author

echolimazulu commented Jul 27, 2022

@echolimazulu good so see you again hanging around! Thank you very much for all the work you put into this topic! We really appreciate it!

The final infrastructure is clear to me now but could you explain whats the use case for the NGINX (OSS) Proxy in front of the Unit instance? Just want to understand what feature is missing. Looking forward to your answer. Have a great day! Cheers Timo

Hello @tippexs,

Thank you for your attention to this issue and for joining the discussion with @alejandro-colomar and @hongzhidao. Glad to work with you all on this issue!

Specifically, I use and I really like the way it is implemented in nginx:

  • caching is a very flexible tool and super efficient in nginx;
  • compression - I use brotli and gzip;
  • headers - I am very comfortable working with headers in nginx, it works without exceptions and restrictions in use and without known problems in passing to the upstream balancer (proxy - if available), especially due to the fact that they are one and the same level with the role of caching and allows you to fully manage this process in one place if this requires the use of additional headers;
  • envsubst - I have my own implementation written by me in C of this feature, an approximate analogue of !/bin/sh, only without using the shell, but with the same principle of transferring control over the process and the environment after preliminary configuration. In unit (if it's up front) I use a pre-configured file at startup, which may not always be convenient if virtualization manages the port and changes it (support for this is necessary from Google's point of view). The option of launching and sending a configuration request (curl) to a configuration socket seems a little redundant and non-atomic to me compared to a pre-checked and prepared file (yes, possibly with a port change on the fly - interpolation).

@echolimazulu
Copy link
Author

echolimazulu commented Jul 27, 2022

As far as I know, pods in the same node share the network namespace, but I'm not sure about that

Hello Alex,
I apologize for the wording "interprocessor communication" - I meant communication between two processes within the same host. I formulated it poorly, since I wrote the answer and drew the picture late at night.
As far as I remember, the attached filespace driver (attached file system, shared) is from the Kubernetes snap-in and is not connected to the network in any way. This works in much the same way as mounted volumes in Docker, although in my humble opinion it is more efficient.

Don't worry, with the picture I got a perfect idea of what you have.

So, I've been investigating about kubernetes, and you're right, pods are isolated from each other regarding network (kubernetes implements this by puting the processes running in them in diferent Linux network namespaces).

So, if you have nginx and unit in different pods, it's impossible to use abstract sockets.

But if you put the two containers in the same pod (that's possible), they will share network namespace, and they'll be able to communicate that way between them.

Would you be able to use a single pod with two containers?

As you've explained it to me, it seems that the relation of nginx containers to unit containers will be 1:1, right? If that's the case, using the same pod would make sense in terms of scalability, since you could just increase the multi-container pod instances easily. It would also remove the need for using a shared filesystem mountpoint. I think it would be even better in terms of isolation.

Hello @alejandro-colomar,

Thank you for taking the time to explore Kubernetes features and suggestions.

Unfortunately, the abstraction option you described: two programs (nginx, unit) in one container violate the general principle of correct virtualization. This is due to several reasons, ranging from zombie processes when using various shells, service managers, to the inability to correctly pass SIGKILL to the end application and get the necessary STDOUT / STDERR output from the end application in the container.

Here I give my preference and follow Google's guidelines for Kubernetes containers, namely:
single app per container
https://cloud.google.com/architecture/best-practices-for-building-containers#package_a_single_app_per_container

As for scalability, I prefer to consider its possibility at the Node abstraction level, rather than the Pod, through the creation of so-called Node pools providing redundancy and fault tolerance if necessary.

@alejandro-colomar
Copy link
Contributor

alejandro-colomar commented Jul 27, 2022

But if you put the two containers in the same pod [...] they'll be able to communicate that way between them.
Would you be able to use a single pod with two containers?

Hello @alejandro-colomar,

Thank you for taking the time to explore Kubernetes features and suggestions.

Unfortunately, the abstraction option you described: two programs (nginx, unit) in one container violate the general principle of correct virtualization. This is due to several reasons, ranging from zombie processes when using various shells, service managers, to the inability to correctly pass SIGKILL to the end application and get the necessary STDOUT / STDERR output from the end application in the container.

Dear Evgenii,

I totally agree with you that placing two processes in the same container would be a terrible idea. I am suggesting instead a totally different solution: placing one single process per container, but two containers in the same pod. As you may already know, a pod is a group of containers that share the same network namespace (they are isolated as a whole, so no networking to the outside, but they can network between them).

Thank you very much.

Here I give my preference and follow Google's guidelines for Kubernetes containers, namely: single app per container https://cloud.google.com/architecture/best-practices-for-building-containers#package_a_single_app_per_container

As for scalability, I prefer to consider its possibility at the Node abstraction level, rather than the Pod, through the creation of so-called Node pools providing redundancy and fault tolerance if necessary.

@alejandro-colomar
Copy link
Contributor

alejandro-colomar commented Jul 27, 2022

I'm referring to this:

https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/

which you'll notice that mentions as a use case having nginx as a reverse proxy before the actual app (unit in this case).

@alejandro-colomar
Copy link
Contributor

alejandro-colomar commented Jul 27, 2022

With my srv_a and cli_a programs, the YAML would look something like the following:

apiVersion: v1
kind: Pod
metadata:
  name: server_and_client
spec:
  containers:
  - name: server
    image: srv_a
    command: ["/bin/sh", "-c"]
    args:
      - /usr/local/bin/srv_a
  - name: client
    image: cli_a
    command: ["/bin/sh", "-c"]
    args:
      - sleep 10 && /usr/local/bin/cli_a

@echolimazulu
Copy link
Author

I'm referring to this:

https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/

which you'll notice that mentions as a use case having nginx as a reverse proxy before the actual app (unit in this case).

Wow, Alex (@alejandro-colomar), this is incredibly useful information, sincerely thank you for this! And yes, it seems to me that this will allow the use of a common namespace to work with abstract sockets. I'll try this out in the future, after full abstract socket support is added to the main branch by you (probably already). But as you very well noted earlier, not all applications can support abstract sockets, and not everyone uses kubernetes as a flexible virtualization tool.

I overlooked this, since I mainly work in abstraction: Node -> Pod = 1 container and have not come across the principle of work you indicated in practice, so my knowledge of this will not allow an objective discussion on the pros and cons of this solution at the moment time. Most likely everything will be fine. Of the minuses, only an increase in the number of attack vectors due to the use of shared memory is likely. Therefore, for myself, I personally accept this option and with a high probability I will use it in my own solutions, as the most efficient current solution, provided that abstract sockets are used.

But the main guest in this PR is still file sockets, which exist almost everywhere and are understood and known to very many people. I would say that this is already a generally accepted standard. Which cannot be completely discarded or abandoned by referring to abstract sockets (which are not always and everywhere possible to use) or the legacy of the project.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC.

If we abandon file sockets altogether, then in my humble opinion it looks like a degradation of the project.

If we leave everything as it is and file sockets have to be deleted manually, after automatic creation at the time of unit startup - then this is also, in my humble opinion, a degradation of the project.

In fact, when we talk about abstract sockets in light of the work in progress for file sockets, we are trying to find a workaround, not a real solution to the problem of closing file sockets.

I am sincerely glad and grateful to you personally and @tippexs for throwing options to find a solution, because this is how we can come to a successful and best solution to this problem.

Regards,
Evgenii.

@alejandro-colomar
Copy link
Contributor

I'm referring to this:
https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/
which you'll notice that mentions as a use case having nginx as a reverse proxy before the actual app (unit in this case).

Wow, Alex (@alejandro-colomar), this is incredibly useful information, sincerely thank you for this! And yes, it seems to me that this will allow the use of a common namespace to work with abstract sockets. I'll try this out in the future, after full abstract socket support is added to the main branch by you (probably already).

I'm happy that this is so useful to you! :-)

But as you very well noted earlier, not all applications can support abstract sockets, and not everyone uses kubernetes as a flexible virtualization tool.

Yeah, I know there are scenarios where file sockets might be necessary.

I overlooked this, since I mainly work in abstraction: Node -> Pod = 1 container and have not come across the principle of work you indicated in practice, so my knowledge of this will not allow an objective discussion on the pros and cons of this solution at the moment time. Most likely everything will be fine. Of the minuses, only an increase in the number of attack vectors due to the use of shared memory is likely. Therefore, for myself, I personally accept this option and with a high probability I will use it in my own solutions, as the most efficient current solution, provided that abstract sockets are used.

Yes, please share any updates about this; tell me if it works well for you!

But the main guest in this PR is still file sockets, which exist almost everywhere and are understood and known to very many people. I would say that this is already a generally accepted standard. Which cannot be completely discarded or abandoned by referring to abstract sockets (which are not always and everywhere possible to use) or the legacy of the project.

When/if we receive a report that can't be adapted to use abstract sockets, this PR will be reconsidered.

But for now, we have a queue of difficult things to implement, and will try to keep it simple for now.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC.

If we abandon file sockets altogether, then in my humble opinion it looks like a degradation of the project.

No, it's not abandoned. It's just delayed until we really need it.

If we leave everything as it is and file sockets have to be deleted manually, after automatic creation at the time of unit startup - then this is also, in my humble opinion, a degradation of the project.

In fact, when we talk about abstract sockets in light of the work in progress for file sockets, we are trying to find a workaround, not a real solution to the problem of closing file sockets.

I am sincerely glad and grateful to you personally and @tippexs for throwing options to find a solution, because this is how we can come to a successful and best solution to this problem.

Regards, Evgenii.

Still, I agree with you that Unix sockets are a big thing for Unit, and in fact we're right now discussing about more uses of unix sockets within Unit (and NGINX too).

Cheers,

Alex

@echolimazulu
Copy link
Author

echolimazulu commented Jul 27, 2022

I'm referring to this:
https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/
which you'll notice that mentions as a use case having nginx as a reverse proxy before the actual app (unit in this case).

Wow, Alex (@alejandro-colomar), this is incredibly useful information, sincerely thank you for this! And yes, it seems to me that this will allow the use of a common namespace to work with abstract sockets. I'll try this out in the future, after full abstract socket support is added to the main branch by you (probably already).

I'm happy that this is so useful to you! :-)

But as you very well noted earlier, not all applications can support abstract sockets, and not everyone uses kubernetes as a flexible virtualization tool.

Yeah, I know there are scenarios where file sockets might be necessary.

I overlooked this, since I mainly work in abstraction: Node -> Pod = 1 container and have not come across the principle of work you indicated in practice, so my knowledge of this will not allow an objective discussion on the pros and cons of this solution at the moment time. Most likely everything will be fine. Of the minuses, only an increase in the number of attack vectors due to the use of shared memory is likely. Therefore, for myself, I personally accept this option and with a high probability I will use it in my own solutions, as the most efficient current solution, provided that abstract sockets are used.

Yes, please share any updates about this; tell me if it works well for you!

But the main guest in this PR is still file sockets, which exist almost everywhere and are understood and known to very many people. I would say that this is already a generally accepted standard. Which cannot be completely discarded or abandoned by referring to abstract sockets (which are not always and everywhere possible to use) or the legacy of the project.

When/if we receive a report that can't be adapted to use abstract sockets, this PR will be reconsidered.

But for now, we have a queue of difficult things to implement, and will try to keep it simple for now.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC.
If we abandon file sockets altogether, then in my humble opinion it looks like a degradation of the project.

No, it's not abandoned. It's just delayed until we really need it.

If we leave everything as it is and file sockets have to be deleted manually, after automatic creation at the time of unit startup - then this is also, in my humble opinion, a degradation of the project.
In fact, when we talk about abstract sockets in light of the work in progress for file sockets, we are trying to find a workaround, not a real solution to the problem of closing file sockets.
I am sincerely glad and grateful to you personally and @tippexs for throwing options to find a solution, because this is how we can come to a successful and best solution to this problem.
Regards, Evgenii.

Still, I agree with you that Unix sockets are a big thing for Unit, and in fact we're right now discussing about more uses of unix sockets within Unit (and NGINX too).

Cheers,

Alex

Good news!

Thanks for sharing this!

Once again, I want to note that I do not insist on combining this PR, it seems to me that I stand on the side and act in the interests of those users who are used to, love and want to use unix sockets in unit and for whom the lack of the correct ability to use them can be critical and the lack of a proper level of support and implementation on the part of the developer at the moment when they decide to choose unit as a product to implement in their production solutions.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC.

And isn't this an example of use with the inability to use abstract sockets? Or is it necessary for a person to buy a subscription to unit and write to support with a feature request in order for this request to be classified as a real client need and pass according to your internal schedule as necessary for implementation?

Sorry, but the position on your part and that of your colleagues is constantly changing. Sometimes (1) you call it implementation complexity, sometimes (2) lack of real need on the part of customers, sometimes (3) lack of real scenarios where it is required (offering workarounds), sometimes (4) useless project legacy, sometimes (5) internal prioritization features and lack of internal implementation capabilities. Am I forgetting something here? I wonder what else could be invented here to justify not doing this in a few months from the moment I reported it and even suggested a possible solution? This is a rhetorical question.

UPDATE:
Based on the logic that the unit command promotes: "give me an example of use". You just gave an example based on abstract sockets, which are not always and everywhere possible to use and far from everywhere they are implemented right now. In addition, taking into account that examples with separate virtual machines that do not have a common IPC and applications that cannot work with abstract sockets are absolutely obvious. But for the unit command, for some reason, this rule and logic does not work.

Personally, I am grateful to you for all kinds of attempts to solve and the solutions you provided for the problems I found, from my point of view, you are the most motivated and effective employee along with @tippexs and @hongzhidao.

Please note that I do not insist on solving this problem! If you do not consider it necessary to solve this problem or do not want to do it now - please, I will accept any decision on this problem. The project is yours, so it's up to you!

@alejandro-colomar
Copy link
Contributor

alejandro-colomar commented Jul 27, 2022

Good news!

Thanks for sharing this!

Once again, I want to note that I do not insist on combining this PR, it seems to me that I stand on the side and act in the interests of those users who are used to, love and want to use unix sockets in unit and for whom the lack of the correct ability to use them can be critical and the lack of a proper level of support and implementation on the part of the developer at the moment when they decide to choose unit as a product to implement in their production solutions.

I also know of people who use these roles (nginx, and others) on separate virtual machines under ESXi, which, as far as I remember, do not have a common IPC.

And isn't this an example of use with the inability to use abstract sockets?

I'd like to know a bit more about the case. I admit that I have no idea about ESXi (I for example had some idea about k8s, which is why I could help you). Maybe after some investigation, we can conclude that there's an easier way. I don't know.

Or is it necessary for a person to buy a subscription to unit and write to support with a feature request in order for this request to be classified as a real client need and pass according to your internal schedule as necessary for implementation?

Certainly not. I don't even know if you have a subscription to nginx. I care about every user of Unit. And if a feature makes sense to me, balancing the implementation complexity, and its benefits, I'll implement it, independently of who requested/suggested it.

Just saying that the complexity of this one is huge, so I want to make sure that we really need it, before taking the time to add it.

Sorry, but the position on your part and that of your colleagues is constantly changing.

I work for NGINX (F5), but my comments here are my own. I don't speak for my collegues, nor do I share their opinion in some cases.
I speak independently, and I hope consistently in time.

As you've noticed in the past, I have a strong tendency to be minimallist in my patches (even if sometimes I'm critizised of trying to apply too big patches). You can see that even after having merged several important features to the Unit code, my net line count in the project is negative (+785 -878). I'm just being myself here, and I'm very averse to add such complex code to a project whose source code I still don't know very well (for some meaning of "very well").

Sometimes (1) you call it implementation complexity, sometimes

Certainly, it is complex. You wrote the code, so I guess you have an idea. It will be even more complex for me to review your code, since it's completely new code for me.

(2) lack of real need on the part of customers,

In your case, we've agreed it's probably simpler for you to use abstract sockets, right?

sometimes (3) lack of real scenarios where it is required (offering workarounds),

I know there are scenarios where it is required, but I would like to have some certainty on that necessity by real users (your collegue that you mentioned might be one of them).

sometimes (4) useless project legacy,

I don't remember having said that.

sometimes (5) internal prioritization features and lack of internal implementation capabilities.

This certainly is the biggest issue we have right now.

Am I forgetting something here? I wonder what else could be invented here to justify not doing this in a few months from the moment I reported it and even suggested a possible solution? This is a rhetorical question.

We're less than a handful of C programmers in the Unit team. I've been in the team just for a few months, and still don't know very well how Unit is implemented internally, so it's hard for me to review this code; please understand that.

We've also had important changes, so that's why at some points we've been more silent than other times.

Personally, I am grateful to you for all kinds of attempts to solve and the solutions you provided for the problems I found, from my point of view, you are the most motivated and effective employee along with @tippexs and @hongzhidao.

Thanks! Really!

Please note that I do not insist on solving this problem! If you do not consider it necessary to solve this problem or do not want to do it now - please, I will accept any decision on this problem. The project is yours, so it's up to you!

My priority right now will be related with unix sockets for the next release, so I'll be learning how Unit uses them.

As I said, for the moment, I'll defer the revision of this PR indefinitely. After some more work with unix sockets, I may reconsider it.

Please tell your collegues to open an issue if they want to share a description of their use case, and maybe you or they can convince me that it's necessary enough that I can't avoid it.

Cheers,

Alex

@echolimazulu
Copy link
Author

I'd like to know a bit more about the case. I admit that I have no idea about ESXi (I for example had some idea about k8s, which is why I could help you). Maybe after some investigation, we can conclude that there's an easier way. I don't know.

I am very grateful to you!

I work for NGINX (F5), but my comments here are my own. I don't speak for my collegues, nor do I share their opinion in some cases.
I speak independently, and I hope consistently in time.

I don't separate F5 Networks and Nginx Inc. for individual employees, just as I do not separate the unit project from other F5 Networks projects. I chose unit because of the good attitude towards the nginx product and its well-deserved reputation, so I expect the same from unit, but taking into account its small age. I have previously described my vision to your Sr Director of Product Management, pointing out the unit's current competitive advantages and disadvantages, but this seems to have been ignored or left unnoticed.

However, after so much time and opposition from the team (not all of its members), providing evidence and discussions, I am still here, but in the near future I plan to close this PR, since unfortunately I will not be able to work on it anymore due to some circumstances not related to this PR.

I don't remember having said that.

This was not said by you, but by @VBart in other PRs.

@ac000
Copy link
Member

ac000 commented Jul 28, 2022

capabilities(7) might be a nice way of solving the unlink(2)ing of the socket files problem, on Linux at least. I don't know if they ever made it to other Unix systems. It never got past the POSIX draft stage (POSIX.1e). Or maybe they have a similar mechanism.

The router process could gain the CAP_DAC_OVERRIDE capability, do the unlink(2) then relinquish the capability until next time. I did use capabilities(7) in a project many years ago...

@alejandro-colomar
Copy link
Contributor

alejandro-colomar commented Jul 28, 2022

capabilities(7) might be a nice way of solving the unlink(2)ing of the socket files problem, on Linux at least. I don't know if they ever made it to other Unix systems. It never got past the POSIX draft stage (POSIX.1e). Or maybe they have a similar mechanism.

The router process could gain the CAP_DAC_OVERRIDE capability, do the unlink(2) then relinquish the capability until next time. I did use capabilities(7) in a project many years ago...

I'm not sure it's a safe one. See a rationale in #740. CAP_DAC_OVERRIDE is basically root (together with CAP_SYS_ADMIN, one of the most powerful caps). I'd prefer a ssafer one, I think

@ac000
Copy link
Member

ac000 commented Jul 28, 2022

Heh, I think that's a little extreme. It allows a process to override file system permissions yes, but it's not as bad as CAP_SYS_ADMIN (that's the basically root one) and it would only be for the duration of the unlink(2).

But then maybe in the end all these schemes aren't much simpler than just sending a message to the main process to

unlink /path/to/socket

and that would work for everybody.

But then nxt_main_process_cleanup() gets called when an application terminates, (perhaps not quite what we want),... and it looks like the listen sockets should be tucked away in task->thread->runtime->listen_sockets, but the array holding those sockets always seems to be empty. Even looking in the router process where I'd expect them to be

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f9633111bfe in epoll_wait (epfd=3, events=0x15e5970, maxevents=32, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) frame 2
#2  0x00000000004134ba in nxt_event_engine_start (engine=0x15dcbd0)
    at src/nxt_event_engine.c:549
549             engine->event.poll(engine, timeout);
(gdb) p *task->thread->runtime->listen_sockets
$1 = {
  elts = 0x15dc918,
  nelts = 0,
  size = 48,
  nalloc = 1,
  mem_pool = 0x15dc060
}
(gdb) p *(nxt_listen_socket_t *)task->thread->runtime->listen_sockets->elts
$2 = {
  socket = 0,
  backlog = 0,
  work_queue = 0x0,
  handler = 0x0,
  sockaddr = 0x0,
  count = 0,
  flags = 0 '\000',
  read_after_accept = 0 '\000',
  ipv6only = 0 '\000',
  socklen = 0 '\000',
  address_length = 0 '\000'
}
(gdb)

@ac000
Copy link
Member

ac000 commented Sep 12, 2023

Closing this pull-request.

There has been some commits in relation to this issue ccaad38 & 2e3e1c7.

Also, the docker image from https://github.com/echolimazulu/unit-bind-aaiu runs without failure now.

(Required the following patch)

diff --git a/Dockerfile b/Dockerfile
index dd2e87e..d7f2a6b 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -8,7 +8,7 @@ RUN set -ex \
     && mkdir -p /usr/lib/unit/modules /usr/lib/unit/debug-modules \
     && hg clone https://hg.nginx.org/unit \
     && cd unit \
-    && hg up 1.26.1 \
+    && hg up 1.31.0 \
     && NCPU="$(getconf _NPROCESSORS_ONLN)" \
     && DEB_HOST_MULTIARCH="$(dpkg-architecture -q DEB_HOST_MULTIARCH)" \
     && CC_OPT="$(DEB_BUILD_MAINT_OPTIONS="hardening=+all,-pie" DEB_CFLAGS_MAINT_APPEND="-Wp,-D_FORTIFY_SOURCE=2 -fPIC" dpkg-buildflags --get CFLAGS)" \
@@ -25,11 +25,11 @@ RUN set -ex \
                 --libdir=/usr/lib/$DEB_HOST_MULTIARCH" \
     && ./configure $CONFIGURE_ARGS --cc-opt="$CC_OPT" --ld-opt="$LD_OPT" --modules=/usr/lib/unit/debug-modules --debug \
     && make -j $NCPU unitd \
-    && install -pm755 build/unitd /usr/sbin/unitd-debug \
+    && install -pm755 build/sbin/unitd /usr/sbin/unitd-debug \
     && make clean \
     && ./configure $CONFIGURE_ARGS --cc-opt="$CC_OPT" --ld-opt="$LD_OPT" --modules=/usr/lib/unit/modules \
     && make -j $NCPU unitd \
-    && install -pm755 build/unitd /usr/sbin/unitd \
+    && install -pm755 build/sbin/unitd /usr/sbin/unitd \
     && make clean \
     && ./configure $CONFIGURE_ARGS --cc-opt="$CC_OPT" --modules=/usr/lib/unit/debug-modules --debug \
     && ./configure  \
Execution log:
2023/09/12 01:02:02.896 [debug] 8#8 setproctitle: "unit: main v1.31.0 [unitd-debug --no-daemon --control unix:/var/run/control.unit.sock]"
2023/09/12 01:02:02.932 [debug] 12#12 conf_data_handler(120): {"listeners":{"unix:///var/run/aaiu.sock":{"pass":"routes"}},"routes":[{"match":{"source":[]},"action":{"return":200}}]}
2023/09/12 01:02:02.932 [debug] 12#12 router listener: "unix:///var/run/aaiu.sock"
2023/09/12 01:02:02.932 [debug] 8#8 listening socket "unix:///var/run/aaiu.sock"
2023/09/12 01:02:02.932 [debug] 8#8 socket("unix:///var/run/aaiu.sock"): 10
2023/09/12 01:02:04.824 [debug] 17#17 setproctitle: "unit: main v1.31.0 [unitd-debug --no-daemon --control unix:/var/run/control.unit.sock]"
2023/09/12 01:02:04.828 [debug] 21#21 conf_data_handler(120): {"listeners":{"unix:///var/run/aaiu.sock":{"pass":"routes"}},"routes":[{"match":{"source":[]},"action":{"return":200}}]}
2023/09/12 01:02:04.828 [debug] 21#21 router listener: "unix:///var/run/aaiu.sock"
2023/09/12 01:02:04.832 [debug] 17#17 listening socket "unix:///var/run/aaiu.sock"
2023/09/12 01:02:04.832 [debug] 17#17 socket("unix:///var/run/aaiu.sock"): 10

@ac000 ac000 closed this Sep 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants