Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart loop in seafile-server and unknown server host db #46

Closed
IllyaMoskvin opened this issue May 2, 2022 · 9 comments
Closed

Restart loop in seafile-server and unknown server host db #46

IllyaMoskvin opened this issue May 2, 2022 · 9 comments

Comments

@IllyaMoskvin
Copy link

I'm running into some strange issues wherein my seafile-server seems to be stuck in a restart loop, and the db service appears to be unreachable. I'm also seeing some worrying "Could not bind socket" errors in seafile-server. In my client, I'm seeing "Library damaged on server" errors, but I think that's just inaccurate messaging.

seafile-server_1    | 2022-05-02 00:51:23 seafile-controller.c(191): starting seaf-server ...
seafile-server_1    | 2022-05-02 00:51:23 seafile-controller.c(82): spawn_process: seaf-server -F /opt/seafile/conf -c /opt/seafile/ccnet -d /opt/seafile/seafile-data -l /opt/seafile/logs/seafile.log -P /opt/seafile/pids/seaf-server.pid -p /opt/seafile/seafile-server-9.0.4/runtime
seafile-server_1    | 2022-05-02 00:51:23 seafile-controller.c(116): spawned seaf-server, pid 1422737
seafile-server_1    | 2022-05-02 00:51:23 socket file exists, delete it anyway
seafile-server_1    | 2022-05-02 00:51:23 ../common/seaf-utils.c(333): Use database Mysql
seafile-server_1    | 2022-05-02 00:51:23 http-server.c(195): fileserver: worker_threads = 10
seafile-server_1    | 2022-05-02 00:51:23 http-server.c(210): fileserver: fixed_block_size = 8388608
seafile-server_1    | 2022-05-02 00:51:23 http-server.c(225): fileserver: web_token_expire_time = 3600
seafile-server_1    | 2022-05-02 00:51:23 http-server.c(240): fileserver: max_indexing_threads = 1
seafile-server_1    | 2022-05-02 00:51:23 http-server.c(255): fileserver: max_index_processing_threads= 3
seafile-server_1    | 2022-05-02 00:51:23 http-server.c(277): fileserver: cluster_shared_temp_file_mode = 600
seafile-server_1    | 2022-05-02 00:51:23 http-server.c(2859): Could not bind socket: Address already in use
seafile-server_1    | 2022-05-02 00:51:23 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:23 http-server.c(888): DB error when check repo existence.
seafile-server_1    | 2022-05-02 00:51:23 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:23 http-server.c(888): DB error when check repo existence.
seafile-server_1    | 2022-05-02 00:51:25 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:25 http-server.c(888): DB error when check repo existence.
seafile-server_1    | 2022-05-02 00:51:25 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:25 http-server.c(888): DB error when check repo existence.
seafile-server_1    | 2022-05-02 00:51:27 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:27 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:29 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:29 http-server.c(888): DB error when check repo existence.
seafile-server_1    | 2022-05-02 00:51:29 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:29 http-server.c(888): DB error when check repo existence.
seafile-server_1    | 2022-05-02 00:51:31 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:31 http-server.c(888): DB error when check repo existence.
seafile-server_1    | 2022-05-02 00:51:31 ../common/seaf-db.c(732): Failed to connect to MySQL: Unknown server host 'db' (-3)
seafile-server_1    | 2022-05-02 00:51:31 http-server.c(888): DB error when check repo existence.
seafile-server_1    | 2022-05-02 00:51:33 seafile-controller.c(447): pid file /opt/seafile/pids/seaf-server.pid does not exist
seafile-server_1    | 2022-05-02 00:51:33 seafile-controller.c(513): seaf-server need restart...
seafile-server_1    | 2022-05-02 00:51:33 seafile-controller.c(191): starting seaf-server ...

Then it loops, and I see the same sort of messages again.

Here are some from seahub:

seahub_1            | django.db.utils.OperationalError: (2005, "Unknown MySQL server host 'db' (-3)")
seahub_1            | ngo/db/backends/base/base.py", line 200, in connect
seahub_1            |     self.connection = self.get_new_connection(conn_params)
seahub_1            |   File "/opt/seafile/seafile-server-latest/seahub/thirdpart/django/utils/asyncio.py", line 26, in inner
seahub_1            |     return func(*args, **kwargs)
seahub_1            |   File "/opt/seafile/seafile-server-latest/seahub/thirdpart/django/db/backends/mysql/base.py", line 234, in get_new_connection
seahub_1            |     connection = Database.connect(**conn_params)
seahub_1            |   File "/usr/lib/python3.8/site-packages/MySQLdb/__init__.py", line 123, in Connect
seahub_1            |     return Connection(*args, **kwargs)
seahub_1            |   File "/usr/lib/python3.8/site-packages/MySQLdb/connections.py", line 185, in __init__
seahub_1            |     super().__init__(*args, **kwargs2)
seahub_1            | django.db.utils.OperationalError: (2005, "Unknown MySQL server host 'db' (-3)")

Last logs from the db service:

db_1                | 2022-04-19  4:20:49 6684 [Warning] Aborted connection 6684 to db: 'ccnet_db' user: 'seafile' host: '172.18.0.6' (Got an error reading communication packets)
db_1                | 2022-04-19  4:20:49 6683 [Warning] Aborted connection 6683 to db: 'seafile_db' user: 'seafile' host: '172.18.0.6' (Got an error reading communication packets)
db_1                | 2022-04-19  4:20:49 6751 [Warning] Aborted connection 6751 to db: 'seafile_db' user: 'seafile' host: '172.18.0.6' (Got an error reading communication packets)
db_1                | 2022-04-19  4:20:50 7502 [Warning] Aborted connection 7502 to db: 'seahub_db' user: 'seafile' host: '172.18.0.5' (Got an error reading communication packets)

I'm going to leave my server as-is for now in case you'd like me to supply any more info.

@ggogel
Copy link
Owner

ggogel commented May 6, 2022

Hello, I could think of two reasons. Either, there some changes in the compose file that breaks the communication between the services or there are some issues in your underlying system regarding network functionality.

If your database service is named db and all services are deployed to the same project, then it should be able to resolve the hostname db.

Did you make any changes to the provided compose file other than the variables described in section "3. Set environment variables" of the documentation?

@IllyaMoskvin
Copy link
Author

IllyaMoskvin commented May 6, 2022

Hi @ggogel, thank you for the response. In my compose file, I made the changes I described in #39, including my interpretations of the suggestions you made in your feedback.

To be clear, my server does run; it works as intended when I run docker-compose restart. It just sometimes crashes... or restarts... or something... and then it gets stuck in some way. This time, it got stuck in the way that I described in the OP.

I still haven't restarted my server, so please let me know if you'd like me to try any debugging steps.

I'm transcribing my compose file for reference, with emails and passwords edited out:

version: '3.8'
services:
  nginx-proxy:
    container_name: nginx-proxy
    image: nginxproxy/nginx-proxy
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - certs:/etc/nginx/certs
      - vhost:/etc/nginx/vhost.d
      - html:/usr/share/nginx/html
      - /var/run/docker.sock:/tmp/docker.sock:ro
    networks:
      - default

  nginx-proxy-acme:
    image: nginxproxy/acme-companion
    volumes:
      - certs:/etc/nginx/certs
      - vhost:/etc/nginx/vhost.d
      - html:/usr/share/nginx/html
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - acme:/etc/acme.sh
    environment:
      - DEFAULT_EMAIL=someone@example.com
      - NGINX_PROXY_CONTAINER=nginx-proxy
    depends_on:
      - nginx-proxy
    networks:
      - default

  seafile-server:
    image: ggogel/seafile-server:9.0.4
    volumes:
      - seafile-data:/shared
    environment:
      - DB_HOST=db
      - DB_ROOT_PASSWD=foobar
      - TIME_ZONE=America/Chicago
      - HTTPS=true
      - SEAFILE_SERVER_HOSTNAME=seafile.example.com # Mandatory on first deployment!
      - GC_CRON=0 6 * * SUN # Garbage collection
    depends_on:
      - db
      - memcached
      - seafile-caddy
    networks:
      - seafile-net

  seahub:
    image: ggogel/seahub:9.0.4
    volumes:
      - seafile-data:/shared
      - seahub-avatars:/shared/seafile/seahub-data/avatars
      - seahub-custom:/shared/seafile/seahub-data/custom
    environment:
      - SEAFILE_ADMIN_EMAIL=someone@example.com
      - SEAFILE_ADMIN_PASSWORD=foobaz
    depends_on:
      - seafile-server
      - seafile-caddy
      - seahub-media
    networks:
      - seafile-net

  seahub-media:
    image: ggogel/seahub-media:9.0.4
    volumes:
     - seahub-avatars:/usr/share/caddy/media/avatars
     - seahub-custom:/usr/share/caddy/media/custom
    depends_on:
      - seafile-caddy
    networks:
      - seafile-net

  db:
    image: mariadb:10.7.1
    environment:
      - MYSQL_ROOT_PASSWORD=foobar
      - MYSQL_LOG_CONSOLE=true
    volumes:
      - seafile-mariadb:/var/lib/mysql
    networks:
      - seafile-net

  memcached:
    image: memcached:1.6.14
    entrypoint: memcached -m 1024
    networks:
      - seafile-net

  seafile-caddy:
    image: ggogel/seafile-caddy:1.0.6
    networks:
      - seafile-net
      - default
    environment:
      - VIRTUAL_PORT=80
      - VIRTUAL_HOST=seafile.example.com
      - LETSENCRYPT_HOST=seafile.example.com
    depends_on:
      - nginx-proxy
      - nginx-proxy-acme

networks:
  seafile-net:
    internal: true

volumes:
  seafile-data:
  seafile-mariadb:
  seahub-avatars:
  seahub-custom:

  # https://github.com/nginx-proxy/acme-companion
  certs:
  vhost:
  html:
  acme:

@ggogel
Copy link
Owner

ggogel commented May 8, 2022

Ok, I misunderstood that.

Well, from the logs it seems that both seafile and seahub are not able to resolve the hostname db anymore. So this could either mean that the db service isn't running anymore or there is an issue with the internal DNS of Docker.

Did you check if the db service was running when this occured? Which version of Docker are you running and on what OS?

@IllyaMoskvin
Copy link
Author

IllyaMoskvin commented May 8, 2022

I'm running Ubuntu 20.04.3 LTS on Digital Ocean. Here's the version output for docker and docker-compose:

# docker version
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:33 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:42 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.13
  GitCommit:        9cc61520f4cd876b86e77edfeb88fbcd536d1f9d
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

# docker-compose --version
docker-compose version 1.26.0, build unknown

It looks like the db service crashed with error code 137:

# docker ps -a
CONTAINER ID   IMAGE                         COMMAND                  CREATED        STATUS                     PORTS                                                                      NAMES
547c544d5022   ggogel/seafile-caddy:1.0.6    "/scripts/start.sh"      5 weeks ago    Up 5 weeks                 80/tcp, 443/tcp, 2019/tcp                                                  seafile_seafile-caddy_1
da3f9bc24e48   nginxproxy/acme-companion     "/bin/bash /app/entr…"   5 weeks ago    Up 5 weeks                                                                                            seafile_nginx-proxy-acme_1
f18cc3ffbaca   nginxproxy/nginx-proxy        "/app/docker-entrypo…"   5 weeks ago    Up 5 weeks                 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   nginx-proxy
41fcaa56a647   ggogel/seahub:9.0.4           "/scripts/start.sh"      2 months ago   Up 5 weeks                                                                                            seafile_seahub_1
3dd5dc94953b   ggogel/seafile-server:9.0.4   "/scripts/start.sh"      2 months ago   Up 5 weeks                                                                                            seafile_seafile-server_1
ebe4323da214   ggogel/seahub-media:9.0.4     "caddy run --config …"   2 months ago   Up 5 weeks                                                                                            seafile_seahub-media_1
7cd90ae1beca   memcached:1.6.14              "memcached -m 1024"      2 months ago   Up 5 weeks                                                                                            seafile_memcached_1
77521694de21   mariadb:10.7.1                "docker-entrypoint.s…"   2 months ago   Exited (137) 2 weeks ago                                                                              seafile_db_1

From what I'm reading, 137 suggests that either the container received a SIGKILL, or it might be running out of memory. I'm not sure why it might be receiving a SIGKILL, so I'm wondering if it's a memory issue. This is possible, since I'm running on the smallest instance on Digital Ocean. There's only 1 GB of memory. However, I've successfully run Seafile without Docker with no issues on this instance size for years. Maybe Docker is adding just enough overhead to where I'm running out of memory?

Interestingly, If I run docker inspect on the container, there's no indication that it ran out of memory.

Here's the relevant snippet:

# docker inspect 77521694de21
[
    {
        "Id": "77521694de21e5345112f71b7f299a57d562b8abeeca000b37b13743a86f8fe9",
        "Created": "2022-03-07T08:11:36.715930464Z",
        "Path": "docker-entrypoint.sh",
        "Args": [
            "mariadbd"
        ],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2022-03-30T19:08:40.14021195Z",
            "FinishedAt": "2022-04-23T06:12:49.415947955Z"
        },
        "Image": "sha256:67a24127bba8bf4d1faea346defc7c7d3bf219cf0469b45c2812c007d6b45b54",
        <snip>
    }
]

The issue I'm encountering sounds like it may be similar to this one:

MariaDB/mariadb-docker#222

Unfortunately, it was closed without a clear resolution...

I'm at a bit of a loss as to how to debug this. I could increase the dropplet size so it has more RAM, but before I do that, I'd like to know for sure that I'm running out of memory, or at least have some clearer sense of what might be causing this issue. Do you have any tips on how I could find out what caused this container to crash? Thank you again for the help.

Edit: It's probably worth noting that my server does not crash when undergoing high network load. I've had 3-5 people downloading multi-gigabyte libraries from it at the same time before, and it held up fine during that time. I wonder if there's maybe some sort of scheduled process that's causing it to crash. Does the timing of the crash suggest anything to you?

@ggogel
Copy link
Owner

ggogel commented May 8, 2022

I think it's very likely that the database service tries to allocate too much RAM. I dug out this article from MariaDB. It states that MariaDB uses at least 512MB of RAM by default. You can disable the Performance Schema to reduce this.

Also, try to run mariadb:10.7.3. It might come with some improvements.

@ggogel
Copy link
Owner

ggogel commented Jun 23, 2022

@IllyaMoskvin Were you able to solve the issue?

@IllyaMoskvin
Copy link
Author

IllyaMoskvin commented Jun 27, 2022

@ggogel Thanks for the follow-up. I think I was able to resolve the initial issue by doing the following:

  1. Doubling the size of my instance to increase RAM from 1 GB to 2 GB
  2. Adding restart: always to each service definition in my docker-compose.yml

For example:

services:
  db:
    restart: always
    image: mariadb:10.7.1
    environment:
      - MYSQL_ROOT_PASSWORD=<redacted>
      - MYSQL_LOG_CONSOLE=true
    volumes:
      - seafile-mariadb:/var/lib/mysql
    networks:
      - seafile-net

That worked fine for almost two months, but recently, I started encountering new issues. I think they are unrelated to the initial issue in this thread. They might not be related to your package at all, but I might document them here for reference anyway. Maybe some other users of this package might find it useful.

I'm currently working through the following issues:

  1. My Let's Encrypt certificate suddenly appears to be untrusted (unrelated to this package) due to being issued to (and by) letsencrypt-nginx-proxy-companion. What's confusing is that it's marked as valid from 3/7/2022 to 3/7/2023, so it doesn't seem as if it was renewed recently, so I don't know why this changed.
  2. Some of my services crashed and refuse to restart.

Here's the current status:

# docker ps -a
CONTAINER ID   IMAGE                         COMMAND                  CREATED       STATUS                    PORTS                                                                      NAMES
d50e6f997409   ggogel/seahub:9.0.4           "/scripts/start.sh"      4 weeks ago   Up 4 weeks                                                                                           seafile_seahub_1
886ceb9e433d   ggogel/seahub-media:9.0.4     "caddy run --config …"   4 weeks ago   Up 3 days                                                                                            seafile_seahub-media_1
c00c985ca9ef   ggogel/seafile-server:9.0.4   "/scripts/start.sh"      4 weeks ago   Up 4 weeks                                                                                           seafile_seafile-server_1
bc304c651605   ggogel/seafile-caddy:1.0.6    "/scripts/start.sh"      4 weeks ago   Exited (137) 4 days ago                                                                              seafile_seafile-caddy_1
1e2391cee265   nginxproxy/acme-companion     "/bin/bash /app/entr…"   4 weeks ago   Up 4 weeks                                                                                           seafile_nginx-proxy-acme_1
1969bbc7bc33   nginxproxy/nginx-proxy        "/app/docker-entrypo…"   4 weeks ago   Up 4 weeks                0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   nginx-proxy
7e2ff59ad4f2   mariadb:10.7.1                "docker-entrypoint.s…"   4 weeks ago   Exited (1) 3 days ago                                                                                seafile_db_1
aac6e046d815   memcached:1.6.14              "memcached -m 1024"      4 weeks ago   Up 4 days                                                                                            seafile_memcached_1

Restarting them naively doesn't work:

# docker-compose restart
Restarting seafile_seahub_1           ... done
Restarting seafile_seahub-media_1     ... done
Restarting seafile_seafile-server_1   ... done
Restarting seafile_seafile-caddy_1    ... error
Restarting seafile_nginx-proxy-acme_1 ... done
Restarting nginx-proxy                ... done
Restarting seafile_db_1               ... error
Restarting seafile_memcached_1        ... done

ERROR: for seafile_db_1  Cannot restart container 7e2ff59ad4f20492c65f47ed4020f6a460aafe20fa84cfe46ce227e7bcbb8378: task 7e2ff59ad4f20492c65f47ed4020f6a460aafe20fa84cfe46ce227e7bcbb8378 already exists: unknown

ERROR: for seafile_seafile-caddy_1  Cannot restart container bc304c65160535f3df1073b733c6ec7603b881eb68f32e941dc914f5f91ec9ee: task bc304c65160535f3df1073b733c6ec7603b881eb68f32e941dc914f5f91ec9ee already exists: unknown

I'll dig into these issues and either edit this comment or make a new one if I have success in solving them, but I just need to get something off my chest: I'm kind of disappointed in this whole "run Seafile through Docker" thing so far. I'd hoped that running it through Docker would ease installation and updates, but I've never run into this much trouble when I ran this stack directly on the DigitalOcean instance. I don't know what I'm doing wrong, but I feel like I keep getting burned here. I want to work through it, but I just hope the effort pays off in the long run.

Please know that I don't think any of this is a fault of your package: I'm just a noob at Docker. But if anyone is passing through here from Google because you're having trouble getting Seafile working through Docker, just know that you're not alone!

Edit: Closing because the initial issue is solved!

@ggogel
Copy link
Owner

ggogel commented Jun 28, 2022

  1. Did you create test certificates with LETSENCRYPT_TEST = true? Normally let's encrypt certificates should always be issued for 120 days.
  2. Your containers seem to be stuck in terminated state and cannot be removed. This happens sometimes with Docker. Restarting the host or the Docker service usually fixes this issue right away. If this still doesn't work, you'll need to stop the Docker service and delete the folders matching the ID of the stuck containers in /var/lib/docker/containers.

I have no idea why you're running into so many issues. So you moved your Docker stack from the DigitalOcean Docker App to a VM?

I hope that we can solve all the issues.

@IllyaMoskvin
Copy link
Author

IllyaMoskvin commented Jul 5, 2022

Restarting the host (shutdown -r) fixed the stuck container and the self-signed Let's Encrypt certificate. Thank you!

  1. I did not, AFAIK. For reference, please see my docker-compose.yml in an earlier comment in this issue.
  2. As mentioned, restarting the host fixed the issue. Restarting the Docker service (service docker restart) only restarted the containers that were already working normally, but did not fix the malfunctioning containers.

I confirmed that if I stop the ggogel/seafile-caddy container with e.g. docker stop bc304c651605, two things happen:

  1. I get the same NET::ERR_CERT_AUTHORITY_INVALID error as before
  2. The memcached container also exits with status code 0

Running docker start bc304c651605 restarts the caddy container and fixes the self-signed certificate, but it does not automatically bring the memcached container back up. I needed to run a separate start command for that.

If this issue happens again, I'll write a script for my crontab that runs curl on my Seahub and restarts the server if it becomes unreachable, or look into existing solutions that do something similar. If I end up going that route, I'll document it here.

So you moved your Docker stack from the DigitalOcean Docker App to a VM?

Not quite. For my initial Seafile installation, I installed it on a VM manually along with all of its dependencies. This would have been almost four years ago now. I decided that it was time to move it to a newer version of Ubuntu. This time, I wanted to install Seafile in a way that would make re-installing and upgrading it easier.

My first instinct was to write Ansible recipes to automate the process of installing Seafile directly to the VM. However, I didn't find an existing "Seafile Ansible" project that looked trustworthy, and I wasn't quite ready to put in the effort to write my own from scratch. Your project looked great, however, so I decided to use this as an opportunity to try running it with Docker.

This is my first time working with Docker. I can see that I have some things to learn about how to debug Docker containers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants