Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start container on Linux 6.7 #868

Open
quinnjr opened this issue Jan 15, 2024 · 63 comments
Open

Unable to start container on Linux 6.7 #868

quinnjr opened this issue Jan 15, 2024 · 63 comments

Comments

@quinnjr
Copy link

quinnjr commented Jan 15, 2024

Currently unable to start the container on Arch Linux as the host OS. The dump files for the failing sqlservr process don't really provide any insight as to why:

docker compose logs db
db-1  | SQL Server 2022 will run as non-root by default.
db-1  | This container is running as user mssql.
db-1  | Your master database file is owned by mssql.
db-1  | To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
db-1  | This program has encountered a fatal error and cannot continue running at Mon Jan 15 18:19:00 2024
db-1  | The following diagnostic information is available:
db-1  | 
db-1  |          Reason: 0x00000001
db-1  |          Signal: SIGABRT - Aborted (6)
db-1  |           Stack:
db-1  |                  IP               Function
db-1  |                  ---------------- --------------------------------------
db-1  |                  000064eb280a3ce1 std::__1::bad_function_call::~bad_function_call()+0x96661
db-1  |                  000064eb280a36a6 std::__1::bad_function_call::~bad_function_call()+0x96026
db-1  |                  000064eb280a2c2f std::__1::bad_function_call::~bad_function_call()+0x955af
db-1  |                  00007c18f8810520 __sigaction+0x50
db-1  |                  00007c18f88649fc pthread_kill+0x12c
db-1  |                  00007c18f8810476 raise+0x16
db-1  |                  00007c18f87f67f3 abort+0xd3
db-1  |                  000064eb28074d96 std::__1::bad_function_call::~bad_function_call()+0x67716
db-1  |                  000064eb280b15b4 std::__1::bad_function_call::~bad_function_call()+0xa3f34
db-1  |                  000064eb280df318 std::__1::bad_function_call::~bad_function_call()+0xd1c98
db-1  |                  000064eb280df0fa std::__1::bad_function_call::~bad_function_call()+0xd1a7a
db-1  |                  000064eb2807b20a std::__1::bad_function_call::~bad_function_call()+0x6db8a
db-1  |                  000064eb2807ae80 std::__1::bad_function_call::~bad_function_call()+0x6d800
db-1  |         Process: 10 - sqlservr
db-1  |          Thread: 157 (application thread 0x264)
db-1  |     Instance Id: 83ef72ce-1100-44c4-913c-45d0df61ae44
db-1  |        Crash Id: 05e56c63-9bd1-47db-b3d5-c1f58cebd578
db-1  |     Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
db-1  |    Distribution: Ubuntu 22.04.3 LTS
db-1  |      Processors: 32
db-1  |    Total Memory: 67119079424 bytes
db-1  |       Timestamp: Mon Jan 15 18:19:00 2024
db-1  |      Last errno: 2
db-1  | Last errno text: No such file or directory
db-1  | Capturing a dump of 10
db-1  | Successfully captured dump: /var/opt/mssql/log/core.sqlservr.1_15_2024_18_19_0.10
db-1  | Executing: /opt/mssql/bin/handle-crash.sh with parameters
db-1  |      handle-crash.sh
db-1  |      /opt/mssql/bin/sqlservr
db-1  |      10
db-1  |      /opt/mssql/bin
db-1  |      /var/opt/mssql/log/
db-1  |      
db-1  |      83ef72ce-1100-44c4-913c-45d0df61ae44
db-1  |      05e56c63-9bd1-47db-b3d5-c1f58cebd578
db-1  |      
db-1  |      /var/opt/mssql/log/core.sqlservr.1_15_2024_18_19_0.10
db-1  | 
db-1  | Ubuntu 22.04.3 LTS
db-1  | Capturing core dump and information to /var/opt/mssql/log...

Docker-compose file:

version: '3'
services:
  db:
    image: 'mcr.microsoft.com/mssql/server:2022-latest'
    environment:
      - ACCEPT_EULA=Y
      - MSSQL_SA_PASSWORD=<there would be a password here>
      - MSSQL_PID=Developer
    volumes:
      - ./logs:/var/opt/mssql/log
      - ./data:/var/opt/mssql/data
    ports:
      - 1433:1433

Docker logs and data directory are set as UID:GID 10001:10001.

@erikbozic
Copy link

I have the same issue. Found that it's the 6.7 kernel update. (#858 (comment))

Rolling back to 6.6.10 makes it work again.

@thomasvm
Copy link

thomasvm commented Jan 15, 2024

I experienced the same behavior today. First my existing container grew in size very quickly. I tried creating other containers but they all failed with the above message.

It took me a while to figure out that downgrading my kernel fixes the issue, but downgrading to 6.6.11 did the trick.

@unlogicalcode
Copy link

I can also confirm, I have the same behaviour.
It works with Kernel 6.6 and with 6.7 I get a similiar Message as above.

@quinnjr quinnjr changed the title Unable to start container on Arch Linux Unable to start container on Linux 6.7 Jan 16, 2024
@quinnjr
Copy link
Author

quinnjr commented Jan 16, 2024

I downgraded my kernel and the container now functions.

Is this limited to just this container or docker needing to update something to be compatible with the 6.7 kernel?

@huestack
Copy link

I have same problem running container in Podman, but the Docker container is running without any problem. I simply pulled the image sudo podman pull mcr.microsoft.com/mssql/server:2022-latest, and ran it:

sudo podman run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=Str0ngPass!" -p 1433:1433 --name sql-test --hostname sql-test -d  mcr.microsoft.com/mssql/server:2022-latest

Attached is a log file.
sql-test.log

@LJFloor
Copy link

LJFloor commented Jan 17, 2024

Can confirm on Arch Linux, both the docker images for versions 2017, 2019 and 2022 and the AUR version give the same result.

Last errno text: No such file or directory

After downgrading the kernel to version 6.6.10-arch1-1 it starts successfully.

@CodeKJ
Copy link

CodeKJ commented Jan 18, 2024

I can confirm this on Nobara 39 with 6.7.0 kernel. Exactly same issue for 2017, 2019, 2022 mssql.
6.6.9 works fine.

@erikbozic
Copy link

It seems like this was solved in the aur repo package mssql-server: https://aur.archlinux.org/packages/mssql-server#comment-953063.
However I'm still having trouble building the needed dependency to verify...

@kshpytsya
Copy link

For what it is worth:

running Gentoo with custom 6.7.x kernel.
It looks like it fails trying to access cgroup v1 "/sys/fs/cgroup/memory/memory.limit_in_bytes".
I suspect that switching to cgroup to "hybrid" would fix the issue but I am not up to rebooting my machine now.

$ docker run -it --rm -e ACCEPT_EULA=Y -e MSSQL_PID=Developer mcr.microsoft.com/mssql/server:2022-latest -- /bin/bash
sleep 1000

in another terminal, run

ps fax|less
# find pid of bash which is parent of sleep
sudo strace -o mssql.strace -f -s1000 -p <bash-in-mssql-docker>

return to the first terminal, Ctrl-C the sleep and run /opt/mssql/bin/sqlservr. Run /opt/mssql/bin/sqlservr and wait for it to crash. Go to the seconf terminal, interrupt strace.

$ grep -P '"/(proc|sys).*ENOENT' mssql.strace
9999 openat(AT_FDCWD, "/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = -1 ENOENT (No such file or directory)

@ibauersachs
Copy link

I think the ENOENT is not the issue, especially not /sys/fs/cgroup/memory/memory.limit_in_bytes since this doesn't exist on Kernel 6.6.13 either, and mssql runs fine there.
My crashlogs on 6.7.1 showed Invalid argument / 22 / EINVAL:

This program has encountered a fatal error and cannot continue running at Mon Jan 22 18:09:17 2024
The following diagnostic information is available:

         Reason: 0x00000001
         Signal: SIGABRT - Aborted (6)
          Stack:
                 IP               Function
                 ---------------- --------------------------------------
                 0000613cdff2ace1 std::__1::bad_function_call::~bad_function_call()+0x96661
                 0000613cdff2a6a6 std::__1::bad_function_call::~bad_function_call()+0x96026
                 0000613cdff29c2f std::__1::bad_function_call::~bad_function_call()+0x955af
                 0000753f7ee4d520 __sigaction+0x50
                 0000753f7eea19fc pthread_kill+0x12c
                 0000753f7ee4d476 raise+0x16
                 0000753f7ee337f3 abort+0xd3
                 0000613cdfefbd96 std::__1::bad_function_call::~bad_function_call()+0x67716
        Process: 10 - sqlservr
         Thread: 161 (application thread 0x278)
    Instance Id: ba778b4b-ea20-4f3c-98fa-2002d4c8e68c
       Crash Id: 3674de73-5de7-494e-8530-2520421dd97f
    Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
   Distribution: Ubuntu 22.04.3 LTS
     Processors: 16
   Total Memory: 29180137472 bytes
      Timestamp: Mon Jan 22 18:09:17 2024
     Last errno: 22
Last errno text: Invalid argument

@CryptoSiD
Copy link

The problem is still there with kernel 6.7.2

@Green0wl
Copy link

same problem on 6.7.1-arch1-1

@GieltjE
Copy link

GieltjE commented Jan 28, 2024

As a bad side effect the lsof process it spawns starts eating a core

@fbrosseau
Copy link

fbrosseau commented Jan 30, 2024

Hello,

The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.

It is unrelated to cgroups, and at first glance it might be a kernel bug (but do not quote me on this) - it appears that as of 6.7, mmap without MAP_FIXED may sometimes ignore the address hint even if the hinted region is in fact available. I have not investigated the kernel side of things further, but I think it might be related to this series of changes and/or its preceding/following changes.

Knowing this, I cannot think of any workaround other than sticking to 6.6 in the meantime.

@vermarine
Copy link

Hello,

The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.

It is unrelated to cgroups, and at first glance it might be a kernel bug (but do not quote me on this) - it appears that as of 6.7, mmap without MAP_FIXED may sometimes ignore the address hint even if the hinted region is in fact available. I have not investigated the kernel side of things further, but I think it might be related to this series of changes and/or its preceding/following changes.

Knowing this, I cannot think of any workaround other than sticking to 6.6 in the meantime.

Thank you very much for the patch. Are there plans to also backport it to 2019?

@jaddie
Copy link

jaddie commented Feb 14, 2024

Just wanted to write to say I am so glad you have all written on here, I didn't even think about the fact I just upgraded my arch system, I was about to start tearing things apart this has saved me a heck of a lot of time, whilst I am here to say thank you, I can also confirm this is still happening on Arch Linux on 6.7.4

@massouji82
Copy link

Hi! We are running a msql based prosject on a mac and use the image mcr.microsoft.com/mssql/server:2019-latest through Podman. Podman will not start a container with this image since the kernel was updated. How kan we revert the kernel version of the host or is there another workaround? Any help would be highly appreciated. Thanks!

zzzeek added a commit to sqlalchemyorg/ci_containers that referenced this issue Feb 15, 2024
@johnvanham
Copy link

johnvanham commented Feb 17, 2024

Same issue with Fedora 39 on 6.7.2 and 6.7.3, but fine on 6.6.x and 6.5.x (in case anyone is searching for this issue and using Fedora). Looking forward to the CU @fbrosseau

@zzzeek
Copy link

zzzeek commented Feb 17, 2024

I think MSFT should strongly consider backporting this at least to SQL Server 2019 if not even 2017 as well. As people continue to upgrade their kernels this is going to be happening on an ever larger scale to existing SQL Server linux / container installations.

@kshpytsya
Copy link

kshpytsya commented Feb 19, 2024

Thank you very much for the patch. Are there plans to also backport it to 2019?

Am I missing something? I do not see any updated Docker images for mcr.microsoft.com/mssql/server:2022-latest that would make it run on 6.7.*.

@markbeazley
Copy link

Thank you very much for the patch. Are there plans to also backport it to 2019?
Am I missing something? I do not see any updated Docker images for mcr.microsoft.com/mssql/server:2022-latest that would make it run on 6.7.*.

It should be included in the next CU, no date estimate

The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.

I've been keeping an eye on this page for a presumably CU12 to be released.

@d4r1us-drk
Copy link

d4r1us-drk commented Mar 10, 2024

In fedora I just followed these instructions using koji to downgrade the kernel to 6.6.14
https://discussion.fedoraproject.org/t/downgrading-to-a-previous-kernel-version/72820

specifically this one command (as root, and make sure to have koji installed using sudo dnf install koji):

cd $(mktemp -d) && koji download-build --arch=x86_64 --arch=noarch kernel-6.6.14-200.fc39 && dnf upgrade *

@w-ko
Copy link

w-ko commented Mar 14, 2024

The image has been updated:
2022-latest amd64 No Dockerfile Ubuntu 22.04 05/31/2022 03/14/2024
latest amd64 No Dockerfile Ubuntu 22.04 09/21/2018 03/14/2024

It works for me again on Fedora 6.7.9-200.fc39.x86_64

@fbrosseau
Copy link

fbrosseau commented Mar 15, 2024

Hello,

Yes, sql22cu12 shipped today and it has the fix. The reason fixes are usually blurry on delivery dates and/or CU numbers is that schedules can shift (such as for making room for an urgent security fix, etc). SQL Server Linux follows the release cadence of SQL Server as a whole, and this schedule is usually quite rigid (minus security fixes).

Sql19 should also be fixed for this in its next CU - their release schedules typically alternate one each month. However, sql17 will not be fixed, as sql17 is out of mainstream support and only receives security fixes. No bug fixes qualify for sql17. Customers who must remain on sql17 should keep kernel 6.6 or lower, although as usual we strongly recommend upgrading to a supported version of SQL Server for Linux, for many reasons including continued bugfixing.

@lateparty
Copy link

The merge is great news, was tracking that overnight. Any rough eta from merge to release available to pull down through the Bitwarden.sh? I gave it a shot about an hour ago and no luck yet

@Drezir
Copy link

Drezir commented Mar 18, 2024

Using Fedora 39, only tag 2022-CU12-ubuntu-22.04 works for me, not latest or 2022-latest.

@felixSabatie
Copy link

2022-latest works fine for me on Fedora 39 with kernel v 6.7.9-200.fc39.x86_64

@Drezir
Copy link

Drezir commented Mar 18, 2024

2022-latest works fine for me on Fedora 39 with kernel v 6.7.9-200.fc39.x86_64

I agree, I had to delete locally cached version :)

@andros0689
Copy link

2022-latest works fine for me on Fedora 39 with kernel v 6.7.9-200.fc39.x86_64

Same here. Thanks a lot.

@thomasvm
Copy link

2022-latest works perfectly on 6.8.1-arch1-1

@sergiuser1
Copy link

Not exactly related to the issue, but I upgrade to 2022-latest from 2019-latest and now I get the following error when I connect from a .NET core application:

 Login failed for user 'sa'. Reason: An error occurred while evaluating the password.

The password specified in the MSSQL_SA_PASSWORD env variable is the same as the one in the app, and the logs from it are:

Microsoft.Data.SqlClient.SqlException (0x80131904): Login failed for user 'sa'.

Has anyone had the same issue?

@andros0689
Copy link

Not exactly related to the issue, but I upgrade to 2022-latest from 2019-latest and now I get the following error when I connect from a .NET core application:

 Login failed for user 'sa'. Reason: An error occurred while evaluating the password.

The password specified in the MSSQL_SA_PASSWORD env variable is the same as the one in the app, and the logs from it are:

Microsoft.Data.SqlClient.SqlException (0x80131904): Login failed for user 'sa'.

Has anyone had the same issue?

Hi @sergiuser1

No, I don't have that error but I still using SA_PASSWORD, not MSSQL_SA_PASSWORD

@sergiuser1
Copy link

Not exactly related to the issue, but I upgrade to 2022-latest from 2019-latest and now I get the following error when I connect from a .NET core application:

 Login failed for user 'sa'. Reason: An error occurred while evaluating the password.

The password specified in the MSSQL_SA_PASSWORD env variable is the same as the one in the app, and the logs from it are:

Microsoft.Data.SqlClient.SqlException (0x80131904): Login failed for user 'sa'.

Has anyone had the same issue?

Turns out it was a race condition, and MSSQL was simply not completely up yet. I've added a healthcheck to it, and waiting for service_healthy before starting the .NET app fixes it.

@sergiuser1
Copy link

Not exactly related to the issue, but I upgrade to 2022-latest from 2019-latest and now I get the following error when I connect from a .NET core application:

 Login failed for user 'sa'. Reason: An error occurred while evaluating the password.

The password specified in the MSSQL_SA_PASSWORD env variable is the same as the one in the app, and the logs from it are:

Microsoft.Data.SqlClient.SqlException (0x80131904): Login failed for user 'sa'.

Has anyone had the same issue?

Hi @sergiuser1

No, I don't have that error but I still using SA_PASSWORD, not MSSQL_SA_PASSWORD

That's deprecated according to Microsoft: https://learn.microsoft.com/en-us/sql/linux/quickstart-install-connect-docker?view=sql-server-2017&tabs=cli&pivots=cs1-bash#run-the-container

@rubnogueira
Copy link

rubnogueira commented Mar 27, 2024

Is it possible to backport the 2022-latest fix but for azure-sql-edge v1? I'm using that version because I need arm64 support, which is currently deprecated in v2, and I can't use it anymore.

@toxik
Copy link

toxik commented Mar 27, 2024

an azure-sql-edge v1.0.8 arm64 with the patch would be very much appreciated!

@JimMoen
Copy link

JimMoen commented Apr 7, 2024

The image has been updated: 2022-latest amd64 No Dockerfile Ubuntu 22.04 05/31/2022 03/14/2024 latest amd64 No Dockerfile Ubuntu 22.04 09/21/2018 03/14/2024

It works for me again on Fedora 6.7.9-200.fc39.x86_64

Yes. the fix included in 2022-CU12
change log:
https://learn.microsoft.com/en-us/troubleshoot/sql/releases/sqlserver-2022/cumulativeupdate12#2958874

Unfortunately, SQL Server 2019 havn't release a new cumulative-update with the fix.

@leonardohtorres
Copy link

leonardohtorres commented Apr 9, 2024

For me, I had to use MSSQL 2022 image instead 2019 and then everything worked (Of course previously, I got the last backup from the databases and restored it to the MSSQL 2022).
The MSSQL is running on Openshift/OKD 4.15

@Marcosdg3
Copy link

Are there any plans for a SQL Server 2019 release?

@Marcosdg3
Copy link

Marcosdg3 commented Apr 26, 2024

Are there any plans for a SQL Server 2019 release?

Looks like this has been fixed in 2019-CU26-ubuntu-20.04 release which is now 2019-latest.

Table here hasn't been updated here yet https://hub.docker.com/_/microsoft-mssql-server/ but checked the tag list and saw the new release (https://mcr.microsoft.com/v2/mssql/server/tags/list). All is working great now on macos, thank you!

@hockdudu
Copy link

I can confirm that the latest version of 2019-latest resolved the issue, tested on Fedora with Linux 6.8.7.

@MPavleski
Copy link

Also confirming that 2019-latest resolved the issue under Linux 6.8.7, tested on Arch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests