Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup error #456

Closed
csmasterpath2023 opened this issue May 10, 2024 · 28 comments
Closed

Startup error #456

csmasterpath2023 opened this issue May 10, 2024 · 28 comments

Comments

@csmasterpath2023
Copy link

          This hasn't been implemented yet. Can you please open a new issue, with this message and the error you received?

Originally posted by @juliangruber in #444 (comment)

@csmasterpath2023
Copy link
Author

I used the following command to create a container, but it failed.

docker run --name station --detach --env STATE_ROOT=/state --env FIL_WALLET_ADDRESS=0x000000000000000000000000000000000000dEaD ghcr.io/filecoin-station/core:20.4.1 -v /tmp/state1:/state

@csmasterpath2023
Copy link
Author

csmasterpath2023 commented May 10, 2024

I want to persist my station ID, but I failed to start using the above command

cs144@cs144:~$ sudo docker run --name station --detach --env STATE_ROOT=/state --env FIL_WALLET_ADDRESS=0x000000000000000000000000000000000000dEaD ghcr.io/filecoin-station/core:20.4.1 -v /tmp/state1:/state 
[sudo] password for cs144:
db717d6522c5c02665360a842b4050b0dd30f129f684c0a87ba6c77a097177f8 
cs144@cs144:~$ sudo docker logs station
 v20.12.2 

@juliangruber
Copy link
Member

Are you saying it failed because it doesn't produce any logs beside "v20.12.2"?

@csmasterpath2023
Copy link
Author

Yes, Under normal circumstances, it should output Spark's work log, but I failed to create the container using the above command, and the container terminated after creation

cs144@cs144:~$ sudo docker exec -it station /bin/bash
 Error response from daemon: Container db717d6522c5c02665360a842b4050b0dd30f129f684c0a87ba6c77a097177f8 is not running

@juliangruber
Copy link
Member

I can reproduce that it exits after the version message when you pass a state root this way

@juliangruber
Copy link
Member

actually I don't know why it logs v20.12.2, that's not the Core version

@juliangruber
Copy link
Member

@bajtos you initially looked into mounting state_root, do you have an idea here?

@csmasterpath2023
Copy link
Author

I can reproduce that it exits after the version message when you pass a state root this way

I'm also curious where v20.12.2 comes from

@zipiju
Copy link
Contributor

zipiju commented May 13, 2024

This v20.12.2 looks to be the Node.js version.

Did some poking around as well, as would like to persist the state between the upgrades (deleting and then installing the container again), however the issue looks to be, at least on this platform, the containers can't do any write operations at the container mount, meaning the modules can't create directories, nor any files on the host as that will result in permissions error.
Do not exactly understand how this works between the container and the host or if this issue is specific to this platform.
Even tried chmod 777 /usr/src/app/.local/state in the container, but even with that Core is unable to create subdirectories there and there is no option to change folder permissions on the host:

Error: EACCES: permission denied, mkdir '/usr/src/app/.local/state/secrets'

My solution for now is to install the container, shell into it, cat station_id, create station_id at the host, mount the file, restart the container and since it will be R/O, it will run without any issues.
This will persist updates unless the expected directory/file structure will change.
One question though - is it enough to persist station_id, or should we also persist for example Spark's state, which contents looks to be identical on all the nodes?

@bajtos
Copy link
Member

bajtos commented May 14, 2024

I'm also curious where v20.12.2 comes from

AFAICT, this is caused by -v /tmp/state1:state added after the docker image name. When I add this argument before the image name, Station Core starts.

On fixing the permissions: Maybe we must explicitly tell Docker to mount the volume as read-write?

The following command works for me on macOS and writes state files to /tmp/state on the host computer:

docker run --name station --detach \
  --env STATE_ROOT=/state \
  --env FIL_WALLET_ADDRESS=0x000000000000000000000000000000000000dEaD \
  -v /tmp/state:/state:rw \
  ghcr.io/filecoin-station/core

Can you please check whether the command above works on your machine too?

One question though - is it enough to persist station_id, or should we also persist for example Spark's state, which contents looks to be identical on all the nodes?

Persisting station_id is crucial.

Persisting other files is not strictly required now. However, we may add more state files in the future that require to be persisted, like the recently introduced station_id file.

I highly recommend persisting the entire state directory.

It's also important to NOT share the state between Station instances, we expect each Station to have its own exclusive state storage.

@csmasterpath2023
Copy link
Author

I have made attempts in both the sudo user group and root user groups
I create a shell script named crest.sh

docker run --name station --detach 
  --env STATE_ROOT=/state 
  --env FIL_WALLET_ADDRESS=0x000000000000000000000000000000000000dEaD
  -v /tmp/state:/state:rw
  ghcr.io/filecoin-station/core:20.4.1 

Under the sudo user group

cs144@cs144:~$ sudo docker rm station station 
cs144@cs144:~$ sudo ./crest.sh 7d41269e8983907c4714af18addfaa7a270fc762a4ce5e20aef906c22d6c229c 
cs144@cs144:~$ sudo docker logs station 
Usage: station.js [options]
Options:
  -j, --json          Output JSON                                      [boolean]
      --experimental  Also run experimental modules                    [boolean]
  -v, --version       Show version number                              [boolean]
  -h, --help          Show help                                        [boolean]

[Error: EACCES: permission denied, mkdir '/state/secrets'] {
  errno: -13,
  code: 'EACCES',
  syscall: 'mkdir',
  path: '/state/secrets'
}

@csmasterpath2023
Copy link
Author

Under the root user

root@cs144:/home/cs144# docker rm station station 
root@cs144:/home/cs144# ./crest.sh 
4badd17bc1758849e21676701a2c2fe664a053cec2cb5ad0890481c3a54e361f 
root@cs144:/home/cs144# docker logs station
Usage: station.js [options]

Options:
  -j, --json          Output JSON                                      [boolean]
      --experimental  Also run experimental modules                    [boolean]
  -v, --version       Show version number                              [boolean]
  -h, --help          Show help                                        [boolean]

[Error: EACCES: permission denied, mkdir '/state/secrets'] {
  errno: -13,
  code: 'EACCES',
  syscall: 'mkdir',
  path: '/state/secrets'
}

@csmasterpath2023
Copy link
Author

The above two attempts all failed to create a new station

@bajtos
Copy link
Member

bajtos commented May 14, 2024

Thank you, @csmasterpath2023, for testing. Maybe this issue is specific to Linux?

The following StackOverflow answer explains the problem with users & groups and permissions inside the Docker container:

https://stackoverflow.com/a/29251160/69868

It looks too complicated to me, I would hope the situation has improved since 2015 when the answer was posted. But maybe it's a place where to start?

@csmasterpath2023
Copy link
Author

Yes , I use the Ubuntu 23.10 (GNU/Linux 6.5.0-28-generic x86_64)

@csmasterpath2023
Copy link
Author

@bajtos You are welcome, I haven't looked at the link you provided yet, but I think the threshold for the vast majority of users is so high that we may lose a lot of users

@bajtos
Copy link
Member

bajtos commented May 14, 2024

Can you please run the following command to list directories & permissions and post the output?

❯ docker run --name station  \
  --env STATE_ROOT=/state \
  --env FIL_WALLET_ADDRESS=0x000000000000000000000000000000000000dEaD \
  -v /tmp/state:/state:rw \
  ghcr.io/filecoin-station/core /bin/sh -c "ls -l; ls -l /state"

This is the output I get on my macOS:

total 672
-rw-r--r--   1 root root    213 May 13 15:20 Dockerfile
-rw-r--r--   1 root root  12581 May 13 15:20 LICENSE.md
-rw-r--r--   1 root root   6085 May 13 15:20 README.md
drwxr-xr-x   2 root root   4096 May 13 15:20 bin
drwxr-xr-x   2 root root   4096 May 13 15:20 commands
drwxr-xr-x   2 root root   4096 May 13 15:20 lib
drwxr-xr-x   3 node node   4096 May 13 15:21 modules
drwxr-xr-x 178 node node   4096 May 13 15:21 node_modules
-rw-r--r--   1 root root 620658 May 13 15:20 package-lock.json
-rw-r--r--   1 root root   1843 May 13 15:20 package.json
drwxr-xr-x   2 root root   4096 May 13 15:20 scripts
drwxr-xr-x   2 root root   4096 May 13 15:20 test
-rw-r--r--   1 root root    562 May 13 15:20 tsconfig.json
total 0
drwxr-xr-x 4 node node 128 May 14 13:15 modules
drwxr-xr-x 3 node node  96 May 14 13:15 secrets

@csmasterpath2023
Copy link
Author

cs144@cs144:~$ sudo docker run --name station    --env STATE_ROOT=/state   --env FIL_WALLET_ADDRESS=0x000000000000000000000000000000000000dEaD   -v /tmp/state:/state:rw   ghcr.io/filecoin-station/core:20.4.1 /bin/sh -c "ls -l; ls -l /state"total 664
-rw-r--r--   1 root root    213 May  3 11:33 Dockerfile
-rw-r--r--   1 root root  12581 May  3 11:33 LICENSE.md
-rw-r--r--   1 root root   6085 May  3 11:33 README.md
drwxr-xr-x   2 root root   4096 May  3 11:33 bin
drwxr-xr-x   2 root root   4096 May  3 11:33 commands
drwxr-xr-x   2 root root   4096 May  3 11:33 lib
drwxr-xr-x   3 node node   4096 May  3 11:33 modules
drwxr-xr-x 173 node node   4096 May  3 11:33 node_modules
-rw-r--r--   1 root root 612898 May  3 11:33 package-lock.json
-rw-r--r--   1 root root   1843 May  3 11:33 package.json
drwxr-xr-x   2 root root   4096 May  3 11:33 scripts
drwxr-xr-x   2 root root   4096 May  3 11:33 test
-rw-r--r--   1 root root    562 May  3 11:33 tsconfig.json
total 0

@bajtos
Copy link
Member

bajtos commented May 14, 2024

Ah, of course, Station Core was not able to create any state files/directories, therefore the second ls printed just "total 0".

Could you please run the following two commands?

Check permissions inside the container:

docker run --name station  \
  --env STATE_ROOT=/state \
  --env FIL_WALLET_ADDRESS=0x000000000000000000000000000000000000dEaD \
  -v /tmp/state:/state:rw \
  ghcr.io/filecoin-station/core /bin/sh -c "ls -ld /state"

Check the permissions on your host computer:

ls -ld /tmp/state

@bajtos
Copy link
Member

bajtos commented May 14, 2024

This is really weird. Here is what I see on my computer:

In the container:

drwxr-xr-x 4 root root 128 May 14 14:10 /state

On the host:

drwxr-xr-x  4 bajtos  wheel  128 14 May 16:10 /tmp/state

On the container, the root-owned state directory contains files created by the node user.

drwxr-xr-x 4 root root 128 May 14 14:10 /state
total 0
drwxr-xr-x 4 node node 128 May 14 14:10 modules
drwxr-xr-x 3 node node  96 May 14 14:10 secrets

@csmasterpath2023
Copy link
Author

csmasterpath2023 commented May 14, 2024

In the container

cs144@cs144:~$ sudo docker run --name station    --env STATE_ROOT=/state   --env FIL_WALLET_ADDRESS=0x000000000000000000000000000000000000dEaD   -v /tmp/state:/state:rw   ghcr.io/filecoin-station/core:20.4.1 /bin/sh -c "ls -ld /state"
drwxr-xr-x 2 root root 4096 May 14 14:01 /state

@csmasterpath2023
Copy link
Author

csmasterpath2023 commented May 14, 2024

On the host

cs144@cs144:~$ ls -ld /tmp/state drwxr-xr-x 2 root root 4096 May 14 14:01 /tmp/state

@bajtos
Copy link
Member

bajtos commented May 14, 2024

On the host

cs144@cs144:~$ ls -ld /tmp/state drwxr-xr-x 2 root root 4096 May 14 14:01 /tmp/state

ok, I think I know what can be problem here.

  • The directory /tmp/state is owned by root.
  • What is the user under which the Docker daemon runs? Is it also root?

I think you should be able to find that user by running ps aux | grep dockerd or ps aux | grep docker.

According to https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-20-04, the Docker service runs under the user in the group docker, so maybe all you need is to change the group from root to docker and allow group members to write to the directory.

chgrp docker /tmp/state
chmod g+w /tmp/state

Alternatively, if the Docker service is running under the docker user, then you can change the owning user instead of the owning group.

chown docker /tmp/state

@csmasterpath2023
Copy link
Author

@bajtos thank you very much, it works now

@bajtos
Copy link
Member

bajtos commented May 15, 2024

@csmasterpath2023 could you please describe what commands you executed? Other operators running on Linux can find that useful.

@csmasterpath2023
Copy link
Author

As you said, if the Docker service is running under the docker user, then you can change the owning user instead of the owning group.

sudo chown docker /tmp/state

@csmasterpath2023
Copy link
Author

csmasterpath2023 commented May 16, 2024

@bajtos But if I use /tmp/state , then I reboot the machine, the directory will be deleted automatically

So, I modified the path to ~/tmp/state, then the problem was resolved

@bigjdunham
Copy link

In my case I had to change the folder owner and group of the local folder to UID/GID 1000:1000. Since that's what is used internally for the "node" user in the container. After that it worked to keep the files persistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants