Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the housekeeper ? #82

Closed
benjha opened this issue Nov 5, 2021 · 6 comments
Closed

What is the housekeeper ? #82

benjha opened this issue Nov 5, 2021 · 6 comments
Assignees
Labels

Comments

@benjha
Copy link

benjha commented Nov 5, 2021

Hi AIstore team,

I am trying to run AIstore in a HPC system. I am using GO 1.17.3. After executing make deploy, at some point there is an output message saying the housekeeper is not running, then it fails.

What is the housekeeper ?

make deploy
Enter number of storage targets:
5
Enter number of proxies (gateways):
1
Number of local mountpaths (enter 0 for preconfigured filesystems):
2
Select backend providers:
Amazon S3: (y/n) ?
n
Google Cloud Storage: (y/n) ?
n
Azure: (y/n) ?
n
HDFS: (y/n) ?
n
Would you like to create loopback mount points: (y/n) ?
n
Building aisnode: version=1bea20d85 providers= tags= mono
done.
+ /sw/summit/ums/gen119/aistore/src/bin/aisnode -config=/ccs/home/benjha/.ais0/ais.json -local_config=/ccs/home/benjha/.ais0/ais_local.json -role=proxy -ntargets=5
housekeeper not running, cannot reg ".dflt.mm.gc"housekeeper not running, cannot reg ".dflt.mm.small.gc"+ /sw/summit/ums/gen119/aistore/src/bin/aisnode -config=/ccs/home/benjha/.ais1/ais.json -local_config=/ccs/home/benjha/.ais1/ais_local.json -role=target
+ /sw/summit/ums/gen119/aistore/src/bin/aisnode -config=/ccs/home/benjha/.ais2/ais.json -local_config=/ccs/home/benjha/.ais2/ais_local.json -role=target
+ /sw/summit/ums/gen119/aistore/src/bin/aisnode -config=/ccs/home/benjha/.ais3/ais.json -local_config=/ccs/home/benjha/.ais3/ais_local.json -role=target
+ /sw/summit/ums/gen119/aistore/src/bin/aisnode -config=/ccs/home/benjha/.ais4/ais.json -local_config=/ccs/home/benjha/.ais4/ais_local.json -role=target
+ /sw/summit/ums/gen119/aistore/src/bin/aisnode -config=/ccs/home/benjha/.ais5/ais.json -local_config=/ccs/home/benjha/.ais5/ais_local.json -role=target
E 14:55:57.012409 err.go:118 FATAL ERROR: operation not supported
FATAL ERROR: operation not supported
E 14:55:57.012480 err.go:118 FATAL ERROR: operation not supported
FATAL ERROR: operation not supported
E 14:55:57.012924 err.go:118 FATAL ERROR: operation not supported
FATAL ERROR: operation not supported
E 14:55:57.013381 err.go:118 FATAL ERROR: operation not supported
FATAL ERROR: operation not supported
E 14:55:57.013471 err.go:118 FATAL ERROR: operation not supported
FATAL ERROR: operation not supported
Done.

Thanks

@VirrageS VirrageS self-assigned this Nov 6, 2021
@VirrageS VirrageS added the bug label Nov 6, 2021
@VirrageS
Copy link
Collaborator

VirrageS commented Nov 6, 2021

You can ignore the "housekeeper not running" message. This is not fatal but it isn't correct behavior as well, I'm working on the fix.

The problem is with "FATAL ERROR: operation not supported". It looks like there is something wrong with filesystem and it doesn't support some operation. Right now, it's enigmatic where this error originates from and why this happens in the first place. I'm right now working on the fix to make sure that the error has correct file and line shown. But in meantime, do you know what is underlying filesystem in your environment? - you can run lsblk -f to check that

@VirrageS
Copy link
Collaborator

VirrageS commented Nov 7, 2021

Hey @benjha I've pushed some new fixes/commits into master branch. If you have a chance to build from that and run it, would be awesome. Let me know what error message you are getting.

@benjha
Copy link
Author

benjha commented Nov 8, 2021

Thanks @VirrageS,

I don't see the housekeeper error anymore. I think the FATAL ERROR: operation not supported is something I forced to happen when commenting the line that verifies setfattr in deploy/dev/local/deploy.sh given that the command is not installed.

This is the output of lsblk -f in the compute node.

NAME     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
bb-cache 253:0    0 104.3G  0 lvm  /var/cache/fscache
bb-bb1   253:1    0   1.4T  0 lvm  /mnt/bb/benjha
nvme0n1  259:1    0   1.5T  0 disk 

Once I figure out the issue with the extended attributes, I'd like to launch AIStore on /mnt/bb/benjha which is the mount point of nvme0n1 (NVMe's parition uses XFS). I think this should be done in one of the configuration files, right ? Where can I find documentation about this ?

On the other hand, is aisfs an optional requirement ?

@VirrageS
Copy link
Collaborator

VirrageS commented Nov 9, 2021

Once I figure out the issue with the extended attributes

Yeah, I think it this can be connected the extended attributes as AIStore uses them broadly and requires them to be enabled. AIStore requires following packages to be installed: gcc, sysstat and attr (see: https://github.com/NVIDIA/aistore/blob/master/docs/getting_started.md#prerequisites).

I'd like to launch AIStore on /mnt/bb/benjha which is the mount point of nvme0n1 (NVMe's parition uses XFS). I think this should be done in one of the configuration files, right ? Where can I find documentation about this ?

Yes, you can check out here https://aiatscale.org/docs/configuration. Basically, what you probably want to do is modify the content of deploy/dev/local/aisnode_config.sh (assuming you are doing make deploy). The thing which can be interesting for you are:

  • confdir - you can also change this with AIS_CONF_DIR env variable
  • log_dir - you can also change this with AIS_LOG_DIR env variable
  • fspaths - you can also change this with AIS_FS_PATHS env variable
  • test_fspaths - you can also change this with TEST_FSPATH_ROOT env variable

On the other hand, is aisfs an optional requirement ?

Yes, this is totally optional. This is the tool that lets you mount the AIStore as the directory.

@benjha
Copy link
Author

benjha commented Nov 15, 2021

Ok, looks like none of the kernel modules needed by AIStore are loaded in the system.

Thanks for your help.

@VirrageS
Copy link
Collaborator

Sounds good :) Closing the issue. If you have any further problems, let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants