Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make KVS garbage collection automatic at shutdown #5778

Closed
garlick opened this issue Mar 8, 2024 · 3 comments
Closed

make KVS garbage collection automatic at shutdown #5778

garlick opened this issue Mar 8, 2024 · 3 comments

Comments

@garlick
Copy link
Member

garlick commented Mar 8, 2024

@grondo noted this during corona startup, while the content-sqlite module load was taking a long time:

● flux.service - Flux message broker
   Loaded: loaded (/usr/lib/systemd/system/flux.service; disabled; vendor prese>
  Drop-In: /etc/systemd/system/flux.service.d
           └─override.conf
   Active: active (running) since Fri 2024-03-08 08:52:15 PST; 15min ago
  Process: 31047 ExecStartPre=/usr/bin/bash -c systemctl start user@$(id -u flu>
 Main PID: 31066 (flux-broker-0)
   Status: "Running /etc/flux/rc1"
    Tasks: 11 (limit: 1647309)
   Memory: 11.8G
   CGroup: /system.slice/flux.service
           ├─31066 broker --config-path=/etc/flux/system/conf.d -Scron.director>
           ├─31071 /bin/sh -e /etc/flux/rc1
           └─31109 module load content-sqlite
@garlick garlick changed the title flux module load content-sqlite is slow uses a lot of memory on elcapi flux module load content-sqlite is slow uses a lot of memory on corona Mar 8, 2024
@garlick
Copy link
Member Author

garlick commented Mar 8, 2024

fwiw


[root@corona81:flux]# flux module stats content-sqlite
{
 "object_count": 2933117,
 "dbfile_size": 29634494464,
 "dbfile_free": 5987725590528,
 "load_time": {
  "count": 195494,
  "min": 0.00248,
  "max": 5.8961870000000003,
  "mean": 0.0083880570810355541,
  "stddev": 0.014352452539191915
 },
 "store_time": {
  "count": 11,
  "min": 0.025000000000000001,
  "max": 0.266372,
  "mean": 0.10041045454545454,
  "stddev": 0.084472641313461536
 }
}
[root@corona81:flux]# systemctl status flux
● flux.service - Flux message broker
   Loaded: loaded (/usr/lib/systemd/system/flux.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/flux.service.d
           └─override.conf
   Active: active (running) since Fri 2024-03-08 09:15:21 PST; 32min ago
  Process: 35680 ExecStartPre=/usr/bin/bash -c systemctl start user@$(id -u flux).service (code=exited, status=0/SUCCESS)
 Main PID: 35683 (flux-broker-0)
   Status: "Running as leader of 125 node Flux instance"
    Tasks: 23 (limit: 1647309)
   Memory: 14.6G
   CGroup: /system.slice/flux.service
           └─35683 broker --config-path=/etc/flux/system/conf.d -Scron.directory=/etc/flux/system/cron.d -Srundir=/run/flux -Sstatedir=/>

@garlick
Copy link
Member Author

garlick commented Mar 8, 2024

Restarting with flux shutdown --gc reduced the sqlite db from 27G to 230M and now everything is snappy.

It would still be good to know what sqlite is doing during initialization and see if there is a way to reduce the impact.

Also we may want to somehow make garbage collection automatic on every restart.

@garlick garlick changed the title flux module load content-sqlite is slow uses a lot of memory on corona make KVS garbage collection automatic at shutdown Mar 27, 2024
@garlick
Copy link
Member Author

garlick commented Apr 2, 2024

Fixed by #5840

@garlick garlick closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant