New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Showing status of service via systemctl is slow (>10s) if disk journal is used #2460
Comments
is this on hdd or ssd? But yeah, we scale badly if we have too many individual files to combine. It's O(n) with each file we get... |
This one was on HDD, but now that I've looked into it it can be a bit (~0.5s, HDD swap, 300MB of logs on tmpfs) slow even with tmpfs if part that was loaded happened to be swapped out (I was testing on system with 2 weeks of uptime) Shouldn't there be some kind of index on journal files ? Or at the very least pointer in service entry to last log file that have relevant log entries . |
I think some kind of journal indexing is required because it's unbearably slow. Right now I've 5411 (43 GiB) journal files and $ time -p journalctl -b --no-pager > /dev/null
real 13.61
user 13.37
sys 0.22 it takes 13 seconds to just check current boot log while it's already cached in RAM. When it's not cached
this is on 2x 3TB HDD with RAID1 btrfs. |
It is laggy even when it is on tmpfs and machine runs long enough to swap it out. Why journald doesn't just use SQLite for storage ? It would be faster and other apps could actually use logfiles for something useful and have good query language instead of relying on a bunch of options in journalctl |
It is still slow as hell:
and it opens over hundred (on system that was up 2 hours)
|
@XANi @davispuh is there any chance you could run your slow cases under |
with systemd 233.75-3 on ArchLinux callgrind.out.systemctl-status-sshd.gz |
@davispuh Thank you for quickly providing the profiles. The It's not a panacea, but #6307 may improve the runtime of For anybody reading, the |
Emm, I can't test this anymore since after I compiled and reinstalled systemd it reset my |
With just complied from fdb6343 When files aren't cached it's really unusably slow
2nd time when it's cached it's quick
callgrind.out.systemctl-status-sshd_nocache.gz I've 7542 Basically to improve performance need to do less disk reading. Like use some kind of indexing or something like that. |
I have 240 system log files and 860 user log files. systemctl status OR journalctl -f take 2-4 minutes just to display logs. (HDD drive) I have added this in: /usr/lib/systemd/journald.conf.d/90-custom.conf
Systemd generates 2 to 3 system journal files everyday each of about 150994944 bytes size. Why doesnt journalctl -f (or systemctl) check only latest / current journal? How do I make it efficient and fast? I need to preserve logs for long duration. In most cases most people only have to check recent logs only. May be some feature to have automatic archival of logs in different directory (/var/log/journal/ID-DIRECTORY/archive) and current logs (say past 3-7 days) kept in /var/log/journal/ID-DIRECTORY? This will speed up journalctl and systemctl status a lot. Anyone want to check archived logs can use --directory option of journalctl |
I have the same problem, i'm on a vmware vm on a hdd san example:
journal size is: 101.1GB right now. |
journalctl has --file parameter. I am able to use it to search faster. while:
Similarly can we make This will drastically speed things up. If admin wants older status he can supply PS: I have no idea how data is stored in journal. @poettering do u want me to create RFE for this? PPS: Now every time I run |
Yes this is a problem for servers storing their logs. My biggest problem is the centralized log server, i receive logs from network equipment using rsyslog, and it uses omjournal to pipe them directly into journal. It works fine to begin with but then degrades quickly (note i'm doing this as a test server, we have another server where rsyslog writes to files). Maybe journal files could be made to contain specific timespans, and only get loaded if requested, i use stuff like |
@amishxda IMO It should only fetch more log lines from journalctl if explictly requested by user @Gunni I don't think using systemd for centralized log server is intended, or good idea in the first place. ELK stack is much more useful for it, jankiness aside Logstash allows to do a lot of nice stuff, for example we use it to split iptables logs into fields (srcip/dstip/port etc) before putting in ES. And ES just works better for search |
@XANi I just expected it to be a supported use case since About the tools you mentioned, setting up all that stuff sounds like much more work, especially since we like to be able to watch the logs live |
@Gunni I wish journald would just use sqlite instead of its current half-assed binary db. I feel like currently it is just trying to reimplement that but badly. And there is a plenty of tools for querying sqlite already. ELK stack is definitely more effort to put in but in exchange it has a ton of nice features to look at, we for example made logstash do geoIP lookup on any IP that's not private so each firewall log gets that info added. Querying is also very nice as you can make a queries on fields directly instead of text strings |
I found out about this issue via https://www.reddit.com/r/linuxadmin/comments/gdfi4t/how_do_you_look_at_journald_logs/ . Is this still a problem? |
@otisg as of systemd 241 (that's just version I have on machine that actually keeps logs on disk), it is most definitely still a problem (all timing tests done right after dropping cache, ~800MB in journal):
now the fun part :
Yes, you are reading this right, getting a last few lines of a currently running service opens every single fucking entry in the journal dir Now if I do just
This shit manages to be order of magnitude slower than "just bruteforce grepping last logrotate's worth of text logs. This is amount of retardness that's fucking mind-boggling. The sheer fact developers decided to go with binary format yet not bother by even introducing time or service-based sharding/indexing and just bruteforce every existing file is just insane. It is like someone, instead of considering reasonable options like, dunno:
They decided one evening "you know, I always wanted to make a binary logging format", then got bored after few weeks and never touched it again. |
Ouch. I had not realized things were so slow. So I'm amazed why so many people at https://www.reddit.com/r/linuxadmin/comments/gdfi4t/how_do_you_look_at_journald_logs/ said they consume journal logs via journalctl. Are they all OK with slowness?!? Why don't more people get their logs out of journal, centralize them in an on-prem or SaaS service that is faster? Anyhow, I see some systemd developers here. I wonder if they plan on integrating something like https://github.com/tantivy-search/tantivy ... |
The first invocation is slow. It probably goes through the journal files and checks them. Then things are fast for a while. journald is not great for log management, and it's simply not fit for any kind of centralized log management at scale. But it's very likely not a goal of the systemd project to handle even that too. The journal is a necessity, just as pid1, udev, and network setup (for remote filesystems) to manage a Linux system and its services reliably. That said it's entirely likely that with a few quick and dirty optimizations this could be worked around. (Eg. if journal files are not in cache, don't wait for them when showing status; allow streaming the journal without looking up the last 10 lines, persisting some structures to speed up journal operations, enabling unverified journal reads by default, etc.) |
@vcaputo Correct me if I'm wrong but that patch only searches in current boot id, so if system is running for a long time it wouldn't change anything ? I have encountered the problem on server machines in the first place (and on personal NAS) so in almost every case the current boot ID is only one in the logs. It certainly would help on desktop but that's not where I hit the problem in the first place (also AFAIK most desktop distros don't have /var/log/journal by default so journalctl doesn't log to HDD in the first place). The other problem is that current implementation makes it really easy for one service to swamp the logs to the point you lose any other service's logs. It is especially apparent for services that don't emit much logs, and even tho there is zero actual logs for the service (as they get rotated out) it still takes about the same amount of time as for any other service. For reference I have it happening on machine with just 1GB of journal and last few days of logs there. |
@XANi Yes you're right, if all the journals are from the same boot the early exit never occurs. FTR it already matches the current boot id, the patch just moves the order around. The change assumes there are probably multiple boots represented in the archived journals. |
Sorry if I'm stating the obvious, but I might not be understanding this right. But this is a common pattern for me and I still have to wait, even though the logs were produced mere seconds ago.
5.6 seconds to output a few lines that were generated a few seconds ago by the restart call. True, another call immediatelly after is instant, but by then it's too late.
This is particularly annoying when you don't know why some service didn't restart properly and checking up on it essentially punishes you with the delay. |
If re-running Unfortunately, substantially improving the uncached read performance is probably going to require changes to the journal file format. There might be some gains to be found in filesystem tuning, like trying to tune /var/log/journal's filesystem for granular mmap-oriented pagefault-based reads in random-ish access patterns. I haven't personally explored that avenue, but it might be worth exploring. One reason uncached performance sucks is most journal accesses are far smaller than a 4KiB page, especially in search-like operations as Due to the substantial lack of clustering in the format to try ensure faulted-in pages bring in operation-relevant information in the baggage of the 4KiB+ page fault of something like the access of an 8-byte offset or seqnum, most of the baggage often isn't even immediately useful (though it's at least now warm in the page cache for subsequent journal operations). Due to the combination of the tiny objects and the inherent interleaving of types of objects in a largely appended-as-arrived layout, faulted in pages tend to not contain stuff immediately needed by the faulting operation. It's not that the format lacks indexing as many have complained, as indicated by the cached performance; the results are immediate when not waiting for page faults from backing store. It tends to more often be a sort of read-amplification effect of the append-mostly tiny objects interleaved by type loaded in units of pages that just ends up reading a whole lot more crap than is relevant to the operation at hand, when uncached. |
Frankly, the search on my IMAP server (Dovecot) search was broken (too slow), and when I started to use a full-text search indexing plugin that uses ElasticSearch as a backend, it was so fast! Cannot systemctl & journalctl use ElasticSearch indexing capabilities in conjunction to the redesign of the journal file format as advised by @vcaputo? |
In case it's unclear: Whenever I'm not the designer or decision maker of these things, but it seems clear based on what I've gleaned working with the code that this is all deliberate in an attempt to keep the overall constant journald footprint foisted on every systemd host to a minimum. I imagine if we added a persistent database daemon to the architecture's read-side, the handful of people complaining about cold cache performance will be replaced with people complaining about some journal database daemon wasting resources on things which rarely occur. Frankly, if people are generally satisfied with the cached performance of these commands, and the featureset offered, you could just schedule a For more elaborate search features, can't people already bolt on more sophisticated indexers entirely external to journald as-is? |
It's not really practical if you keep logs for a really long time. Imagine 2 years worth of logs, it's pointless to have them cached. When you need only last month usually. But maybe once a year you want to look more in back.
This sounds like best solution. But I'm not aware of anything like that. Basically it looks like systemd journal can't handle our needs so we need something better but it would be really great if it could be plugged in journald so that we don't need to learn new tools. eg. use |
For the record this is the reason I first considered switching to distros not using systemd, and now that I understand and see how the root cause is still here after many years, I'm reassured that it was a good decision after all. I'm talking about desktop environments with small size log files and relatively slow hardware. |
Well, except that the way it does it trashes cache. Cached performance is attained only in condition where a ton of useless logs are loaded into memory only to find last few lines
I imagine slapping an index that just stored "latest journal file service is in" would solve ~99% issues with getting status here. I imagine if it had semi-sensible system of splitting logs between filenames (via name hash or whatever) would cut it down by order of magnitude or two, altho probably generate more random IO so probably not a tradeoff worth taking. Current performance is worse than tailing a logfile.
Nonsense. Anything under configuration management and proper monitoring will have that run often, from once or twice per hour to multiple times per minute. So at best wasting cache on some files you don't even pick any data from. |
When you run |
@mbiebl if service is working, sure. If service isn't, I'm VERY interested in last few lines. Same for monitoring checks really, but I guess there it is always possible to do But, if I'm checking in the first place, there is a good chance something is wrong with it so not that useful for CLI work. |
Yes please, this is the solution. And I agree with the 99% number.
True. |
Could systemctl status not flush the header and then block while fetching the log? Then one could just Ctrl-C it when not interested in the log anymore after checking the status. |
systemctl has known performance issues with large numbers of journal files. See systemd/systemd#2460. Adding --lines=0 argument to call improves call speed when systemctl is called for the first time. If this first call happens from mount.efs, this may result in high latency for the mount as it waits for all of the journal files to be indexed. Given that mount.efs is only looking for the exit code of the command and does not care about the journal entries, there is no loss of usability by adding the flag.
systemctl has known performance issues with large numbers of journal files. See systemd/systemd#2460. Changing to use `is-active` to improve call speed when systemctl is called. This avoids first systemctl call issues that may result in high latency for the mount as systemctl indexes all journal files. Given that mount.efs is only looking for the exit code of the command and does not care about the journal entries, there is no loss of usability by adding the flag.
systemctl has known performance issues with large numbers of journal files. See systemd/systemd#2460. Changing to use `is-active` to improve call speed when systemctl is called. This avoids first systemctl call issues that may result in high latency for the mount as systemctl indexes all journal files. Given that mount.efs is only looking for the exit code of the command and does not care about the journal entries, there is no loss of usability by adding the flag.
Still having this problem on a HDD with journald. Using btrfs with nodatacow and frequent defragment. Journal files are few (63 files) and not fragmented, but this is unbareably slow. edit: I initially told that writes were slow but this is another issue. |
With big (4GB, few months of logs) on-disk journal,
systemctl status service
becomes very slowit is of course faster after it gets to cache... for that service, querying other one is still slow.
Dunno what would be right way to do it.. but opening ~80 log files to just display service status seems a bit excessive
The text was updated successfully, but these errors were encountered: