Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tailing gzipped logfiles should ungzip them first #907

Closed
pmoravec opened this issue Dec 28, 2016 · 2 comments
Closed

tailing gzipped logfiles should ungzip them first #907

pmoravec opened this issue Dec 28, 2016 · 2 comments

Comments

@pmoravec
Copy link
Contributor

Having logfiles to collect with a limit, the latest/oldest file is truncated / only its tail is collected.

If this "border logfile" is gzip file, sosreport collects tail of gzip archive, i.e. useless sequence of bytes.

Reproducer:

# ll /var/log/messages*
-rw-------. 1 root root  297736 Dec 28 15:10 /var/log/messages
-rw-r--r--. 1 root root 4411245 Dec 28 13:53 /var/log/messages-20161212.gz
# ./sosreport -o logs -e logs --batch --log-size=1 --build
..
sosreport build tree is located at : /var/tmp/sosreport-pmoravec-rhel72.gsslab.brq.redhat.com-20161228151642
# file /var/tmp/sosreport-pmoravec-rhel72.gsslab.brq.redhat.com-20161228151642/var/log/messages-20161212.gz
/var/tmp/sosreport-pmoravec-rhel72.gsslab.brq.redhat.com-20161228151642/var/log/messages-20161212.gz: symbolic link to `../../sos_strings/logs/var.log.messages-20161212.gz.tailed'
# file /var/tmp/sosreport-pmoravec-rhel72.gsslab.brq.redhat.com-20161228151642/sos_strings/logs/var.log.messages-20161212.gz.tailed
/var/tmp/sosreport-pmoravec-rhel72.gsslab.brq.redhat.com-20161228151642/sos_strings/logs/var.log.messages-20161212.gz.tailed: data
# zcat /var/tmp/sosreport-pmoravec-rhel72.gsslab.brq.redhat.com-20161228151642/sos_strings/logs/var.log.messages-20161212.gz.tailed

gzip: /var/tmp/sosreport-pmoravec-rhel72.gsslab.brq.redhat.com-20161228151642/sos_strings/logs/var.log.messages-20161212.gz.tailed: not in gzip format
#
@pmoravec
Copy link
Contributor Author

Any fix has to deal with 2 problems:

  1. identify file as gzip archive. Or ideally any archive (but gz dominates as archive manager for logs, I think).
  2. efficiently (and with respect to the current code) unpack the archive

Both might be resolved by using gzip library, but that is reported as very slow and also not included in current sos prerequisities.

Very naive (and not working) patch (not relying on gzip):

def tail(filename, number_of_bytes):
    """Returns the last number_of_bytes of filename"""
    if filename.endswith(".gz"):
        cmd = "zcat %s | tail --bytes %s" % (filename, number_of_bytes)
        p = Popen(cmd, shell=False, stdout=PIPE,
                  stderr=PIPE, bufsize=-1, close_fds=True)
        stdout, stderr = p.communicate()
        return stdout

    with open(filename, "rb") as f:
        if os.stat(filename).st_size > number_of_bytes:
            f.seek(-number_of_bytes, 2)
        return f.read()

@bmr-cymru
Copy link
Member

I think it is simpler (and just as effective) to just say:

  • If not --all-logs and a .gz file exceeds the limit, skip it.
  • If --all-logs (and --log-size is zero) then copy .gz files whole into the archive.

We already impose a considerable IO and CPU load on the collection host; I don't think we should add to that by slavishly decompressing, tailing, and recompressing data just to grab a snippet of log data.

pmoravec added a commit to pmoravec/sos that referenced this issue Sep 24, 2017
In case sizelimit in add_copy_spec is reached and tailit=True, we shall
skip tailing a gzip archive that would be collected as damaged.

Resolves: sosreport#907

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
pmoravec added a commit to pmoravec/sos that referenced this issue Oct 3, 2017
In case sizelimit in add_copy_spec is reached and tailit=True, we shall
skip tailing a gzip archive that would be collected as damaged.

Resolves: sosreport#907

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
pmoravec added a commit to pmoravec/sos that referenced this issue Oct 30, 2017
In case sizelimit in add_copy_spec is reached and tailit=True, we shall
skip tailing a gzip archive that would be collected as damaged.

Resolves: sosreport#907

Signed-off-by: Pavel Moravec <pmoravec@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants