Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion To Unix Timestamps Is 24 Hours Behind #830

Closed
ckbuckley opened this issue Mar 2, 2016 · 5 comments
Closed

Conversion To Unix Timestamps Is 24 Hours Behind #830

ckbuckley opened this issue Mar 2, 2016 · 5 comments
Assignees
Labels
Milestone

Comments

@ckbuckley
Copy link

@ckbuckley ckbuckley commented Mar 2, 2016

Hi guys,

All of our servers (Amazon Linux 2015.03 running rsyslog 8.16.0-3.el6) have suddenly stopped recording Unix timestamps in log messages correctly, as of midnight UTC on 29/02/2016.

We use the following GELF template to format log messages and ship them to our Graylog server:

template(name="gelf" type="list") {
    constant(value="\"host\":\"")
    property(name="hostname")
    constant(value="\",\"Syslog-Tag\":\"")
    property(name="syslogtag" format="json")
    constant(value="\",\"short_message\":\"")
    property(name="msg" format="json")
    constant(value="\",\"timestamp\":\"")
    property(name="timegenerated" dateformat="unixtimestamp")
    constant(value="\",\"Level\":\"")
    property(name="syslogseverity")
    constant(value="\",\"Severity\":\"")
    property(name="syslogseverity-text")
    constant(value="\"}")
}

But all 'timestamp' components of the message are 24 hours in the past. As an experiment, I tried changing the dateformat to 'rfc3339' and all of the timestamps began being reported correctly. I then changed it back to unixtimestamp and it went back to being 24 hours in the past.

A quick example is below for some log messages:

When 'dateformat' is set to 'unixtimestamp':
Message time in syslog: 2016-03-02T04:24:25+00:00
Message time in 'timestamp' field: 1456806270
Converted to readable time: 2016-03-01T04:24:30+00:00

When 'dateformat' is set to 'rfc3339':
Message time in syslog: 2016-03-02T04:34:00+00:00
Message time in 'timestamp' field: 2016-03-02T04:34:02.110830+00:00

This information was pulled from running a tcpdump on one of our application servers and capturing the logging traffic as it left the server in order to eliminate Graylog as the sou
rce of the issues - the behaviour detailed above seems to point to rsyslog as the culprit.

I can't think of any reason why this could have suddenly happened (and Graylog shows that it happened to all servers at exactly the same time - midnight UTC on 29/02/2016). We made no change to any of our systems or applications from 48 hours before this occurred and have been in a change freeze since.

Thanks for your help!

@rgerhards rgerhards added the bug label Mar 2, 2016
@rgerhards rgerhards self-assigned this Mar 2, 2016
@rgerhards
Copy link
Member

@rgerhards rgerhards commented Mar 2, 2016

I strongly think this is a bug in rsyslog: we use some custom code to compute the Unix timestamp. I think the reason was that we missed some features in the usual api because of the different time zones we need to support concurrently. My guess is that this code has a problem with this years leap year. I will investigate today and report back.

@rgerhards rgerhards added this to the v8.17 release milestone Mar 2, 2016
rgerhards added a commit to rgerhards/rsyslog that referenced this issue Mar 2, 2016
among others, showcases a problem with leap year handling

see also rsyslog#830
@rgerhards
Copy link
Member

@rgerhards rgerhards commented Mar 2, 2016

I have confirmed this problem with the tests I have just added. They also show we have been off for one second in other cases, too. Now working on the analysis/fix.

@rgerhards
Copy link
Member

@rgerhards rgerhards commented Mar 2, 2016

I think I have found the culprit. We are using lookup tables for year second values, but it was an oversight on how to handle month within the leap year itself. In other words: it'll auto-heal on 2017-01-01... I also noticed that we seem to be consistently one second too low.

Now working on a simple fix to the current algo. When done, I'll see if I can refactor this to mktime(), not sure what was the orginal cause why we didn't use it from day one.

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Mar 2, 2016
among others, showcases a problem with leap year handling

see also rsyslog#830
rgerhards added a commit to rgerhards/rsyslog that referenced this issue Mar 2, 2016
among others, showcases a problem with leap year handling

see also rsyslog#830
rgerhards added a commit to rgerhards/rsyslog that referenced this issue Mar 2, 2016
The problem happened in leap year from March til then end
of year and healed itself at the begining of the next year.
During the problem period, the timestamp was 24 hours too
low.

This is primarily a simple fix that can also be applied to
older rsyslog versions. However, we will see if we can
refactor the code to make use of mktime(). Originally, that
was not done for some issues seen, which may no longer
apply.

fixes rsyslog#830
@rgerhards
Copy link
Member

@rgerhards rgerhards commented Mar 2, 2016

Unfortunately, there is solid reason not to use mktime(). From the (new) code comment:

 * Note: it would be better to use something similar to mktime() here.
 * Unfortunately, mktime() semantics are problematic: first of all, it
 * works on local time, on the machine's time zone. In syslog, we have
 * to deal with multiple time zones at once, so we cannot plainly rely
 * on the local zone, and so we cannot rely on mktime(). One solution would
 * be to refactor all time-related functions so that they are all guarded 
 * by a mutex to ensure TZ consistency (which would also enable us to
 * change the TZ at will for specific function calls). But that would
 * potentially mean a lot of overhead.
 * Also, mktime() has some side effects, at least setting of tzname. With
 * a refactoring as described above that should probably not be a problem,
 * but would also need more work. For some more thoughts on this topic,
 * have a look here:
 * http://stackoverflow.com/questions/18355101/is-standard-c-mktime-thread-safe-on-linux
 * In conclusion, we keep our own code for generating the unix timestamp.

So we need to stay with the "home grown" code, at least the testbench now has good tests up until 2100 (which is when rsyslog needs at least some new lookup tables, so it is it's current expiration date ;)).

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Mar 3, 2016
among others, showcases a problem with leap year handling

see also rsyslog#830
rgerhards added a commit to rgerhards/rsyslog that referenced this issue Mar 3, 2016
The problem happened in leap year from March til then end
of year and healed itself at the begining of the next year.
During the problem period, the timestamp was 24 hours too
low.

This is primarily a simple fix that can also be applied to
older rsyslog versions. However, we will see if we can
refactor the code to make use of mktime(). Originally, that
was not done for some issues seen, which may no longer
apply.

fixes rsyslog#830
rgerhards added a commit to rgerhards/rsyslog that referenced this issue Mar 3, 2016
among others, showcases a problem with leap year handling

see also rsyslog#830
rgerhards added a commit to rgerhards/rsyslog that referenced this issue Mar 3, 2016
The problem happened in leap year from March til then end
of year and healed itself at the begining of the next year.
During the problem period, the timestamp was 24 hours too
low.

This is primarily a simple fix that can also be applied to
older rsyslog versions. However, we will see if we can
refactor the code to make use of mktime(). Originally, that
was not done for some issues seen, which may no longer
apply.

fixes rsyslog#830
Whissi added a commit to Whissi/gentoo that referenced this issue Mar 12, 2016
Bug: rsyslog/rsyslog#830

Package-Manager: portage-2.2.28
gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Mar 12, 2016
Bug: rsyslog/rsyslog#830

Package-Manager: portage-2.2.28
Closes: #1034
JindongChen pushed a commit to jopstools/rsyslog that referenced this issue Mar 15, 2016
The problem happened in leap year from March til then end
of year and healed itself at the begining of the next year.
During the problem period, the timestamp was 24 hours too
low.

This is primarily a simple fix that can also be applied to
older rsyslog versions. However, we will see if we can
refactor the code to make use of mktime(). Originally, that
was not done for some issues seen, which may no longer
apply.

fixes rsyslog#830
janmejay added a commit to janmejay/rsyslog that referenced this issue Apr 4, 2016
The problem happened in leap year from March til then end
of year and healed itself at the begining of the next year.
During the problem period, the timestamp was 24 hours too
low.

This is primarily a simple fix that can also be applied to
older rsyslog versions. However, we will see if we can
refactor the code to make use of mktime(). Originally, that
was not done for some issues seen, which may no longer
apply.

fixes rsyslog#830
@lock
Copy link

@lock lock bot commented Dec 27, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Dec 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants