Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Promtail journal seeking known position #2111

Merged
merged 1 commit into from
May 22, 2020

Conversation

adityacs
Copy link
Collaborator

@adityacs adityacs commented May 22, 2020

What this PR does / why we need it:
Fix Promtail journal seeking known position

Which issue(s) this PR fixes:
Fixes #2104

@adityacs adityacs force-pushed the 2104_journal_seekcursor_position branch from 49d61e4 to cf52796 Compare May 22, 2020 11:43
@adityacs
Copy link
Collaborator Author

adityacs commented May 22, 2020

This bug is probably due to not calling Previous or Next as explained here
https://github.com/coreos/go-systemd/blob/cb8b64719ae392c1b4db8327dd7d150fe54d8811/sdjournal/journal.go#L703

What I see is, when next or previous is called current_file location is set as here
https://github.com/systemd/systemd/blob/fbc6d1716fc74d5475146cb6a98a849ade1e5ade/src/journal/sd-journal.c#L853

without next or previous this returns nil I guess
https://github.com/systemd/systemd/blob/fbc6d1716fc74d5475146cb6a98a849ade1e5ade/src/journal/sd-journal.c#L2187

@rfratto need your help here

@codecov-commenter
Copy link

codecov-commenter commented May 22, 2020

Codecov Report

Merging #2111 into master will decrease coverage by 0.04%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2111      +/-   ##
==========================================
- Coverage   61.25%   61.20%   -0.05%     
==========================================
  Files         146      146              
  Lines       11193    11196       +3     
==========================================
- Hits         6856     6853       -3     
- Misses       3791     3796       +5     
- Partials      546      547       +1     
Impacted Files Coverage Δ
pkg/promtail/targets/journaltarget.go 52.55% <0.00%> (-1.18%) ⬇️
pkg/promtail/targets/filetarget.go 68.67% <0.00%> (-1.81%) ⬇️

@rfratto
Copy link
Member

rfratto commented May 22, 2020

Thanks for looking into this @adityacs! I'm looking at the systemd docs for this, and while it seems like calling Next or Previous should work, I think we want to use Next here instead; that way Promtail never re-reads already-sent entries if the cursor we seek to is invalid.

I'd also like to see some comments explaining why we have to do the call before calling GetEntry, could you add those?

@rfratto rfratto self-requested a review May 22, 2020 12:11
@adityacs
Copy link
Collaborator Author

adityacs commented May 22, 2020

@rfratto Sure, will add the comments.

What I thought was If there is no entry after the cursor, calling next would throw an error. However, calling previous would re read the entry as you mentioned.

How should we handle this?

@rfratto
Copy link
Member

rfratto commented May 22, 2020

What I thought was If there is no entry after the cursor, calling next would throw an error. However, calling previous would re read the entry as you mentioned.

Right, so calling Next in this case would be valid, but it would be set to the EOF. I think getting the entry would just return the last entry before EOF, but it's not clear (sorry, my C skills aren't what they used to be).

Even if that's wrong and getting the entry fails, I think that's ok; trying to restore the position is best-effort. Your PR here will help a lot with the majority of issues, but I think there's always going to be the possibility of some edge case that's going to prevent us from restoring.

That being said, I think the real question here is agreeing what the behavior should be when we do fail to read the saved entry, and it's either:

  1. We reread from the max age and the client ends up resending some logs they already sent (seeing 4xxs)
  2. We skip to the tail of the journal and stop reading from there. No 4xxs, but then journal entries that appeared in the promtail downtime will never be sent.

Currently I'm using the first approach and IMO I think that's the right thing to do, even if the 4xxs might raise some red flags when we fail to read the entry.

@adityacs
Copy link
Collaborator Author

Currently I'm using the first approach and IMO I think that's the right thing to do, even if the 4xxs might raise some red flags when we fail to read the entry.

Yes, this makes sense.

According to this doc

More specifically, if sd_journal_next() or sd_journal_previous() reach the end/beginning of the journal they will return 0, instead of 1 when they are successful. This should be considered an EOF marker

So, if it is EOF, it will return 0 which is fine as the condition they check here is r<0
https://github.com/coreos/go-systemd/blob/cb8b64719ae392c1b4db8327dd7d150fe54d8811/sdjournal/journal.go#L564

So, in cases where we get error, it's good to reread from the max age as you mentioned

@adityacs adityacs force-pushed the 2104_journal_seekcursor_position branch from cf52796 to d42cdb4 Compare May 22, 2020 16:42
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@rfratto rfratto merged commit f984e71 into grafana:master May 22, 2020
cyriltovena pushed a commit to cyriltovena/loki that referenced this pull request Jun 11, 2021
…ana#2111)

* Removed extra empty lines in imports.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added section on Go files formatting

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

promtail/journal: cannot seek back to saved position
3 participants