Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New journal disk based indexing for agent memory reduction #13885

Merged
merged 108 commits into from Nov 15, 2022

Conversation

stelfrag
Copy link
Collaborator

@stelfrag stelfrag commented Oct 26, 2022

Summary

The agent requires a lot of memory to index pages and how they map to the actual files that store metrics

  • Produce a new journal index file that the agent will MMAP and use that instead of creating all the entries in memory

File structure

The new file based index has a structure that allows quick access of the needed metadata. The file structure consists
of

  • File header
  • List of extents
  • List of unique metric identifiers (sorted)
  • Detailed page info for each metric (page @ time information)

During the agent start up, the journal replay only needs to create the necessary pages (unique metrics) which is very fast (initial tests indicate that is ~x100 faster than the current journal replay). This is aided by the fast that individual pages are not created in memory during startup but only when needed (during data queries).

Pages that are no longer needed (evicted from the cache) are removed. They will also be removed when unused for more than 10 minutes.

You can see the number of descriptors in memory under under netdata.dbengine_long_term_page_stats, journal v2 descriptors

Creation of new journal index files

When the agent starts it will check if a new index file exists for each journal file that needs to be processed. If it exists, it will use that instead. If the index file does not exist, it will replay the old journal file and use that information to create the new journal file and start using that immediately.
The agent will generated new index files for all journals except the last (active) one

New datafiles while the agent is running

When a new datafile / journal pair is created the agent will check and create a new journal index file for the journal that was just completed.

Known issues

  • New journal creation may not trigger index creation for the last journal file do to a race condition (pending transactions)

Other fixes

This PR also fixes:

  • Bug in replication where overlapping time ranges were replicated unnecessarily
  • Bug in streaming compression where under certain conditions corrupted data were offered for parsing
  • Children connecting to a parent without compression were disabling compression globally for the host. Now compression is globally disabled only when there is a compression error.
  • DBENGINE was under conditions allowing past time ranges to be injected to the database, resulting in overlapping data pages in the database. After this PR, DBENGINE only allows future data to be stored, relative to the last data collection time.
Test Plan

@lgtm-com
Copy link

lgtm-com bot commented Oct 26, 2022

This pull request introduces 11 alerts when merging 4c281aa into b1bc23d - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield
  • 1 for Time-of-check time-of-use filesystem race condition

@ktsaou
Copy link
Member

ktsaou commented Oct 26, 2022

image

@ktsaou
Copy link
Member

ktsaou commented Oct 26, 2022

image

@lgtm-com
Copy link

lgtm-com bot commented Oct 26, 2022

This pull request introduces 11 alerts when merging f6af2c0 into 9e89ac7 - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield
  • 1 for Time-of-check time-of-use filesystem race condition

@ktsaou
Copy link
Member

ktsaou commented Oct 26, 2022

@stelfrag I fixed both crashes.

@lgtm-com
Copy link

lgtm-com bot commented Oct 26, 2022

This pull request introduces 11 alerts when merging 0ada1bb into 9e89ac7 - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield
  • 1 for Time-of-check time-of-use filesystem race condition

@lgtm-com
Copy link

lgtm-com bot commented Oct 26, 2022

This pull request introduces 10 alerts when merging 6c63ade into 185f5a1 - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 26, 2022

This pull request introduces 10 alerts when merging 76838d0 into 88c03ca - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 26, 2022

This pull request introduces 10 alerts when merging 052e645 into 88c03ca - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 26, 2022

This pull request introduces 10 alerts when merging ee45261 into e4cc55d - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 27, 2022

This pull request introduces 10 alerts when merging 376d5f5 into 3bbfa17 - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 27, 2022

This pull request introduces 10 alerts when merging f1ba97b into 3bbfa17 - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 27, 2022

This pull request introduces 10 alerts when merging fb8cb7f into 3bbfa17 - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 27, 2022

This pull request introduces 11 alerts when merging 66e28a1 into fced0fc - view on LGTM.com

new alerts:

  • 10 for FIXME comment
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 27, 2022

This pull request introduces 11 alerts when merging b643d8f into 0e3c365 - view on LGTM.com

new alerts:

  • 10 for FIXME comment
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 28, 2022

This pull request introduces 11 alerts when merging d0e2bae into 1ab62f3 - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Comparison result is always the same
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 29, 2022

This pull request introduces 11 alerts when merging cc48dee into f9ea1f2 - view on LGTM.com

new alerts:

  • 9 for FIXME comment
  • 1 for Comparison result is always the same
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 30, 2022

This pull request introduces 10 alerts when merging 1946c41 into 4dd4bd7 - view on LGTM.com

new alerts:

  • 8 for FIXME comment
  • 1 for Comparison result is always the same
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 31, 2022

This pull request introduces 8 alerts when merging 0f686cd into 4dd4bd7 - view on LGTM.com

new alerts:

  • 6 for FIXME comment
  • 1 for Comparison result is always the same
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 31, 2022

This pull request introduces 8 alerts when merging 46112ad into 4dd4bd7 - view on LGTM.com

new alerts:

  • 6 for FIXME comment
  • 1 for Comparison result is always the same
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 31, 2022

This pull request introduces 8 alerts when merging c968dd9 into 9e85dc2 - view on LGTM.com

new alerts:

  • 6 for FIXME comment
  • 1 for Comparison result is always the same
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Oct 31, 2022

This pull request introduces 8 alerts when merging 7a49304 into df87a53 - view on LGTM.com

new alerts:

  • 6 for FIXME comment
  • 1 for Comparison result is always the same
  • 1 for Implicit downcast from bitfield

@lgtm-com
Copy link

lgtm-com bot commented Nov 14, 2022

This pull request introduces 1 alert when merging 7994128 into d15f36d - view on LGTM.com

new alerts:

  • 1 for Implicit downcast from bitfield

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 1 alert when merging ba1b2f7 into 558db52 - view on LGTM.com

new alerts:

  • 1 for Implicit downcast from bitfield

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 1 alert when merging d4f72ea into 558db52 - view on LGTM.com

new alerts:

  • 1 for Implicit downcast from bitfield

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 1 alert when merging 9635927 into 558db52 - view on LGTM.com

new alerts:

  • 1 for Implicit downcast from bitfield

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

stelfrag and others added 5 commits November 15, 2022 17:18
Add config parameter for detailed journal integrity check (Metric chain validation check during startup)
pg cache insert drop check for existing page
Fix crc calculation for metric headers
@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 1 alert and fixes 1 when merging e4b51f2 into b4a0298 - view on LGTM.com

new alerts:

  • 1 for Implicit downcast from bitfield

fixed alerts:

  • 1 for FIXME comment

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 1 alert and fixes 1 when merging c8def16 into b4a0298 - view on LGTM.com

new alerts:

  • 1 for Implicit downcast from bitfield

fixed alerts:

  • 1 for FIXME comment

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request introduces 1 alert and fixes 1 when merging c1c1c20 into b4a0298 - view on LGTM.com

new alerts:

  • 1 for Implicit downcast from bitfield

fixed alerts:

  • 1 for FIXME comment

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

@ktsaou ktsaou merged commit 224b051 into netdata:master Nov 15, 2022
@lgtm-com
Copy link

lgtm-com bot commented Nov 15, 2022

This pull request fixes 1 alert when merging 074113e into b4a0298 - view on LGTM.com

fixed alerts:

  • 1 for FIXME comment

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

ktsaou added a commit that referenced this pull request Nov 15, 2022
ktsaou added a commit that referenced this pull request Nov 15, 2022
…14000)

Revert "New journal disk based indexing for agent memory reduction (#13885)"

This reverts commit 224b051.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants