Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only read the commit log once during bootstrapping #2645

Merged
merged 6 commits into from
Sep 17, 2020

Conversation

ryanhall07
Copy link
Collaborator

What this PR does / why we need it:
A recent refactoring of cold writes ( #2508) introduced a regression that increases the
chances the commit log is read twice while bootstrapping. The referenced PR changed the commitlog
bootstrapper to read all requested time ranges, even if a range had been fulfilled by a previous
bootstrapper. This was necessary since the commitlog may have cold writes that were never commmited
to a fileset. The fileystem bootstrapper would report a time range as fulfilled, but might be missing
cold writes only in the commit log.

It should be noted this bug was always theoretically possible, but unlikely since the commitlog bootstrapper
typically wouldn't run in the first pass (cold time ranges) since the filesystem would fulfill all cold ranges
and short circuit the first pass of the boostrapper.

This change only reads the commit log on the first pass of the boostrapper and caches the result to skip reading
it in subsequent passes. It doesn't actually matter which pass we read the commit log, the first was just chosen
arbitrarily.

Other attempts at fixing this bug attempted to disable the entire commit log bootstrapper during a pass, but that's
not possible since the commit log bootstrapper is actually 2 bootstrappers in one, both the the commit log and snapshot
files. To minimize the refactoring changes we still want to only read the snapshot files of the requested ranges.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:
NONE

Does this PR require updating code package or user-facing documentation?:
NONE

A recent refactoring of cold writes ( #2508) introduced a regression that increases the
chances the commit log is read twice while bootstrapping. The referenced PR changed the commitlog
bootstrapper to read all requested time ranges, even if a range had been fulfilled by a previous
bootstrapper. This was necessary since the commitlog may have cold writes that were never commmited
to a fileset. The fileystem bootstrapper would report a time range as fulfilled, but might be missing
cold writes only in the commit log.

It should be noted this bug was always theoretically possible, but unlikely since the commitlog bootstrapper
typically wouldn't run in the first pass (cold time ranges) since the filesystem would fulfill all cold ranges
and short circuit the first pass of the boostrapper.

This change only reads the commit log on the first pass of the boostrapper and caches the result to skip reading
it in subsequent passes. It doesn't actually matter which pass we read the commit log, the first was just chosen
arbitrarily.

Other attempts at fixing this bug attempted to disable the entire commit log bootstrapper during a pass, but that's
not possible since the commit log bootstrapper is actually 2 bootstrappers in one, both the the commit log and snapshot
files. To minimize the refactoring changes we still want to only read the snapshot files of the requested ranges.
@@ -246,10 +237,56 @@ func (s *commitLogSource) Read(
zap.Duration("took", s.nowFn().Sub(startSnapshotsRead)))
span.LogEvent("read_snapshots_done")

commitLogResult, err := s.readCommitLog(namespaces, span)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately the diff is not great. I essentially moved all of the commit log reading logic into a new method readCommitLog. additionally I changed the for loop below to iterate over the provided namespaces and not the namespace results from the commit log. didn't seem necessary to use the results in the for loop and it allowed the results to be private in the new method.

Copy link
Contributor

@notbdu notbdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good so far. Just needs some tests for ensuring that the commit log bootstrapper only bootstraps commit logs once and maybe something additional to ensure snapshots aren't being read twice.

var indexResult result.IndexBootstrapResult
if ns.Metadata.Options().IndexOptions().Enabled() {
indexResult = result.NewIndexBootstrapResult()
if commitLogResult.shouldReturnUnfulfilled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could move up the nesting of this logic so we don't need to perform the same check twice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we can, since you're already inside a conditional branch for IndexOptions

DataResult: dataResult,
IndexResult: indexResult,
})
s.commitLogResult = commitLogResult{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's a little awkward to set it here and use the set value as the return value. We also use the return value after the function call. Maybe it's better to either have no return or return the value and let the caller do the setting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I don't have a strong opinion about this since it's all private local state anyways. I'll just update it so the main function controls the caching logic and decides if it should actually call readCommitLog

@notbdu
Copy link
Contributor

notbdu commented Sep 17, 2020

Nice, we should also add this check in the integration tests. Perhaps just to the test in src/dbnode/integration/commitlog_bootstrap_test.go.

Copy link
Contributor

@notbdu notbdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides one remaining nit (imports), PR LGTM.

@@ -23,6 +23,7 @@
package integration

import (
"github.com/uber-go/tally"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: imports, this goes with the third party imports

@ryanhall07
Copy link
Collaborator Author

verified the change locally with the m3stack

before the change

(rc:rc-05)rhall@ryans-mbp ~ % docker logs m3_stack_m3db_seed_1 2>&1 | grep "read commit logs start"
{"level":"info","ts":1600385779.7967737,"msg":"read commit logs start","bootstrapper":"commitlog"}
{"level":"info","ts":1600385779.825805,"msg":"read commit logs start","bootstrapper":"commitlog"}

after the change

(rc:rc-05)rhall@ryans-mbp ~ % docker logs m3_stack_m3db_seed_1 2>&1 | grep "read commit logs start"
{"level":"info","ts":1600385228.9855318,"msg":"read commit logs start","bootstrapper":"commitlog"}

@ryanhall07 ryanhall07 merged commit 88164cf into master Sep 17, 2020
@ryanhall07 ryanhall07 deleted the rhall-commit-log-twice-take-2 branch September 17, 2020 23:38
ChrisChinchilla pushed a commit that referenced this pull request Oct 5, 2020
commit 4f3778dded83ad4d81aaeaad62608f6c7a0f9461
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Mon Oct 5 12:53:40 2020 +0200

    fix code inclusion

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit daefb435243ed98041744c520d52ac4a92421694
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Mon Oct 5 12:53:34 2020 +0200

    Add redirects

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit 612cfbb319b30177501530f9cd7bdd626ea6b107
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Mon Oct 5 12:09:40 2020 +0200

    Remove versioning for now

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit cda48e18dce956a4e684dd4d6d7943df808e6846
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Fri Oct 2 18:09:35 2020 +0200

    Switch netlify directory

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit c5d0b1decc95d042a565cbe97d25c9eda94549a9
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Fri Oct 2 18:00:34 2020 +0200

    Move theme to module

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit 855d8c7af1bf45978ed180469ecb1b753ec1abdb
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Thu Oct 1 14:59:14 2020 +0200

    Netlify dev

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit a8a8eda0f2f3b8fa1b3aefa2316942ff692dc8e5
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Thu Oct 1 14:03:34 2020 +0200

    Update Hugo version

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit f19ffadb7caad18663dad99ec5230e8357125954
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Thu Oct 1 13:54:41 2020 +0200

    Random file to fix odd git issues

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit a6413db7891f2f741510e4b36300e3cb93904f1d
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Thu Oct 1 13:11:42 2020 +0200

    Update versions

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit 9919f004633ace0f88bb9cb721a6373416f28781
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Thu Oct 1 12:45:25 2020 +0200

    Convert docs theme to module

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit 3a1e013338b6671eb601df3f5cbdd89d770ca171
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Thu Oct 1 12:27:31 2020 +0200

    Remove subtree again, not working

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit 282ca2d870eb656ba15e20ade3d8aec6493523fe
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Thu Oct 1 12:20:11 2020 +0200

    Add versions to config

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit 73f03d34990b740d5bb1c57df1c5fbb328d0f701
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Wed Sep 30 18:29:35 2020 +0200

    Consolidate all commits

commit f2ebf5c
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Mon Sep 21 12:33:37 2020 -0700

    [query] Implemented the Graphite `integralByInterval` function  (#2596)

commit a66fb7d
Author: arnikola <artem@chronosphere.io>
Date:   Mon Sep 21 14:33:57 2020 -0400

    [dbnode] Tile iterators for wide aggregations (#2646)

commit 9ea5682
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Sun Sep 20 21:50:31 2020 -0700

    [query] Implemented the Graphite `divideSeriesLists` function (#2585)

commit 35cac59
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Mon Sep 21 00:21:58 2020 -0400

    [coordinator] Update OpenAPI specs for namespace update endpoint (#2629)

commit ef83ec4
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Sun Sep 20 21:57:26 2020 -0400

    [changelog] Add changelog for 0.15.15 (#2649)

commit 091f833
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Fri Sep 18 11:30:57 2020 -0400

    [coordinator] Allow configuration of tag validation (#2647)

commit 3476b4e
Author: Gediminas Guoba <gediminas@chronosphere.io>
Date:   Fri Sep 18 17:24:09 2020 +0300

    [dbnode] Streaming writer (#2618)

    * [dbnode] Large tiles writer

    * minor refactorings

    * minor refactoring

    * Skip tagsEncoder, use encodedTags directly

    * Rename LargeTilesWriter to StreamingWriter

    * Add FIXME wrt stegment.Tail.Finalize

    * Address PR feedback

    Co-authored-by: Linas Medziunas <linas.medziunas@gmail.com>
    Co-authored-by: Linas Medžiūnas <linasm@users.noreply.github.com>

commit 88164cf
Author: Ryan Hall <ryanhall07@gmail.com>
Date:   Thu Sep 17 16:38:04 2020 -0700

    Only read the commit log once during bootstrapping (#2645)

    * Only read the commit log once during bootstrapping

    A recent refactoring of cold writes ( #2508) introduced a regression that increases the
    chances the commit log is read twice while bootstrapping. The referenced PR changed the commitlog
    bootstrapper to read all requested time ranges, even if a range had been fulfilled by a previous
    bootstrapper. This was necessary since the commitlog may have cold writes that were never commmited
    to a fileset. The fileystem bootstrapper would report a time range as fulfilled, but might be missing
    cold writes only in the commit log.

    It should be noted this bug was always theoretically possible, but unlikely since the commitlog bootstrapper
    typically wouldn't run in the first pass (cold time ranges) since the filesystem would fulfill all cold ranges
    and short circuit the first pass of the boostrapper.

    This change only reads the commit log on the first pass of the boostrapper and caches the result to skip reading
    it in subsequent passes. It doesn't actually matter which pass we read the commit log, the first was just chosen
    arbitrarily.

    Other attempts at fixing this bug attempted to disable the entire commit log bootstrapper during a pass, but that's
    not possible since the commit log bootstrapper is actually 2 bootstrappers in one, both the the commit log and snapshot
    files. To minimize the refactoring changes we still want to only read the snapshot files of the requested ranges.

commit 3d2915f
Author: nate <nbroyles@gmail.com>
Date:   Thu Sep 17 13:03:48 2020 -0400

    [dbnode] Make caching after block retrieval a configuration option (#2613)

commit 0ef7aba
Author: nate <nbroyles@gmail.com>
Date:   Thu Sep 17 12:33:53 2020 -0400

    [docs] Add documentation on fileset migrations (#2630)

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants