Skip to content

Support WAL mode #1445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 23 commits into from
Closed

Support WAL mode #1445

wants to merge 23 commits into from

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Jul 24, 2020

Fixes DGRAPH-2177

[This PR contains 300 lines of tests. The actual code is around 250 lines]

This PR adds support for running badger in WAL mode. When WALMode option is set, badger will remove the old wal files as soon as new log files are created. This allows for faster disk space reclamation.

The maximum size of value that can be inserted into LSM tree in badger is capped at 1 MB (I don't know why we limit it to 1 MB).
So a user cannot insert a value larger than 1 MB in the WALMode.

Users can switch between the modes without any loss of data. However, they cannot run Value Log GC in WAL Mode. This can be supported but this PR doesn't support that. It can be done in a separate PR.

Problem with WAL Mode and Value Log GC

It could happen that someone inserted a 10 MB value in the normal mode and then opens badger in WAL mode. If we now try to run ValueLog GC, badger will error out half way while moving the old value since we support only 1 MB (or maxValueThreshold) length of values in WAL mode.
One way to deal with this issue would be to read the entire value log file and GC it only if all values are less than the value threshold. If there is even a single file with threshold greater than the WAL mode threshold, we will not GC that vlog file.

The approach mentioned above is not implemented in this PR. It will be done in a separate PR. This PR disables value log GC in WAL mode.

Todo

  • Ensure we run in LSM only mode when WALMode is set. (we're setting the max value threshold)
  • Add tests [we don't want to accidentally delete value vlog files]
  • Disable Value Log GC in WALMode.

This change is Reviewable

@jarifibrahim jarifibrahim changed the title [Experiment] Support using vlog as only WAL Support using vlog as only WAL Jul 31, 2020
@jarifibrahim jarifibrahim marked this pull request as ready for review July 31, 2020 13:57
@jarifibrahim jarifibrahim changed the title Support using vlog as only WAL Support WAL mode Jul 31, 2020
})
}

func TestWALBigValue(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test with encryption -> switch between modes (wal and vlog)

Copy link

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: minor comments.

Reviewable status: 0 of 8 files reviewed, 7 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, @manishrjain, @parasssh, and @vvbalaji-dgraph)


db.go, line 734 at r1 (raw file):

		} else {
			// In WALMode, we don't use the values stored in the wal files.
			y.AssertTrue(!db.opt.WALMode)

Error out instead of crash.


stream_writer.go, line 329 at r1 (raw file):

				}
			} else {
				y.AssertTrue(!w.db.opt.WALMode)

error or silently skip this entry instead of crashing.


value.go, line 912 at r1 (raw file):

			suffix = walSuffix
		default:
			// This neither a vlog file nor a wal file, ignore it.

nit: "is"


value.go, line 1160 at r1 (raw file):

		flushChan: make(chan map[uint32]int64, 16),
	}
	vlog.wc = &walCleaner{

We dont need to instantiate this if we are not in the WAL mode. Its confusing otherwise.


value.go, line 1280 at r1 (raw file):

	vlog.db.vhead = valuePointer{Fid: vlog.maxFid, Offset: uint32(lastOffset)}

	vlog.wc.dropBefore(vlog.maxFid)

do this only in WAL mode


value.go, line 2161 at r1 (raw file):

				// Delete only WAL files. Vlog files will be deleted by the vlog GC.
				if lf.fileType == vlogFile {

I'd prefer doing the other way.
If filetype == WAL {
... // logic below
}

// all other file types are skipped.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, 14 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, @manishrjain, and @vvbalaji-dgraph)


db.go, line 240 at r1 (raw file):

	if opt.WALMode {
		opt.ValueThreshold = maxValueThreshold

Did you figure out if we add a value bigger than 1MB, what happens?


db.go, line 812 at r1 (raw file):

		// Batch set API bypasses the txn.modify function so recheck the length here.
		if db.opt.WALMode && len(e.Value) > db.opt.ValueThreshold {
			// Cannot insert a value that's bigger than the value threshold in

Is there a reason why you can't add a value bigger than ValueThreshold?


db.go, line 1220 at r1 (raw file):

	// than maxValueThreshold, we should not GC that file.
	if db.opt.WALMode {
		return errors.New("Cannot run value GC in WAL mode")

Why not?


txn.go, line 386 at r1 (raw file):

		// All values are written in the LSM tree in this mode. If the len goes
		// beyond value threshold we will end up writing to vlog and this will
		// cause a crash in badger.

Why would this crash happen? Why can't I write huge values to value log, but all of them in WAL?


value.go, line 2141 at r1 (raw file):

			// Set wc to nil so that we don't push more file IDs. DropBefore
			// will ignore fids if wc is nil.
			wc = nil

if this is nil, then wc.delChan would be inaccessible and dropBefore would crash. Why do you need to close this channel or set this to nil?


value.go, line 2158 at r1 (raw file):

				vlog.filesLock.Lock()
				lf, ok := vlog.filesMap[lfid]
				y.AssertTrue(ok)

All AssertTrues should become panics at this point. Then they can be caught and reported by Sentry.


value.go, line 2171 at r1 (raw file):

				if err := vlog.deleteLogFile(lf); err != nil {
					vlog.db.opt.Logger.Errorf("Failed to delete wal %s, err:%s", lf.fd.Name(), err)

vertical space

Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, 13 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, @manishrjain, @parasssh, and @vvbalaji-dgraph)


db.go, line 240 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Did you figure out if we add a value bigger than 1MB, what happens?

Yes, we can insert value as big as the memtable. If it cannot fit into the memtable, we cannot insert it in WAL mode (because we don't have any value pointers).


db.go, line 734 at r1 (raw file):

Previously, parasssh wrote…

Error out instead of crash.

Done


db.go, line 812 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Is there a reason why you can't add a value bigger than ValueThreshold?

Because we don't have the vlog files in the WAL mode. The WAL files are plain write-ahead-log files. If we were to allow values greater than the value threshold, then we'll have to keep both, vlog and WAL files.


db.go, line 1220 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Why not?

Copied over from the PR Description.

It could happen that someone inserted a 10 MB value in the normal mode and then opens badger in WAL mode. If we now try to run ValueLog GC, badger will error out half way while moving the old value since we support only 1 MB (or maxValueThreshold) length of values in WAL mode.
One way to deal with this issue would be to read the entire value log file and GC it only if all values are less than the value threshold. If there is even a single file with threshold greater than the WAL mode threshold, we will not GC that vlog file (or we still GC it but create a new vlog file in the WAL mode)


stream_writer.go, line 329 at r1 (raw file):

Previously, parasssh wrote…

error or silently skip this entry instead of crashing.

Isn't it better to crash instead of skipping entries? Let's say we're writing some index from dgraph and badger skips one key silently. If this happens, dgraph will return incorrect results.


value.go, line 912 at r1 (raw file):

Previously, parasssh wrote…

nit: "is"

Done


value.go, line 1160 at r1 (raw file):

Previously, parasssh wrote…

We dont need to instantiate this if we are not in the WAL mode. Its confusing otherwise.

This is instantiated so that we can clean up the old WAL files.
Let's say you were in WAL mode and you have some WAL files. This initialization will ensure we clean up the old WAL files irrespective of which mode we're in.


value.go, line 1280 at r1 (raw file):

Previously, parasssh wrote…

do this only in WAL mode

The wc is set to nil so once the WAL files are cleared in the normal mode, this call is a no-op.


value.go, line 2141 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

if this is nil, then wc.delChan would be inaccessible and dropBefore would crash. Why do you need to close this channel or set this to nil?

dropBefore has a if nil check so it wouldn't crash. This allows me to reduce the number of ifs I have to add but I guess it looks counter-intuitive. I'll change this and make it obvious.


value.go, line 2158 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

All AssertTrues should become panics at this point. Then they can be caught and reported by Sentry.

I think sentry catches log.Fatal as well. @parasssh does sentry catch log.Fatals?


value.go, line 2161 at r1 (raw file):

Previously, parasssh wrote…

I'd prefer doing the other way.
If filetype == WAL {
... // logic below
}

// all other file types are skipped.

Done.


value.go, line 2171 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

vertical space

Done.

@stale
Copy link

stale bot commented Sep 17, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Sep 17, 2020
@NamanJain8 NamanJain8 removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Sep 18, 2020
@stale
Copy link

stale bot commented Oct 18, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Oct 18, 2020
@jarifibrahim
Copy link
Contributor Author

We don't need this PR anymore. Closing it.

@jarifibrahim jarifibrahim deleted the ibrahim/only-wal branch October 18, 2020 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/stale The issue hasn't had activity for a while and it's marked for closing.
Development

Successfully merging this pull request may close these issues.

4 participants