Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

pushsync, localstore: decouple push/pull indexes for tag increment #1915

Merged
merged 13 commits into from
Nov 14, 2019

Conversation

acud
Copy link
Member

@acud acud commented Oct 29, 2019

This PR aims to decrease the complexity that was added in #1828 by adding a Tag field to the localstore pullIndex. This allows decoupling of the logic of tag increments from the pushIndex and in addition, allows us to have the freedom of choosing whether to insert into pushIndex at all (this should not be done on all cases, for example when an anonymous upload is made).

As part of the changes needed, a backwards compatible Tag field has been appended to the pullIndex value definition. This field, due to the nature of the new tag increment logic is backwards compatible, thus there is not need for a database migration apart from just renaming the pullIndex using the new shed functionality of RenameIndex.

@acud acud added the cleanup code completion, add comments and more label Oct 29, 2019
@acud acud added this to the 0.5.3 milestone Oct 29, 2019
@acud acud self-assigned this Oct 29, 2019
@acud acud added this to Backlog in Swarm Core - Sprint planning via automation Oct 29, 2019
@acud acud requested review from janos and zelig October 29, 2019 11:20
@acud acud moved this from Backlog to In progress in Swarm Core - Sprint planning Oct 29, 2019
@acud acud force-pushed the pushpullidx branch 3 times, most recently from a1c61b7 to 45c72e1 Compare November 1, 2019 08:46
@acud acud changed the base branch from master to compare-pushpullpr November 1, 2019 11:09
@acud acud changed the base branch from compare-pushpullpr to master November 1, 2019 11:46
@acud acud changed the title localstore: add tags to pullsync index and mend tag increment logic chunk, localstore: decouple push and pull indexes, add tag checkpoint persistence Nov 1, 2019
@acud acud changed the title chunk, localstore: decouple push and pull indexes, add tag checkpoint persistence chunk, localstore: decouple push/pull indexes, add tag checkpoint persistence Nov 1, 2019
@acud
Copy link
Member Author

acud commented Nov 4, 2019

there is a data race in a stream test which is showing up here. I will fix this but I think it is orthogonal to this PR and that this can still be reviewed

@acud acud moved this from In progress to In review (includes Documentation) in Swarm Core - Sprint planning Nov 4, 2019
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few minor comments. I am not approving since it is still marked as in progress.

storage/localstore/migration_test.go Outdated Show resolved Hide resolved
storage/localstore/migration_test.go Outdated Show resolved Hide resolved
storage/localstore/migration.go Outdated Show resolved Hide resolved
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is addressing 1) persistance, and 2) changing indexes.
as for 1) i believe we may need more careful checkpoints as per roundtable discussion, no?
as for 2) synced state should not increment for pull sync only sent state.

chunk/tags.go Outdated Show resolved Hide resolved
storage/localstore/mode_set.go Outdated Show resolved Hide resolved
swarm.go Outdated Show resolved Hide resolved
storage/localstore/mode_set.go Show resolved Hide resolved
@acud
Copy link
Member Author

acud commented Nov 5, 2019

This PR is addressing 1) persistance, and 2) changing indexes.
as for 1) i believe we may need more careful checkpoints as per roundtable discussion, no?

TL;DR: our shutdown sequence does not promise that all goroutines related to push and pullsync are shut down before we persist tags. this has to be mended before we implement a dirty flag

Yes but as per my discussion with @janos it is not possible to implement the dirty flag right now, since we have no guarantees that once the dirty flag is unset (on persist), that no other goroutine is going to do a tag increment. example - on pushsync shutdown right now there is no guarantee that additional goroutines will not do any other Set operations or tag increments. This will, in turn, very probably create a situation where the tags are constantly dirty on shutdown and thus on startup they will always be ignored and a new tag object created instead.

as for 2) synced state should not increment for pull sync only sent state.

ok

@acud acud requested a review from zelig November 8, 2019 16:27
@janos
Copy link
Member

janos commented Nov 11, 2019

The code cannot be compiled:

storage/localstore/migration.go:117:23: not enough arguments in call to db.pushIndex.Iterate
	have (func(shed.Item) (bool, error))
	want (shed.IndexIterFunc, *shed.IterateOptions)
storage/localstore/migration.go:120:5: continue is not in a loop
storage/localstore/migration.go:128:3: missing return at end of function
storage/localstore/migration.go:134:4: too many arguments to return
	have (nil, error)
	want (error)

storage/localstore/localstore.go Outdated Show resolved Hide resolved
storage/localstore/migration.go Outdated Show resolved Hide resolved
storage/localstore/migration.go Outdated Show resolved Hide resolved
Comment on lines 39 to 45
for i := 0; i < len(migrations)-1; i++ {
err := migrations[i].migrationFunc(db)
if err != nil {
return err
}
if i != len(migrations)-1 {
err = db.schemaName.Put(migrations[i+1].name) // put the name of the next schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we never running migrations for the last element? But also setting the schema of the last migration.

When any migration is done, its schema should be stored in schemaName field, so that if any intermediate migration fails, the migration can be restarted after the last successful one.

Copy link
Member Author

@acud acud Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we never running migrations for the last element? But also setting the schema of the last migration.

if you would look at the definition of the migration struct:

type migration struct {
	name          string             //name of the schema
	migrationFunc func(db *DB) error // the migration function that needs to be performed in order to get to the NEXT schema name
}

maybe this is not intuitive enough. i can change it that the migrationFunc will lead to the current schema name, not the next.

When any migration is done, its schema should be stored in schemaName field, so that if any intermediate migration fails, the migration can be restarted after the last successful one.

that is actually what happens. but yeah i guess this code is a bit overly convoluted. i will refactor this

storage/localstore/migration.go Outdated Show resolved Hide resolved

// allDbSchemaMigrations contains an ordered list of the database schemes, that is
// in order to run data migrations in the correct sequence
var allDbSchemaMigrations = []migration{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all prefix needed? There is only one migrations slice. Can it be called just migrations?

return nil
}},
{name: DbSchemaDiwali, migrationFunc: func(db *DB) error {
shouldNotRun = true // this should not be executed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in relation with my comment on migrate function. I do not know why last migration should not be executed.

if err != nil {
t.Fatal(err)
}
tag.Inc(chunk.StateStored) // so we don't get an error on tag.Status later on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is needed in the test, should it be the responsibility of db.Put to increment StateStored if it finds the tag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, StateStored is incremented in hasherstore..... remove this line and you will understand the error that is invoked. this must be called in the test because of the tag.Status logic. please review that function to understand why this is called

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

storage/localstore/mode_set_test.go Show resolved Hide resolved
storage/localstore/mode_set.go Show resolved Hide resolved
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments. LGTM.

@@ -8,7 +24,7 @@ import (

// The DB schema we want to use. The actual/current DB schema might differ
// until migrations are run.
const CurrentDbSchema = DbSchemaSanctuary
var DbSchemaCurrent = DbSchemaDiwali
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong, but I do not think that I added schema name variables. I used DB in shed.

The convention that is usually noted is this one https://github.com/golang/go/wiki/CodeReviewComments#initialisms on initialisms. I also think that Uid should be UID and so on, but I missed to review these pull requests.

The uppercaps argument does not stand, as it is about "same" caps, DB can be db at the start of unexported value or on its own.

In any case, we have so much inconsistencies in the code that this does not matter at all.

if err != nil {
t.Fatal(err)
}
tag.Inc(chunk.StateStored) // so we don't get an error on tag.Status later on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

}
defer os.RemoveAll(tmpdir)

cdir, err := os.Getwd()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. Tests are run always in current directory. Relative path for dir is fine.

return err
}
defer func() {
err = out.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error will shadow any returned one. It is possible that Copy or Sync return not nil error and that the err is set to nil, by the Close in this defer.

if _, err = io.Copy(out, in); err != nil {
return err
}
err = out.Sync()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is never returned.

@acud acud removed the do-not-merge label Nov 14, 2019
@acud acud merged commit 3ce77cb into master Nov 14, 2019
Swarm Core - Sprint planning automation moved this from In review (includes Documentation) to Done Nov 14, 2019
@acud acud deleted the pushpullidx branch November 14, 2019 16:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cleanup code completion, add comments and more
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants