Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PropertyLoadSaver Interface #69

Merged
merged 16 commits into from
May 7, 2018

Conversation

yuichi1004
Copy link
Contributor

Introduction

This PR supports PropertyLoadSaver interface.

https://cloud.google.com/appengine/docs/standard/go/datastore/reference#hdr-The_PropertyLoadSaver_Interface

This PR should resolve #53.

Strategy

  • Define serializationStatePropertyList as new cache type
    • PropertyLists are cached with serializationStatePropertyList, while others are cached with serializationStateNormal
    • This strategy simplifies the implementation than maintaining compatibility of normal cache to PropertyLoadSaver interface
    • I think most people use one of normal struct or PropertyLoadSaver interface so this works on most of the use cases
  • Support fetching different cache type if possible, fallback to datastore if not
    • This strategy supports most of the case with cache efficiency
    • This strategy avoids error even if fetching different cache type is not supported

Please take a look at the code. Any comments, improvements, proposal of the different approach are welcome.

@xStrom
Copy link
Collaborator

xStrom commented May 10, 2017

I'll find time for review later this week.

@hamakn
Copy link

hamakn commented Apr 6, 2018

How is the progress?
I want this feature.

@xStrom
Copy link
Collaborator

xStrom commented Apr 7, 2018

I apologize, this got deprioritized and somewhat forgotten.

Good to know that there's active interest! I will make an effort to get this functionality tested & merged.

@yuichi1004 are you still interested in helping make this happen? There are merge conflicts right now, I think it's these two PRs that got merged after you submitted your changes: #71 & #73.

If @yuichi1004 wants to help and fixes the merge conflicts, I will review/test it. If not, it may take a bit longer but I will do the necessary work myself.

@yuichi1004
Copy link
Contributor Author

@xStrom @hamakn let me take care of this. I will check and fix on this weekend. Sorry for my late response.

@yuichi1004
Copy link
Contributor Author

@xStrom @hamakn
Conflict resolved and test works for me.

I have one failure test but I do believe this is irrelevant to this change and the bug of App Engine dev server.

=== RUN   TestChangeMemcacheKey
INFO     2018-04-14 07:00:02,920 devappserver2.py:105] Skipping SDK update check.
WARNING  2018-04-14 07:00:02,920 devappserver2.py:121] DEFAULT_VERSION_HOSTNAME will not be set correctly with --port=0
WARNING  2018-04-14 07:00:02,974 simple_search_stub.py:1196] Could not read search indexes from /var/folders/5j/2bmf8d2d3xgb9lmttlptxvm8p49fjs/T/appengine.testapp.01099286/search_indexes
INFO     2018-04-14 07:00:02,976 api_server.py:308] Starting API server at: http://localhost:58270
INFO     2018-04-14 07:00:03,045 dispatcher.py:255] Starting module "default" running at: http://localhost:58271
INFO     2018-04-14 07:00:03,046 admin_server.py:146] Starting admin server at: http://localhost:58273
2018/04/14 16:00:03 Metadata fetch failed: Get http://metadata/computeMetadata/v1/instance/attributes/gae_backend_version: dial tcp: lookup metadata: no such host
exit status 1
FAIL    github.com/yuichi1004/goon      1.453s

@hamakn
Copy link

hamakn commented Apr 18, 2018

Using goapp test, all tests are passed on this branch in my environment.

$ git branch
  master
* topic/support_property_load_saver
$ goapp test ./...
ok      github.com/mjibson/goon 209.410s

@delphinus
Copy link

delphinus commented Apr 19, 2018

@yuichi1004

I think it should call .Load() in loading from g.cache in addition to from Memcache & Datastore. It calls .Load() only in deserializeStruct() and so it lacks the logic to load from g.cache.

https://github.com/yuichi1004/goon/blob/3d84cd76c6d15bf622305c3e8e9623603749fbb0/goon.go#L454-L458

		if s, present := g.cache[m]; present {
			if vi.Kind() == reflect.Interface {
				// Load() is needed here?
				vi = vi.Elem()
			}
			reflect.Indirect(vi).Set(reflect.Indirect(reflect.ValueOf(s)))

Is this just as you intended? It causes entities in g.cache differ from the ones in Memcache & Datastore.

@daisuzu
Copy link

daisuzu commented Apr 20, 2018

Hi, @yuichi1004

I tried this feature, but when Goon.Get() loads data from memcache, deserializeStruct() got errCacheFetchFailed.
It seems serializeStruct() may not be working properly, and each request has to load data from datastore.
Could you confirm this?

reproduction code
package goon

import (
	"testing"

	"google.golang.org/appengine/aetest"
	"google.golang.org/appengine/datastore"
	"google.golang.org/appengine/memcache"
)

type Value struct {
	ID     int64  `datastore:"-" goon:"id"`
	String string `datastore:"string"`
}

func (v *Value) Save() ([]datastore.Property, error) { return datastore.SaveStruct(v) }
func (v *Value) Load(p []datastore.Property) error   { return datastore.LoadStruct(v, p) }

func TestPLS(t *testing.T) {
	c, done, err := aetest.NewContext()
	if err != nil {
		t.Fatal(err)
	}
	defer done()
	g := FromContext(c)

	// Put
	v := Value{String: "test"}
	k, err := g.Put(&v)
	if err != nil {
		t.Fatal(err)
	}
	g.FlushLocalCache()

	// Get from datastore
	v1 := Value{ID: v.ID}
	if err := g.Get(&v1); err != nil {
		t.Fatal(err)
	}
	t.Log("from datastore:", v1)

	// Get from memcache
	s, err := memcache.Get(c, MemcacheKey(k))
	if err != nil {
		t.Fatal(err)
	}
	var v2 Value
	if err := deserializeStruct(&v2, s.Value); err != nil {
		t.Fatalf("could not deserialize `%v`: %s", s.Value, err) // err == errCacheFetchFailed
	}
	t.Log("from memcache:", v2)
}

@yuichi1004
Copy link
Contributor Author

@delphinus @daisuzu
Appreciate the feedback. Let me check that in detail over the weekend. I will keep you guys updated.

@yuichi1004
Copy link
Contributor Author

yuichi1004 commented Apr 21, 2018

@daisuzu Well, that's a silly mistake... 005ae6a
After the fix above, it looks working.

package goon

import (
	"testing"

	"google.golang.org/appengine/aetest"
	"google.golang.org/appengine/datastore"
	"google.golang.org/appengine/memcache"
)

type Value struct {
	ID     int64  `datastore:"-" goon:"id"`
	String string `datastore:"string"`
}

func (v *Value) Save() ([]datastore.Property, error) { return datastore.SaveStruct(v) }
func (v *Value) Load(p []datastore.Property) error   { return datastore.LoadStruct(v, p) }

func TestPLS(t *testing.T) {
	c, done, err := aetest.NewContext()
	if err != nil {
		t.Fatal(err)
	}
	defer done()
	g := FromContext(c)

	// Put
	v := Value{String: "test"}
	k, err := g.Put(&v)
	if err != nil {
		t.Fatal(err)
	}
	g.FlushLocalCache()

	// Get from datastore
	v1 := Value{ID: v.ID}
	if err := g.Get(&v1); err != nil {
		t.Fatal(err)
	}
	t.Log("from datastore:", v1)

	// Get from memcache
	s, err := memcache.Get(c, MemcacheKey(k))
	if err != nil {
		t.Fatal(err)
	}
	var v2 Value
	if err := deserializeStruct(&v2, s.Value); err != nil {
		t.Fatalf("could not deserialize `%v`: %s", s.Value, err) // err == errCacheFetchFailed
	}
	t.Log("from memcache (raw):", v2)

	v3 := Value{ID: v.ID}
	if err := g.Get(&v3); err != nil {
		t.Fatal(err)
	}
	t.Log("from memcache (thorugh goon):", v3)
}
--- PASS: TestPLS (3.98s)
        my_test.go:40: from datastore: {5629499534213120 test}
        my_test.go:51: from memcache (raw): {0 test}
        my_test.go:57: from memcache (thorugh goon): {5629499534213120 test}

@yuichi1004 yuichi1004 force-pushed the topic/support_property_load_saver branch from 7b65b06 to 005ae6a Compare April 21, 2018 02:43
@daisuzu
Copy link

daisuzu commented Apr 21, 2018

@yuichi1004

Ah, I see...
After 005ae6a, I confirmed it works well.
Thank you!

@yuichi1004
Copy link
Contributor Author

@delphinus I am still looking into that topic.
I think your argument is correct. I am trying to avoid using local cache if dst is PLS since current local cache does not have serialization process which makes difficult to introduce Save() and Load().

We can simplify all of those complexity if we fully migrated to PropertyList based implementation. If underlying core logic were written in PropertyList, then PLS struct and non-PLS struct can work on the same caching logic.

But for now, I will go with the easy approach since this is a bigger discussion than original topic.

@delphinus
Copy link

Agree.

If it contains PLS process for local cache, it should encode entities into cache instead of the raw structs as current logic does. It makes some performance impact, and so I also think your plan is good for this time.

@yuichi1004
Copy link
Contributor Author

I am still working on avoiding local cache for PLS struct. I got weird error related to re-using deserialization decoder.

I appreciate if anybody can provide the fix, otherwise I will work on next weekend.

entity.go Outdated
serializeStructMetaData(finalBuf[1:], smd) // Serialize the metadata
copy(finalBuf[finalBufSize-bufSize:], se.buf.Bytes()) // Copy the actual data
if ls, ok := src.(datastore.PropertyLoadSaver); ok {
se.buf.Write([]byte{serializationStatePropertyList})
Copy link
Collaborator

@xStrom xStrom Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the WriteByte method to skip the slice allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed b5dc7dc

goon_test.go Outdated
}

// Make sure that MigrationC implements datastore.PropertyLoadSaver
var _, _ datastore.PropertyLoadSaver = &MigrationPlsA{}, &MigrationPlsB{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MigrationC reference in the comment seems outdated. Should change the wording to something like: Make sure that these implement datastore.PropertyLoadSaver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed dd18dfd

goon_test.go Outdated
name string
ignoreFieldMismatch bool
src, dst MigrationEntity
}{
Copy link
Collaborator

@xStrom xStrom Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still test with IgnoreFieldMismatch on & off with all the variants, not just NormalCache -> NormalCache. Writing all of these out can get too verbose (8 cases), so we could still do a loop for i := 0; i < 2; i++ { like before and inside the loop append the 4 different types to the testcase slice, with IgnoreFieldMismatch set based on the loop variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xStrom
Copy link
Collaborator

xStrom commented Apr 23, 2018

I think it should call .Load() in loading from g.cache in addition to from Memcache & Datastore.

That doesn't sound right to me. The local cache should basically be just a pointer. [1]

Thus, with a reference-based cache, whether we load the serialized data from datastore or memcache, we'll execute the Load logic then and just keep a pointer to the already-loaded struct. Thus we don't need to call Load again when returning from local cache.


[1] There has been some previous discussion over reference-based vs. copy-based local cache in PR #27. As the years have passed I've become even more convinced that a referenced-based cache is superior. Unfortunately I haven't gotten around to writing tests for this yet, so there may be some edge cases where shallow copies are being produced unintentionally. However there definitely isn't a proper copy-based system.

@xStrom
Copy link
Collaborator

xStrom commented Apr 23, 2018

I've now gone over the changes once. I pointed out some basic issues. I also have some more fundamental thoughts on the implementation strategy. I'll need a bit more time to think this over and then I'll comment on it.

@yuichi1004
Copy link
Contributor Author

@xStrom thanks for your review and comment. For the discussion about reference-based cache over copy-based cache, I actually like reference-based cache.

My assumption was some users use Load() func as a hook the event. It does make sense to me for Save() case, for example updating timestamps on UpdatedAt fields. But for Load() case, I could not come up with the concrete situation. So may be we can skip calling Load() for local cache.

@delphinus correct me if you need Load() to be called for some specific use cases.

@delphinus
Copy link

@xStrom I understand. This behavior is correct only when Load() and Save() is pseudo-“inverse functions”, I think.

type Entity {
  ID        int64 `datastore:"-" goon:"id"`
  Foo       string
  Unix      int64
  CreatedAt time.Time `datastore:"-"`
}

func (e *Entity) Load([]datastore.Property), error) {
  _ = datastore.LoadStruct(e)
  e.CreatedAt = time.Unix(e.Unix, 0)
  return nil
}

func (e *Entity) Save() ([]datastore.Property, error) {
  e.Unix = e.CreatedAt.Unix()
  return datastore.SaveStruct(e)
}

The code above runs good with goon and the raw datastore because Load() is inverse from Save() and vice versa, and they have no side effects.

func (e *Entity) Load([]datastore.Property), error) {
  _ = datastore.LoadStruct(e)
  e.CreatedAt = time.Unix(e.Unix, 0)
  e.Foo = e.Foo + " Loaded!!" // This breaks symmetry
  return nil
}

func (e *Entity) Save() ([]datastore.Property, error) {
  e.Unix = e.CreatedAt.Unix()
  e.Foo = e.Foo + " Saved!" // This breaks symmetry
  return datastore.SaveStruct(e)
}

But how about above? They are both inverse, but have side effects. With datastore package, Get after Put shows the entity has the Foo property is Loaded! Saved!. But with goon, it shows only Saved! because Load() is not called for local cache.

test log (failed)
=== RUN   TestGetAfterPut
INFO     2018-04-25 01:05:00,749 devappserver2.py:105] Skipping SDK update check.
WARNING  2018-04-25 01:05:00,749 devappserver2.py:121] DEFAULT_VERSION_HOSTNAME will not be set correctly with --port=0
WARNING  2018-04-25 01:05:00,793 simple_search_stub.py:1196] Could not read search indexes from /var/folders/fk/xlnw71yx7yd26pqby31xgdrdtbpfqh/T/appengine
testapp.jinnouchi.yasushi/search_indexes
INFO     2018-04-25 01:05:00,796 api_server.py:265] Starting API server at: http://localhost:58832
INFO     2018-04-25 01:05:00,800 dispatcher.py:255] Starting module "default" running at: http://localhost:58833
INFO     2018-04-25 01:05:00,802 admin_server.py:152] Starting admin server at: http://localhost:58834
2018/04/25 10:05:01 DEBUG: saved e: goon.Entity{ID:5629499534213120, Foo:" Saved!", Unix:1545568496, CreatedAt:time.Time{wall:0x0, ext:63681165296, loc:(*
ime.Location)(nil)}}
2018/04/25 10:05:01 DEBUG: get entity: goon.Entity{ID:5629499534213120, Foo:" Saved!", Unix:1545568496, CreatedAt:time.Time{wall:0x0, ext:63681165296, loc
(*time.Location)(nil)}}
--- FAIL: TestGetAfterPut (3.63s)
        hoge_test.go:61:
                        Error Trace:    hoge_test.go:61
                        Error:          Not equal:
                                        expected: " Saved! Loaded!"
                                        actual  : " Saved!"
                        Test:           TestGetAfterPut
                        Messages:       Load() is called after Save()
FAIL
exit status 1
FAIL    github.com/yuichi1004/goon      3.645s

test code for this

Yes, this is not much practical. But someone may need simply hooks there (as @yuichi1004 says) and he/she might be at a loss this behavior that differs from datastore's one.

This cannot be solved easily and that “side effectable” Load()/Save() may be undesired from the beginning. So I think it is good if you add comments that warns this somewhere.

@xStrom
Copy link
Collaborator

xStrom commented Apr 25, 2018

Thanks for the test case, it illustrates an interesting case. In my previous comment I talked about how Load is always executed and the results cached. However here, indeed, you have provided a case where Load won't be executed even once.

What happens is that Put populates the local cache. What we could do is check per entity if it's PLS or not. If PLS then never populate local cache with Put. That way the first Get after Put would go to the datastore, execute Load and cache the result. Additional Get calls would use the local cache, but that cached entity would have the at-least-once Load guarantee.


On a somewhat related topic, of course there are still cases where if you Put a struct, and then Get the same entity but into a different struct or PLS, with the same goon context, it won't work as expected. The local cache does not support any migration at all currently. There are ways to support migration, for example with a PropertyList-based cache. However this has a high memory cost and a performance cost, for a feature that almost nobody actually needs. Thus I think we would need people to come forward with complaints until we start thinking about supporting local cache migrations. This should be better documented and I'll try to get that done.

@yuichi1004
Copy link
Contributor Author

yuichi1004 commented Apr 28, 2018

@xStrom @delphinus Thanks for your discussion. Let me clarify the action items

  • Load() never called in case @delphinus pointed out
    • I am working on the fix to avoid local cache for PUT if the struct is PLS
  • Several fixes @xStrom commented
    • I am working on the fix
  • I am still waiting for @xStrom 's thought about the fundamental design

@xStrom
Copy link
Collaborator

xStrom commented Apr 28, 2018

Speaking broadly, I think having a unified strategy for structs/PLS is superior. It would be fewer algorithms to reason about and probably less chance for bugs to be introduced in the future when changes are made. It would also allow us to fully support migrations between types. As opposed to the current situation where we can't load a PropertyList-based cache into a struct.

Then there's still the question of which type of unified strategy to use. Depend more on the datastore package and do everything with PropertyList or use the goon's serialization engine for everything? There are benefits on both sides. The PropertyList based solution would be a lot less code, less room for us to make mistakes. The goon serialization engine however is future-compatible [1], uses less storage and is slightly faster. [2] In addition the custom serialization would allow easier support for potential complex features like supporting maps without PLS, although admittedly there is no current work going on to do this.

Intuitively I think the correct solution is a hybrid of these two strategies, though leaning more towards goon's custom serialization. However this is a non-trivial undertaking. There's a saying that perfect is the enemy of good. I think that applies here as well.

We should continue with the current split strategy that we have in this PR. We only need some smaller changes, including a few that I'll comment on soon. Then we can get this merged as an initial working version. Later on I'll open up another PR for work towards a unified solution, which can have its own schedule and won't hold back this PR.


[1] One scenario where just encoding PropertyList with gob would fail is if Google adds some new pointer/interface to Property that can be nil. gob fails at encoding nil values and would break goon until we release a patch.

[2] Really old goon versions actually just used gob to serialize structs. Some of the benefits of moving to a custom serialization engine were reduced space usage and slight performance gains. The space usage in particular is interesting, because entities designed to neatly fit the 1MB datastore limit should also fit into the 1MB memcache limit. Just encoding Property saves field names like "NoIndex" and "Multiple", which take extra space and can be omitted in a custom solution, with just the values saved.

type nilType struct {
GoonNilValue string
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's cut down on allocations and create a reusable value of this: var nilValue = nilType{}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 0ffa7ca

entity.go Outdated
gob.Register(seBoot.v13)
gob.Register(seBoot.v14)
gob.Register(seBoot.v15)
gob.Register(nilType{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the new nilValue instead of allocating a new one with nilType{}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 0ffa7ca

entity.go Outdated
for i := 0; i < len(props); i++ {
v := reflect.ValueOf(props[i].Value)
if v.Kind() == reflect.Ptr && v.IsNil() {
props[i].Value = nilType{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the new nilValue instead of allocating a new one with nilType{}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 0ffa7ca

entity.go Outdated
}
for i := 0; i < len(props); i++ {
nilVal := nilType{}
if props[i].Value == nilVal {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare to the new nilValue instead of the local nilVal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 0ffa7ca

@yuichi1004
Copy link
Contributor Author

@xStrom I fixed where you made comments. Please take a look.

While I was working on the test fix, I found a bug on caching slice. That's why I ended up with like
1c5058d.

I created an issue since this is not related to PLS and we would like to discuss and to fix in another PR I guess.

Issue:
#76

@xStrom
Copy link
Collaborator

xStrom commented Apr 28, 2018

Reusing dst isn't a good idea, but not because of a bug as I explained in #76. Using a factory function is a decent solution but I think having 8 different entries in testcase is even better, because then we keep all the test case configuration together in once place (in the struct).

Currently we have:

for _, tt := range testcase {
    for _, IgnoreFieldMismatch = range []bool{true, false} {

What I had in mind was something like this:

for _, ifm := range []bool{true, false} {
    testcase = append(testcase, 
        {
            name:                fmt.Sprintf("NormalCache -> NormalCache (IgnoreFieldMismatch:%v)", ifm),
            ignoreFieldMismatch: ifm,
            src:                 &MigrationA{Parent: parentKey, Id: 1},
            dst:                 &MigrationB{Parent: parentKey, Identification: 1},
        })
    // .. and the 3 other appends as well, with ignoreFieldMismatch based on loop variable ifm
}

This way we would keep the test configuration together and we would also have a fresh dst for every case, so we wouldn't run into the slice reuse issue.

@yuichi1004
Copy link
Contributor Author

@xStrom I'm sorry to took this long time. I applied your feedback.
972a214

goon_test.go Outdated
migA = &MigrationA{Parent: parentKey, Id: 1}
if err := g.Get(migA); err != nil {
t.Errorf("Unexpected error on Get: %v", err)
for _, IgnoreFieldMismatch := range []bool{true, false} {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop declares a local scope variable called IgnoreFieldMismatch which hides the global one. I think the easiest fix is to change := to =.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's a silly one...
Let me fix that quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 7bb8fe0

@xStrom
Copy link
Collaborator

xStrom commented May 5, 2018

After we get this one final minor issue fixed (:= to =), I think this PR is in a good enough state to get merged. Unless you still want to add something more here.

@xStrom
Copy link
Collaborator

xStrom commented May 5, 2018

Actually one other small thing. Could you also modify the MemcacheKey function in goon.go from version g2 to g3. Just to be safe.

// MemcacheKey returns key string of Memcache.
var MemcacheKey = func(k *datastore.Key) string {
	// Versioning, so that incompatible changes to the cache system won't cause problems
	return "g3:" + k.Encode()
}

@yuichi1004
Copy link
Contributor Author

@xStrom I fixed what you pointed out.

@xStrom
Copy link
Collaborator

xStrom commented May 6, 2018

Great! I will merge this tomorrow (1 year anniversary of the PR 😄) unless someone objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changes by PropertyLoadSaver.Load aren't cached.
6 participants