Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Encoder to use new CBOR library Streaming API (Phase 1) #796

Closed
turbolent opened this issue Apr 13, 2021 · 4 comments · Fixed by #830
Closed

Change Encoder to use new CBOR library Streaming API (Phase 1) #796

turbolent opened this issue Apr 13, 2021 · 4 comments · Fixed by #830
Assignees

Comments

@turbolent
Copy link
Member

Issue To Be Solved

The Cadence storage layer can be optimized to encode faster by using a new streaming encoding API of and fxamacker/cbor, being introduced in #795.

Suggestion

By using streaming mode for encoding, we can eliminate intermediate objects being created when encoding large composites and dictionaries. Other worthwhile optimizations are also possible by providing a low-level API in fxamacker/cbor.

Preliminary benchmark comparisons show performance gains are possible by encoding to CBOR in streaming mode. Adding streaming mode to fxamacker/cbor was one of the ideas proposed by @dete and @turbolent during our first chat.

The effort required to add streaming mode has been simplified by replacing all CBORMap with CBORArray in the various PRs for issue #738. We don't need to deal with maps. 🎉

Proposed Changes

Modify the Cadence storage layer to use the new streaming encoding API, specifically Phase 1.

💡 Phase 1 provides a new API that helps eliminate the creation of intermediate objects when encoding large arrays, composites, and dictionaries. Tests using unoptimized new API show encoding is 12% faster for a 776,155 byte 'Value'. Memory usage improved more than speed.

Caveats

The streaming encoding API will not perform validation, so keep the validation step that is already performed currently after encoding.

@turbolent turbolent self-assigned this Apr 13, 2021
@fxamacker
Copy link
Member

@turbolent here's the proof-of-concept code I used.

Most notable changes are in encodeArray, encodeCompositeValue, and encodeDictionaryValue. These used to be called prepareArray, etc.

And these two functions were combined:

  • func (e *Encoder) Encode(v Value, path []string, deferrals *EncodingDeferrals) error
  • func (e *Encoder) prepare(v Value, path []string, deferrals *EncodingDeferrals) (interface{}, error)
git diff encode.go
diff --git a/runtime/interpreter/encode.go b/runtime/interpreter/encode.go
index e28b4f9e..3dcecddf 100644
--- a/runtime/interpreter/encode.go
+++ b/runtime/interpreter/encode.go
@@ -199,7 +199,7 @@ type EncodingPrepareCallback func(value Value, path []string)
 // Encoder converts Values into CBOR-encoded bytes.
 //
 type Encoder struct {
-	enc             *cbor.Encoder
+	enc             *cbor.StreamEncoder
 	deferred        bool
 	prepareCallback EncodingPrepareCallback
 }
@@ -235,6 +235,12 @@ func EncodeValue(value Value, path []string, deferred bool, prepareCallback Enco
 		return nil, nil, err
 	}
 
+	// Write streamed data to writer.
+	err = enc.enc.Flush()
+	if err != nil {
+		return nil, nil, err
+	}
+
 	data := w.Bytes()
 	err = decMode.Valid(data)
 	if err != nil {
@@ -261,7 +267,7 @@ var encMode = func() cbor.EncMode {
 // to the given io.Writer.
 //
 func NewEncoder(w io.Writer, deferred bool, prepareCallback EncodingPrepareCallback) (*Encoder, error) {
-	enc := encMode.NewEncoder(w)
+	enc := encMode.NewStreamEncoder(w)
 	return &Encoder{
 		enc:             enc,
 		deferred:        deferred,
@@ -280,161 +286,151 @@ func (e *Encoder) Encode(
 	path []string,
 	deferrals *EncodingDeferrals,
 ) error {
-	prepared, err := e.prepare(v, path, deferrals)
-	if err != nil {
-		return err
-	}
-
-	return e.enc.Encode(prepared)
-}
-
-// prepare traverses the object graph of the provided value and returns
-// the representation for the value that can be marshalled to CBOR.
-//
-func (e *Encoder) prepare(
-	v Value,
-	path []string,
-	deferrals *EncodingDeferrals,
-) (
-	interface{},
-	error,
-) {
 	if e.prepareCallback != nil {
 		e.prepareCallback(v, path)
 	}
 
+	var prepared interface{}
+	var err error
+
 	switch v := v.(type) {
 
 	case NilValue:
-		return e.prepareNil(), nil
+		prepared = e.prepareNil()
 
 	case VoidValue:
-		return e.prepareVoid(), nil
+		prepared = e.prepareVoid()
 
 	case BoolValue:
-		return e.prepareBool(v), nil
+		prepared = e.prepareBool(v)
 
 	case AddressValue:
-		return e.prepareAddressValue(v), nil
+		prepared = e.prepareAddressValue(v)
 
 	// Int*
 
 	case IntValue:
-		return e.prepareInt(v), nil
+		prepared = e.prepareInt(v)
 
 	case Int8Value:
-		return e.prepareInt8(v), nil
+		prepared = e.prepareInt8(v)
 
 	case Int16Value:
-		return e.prepareInt16(v), nil
+		prepared = e.prepareInt16(v)
 
 	case Int32Value:
-		return e.prepareInt32(v), nil
+		prepared = e.prepareInt32(v)
 
 	case Int64Value:
-		return e.prepareInt64(v), nil
+		prepared = e.prepareInt64(v)
 
 	case Int128Value:
-		return e.prepareInt128(v), nil
+		prepared = e.prepareInt128(v)
 
 	case Int256Value:
-		return e.prepareInt256(v), nil
+		prepared = e.prepareInt256(v)
 
 	// UInt*
 
 	case UIntValue:
-		return e.prepareUInt(v), nil
+		prepared = e.prepareUInt(v)
 
 	case UInt8Value:
-		return e.prepareUInt8(v), nil
+		prepared = e.prepareUInt8(v)
 
 	case UInt16Value:
-		return e.prepareUInt16(v), nil
+		prepared = e.prepareUInt16(v)
 
 	case UInt32Value:
-		return e.prepareUInt32(v), nil
+		prepared = e.prepareUInt32(v)
 
 	case UInt64Value:
-		return e.prepareUInt64(v), nil
+		prepared = e.prepareUInt64(v)
 
 	case UInt128Value:
-		return e.prepareUInt128(v), nil
+		prepared = e.prepareUInt128(v)
 
 	case UInt256Value:
-		return e.prepareUInt256(v), nil
+		prepared = e.prepareUInt256(v)
 
 	// Word*
 
 	case Word8Value:
-		return e.prepareWord8(v), nil
+		prepared = e.prepareWord8(v)
 
 	case Word16Value:
-		return e.prepareWord16(v), nil
+		prepared = e.prepareWord16(v)
 
 	case Word32Value:
-		return e.prepareWord32(v), nil
+		prepared = e.prepareWord32(v)
 
 	case Word64Value:
-		return e.prepareWord64(v), nil
+		prepared = e.prepareWord64(v)
 
 	// Fix*
 
 	case Fix64Value:
-		return e.prepareFix64(v), nil
+		prepared = e.prepareFix64(v)
 
 	// UFix*
 
 	case UFix64Value:
-		return e.prepareUFix64(v), nil
+		prepared = e.prepareUFix64(v)
 
 	// String
 
 	case *StringValue:
-		return e.prepareString(v), nil
+		prepared = e.prepareString(v)
 
 	// Collections
 
 	case *ArrayValue:
-		return e.prepareArray(v, path, deferrals)
+		return e.encodeArray(v, path, deferrals)
 
 	case *DictionaryValue:
-		return e.prepareDictionaryValue(v, path, deferrals)
+		return e.encodeDictionaryValue(v, path, deferrals)
 
 	// Composites
 
 	case *CompositeValue:
-		return e.prepareCompositeValue(v, path, deferrals)
+		return e.encodeCompositeValue(v, path, deferrals)
 
 	// Some
 
 	case *SomeValue:
-		return e.prepareSomeValue(v, path, deferrals)
+		return e.encodeSomeValue(v, path, deferrals)
 
 	// Storage
 
 	case *StorageReferenceValue:
-		return e.prepareStorageReferenceValue(v), nil
+		prepared = e.prepareStorageReferenceValue(v)
 
 	case PathValue:
-		return e.preparePathValue(v), nil
+		prepared = e.preparePathValue(v)
 
 	case CapabilityValue:
-		return e.prepareCapabilityValue(v)
+		prepared, err = e.prepareCapabilityValue(v)
 
 	case LinkValue:
-		return e.prepareLinkValue(v)
+		prepared, err = e.prepareLinkValue(v)
 
 	// Type
 
 	case TypeValue:
-		return e.prepareTypeValue(v)
+		prepared, err = e.prepareTypeValue(v)
 
 	default:
-		return nil, EncodingUnsupportedValueError{
+		return EncodingUnsupportedValueError{
 			Path:  path,
 			Value: v,
 		}
 	}
+
+	// Encode prepared value.
+	if err != nil {
+		return err
+	}
+	return e.enc.Encode(prepared)
 }
 
 func (e *Encoder) prepareNil() interface{} {
@@ -636,26 +632,22 @@ func joinPathElements(elements ...string) string {
 	return strings.Join(elements, pathSeparator)
 }
 
-func (e *Encoder) prepareArray(
+func (e *Encoder) encodeArray(
 	v *ArrayValue,
 	path []string,
 	deferrals *EncodingDeferrals,
-) (
-	[]interface{},
-	error,
-) {
-	result := make([]interface{}, len(v.Values))
+) error {
+	e.enc.EncodeArrayHead(uint64(len(v.Values)))
 
 	for i, value := range v.Values {
 		valuePath := append(path[:], strconv.Itoa(i))
-		prepared, err := e.prepare(value, valuePath, deferrals)
+		err := e.Encode(value, valuePath, deferrals)
 		if err != nil {
-			return nil, err
+			return err
 		}
-		result[i] = prepared
 	}
 
-	return result, nil
+	return nil
 }
 
 // NOTE: NEVER change, only add/increment; ensure uint64
@@ -673,19 +665,31 @@ const (
 const dictionaryKeyPathPrefix = "k"
 const dictionaryValuePathPrefix = "v"
 
-func (e *Encoder) prepareDictionaryValue(
+// encodeDictionaryValue encodes DictionaryValue as
+// cbor.Tag{
+//			Number: cborTagDictionaryValue,
+//			Content: cborArray{
+//				encodedDictionaryValueKeysFieldKey:    keys,
+//				encodedDictionaryValueEntriesFieldKey: entries,
+//			},
+//		}
+func (e *Encoder) encodeDictionaryValue(
 	v *DictionaryValue,
 	path []string,
 	deferrals *EncodingDeferrals,
-) (
-	interface{},
-	error,
-) {
+) error {
+	// Encode CBOR tag head
+	e.enc.EncodeTagHead(cborTagDictionaryValue)
+
+	// Encode CBOR array head with 2 elements
+	e.enc.EncodeArrayHead(uint64(encodedDictionaryValueLength))
+
+	// Encode array of keys as 1st element
 	keysPath := append(path[:], dictionaryKeyPathPrefix)
 
-	keys, err := e.prepareArray(v.Keys, keysPath, deferrals)
+	err := e.encodeArray(v.Keys, keysPath, deferrals)
 	if err != nil {
-		return nil, err
+		return err
 	}
 
 	// Deferring the encoding of values is only supported if all
@@ -715,9 +719,11 @@ func (e *Encoder) prepareDictionaryValue(
 
 	// entries is a CBOR array (not CBOR map) to improve speed and
 	// preserve ordering.
-	entries := make(cborArray, entriesLength)
 
-	for i, keyValue := range v.Keys.Values {
+	// Encode array of entries as 2nd element
+	e.enc.EncodeArrayHead(uint64(entriesLength))
+
+	for _, keyValue := range v.Keys.Values {
 		key := dictionaryKey(keyValue)
 		entryValue, _ := v.Entries.Get(key)
 		valuePath := append(path[:], dictionaryValuePathPrefix, key)
@@ -764,22 +770,14 @@ func (e *Encoder) prepareDictionaryValue(
 				}
 			}
 		} else {
-			var prepared interface{}
-			prepared, err = e.prepare(entryValue, valuePath, deferrals)
+			err = e.Encode(entryValue, valuePath, deferrals)
 			if err != nil {
-				return nil, err
+				return err
 			}
-			entries[i] = prepared
 		}
 	}
 
-	return cbor.Tag{
-		Number: cborTagDictionaryValue,
-		Content: cborArray{
-			encodedDictionaryValueKeysFieldKey:    keys,
-			encodedDictionaryValueEntriesFieldKey: entries,
-		},
-	}, nil
+	return nil
 }
 
 // NOTE: NEVER change, only add/increment; ensure uint64
@@ -797,65 +795,76 @@ const (
 	encodedCompositeValueLength int = 5
 )
 
-func (e *Encoder) prepareCompositeValue(
+// encodeCompositeValue encodes CompositeValue as
+//	cbor.Tag{
+//		Number: cborTagCompositeValue,
+//		Content: cborArray{
+//			encodedCompositeValueLocationFieldKey:            location,
+//			encodedCompositeValueTypeIDFieldKey:              nil,
+//			encodedCompositeValueKindFieldKey:                uint(v.Kind),
+//			encodedCompositeValueFieldsFieldKey:              fields,
+//			encodedCompositeValueQualifiedIdentifierFieldKey: v.QualifiedIdentifier,
+//		},
+//	}
+func (e *Encoder) encodeCompositeValue(
 	v *CompositeValue,
 	path []string,
 	deferrals *EncodingDeferrals,
-) (
-	interface{},
-	error,
-) {
-	fields := make(cborArray, v.Fields.Len()*2)
+) error {
+	// Encode CBOR tag head
+	e.enc.EncodeTagHead(cborTagCompositeValue)
+
+	// Encode CBOR array head with 5 elements
+	e.enc.EncodeArrayHead(uint64(encodedCompositeValueLength))
+
+	//Encode location as 1st element
+	location, err := e.prepareLocation(v.Location)
+	if err != nil {
+		return err
+	}
+	e.enc.Encode(location)
+
+	// Element 2 is obsolete
+	e.enc.Encode(nil)
+
+	// Encode kind as 3rd element
+	e.enc.Encode(uint(v.Kind))
+
+	// Encode fields array as 4th element
+	e.enc.EncodeArrayHead(uint64(v.Fields.Len() * 2))
 
-	i := 0
 	for pair := v.Fields.Oldest(); pair != nil; pair = pair.Next() {
 		fieldName := pair.Key
+
+		err := e.enc.Encode(fieldName)
+		if err != nil {
+			return err
+		}
+
 		value := pair.Value
 
 		valuePath := append(path[:], fieldName)
 
-		prepared, err := e.prepare(value, valuePath, deferrals)
+		err = e.Encode(value, valuePath, deferrals)
 		if err != nil {
-			return nil, err
+			return err
 		}
-
-		fields[i], fields[i+1] = fieldName, prepared
-		i += 2
 	}
 
-	location, err := e.prepareLocation(v.Location)
-	if err != nil {
-		return nil, err
-	}
+	// Encode qualified identifier as 5th element
+	e.enc.Encode(v.QualifiedIdentifier)
 
-	return cbor.Tag{
-		Number: cborTagCompositeValue,
-		Content: cborArray{
-			encodedCompositeValueLocationFieldKey:            location,
-			encodedCompositeValueKindFieldKey:                uint(v.Kind),
-			encodedCompositeValueFieldsFieldKey:              fields,
-			encodedCompositeValueQualifiedIdentifierFieldKey: v.QualifiedIdentifier,
-		},
-	}, nil
+	return nil
 }
 
-func (e *Encoder) prepareSomeValue(
+func (e *Encoder) encodeSomeValue(
 	v *SomeValue,
 	path []string,
 	deferrals *EncodingDeferrals,
-) (
-	interface{},
-	error,
-) {
-	prepared, err := e.prepare(v.Value, path, deferrals)
-	if err != nil {
-		return nil, err
-	}
+) error {
+	e.enc.EncodeTagHead(cborTagSomeValue)
 
-	return cbor.Tag{
-		Number:  cborTagSomeValue,
-		Content: prepared,
-	}, nil
+	return e.Encode(v.Value, path, deferrals)
 }
 
 // NOTE: NEVER change, only add/increment; ensure uint64

@turbolent
Copy link
Member Author

@fxamacker The diff look good! Great idea to document the structure that is encoded, as it is now not as obvious anymore. Let me know when a branch is ready on the CBOR library side (no rush) and I'll make these changes here on the Cadence

@fxamacker
Copy link
Member

@turbolent thanks! Updating/debugging fuzz tests for streaming encoder today took longer than expected. Round-trip fuzz tests on randomly generated data can get interesting if the same data can be encoded in different ways and still be valid. Fuzzing passed 500,000 execs now (~15 mins on a throttled cpu)...I'll have the branch ready either tonight or 1st thing in the morning.

@turbolent
Copy link
Member Author

Very cool! It's not urgent, please take your time 🙂

fxamacker added a commit to fxamacker/cadence that referenced this issue Apr 22, 2021
Used streaming mode for encoding to eliminate intermediate objects
being created when encoding large composites and dictionaries.

Used low-level API for encoding to reduce CBOR library's internal
calls to Go's reflect package (to boost speed).

Modify go.mod to use fxamacker/cbor branch feature/stream-mode
before it's merged to master.

Closes onflow#794
Closes onflow#796
Closes onflow#807
turbolent pushed a commit that referenced this issue Apr 26, 2021
Used streaming mode for encoding to eliminate intermediate objects
being created when encoding large composites and dictionaries.

Used low-level API for encoding to reduce CBOR library's internal
calls to Go's reflect package (to boost speed).

Modify go.mod to use fxamacker/cbor branch feature/stream-mode
before it's merged to master.

Closes #794
Closes #796
Closes #807
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants