Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace badgerhold with directly using BadgerDB v4 #376

Merged
merged 6 commits into from
Apr 1, 2024

Conversation

werbenhu
Copy link
Member

Refer to #375 .

@coveralls
Copy link

coveralls commented Mar 22, 2024

Pull Request Test Coverage Report for Build 8466241881

Details

  • 108 of 112 (96.43%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 98.802%

Changes Missing Coverage Covered Lines Changed/Added Lines %
hooks/storage/badger/badger.go 108 112 96.43%
Totals Coverage Status
Change from base Build 8465554775: 0.1%
Covered Lines: 6104
Relevant Lines: 6178

💛 - Coveralls

@werbenhu werbenhu changed the title Replace badgerhold with directly using BadgerDB v4. Replace badgerhold with directly using BadgerDB v4.(Not ready to merge) Mar 22, 2024
@werbenhu werbenhu changed the title Replace badgerhold with directly using BadgerDB v4.(Not ready to merge) Replace badgerhold with directly using BadgerDB v4 Mar 28, 2024
@werbenhu
Copy link
Member Author

@mochi-co Not ready to merge yet.

@werbenhu
Copy link
Member Author

I believe there are no changes needed on my end. Please review and merge. I came across some information about Badger GC here: dgraph-io/badger#767. Perhaps there's no issue with the GC.

Copy link
Collaborator

@thedevop thedevop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used Badger before, so can't provide too much feedback. I did look into the the compaction/GC, that part looks good.

One feedback is current Marshal/UnMarshal is using JSON, should we consider changing to gob?

@werbenhu
Copy link
Member Author

werbenhu commented Mar 30, 2024

@thedevop If considering serialization, I don't think gob would be the best choice. Please refer to: https://github.com/alecthomas/go_serialization_benchmarks. Personally, I believe sticking to the standard library would suffice. When it comes to choosing a serialization library, different users may have different preferences, much like when we chose a logging library before.

@thedevop
Copy link
Collaborator

thedevop commented Mar 31, 2024

Understand. I was just mentioning as we have binary data, and json is not efficient as it use base64 encoding. I normally would not recommend gob (as it is only for Go) for any network based service, but the use case here maybe appropriate as it is just to store/load what was in-memory for the app. BTW, gob is part of standard Go library.

@werbenhu
Copy link
Member Author

I've noticed that badgerhold is using gob. If we decide to use gob, then simply updating all UnmarshalBinary and MarshalBinary under the storage package to use gob would suffice. @thedevop I think it might be better for you to submit another PR for this change.

@thedevop
Copy link
Collaborator

thedevop commented Apr 1, 2024

I can do that, let's hear from @mochi-co first.

@werbenhu
Copy link
Member Author

werbenhu commented Apr 1, 2024

@thedevop @mochi-co I've written a benchmark test case. You can refer to it and check if there's something wrong.
Just modify it and run to see the results. My results are as follows:

go version go1.22.1 windows/amd64
cpu: AMD Ryzen 5 5600H with Radeon Graphics
Benchmark_Gob_Marshal-12          149014              7534 ns/op               902.0 B/serial       4640 B/op         85 allocs/op
Benchmark_Gob_Unmarshal-12         43809             27506 ns/op               902.0 B/serial      14558 B/op        367 allocs/op
Benchmark_Json_Marshal-12        1000000              1029 ns/op               844.0 B/serial        896 B/op          1 allocs/op
Benchmark_Json_Unmarshal-12       169522              6622 ns/op               844.0 B/serial        824 B/op          9 allocs/op
package serialization

import (
	"bytes"
	"encoding/gob"
	"encoding/json"
	"fmt"
	"math"
	"math/rand"
	"testing"
)

// UserProperty is an arbitrary key-value pair for a packet user properties array.
type UserProperty struct { // [MQTT-1.5.7-1]
	Key string `json:"k"`
	Val string `json:"v"`
}

// Client is a storable representation of an MQTT client.
type Client struct {
	Will            ClientWill       `json:"will"`            // will topic and payload data if applicable
	Properties      ClientProperties `json:"properties"`      // the connect properties for the client
	Username        []byte           `json:"username"`        // the username of the client
	ID              string           `json:"id" storm:"id"`   // the client id / storage key
	T               string           `json:"t"`               // the data type (client)
	Remote          string           `json:"remote"`          // the remote address of the client
	Listener        string           `json:"listener"`        // the listener the client connected on
	ProtocolVersion byte             `json:"protocolVersion"` // mqtt protocol version of the client
	Clean           bool             `json:"clean"`           // if the client requested a clean start/session
}

// ClientProperties contains a limited set of the mqtt v5 properties specific to a client connection.
type ClientProperties struct {
	AuthenticationData        []byte         `json:"authenticationData"`
	User                      []UserProperty `json:"user"`
	AuthenticationMethod      string         `json:"authenticationMethod"`
	SessionExpiryInterval     uint32         `json:"sessionExpiryInterval"`
	MaximumPacketSize         uint32         `json:"maximumPacketSize"`
	ReceiveMaximum            uint16         `json:"receiveMaximum"`
	TopicAliasMaximum         uint16         `json:"topicAliasMaximum"`
	SessionExpiryIntervalFlag bool           `json:"sessionExpiryIntervalFlag"`
	RequestProblemInfo        byte           `json:"requestProblemInfo"`
	RequestProblemInfoFlag    bool           `json:"requestProblemInfoFlag"`
	RequestResponseInfo       byte           `json:"requestResponseInfo"`
}

// ClientWill contains a will message for a client, and limited mqtt v5 properties.
type ClientWill struct {
	Payload           []byte         `json:"payload"`
	User              []UserProperty `json:"user"`
	TopicName         string         `json:"topicName"`
	Flag              uint32         `json:"flag"`
	WillDelayInterval uint32         `json:"willDelayInterval"`
	Qos               byte           `json:"qos"`
	Retain            bool           `json:"retain"`
}

type Serializer interface {
	Marshal(o interface{}) ([]byte, error)
	Unmarshal(d []byte, o interface{}) error
}

func randString(l int) string {
	buf := make([]byte, l)
	for i := 0; i < (l+1)/2; i++ {
		buf[i] = byte(rand.Intn(256))
	}
	return fmt.Sprintf("%x", buf)[:l]
}

func generate() []*Client {
	a := make([]*Client, 0, 1000)
	for i := 0; i < 1000; i++ {
		a = append(a, &Client{
			ID: randString(16),
			Properties: ClientProperties{
				MaximumPacketSize: math.MaxUint32,
			},
			Will: ClientWill{
				Payload: []byte(randString(256)),
			},
		})
	}
	return a
}

func benchMarshal(b *testing.B, s Serializer) {
	b.Helper()
	data := generate()
	b.ReportAllocs()
	b.ResetTimer()
	var serialSize int
	for i := 0; i < b.N; i++ {
		o := data[rand.Intn(len(data))]
		bytes, err := s.Marshal(o)
		if err != nil {
			b.Fatalf("marshal error %s for %#v", err, o)
		}
		serialSize += len(bytes)
	}
	b.ReportMetric(float64(serialSize)/float64(b.N), "B/serial")
}

func benchUnmarshal(b *testing.B, s Serializer) {
	b.Helper()
	b.StopTimer()
	data := generate()
	ser := make([][]byte, len(data))
	var serialSize int
	for i, d := range data {
		o, err := s.Marshal(d)
		if err != nil {
			b.Fatal(err)
		}
		t := make([]byte, len(o))
		serialSize += copy(t, o)
		ser[i] = t
	}
	b.ReportMetric(float64(serialSize)/float64(len(data)), "B/serial")
	b.ReportAllocs()
	b.StartTimer()

	for i := 0; i < b.N; i++ {
		n := rand.Intn(len(ser))
		o := &Client{}
		err := s.Unmarshal(ser[n], o)
		if err != nil {
			b.Fatalf("unmarshal error %s for %#x / %q", err, ser[n], ser[n])
		}
		// Validate unmarshalled data.
		i := data[n]
		correct := o.ID == i.ID
		if !correct {
			b.Fatalf("unmarshaled object differed:\n%v\n%v", i, o)
		}
	}
}

type GobSerializer struct{}

func (g *GobSerializer) Marshal(o interface{}) ([]byte, error) {
	var buf bytes.Buffer
	err := gob.NewEncoder(&buf).Encode(o)
	return buf.Bytes(), err
}

func (g *GobSerializer) Unmarshal(d []byte, o interface{}) error {
	return gob.NewDecoder(bytes.NewReader(d)).Decode(o)
}

func NewGobSerializer() *GobSerializer {
	// registration required before first use
	gob.Register(Client{})
	return &GobSerializer{}
}

func Benchmark_Gob_Marshal(b *testing.B) {
	s := NewGobSerializer()
	benchMarshal(b, s)
}

func Benchmark_Gob_Unmarshal(b *testing.B) {
	s := NewGobSerializer()
	benchUnmarshal(b, s)
}

type JsonSerializer struct{}

func (j JsonSerializer) Marshal(o interface{}) ([]byte, error) {
	return json.Marshal(o)
}

func (j JsonSerializer) Unmarshal(d []byte, o interface{}) error {
	return json.Unmarshal(d, o)
}

func Benchmark_Json_Marshal(b *testing.B) {
	benchMarshal(b, JsonSerializer{})
}

func Benchmark_Json_Unmarshal(b *testing.B) {
	benchUnmarshal(b, JsonSerializer{})
}

@mochi-co
Copy link
Collaborator

mochi-co commented Apr 1, 2024

@thedevop @werbenhu gob changes sound like a nice improvement, agree it's best as a new PR.

@werbenhu This one looks good to me, so I'm merging it now 👍🏻

@mochi-co mochi-co merged commit e2cb688 into mochi-mqtt:main Apr 1, 2024
3 checks passed
@werbenhu
Copy link
Member Author

werbenhu commented Apr 1, 2024

@mochi-co @thedevop What I mean is, from my test results, it seems that Gob doesn't offer better performance than JSON. I'm not sure if I'm using Gob correctly in my test cases. I hope you can also take a look at my code and run the tests to confirm the results.

@werbenhu werbenhu deleted the badgerdbv4 branch April 2, 2024 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants