Fetching performance #12

derkan · 2018-04-30T17:30:50Z

Hi Samuel,

I did some benchmarking&profiling. Fetching is spending time&memory on calling (smd *structMappingDetails) traverseTree method. It is critical especially while scanning high number of rows from results(for reporting).

Full results of benchmarking is at this repo. As for every fetching of rows(single or multiple, after inserts and updates also) godb is calling traverseTree and that puts godb into middle of report. I think it will get better if a caching mechanism is added instead of running reflections for each time. Caching should be implemented specific to db session instance.

For example pg orm make use of this kind of caching.

I've checked if some kind of caching for column mapping is possible, but unfortunately couldn't find a place in code to implement. Can you also check this?

Here is profiling output for scanning rows of SQL with LIMIT 10000:

Showing top 10 nodes out of 81
      flat  flat%   sum%        cum   cum%
  274.51MB 29.92% 29.92%   457.53MB 49.87%  github.com/samonzeweb/godb/dbreflect.(*structMappingDetails).traverseTree /data/Development/MyOpenSource/GODB/src/github.com/samonzeweb/godb/dbreflect/dbreflect.go
   95.01MB 10.36% 40.28%    95.01MB 10.36%  github.com/samonzeweb/godb/dbreflect.(*StructMapping).GetAllFieldsPointers.func1 /data/Development/MyOpenSource/GODB/src/github.com/samonzeweb/godb/dbreflect/dbreflect.go
   58.50MB  6.38% 46.66%    58.50MB  6.38%  github.com/lib/pq.textDecode /data/Development/MyOpenSource/GODB/src/github.com/lib/pq/encode.go
   50.88MB  5.55% 52.20%    50.88MB  5.55%  github.com/lib/pq.(*stmt).exec /data/Development/MyOpenSource/GODB/src/github.com/lib/pq/buf.go
      44MB  4.80% 57.00%       44MB  4.80%  reflect.unsafe_New /usr/local/src/go/src/runtime/malloc.go
      43MB  4.69% 61.69%       43MB  4.69%  reflect.(*structType).Field /usr/local/src/go/src/reflect/type.go
   36.37MB  3.96% 65.65%    42.39MB  4.62%  github.com/samonzeweb/godb.(*SQLBuffer).WriteBytes /data/Development/MyOpenSource/GODB/src/github.com/samonzeweb/godb/sqlbuffer.go
   33.27MB  3.63% 69.28%    33.77MB  3.68%  database/sql.driverArgs /usr/local/src/go/src/database/sql/convert.go
   27.50MB  3.00% 72.28%       38MB  4.14%  github.com/samonzeweb/godb/dbreflect.(*StructMapping).GetNonAutoFieldsValues.func1 /data/Development/MyOpenSource/GODB/src/github.com/samonzeweb/godb/dbreflect/dbreflect.go
   20.22MB  2.20% 74.48%   123.64MB 13.48%  github.com/lib/pq.(*conn).QueryContext /data/Development/MyOpenSource/GODB/src/github.com/lib/pq/conn_go18.go
(pprof)

Sample SQL is:

type GDModel struct {
	Id      int           `db:"id,key,auto"`
	Name    string        `db:"name"`
	Title   string        `db:"title"`
	Fax     string        `db:"fax"`
	Web     string        `db:"web"`
	Age     int           `db:"age"`
	Right   bool          `db:"right"`
	Counter int64         `db:"counter"`
}

func (*GDModel) TableName() string {
	return "models"
}

var models []*GDModel
if err := db.Select(&models).Where("id > ?", 0).Limit(10000).Do(); err != nil {
	fmt.Printf("slice err: %v\n", err)
}

The text was updated successfully, but these errors were encountered:

samonzeweb · 2018-05-02T11:52:46Z

I agree that this code isn't optimal. Some things are already cached : structs (types) details are fetched only one time. But fetching data involve many operations (especially for big structs).

The current version is done to be easy to maintain, not to be the fastest. Before building godb I though about using unsafe code, calculating memory offsets only one time, but it was not the way I choosed. It could be an optimization later. But before doing things like this I prefer writing fastest safe code.

It's perhaps not the fastest, but it's fast. Really blazing fast compared to other tools I used in other ecosystems. Then changing this is not my priority, and I will not do anything without taking time to be sure that's correct, really faster and easy to maintain (it's really an important part of godb).

I made a test with SQLite (because it's easy to setup, except on Windows), the full code is below, and a flamegraph. It's not a full analysis but a good start for me to show what's take time. I made only CPU profile, but it's obvious seeing the graph that many memory operations are implied.

The graph could look different on other OS, or with other database, but it's enough for me to see where the CPU is used. And to bee honest it's not a surprise.

Perhaps for godb 2, writen with Go 2 (joke inside). It's not my priority but I'll take time "a day" on this. It's an interesting subject ;)

EDIT : the famegraph is done with the latest versoin of pprof : https://github.com/google/pprof

package main

import (
	"fmt"
	"log"
	"os"
	"runtime/pprof"

	"github.com/samonzeweb/godb"
	"github.com/samonzeweb/godb/adapters/sqlite"
)

type Record struct {
	ID     int    `db:"id,key,auto"`
	Dummy1 string `db:"dummy1"`
	Dummy2 string `db:"dummy2"`
	Dummy3 string `db:"dummy3"`
	Dummy4 string `db:"dummy4"`
	Dummy5 string `db:"dummy5"`
}

const dataSize int = 100000

func main() {
	db, err := godb.Open(sqlite.Adapter, ":memory:")
	panicIfErr(err)

	_, err = db.CurrentDB().Exec(`
		create table Record (
			id       integer not null primary key autoincrement,
			dummy1   text not null,
			dummy2   text not null,
			dummy3   text not null,
			dummy4   text not null,
			dummy5   text not null)
	`)
	panicIfErr(err)

	massiveInsert(db)
	count, err := db.Select(&Record{}).Count()
	panicIfErr(err)
	fmt.Println("Inserted : ", count)

	readAll(db)
}

func massiveInsert(db *godb.DB) {
	db.Begin()
	bulkSize := 100
	records := make([]Record, 0, bulkSize)
	for i := 0; i < dataSize; i++ {
		record := Record{
			Dummy1: "dummy",
			Dummy2: "dummy",
			Dummy3: "dummy",
			Dummy4: "dummy",
			Dummy5: "dummy",
		}
		records = append(records, record)
		if len(records) >= bulkSize {
			err := db.BulkInsert(&records).Do()
			panicIfErr(err)
			records = records[:0]
		}
	}
	if len(records) > 0 {
		err := db.BulkInsert(&records).Do()
		panicIfErr(err)
	}
	db.Commit()
}

func readAll(db *godb.DB) {

	f, err := os.Create("cpu.prof")
	if err != nil {
		log.Fatal("could not create CPU profile: ", err)
	}
	if err := pprof.StartCPUProfile(f); err != nil {
		log.Fatal("could not start CPU profile: ", err)
	}
	defer pprof.StopCPUProfile()

	all := make([]Record, 0, dataSize)
	err = db.Select(&all).Do()
	panicIfErr(err)
	fmt.Println("Read : ", len(all))
}

func panicIfErr(err error) {
	if err != nil {
		panic(err)
	}
}

derkan · 2018-05-02T13:01:17Z

Thanks for detailed answer. I agree that it is fast enough(for me, especially after using Python SqlAlchemy for years), Go is the way to Go. :-)

I had a change to check coding for other Go ORM/ODM/SQL builders while developing gobenchorm benchmarkings, and I have to say that godb has a good coding base.

I've done tests with adding this code before exit:

	runtime.GC()
	memProfile, err := os.Create("/tmp/godb.mprof")
	if err != nil {
		log.Fatal(err)
	}
	defer memProfile.Close()
	if err := pprof.WriteHeapProfile(memProfile); err != nil {
		log.Fatal(err)
	}

And reported with:

go tool pprof --alloc_space mem /tmp/godb.mprof

And it shows that we let a lot of work to do for GC. That is why CPU usage is high. In my first message I gave an example(by editing issue later, maybe you didn't see it) of pg - please check link

I think because of this pg is slightly faster.

I'm looking forward for your patches on this subject, If you need help please don't hesitate to ask.

Not to scare people with this issue, I'm closing it.
:-)

samonzeweb · 2018-05-04T13:56:36Z

There is an obvious way to improve performance. It's so obvious that I'm ashamed of not having done it from start.

See this commit : b0df693

I'll take a little more time to analyze performances. Now it's on a separate branch, i'll see if there are other simple things to fix after doing escape analysis. If I've no time or don't see simple things I'll merge it almost as-is (just fix some comments).

I made a bench targeting GetAllFieldsPointers, the gain is real 😃

go test -bench=GetAllFieldsPointers -run=nore ./dbreflect/ -benchmem -benchtime=10s

Before :

BenchmarkGetAllFieldsPointers-4   	  500000	      2002 ns/op	     688 B/op      20 allocs/op

After :

BenchmarkGetAllFieldsPointers-4   	10000000	      1612 ns/op	     544 B/op      17 allocs/op

derkan · 2018-05-04T14:44:56Z

Wow! Thanks, that is nice!

samonzeweb · 2018-05-05T13:05:39Z

I didn't see something as obvious (or simple to to in short time).

I made some cleaning, rebased and merged it to master : 18ac3a5

(the commit in my previous message could be unavailable as I'll remove the corresponding branch)

derkan closed this as completed May 2, 2018

samonzeweb reopened this May 4, 2018

samonzeweb closed this as completed May 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetching performance #12

Fetching performance #12

derkan commented Apr 30, 2018 •

edited

samonzeweb commented May 2, 2018 •

edited

derkan commented May 2, 2018 •

edited

samonzeweb commented May 4, 2018

derkan commented May 4, 2018

samonzeweb commented May 5, 2018

Fetching performance #12

Fetching performance #12

Comments

derkan commented Apr 30, 2018 • edited

samonzeweb commented May 2, 2018 • edited

derkan commented May 2, 2018 • edited

samonzeweb commented May 4, 2018

derkan commented May 4, 2018

samonzeweb commented May 5, 2018

derkan commented Apr 30, 2018 •

edited

samonzeweb commented May 2, 2018 •

edited

derkan commented May 2, 2018 •

edited