Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: cache the columnToFieldIndex map for each struct type #25

Open
kovalromank opened this issue Dec 12, 2020 · 3 comments
Open

Feature: cache the columnToFieldIndex map for each struct type #25

kovalromank opened this issue Dec 12, 2020 · 3 comments
Labels
help wanted Extra attention is needed

Comments

@kovalromank
Copy link

Thanks for your library.

Because you can't update a struct dynamically in go, when scanning a row into a struct there isn't much reason to create a columnToFieldIndex map for the same struct more than once.

I created a test struct with 1024 fields and some quick benchmarks to see if the cache helps, you can see the benchmarks in my fork of scanny here.

To implement a cache I only changed the getColumnToFieldIndexMap function which will attempt to get a column to field index map from and return it if it exists. The cache can be implemented using both a map[reflect.Type] with a sync.RWMutex or with a sync.Map. I tested both in the benchmarks.

The BenchmarkStruct functions reuse the same row scanner for each iteration. The BenchmarkScannerStruct functions create a new row scanner for each iteration. The benchmarks that end in MapCache use a map[reflect.Type] with a sync.RWMutex, while the SyncMapCache benchmarks use a sync.Map to store the column to field index maps. The results of the benchmarks:

goos: darwin
goarch: amd64
pkg: github.com/georgysavva/scany
BenchmarkStruct
BenchmarkStruct-8                       	   14983	     78204 ns/op	   16396 B/op	       1 allocs/op
BenchmarkStruct_MapCache
BenchmarkStruct_MapCache-8              	   14470	     88521 ns/op	   16397 B/op	       1 allocs/op
BenchmarkStruct_SyncMapCache
BenchmarkStruct_SyncMapCache-8          	   14360	     80351 ns/op	   16397 B/op	       1 allocs/op
BenchmarkScannerStruct
BenchmarkScannerStruct-8                	    2630	    462315 ns/op	  188686 B/op	    3081 allocs/op
BenchmarkScannerStruct_MapCache
BenchmarkScannerStruct_MapCache-8       	    7016	    147001 ns/op	   57477 B/op	       4 allocs/op
BenchmarkScannerStruct_SyncMapCache
BenchmarkScannerStruct_SyncMapCache-8   	    7268	    149239 ns/op	   57476 B/op	       4 allocs/op
BenchmarkMap
BenchmarkMap-8                          	    5004	    246030 ns/op	  114842 B/op	    2054 allocs/op
PASS

When reusing a row scanner there isn't much difference to the performance when using a cache. The real benefit happens when you create a new row scanner each iteration. The allocs drop from 3081 to only 4. And both the bytes/op and ns/op drop by over three times.

I think this would be a useful feature to add since even though having 1024 fields in a struct is pretty extreme but from the benchmarks the getColumnToFieldIndexMap function is creating 3 allocs per field of a struct which can be avoided after the first call to getColumnToFieldIndexMap with the same struct without much overhead.

@georgysavva
Copy link
Owner

Thanks for submitting your issue. I will take a deeper look at your idea and the benchmarks and get back to you soon!

@georgysavva
Copy link
Owner

georgysavva commented Jan 8, 2021

Sorry, I've been pretty busy at work and couldn’t find enough time to work on this issue yet.

I think what you are proposing is good and will improve the performance of the library (I actually want to measure it on my own also).
The only thing that I am worried about is the global state (the cache itself) that we create with this feature. Because right now the library is exposed as a set of stateless functions. In order to accommodate the cache, we would need to introduce a stateful type/object that holds the cache, and instead of stateless functions, it has the corresponding methods with the implementation. The library must stay backward-compatible also, it should be possible we just create a default global instance of the new stateful type, and the existing package-level functions just wrap the calls to the default object methods.
BTW, this new stateful type can be also used in the future for customization parameters that modify the hardcoded into the library constants/logic.

I hope I will be able to find the time to work on this during this month. But If you want it to be done earlier or you are just interested in tackling it on your own, I encourage you to propose a PR with the solution. I understand it won’t be that easy though, since it involves some design changes.

@georgysavva
Copy link
Owner

Hey! Now when I introduced a package-level API object we can easily enhance it with caching logic.
If you are still interested, feel free to work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants