Feature: cache the columnToFieldIndex map for each struct type #25

kovalromank · 2020-12-12T06:38:19Z

Thanks for your library.

Because you can't update a struct dynamically in go, when scanning a row into a struct there isn't much reason to create a columnToFieldIndex map for the same struct more than once.

I created a test struct with 1024 fields and some quick benchmarks to see if the cache helps, you can see the benchmarks in my fork of scanny here.

To implement a cache I only changed the getColumnToFieldIndexMap function which will attempt to get a column to field index map from and return it if it exists. The cache can be implemented using both a map[reflect.Type] with a sync.RWMutex or with a sync.Map. I tested both in the benchmarks.

The BenchmarkStruct functions reuse the same row scanner for each iteration. The BenchmarkScannerStruct functions create a new row scanner for each iteration. The benchmarks that end in MapCache use a map[reflect.Type] with a sync.RWMutex, while the SyncMapCache benchmarks use a sync.Map to store the column to field index maps. The results of the benchmarks:

goos: darwin
goarch: amd64
pkg: github.com/georgysavva/scany
BenchmarkStruct
BenchmarkStruct-8                       	   14983	     78204 ns/op	   16396 B/op	       1 allocs/op
BenchmarkStruct_MapCache
BenchmarkStruct_MapCache-8              	   14470	     88521 ns/op	   16397 B/op	       1 allocs/op
BenchmarkStruct_SyncMapCache
BenchmarkStruct_SyncMapCache-8          	   14360	     80351 ns/op	   16397 B/op	       1 allocs/op
BenchmarkScannerStruct
BenchmarkScannerStruct-8                	    2630	    462315 ns/op	  188686 B/op	    3081 allocs/op
BenchmarkScannerStruct_MapCache
BenchmarkScannerStruct_MapCache-8       	    7016	    147001 ns/op	   57477 B/op	       4 allocs/op
BenchmarkScannerStruct_SyncMapCache
BenchmarkScannerStruct_SyncMapCache-8   	    7268	    149239 ns/op	   57476 B/op	       4 allocs/op
BenchmarkMap
BenchmarkMap-8                          	    5004	    246030 ns/op	  114842 B/op	    2054 allocs/op
PASS

When reusing a row scanner there isn't much difference to the performance when using a cache. The real benefit happens when you create a new row scanner each iteration. The allocs drop from 3081 to only 4. And both the bytes/op and ns/op drop by over three times.

I think this would be a useful feature to add since even though having 1024 fields in a struct is pretty extreme but from the benchmarks the getColumnToFieldIndexMap function is creating 3 allocs per field of a struct which can be avoided after the first call to getColumnToFieldIndexMap with the same struct without much overhead.

The text was updated successfully, but these errors were encountered:

georgysavva · 2020-12-16T05:08:07Z

Thanks for submitting your issue. I will take a deeper look at your idea and the benchmarks and get back to you soon!

georgysavva · 2021-01-08T02:00:02Z

Sorry, I've been pretty busy at work and couldn’t find enough time to work on this issue yet.

I think what you are proposing is good and will improve the performance of the library (I actually want to measure it on my own also).
The only thing that I am worried about is the global state (the cache itself) that we create with this feature. Because right now the library is exposed as a set of stateless functions. In order to accommodate the cache, we would need to introduce a stateful type/object that holds the cache, and instead of stateless functions, it has the corresponding methods with the implementation. The library must stay backward-compatible also, it should be possible we just create a default global instance of the new stateful type, and the existing package-level functions just wrap the calls to the default object methods.
BTW, this new stateful type can be also used in the future for customization parameters that modify the hardcoded into the library constants/logic.

I hope I will be able to find the time to work on this during this month. But If you want it to be done earlier or you are just interested in tackling it on your own, I encourage you to propose a PR with the solution. I understand it won’t be that easy though, since it involves some design changes.

Issue: georgysavva#25.

georgysavva · 2022-02-02T07:19:50Z

Hey! Now when I introduced a package-level API object we can easily enhance it with caching logic.
If you are still interested, feel free to work on this!

kovalromank mentioned this issue Dec 12, 2020

Feature: use a pool of scans slices #24

Closed

georgysavva mentioned this issue May 25, 2021

fields: add function to return expression for selecting all columns of a given struct. #48

Closed

henvic added a commit to henvic/scany that referenced this issue Jun 3, 2021

structref: adding cache to GetColumnToFieldIndexMap.

f3c42d7

Issue: georgysavva#25.

georgysavva added the help wanted Extra attention is needed label Feb 2, 2022

georgysavva mentioned this issue Mar 12, 2023

Scanning into a recursive struct #57

Open

georgysavva mentioned this issue Mar 31, 2024

Cache columeToIndexFieldMap #131

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: cache the columnToFieldIndex map for each struct type #25

Feature: cache the columnToFieldIndex map for each struct type #25

kovalromank commented Dec 12, 2020

georgysavva commented Dec 16, 2020

georgysavva commented Jan 8, 2021 •

edited

Loading

georgysavva commented Feb 2, 2022

Feature: cache the columnToFieldIndex map for each struct type #25

Feature: cache the columnToFieldIndex map for each struct type #25

Comments

kovalromank commented Dec 12, 2020

georgysavva commented Dec 16, 2020

georgysavva commented Jan 8, 2021 • edited Loading

georgysavva commented Feb 2, 2022

georgysavva commented Jan 8, 2021 •

edited

Loading