Improves memory usage of ingest path. #118

cyriltovena · 2022-07-04T11:51:36Z

❯ benchstat before.txt after.txt                                                                                                                                         
name                                       old time/op    new time/op    delta
_Table_Insert_10Rows_10Iters_10Writers-16     457ms ±11%     550ms ± 8%  +20.56%  (p=0.008 n=5+5)

name                                       old alloc/op   new alloc/op   delta
_Table_Insert_10Rows_10Iters_10Writers-16     577MB ±25%     169MB ±33%  -70.78%  (p=0.008 n=5+5)

name                                       old allocs/op  new allocs/op  delta
_Table_Insert_10Rows_10Iters_10Writers-16     1.06M ±36%     0.55M ±50%  -47.59%  (p=0.032 n=5+5)

Not yet sure why we're 20% slower, could be the sync.Map, nothing terrible is showing in cpu profiling, but never the less, the memory usage is way better.

See #94

cyriltovena · 2022-07-04T11:54:12Z

dynparquet/schema.go

+}
+
+func (s *Schema) GetWriter(w io.Writer, dynamicColumns map[string][]string) (*PooledWriter, error) {
+	key := serializeDynamicColumns(dynamicColumns)


Technically I need to also sort each suffix too at the risk of the cache not working, now sure if that's required or not that could be a big to do each time.

Not sure exactly if this is what you mean but keys are sorted, and values for keys are sorted, so there should be no different keys with the same dynamic columns.

I wasn't sure if values were ! Thanks for clarifying

cyriltovena · 2022-07-04T12:38:53Z

Latest commit seems to have improve CPU situation.

name                                       old time/op    new time/op    delta
_Table_Insert_10Rows_10Iters_10Writers-16     479ms ±12%     523ms ± 5%     ~     (p=0.095 n=5+5)

name                                       old alloc/op   new alloc/op   delta
_Table_Insert_10Rows_10Iters_10Writers-16     545MB ±19%     152MB ±31%  -72.01%  (p=0.008 n=5+5)

name                                       old allocs/op  new allocs/op  delta
_Table_Insert_10Rows_10Iters_10Writers-16     1.12M ±59%     0.37M ±15%  -67.42%  (p=0.016 n=5+4)

metalmatze · 2022-07-04T13:24:18Z

dynparquet/schema.go

+var rowBufPool = &sync.Pool{
+	New: func() interface{} {
+		return make([]parquet.Row, 64) // Random guess.
+	},
+}
+


Does it make sense to have this initialized when calling NewSchema rather than having it globally in the package?

This one is actually shared across all schemas.

metalmatze · 2022-07-04T13:24:53Z

Very nice! 🎉

brancz

I have to admit, it took me a couple of re-reads to understand the logic, but this approach is pretty nifty, we don't have to do any unnecessary accesses to the sync map, which is really good!

cyriltovena · 2022-07-05T12:40:10Z

I used the sync.Map because it performs very well when you write only once, better than a RWMutex.

The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2)

My only wish would be to avoid the empty pool creation required by the LoadOrStore.

cyriltovena added 3 commits July 4, 2022 13:51

better benchmak names

20acfbd

Increasing timestamp

844535c

Improves memory usage of ingest path.

84769aa

cyriltovena force-pushed the perf-insert branch from 19007bb to 84769aa Compare July 4, 2022 11:52

cyriltovena commented Jul 4, 2022

View reviewed changes

improve pool structure

f9b1218

cyriltovena mentioned this pull request Jul 4, 2022

Recycle goroutine doing compaction #119

Closed

metalmatze reviewed Jul 4, 2022

View reviewed changes

brancz approved these changes Jul 5, 2022

View reviewed changes

brancz merged commit f949727 into polarsignals:main Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improves memory usage of ingest path. #118

Improves memory usage of ingest path. #118

cyriltovena commented Jul 4, 2022 •

edited

Loading

cyriltovena Jul 4, 2022

brancz Jul 5, 2022

cyriltovena Jul 5, 2022

cyriltovena commented Jul 4, 2022

metalmatze Jul 4, 2022

cyriltovena Jul 4, 2022

metalmatze commented Jul 4, 2022

brancz left a comment

cyriltovena commented Jul 5, 2022

Improves memory usage of ingest path. #118

Improves memory usage of ingest path. #118

Conversation

cyriltovena commented Jul 4, 2022 • edited Loading

cyriltovena Jul 4, 2022

Choose a reason for hiding this comment

brancz Jul 5, 2022

Choose a reason for hiding this comment

cyriltovena Jul 5, 2022

Choose a reason for hiding this comment

cyriltovena commented Jul 4, 2022

metalmatze Jul 4, 2022

Choose a reason for hiding this comment

cyriltovena Jul 4, 2022

Choose a reason for hiding this comment

metalmatze commented Jul 4, 2022

brancz left a comment

Choose a reason for hiding this comment

cyriltovena commented Jul 5, 2022

cyriltovena commented Jul 4, 2022 •

edited

Loading