-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improves memory usage of ingest path. #118
Conversation
} | ||
|
||
func (s *Schema) GetWriter(w io.Writer, dynamicColumns map[string][]string) (*PooledWriter, error) { | ||
key := serializeDynamicColumns(dynamicColumns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically I need to also sort each suffix too at the risk of the cache not working, now sure if that's required or not that could be a big to do each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure exactly if this is what you mean but keys are sorted, and values for keys are sorted, so there should be no different keys with the same dynamic columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if values were ! Thanks for clarifying
Latest commit seems to have improve CPU situation.
|
var rowBufPool = &sync.Pool{ | ||
New: func() interface{} { | ||
return make([]parquet.Row, 64) // Random guess. | ||
}, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have this initialized when calling NewSchema
rather than having it globally in the package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is actually shared across all schemas.
Very nice! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit, it took me a couple of re-reads to understand the logic, but this approach is pretty nifty, we don't have to do any unnecessary accesses to the sync map, which is really good!
I used the sync.Map because it performs very well when you write only once, better than a RWMutex.
My only wish would be to avoid the empty pool creation required by the LoadOrStore. |
Not yet sure why we're 20% slower, could be the sync.Map, nothing terrible is showing in cpu profiling, but never the less, the memory usage is way better.
See #94