Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXPERIMENTAL arrow ingestion: create the arrow record from sorted buffer #2696

Merged
merged 3 commits into from
Mar 7, 2023

Conversation

thorfour
Copy link
Contributor

@thorfour thorfour commented Mar 6, 2023

Changed the record creation to happen from the sorted Parquet buffer to prevent the panic seen in FrostDB

So instead of writing the series directly to an Arrow record, it writes them to a Parquet buffer, sorts that Parquet buffer, and then reads the sorted rows back into an Arrow record.

While this is not ideal and it would likely be better to sort the series beforehand and then write the series directly to an arrow record as it was before, this is far less complex and requires less investment for the time being for an experimental feature. If we merge this I shall open an issue to track the desire to sort the series beforehand and remove the intermediary Parquet buffer step.

Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is just to test if that panic continues to occur after this, right? If so lgtm

@asubiotto
Copy link
Member

Although this is what we eventually want to do, I'm wondering if we should instead add the same guardrails that we have for parquet ingestion to arrow ingestion in frostdb. Essentially, check if the arrow record is sorted, increment a metric and sort it if not. I think that it'll be more complete to add that to frostdb since we have other users that potentially use arrow ingestion (e.g. cloud) that might run into the same problems. We would eventually change parca to correctly sort the arrow record/create it in sorted order

@brancz
Copy link
Member

brancz commented Mar 7, 2023

I would also be happy with that.

@thorfour
Copy link
Contributor Author

thorfour commented Mar 7, 2023

I think we should do both. FrostDB expects sorted rows, so this is a bug fix in the Parca side of things, but I agree that it would also be good to have the sanity check in Frost

@asubiotto
Copy link
Member

I agree we should do both long term, but if short-term the intent of this fix is to prevent panics in frostdb I think it's more effective to add the check/sorting on the frostdb side rather than parca. If the issue is that we don't have code to sort arrow records and don't want to invest in that right now, then we could probably convert to parquet, sort, and convert back even though it's not ideal performance wise.

@thorfour
Copy link
Contributor Author

thorfour commented Mar 7, 2023

This is intended to "do the right thing" on the Parca side where we write to FrostDB with sorted rows. Since we don't have a way to sort records right now, and we don't have the cycles to invest in that I think this corner cut is pragmatic while we explore ingesting arrow records. Not doing this means we would still need to implement a way to sort records on the Frost side which I'm trying to avoid investing in right now.

@thorfour thorfour merged commit e556443 into main Mar 7, 2023
@thorfour thorfour deleted the exp-arrow-ingest-fix-sorting branch March 7, 2023 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants