Skip to content

Conversation

@severo
Copy link
Contributor

@severo severo commented Jul 31, 2025

@platypii do you want to review this one?

I'm not really sure about the dataframe I created for the iceberg files.

Three things to think about:

  • how to handle the case where the file contains less than the estimated numRows. In this implementation, I put -1 in the row number (as we only support numbers - it could also be NaN) and undefined in the cells
  • I'm not sure how icebergRead works, but I fear it has to fetch from the first row, even if we ask for [1000, 2000]. Is it right? If so, do you have an idea of how we could improve? maybe asking for [0, rowEnd] and cache everything to avoid later fetches (consumes more memory)
  • Currently, if we scroll slowly, we send a lot of icebergRead for one or two rows. We could improve by batching by 1000 rows (a bit like VirtualRowGroup in hyperparam-cli)

Re cache: we might want to implement hyparam/hightable#232, and use it here, to simplify the cache management.

@severo severo requested a review from platypii July 31, 2025 06:12
Copy link
Contributor

@platypii platypii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Honestly I'm not that worried about the iceberg functionality. Eventually I think we will need some support for variable number of rows, but this is good for now 👍

@severo severo merged commit 88d94f2 into master Aug 1, 2025
4 checks passed
@severo severo deleted the update-hightable-in-icebird-demo branch August 1, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants