Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data structures for variably-sized fixed columns #1542

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

georgwiese
Copy link
Collaborator

@georgwiese georgwiese commented Jul 8, 2024

This PR adds number::VariablySizedColumns, which can store several sizes of the same column. Currently, we always just have one size, but as part of #1496, we can relax that.

@georgwiese georgwiese changed the title [WIP] Variably-sized fixed columns Add data structures for variably-sized fixed columns Jul 8, 2024
@georgwiese georgwiese marked this pull request as ready for review July 8, 2024 15:13
serde_cbor::from_reader(file).unwrap()
impl<T: DeserializeOwned + Serialize> ReadWrite for T {
fn read(file: &mut impl Read) -> Self {
serde_cbor::from_reader(file).unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR but we should probably use BufRead here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean just changing to file: &mut impl BufRead? What would be the difference? It is just a trait that adds more functions, which apparently serde_cbor::from_reader() does not use?


#[derive(Serialize, Deserialize)]
/// Like Columns, but each column can exist in multiple sizes
pub struct VariablySizedColumns<F> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be inside number

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do you have a suggestion?

Of the common dependencies of backend and executor, for example, it seems like the best fit:

Screenshot 2024-07-09 at 17 32 16

Also, this crate handles the serialization (which arguably it shouldn't), so I think it kind of fits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much more than just "number". Either we create a new crate or rename this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not put it inside the executor? Isn't it one of its main outputs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, changed it!

@@ -31,6 +33,60 @@ pub fn log2_exact(n: BigUint) -> Option<usize> {
.filter(|zeros| n == (BigUint::from(1u32) << zeros))
}

pub type Columns<F> = Vec<(String, Vec<F>)>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this type name over-simplifies (i.e. there should be no typedef ;) ). I would like to know that this is a vector and that it contains the name as a string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/// Like Columns, but each column can exist in multiple sizes
pub struct VariablySizedColumns<F> {
/// Maps each column name to a (size -> values) map
columns: Vec<(String, BTreeMap<usize, Vec<F>>)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we gain much hiding variably sized columns in a collection of them.

Wouldn't it be better to use a type for ONE column that has a variable size? Then there could be free functions that reduce a collection of those to a unique size.

The len() function is very misleading for example. Does it return the number of different sizes? If this is just a Vec<String, VariableLengthColumn<T>> it's much clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@chriseth
Copy link
Member

chriseth commented Jul 8, 2024

Should we really store multiple copies of the values? Wouldn't we want to return a slice instead (plus manage metadata about which slice lengths are "legal")?

@georgwiese
Copy link
Collaborator Author

Should we really store multiple copies of the values? Wouldn't we want to return a slice instead (plus manage metadata about which slice lengths are "legal")?

The problem with that is that we sometimes have columns that are different in the last row, for example here.

}
}

pub fn get_only_size<F>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe get_uniquely_sized? With this name, it could mean that it only returns the size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants