Skip to content
This repository has been archived by the owner on Jan 11, 2021. It is now read-only.

How do I write a BigDecimal value? #177

Open
xrl opened this issue Oct 30, 2018 · 6 comments
Open

How do I write a BigDecimal value? #177

xrl opened this issue Oct 30, 2018 · 6 comments
Labels

Comments

@xrl
Copy link
Contributor

xrl commented Oct 30, 2018

I am loading numeric data from diesel with:

table! {
    currencies (id) {
        [[ SNIP ]]]
        conversion_rate -> Nullable<Numeric>,
        [[ SNIP ]]]
    }
}

cast to a struct with

use bigdecimal::BigDecimal;

#[derive(Queryable, Debug)]
pub struct Currency {
    [[ SNIP ]]
    pub conversion_rate: Option<BigDecimal>,
    [[ SNIP ]]
}

The current ColumnWriter does not include a easily compatible decimal type:

/// Column writer for a Parquet type.
pub enum ColumnWriter {
  BoolColumnWriter(ColumnWriterImpl<BoolType>),
  Int32ColumnWriter(ColumnWriterImpl<Int32Type>),
  Int64ColumnWriter(ColumnWriterImpl<Int64Type>),
  Int96ColumnWriter(ColumnWriterImpl<Int96Type>),
  FloatColumnWriter(ColumnWriterImpl<FloatType>),
  DoubleColumnWriter(ColumnWriterImpl<DoubleType>),
  ByteArrayColumnWriter(ColumnWriterImpl<ByteArrayType>),
  FixedLenByteArrayColumnWriter(ColumnWriterImpl<FixedLenByteArrayType>)
}

I think I found the decimal type definition here: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal and it looks like it's up to the writer to move between i32/i64/fixed_len_byte_array/binary_array. Is that right? Should we have a ColumnWriter that picks the right type?

For now I'm just going to throw away precision and write a double but I want to come back and do this right 👍

but I see decimal support (read only) in #103 so maybe that work could be extended to be compatible with ColumnWriter?

@sadikovi
Copy link
Collaborator

Yes, you are right. You would have to write physical types and assign DECIMAL logical type to it. And you are also right about having something that abstracts writes for those fields. It looks like it could be decimal, string, timestamp, date fields, etc.

Not sure about adding this to column writer, but we definitely need something. Did you have any particular design in mind?

@xrl
Copy link
Contributor Author

xrl commented Nov 14, 2018

What do you think of having BigDecimal support behind a feature flag? Then that could add a variant for a column writer?

@sadikovi
Copy link
Collaborator

I think you mean Decimal. What Decimal support are you talking about? Decimal is not a Parquet type, and we have column writers for each one of those. You can write decimal values in three different ways. I will start working on record writer, this should solve most of your problems.

@xrl
Copy link
Contributor Author

xrl commented Nov 15, 2018

I was talking about BigDecimal as the popular (or is it?) rust library for handling arbitrary precision numbers. The diesel library activates its support like this:

diesel = { version = "1.0.0", features = ["numeric"] }

this activates the BigDecimal crate dependency and turns on some modules in the diesel library. Something similar would allow parquet-rs users to get native bigdecimal support without forcing the dependency on all users.

@sadikovi
Copy link
Collaborator

Fair enough. Sorry, I feel like I lost the context.

  • If you are talking about some generic BigDecimal crate support in parquet-rs - sure, we can do that. Record API returns you a struct with scale and precision and you can decide how to parse it with or without library.
  • If you are talking about decimal writes using BigDecimal library - then no, we can't do that, because Decimal is not the parquet type, so we can't create a column writer for it, and if we did, it would be very confusing.

Does this answer your question(s)?

@xrl
Copy link
Contributor Author

xrl commented Nov 15, 2018

Yes, this one: If you are talking about some generic BigDecimal crate support in parquet-rs

Having to translate the scale/precision seems like cutting the translation too early. Is it common to work with scale/precision directly? Are there many popular options for arbitrary precision values?

Would you be open to transparent BigDecimal serialization/deserialization?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants