Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expose methods for ser/de data file #145

Merged
merged 1 commit into from
Aug 14, 2023

Conversation

liurenjie1024
Copy link
Contributor

@liurenjie1024 liurenjie1024 commented Aug 14, 2023

Required by other systems..

@ZENOTME
Copy link
Contributor

ZENOTME commented Aug 14, 2023

Maybe we can consider to expose method to return a serializable object to make it more general.

@liurenjie1024
Copy link
Contributor Author

Maybe we can consider to expose method to return a serializable object to make it more general.

I tried to implement sth like:

fn serialize<S: Serializer>(data_file: types::DataFile, s: Serializer) {}

But failed 🤪

@ZENOTME
Copy link
Contributor

ZENOTME commented Aug 14, 2023

I tried to implement sth like:
fn serialize<S: Serializer>(data_file: types::DataFile, s: Serializer) {}
But failed 🤪

Can we directly impl Serialize in in_memory::Datafile, like:

impl Serialize for Datafile{
    // Required method
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
       where S: Serializer {
           self.try_into().serialize(serializer)
       }
}

@Xuanwo
Copy link
Contributor

Xuanwo commented Aug 14, 2023

Do we need to serialize the data_file into other formats instead of json?

@ZENOTME
Copy link
Contributor

ZENOTME commented Aug 14, 2023

Do we need to serialize the data_file into other formats instead of json?

Not sure. For now maybe json is enogh. But a more general interface may better for compatibilty in future.

@Xuanwo
Copy link
Contributor

Xuanwo commented Aug 14, 2023

But a more general interface may better for compatibilty in future.

I understand your idea, but the DataFile in iceberg is specifically designed to be a JSON format. Therefore, adding a general interface may not provide enough benefits.

@ZENOTME
Copy link
Contributor

ZENOTME commented Aug 14, 2023

I understand your idea, but the DataFile in iceberg is specifically designed to be a JSON format. Therefore, adding a general interface may not provide enough benefits.

Here is the use case, and I think it's not so much relative the original design🤔. (CMIIW):
We seperate the write and commit phase into two process and we need to transfer(e.g. rpc) the write result (data file) to another process reponsible to commit. So We need to serialize the data file. I think in here having a more compactable serialize format is a reasonable requirement. Return a serializable object and let the user decide what format to serialize maybe more friendly.

@Xuanwo
Copy link
Contributor

Xuanwo commented Aug 14, 2023

We seperate the write and commit phase into two process and we need to transfer(e.g. rpc) the write result (data file) to another process reponsible to commit.

It is advisable not to rely on an external data type and attempt to directly serialize/deserialize or transfer it. Instead, downstream systems should have their own data format for this purpose in order to prevent unexpected disruptions.

For a distribution system like risingwave, it may be more important since our service may not be able to upgrade simultaneously.

@ZENOTME
Copy link
Contributor

ZENOTME commented Aug 14, 2023

It is advisable not to rely on an external data type and attempt to directly serialize/deserialize or transfer it. Instead, downstream systems should have their own data format for this purpose in order to prevent unexpected disruptions.

For a distribution system like risingwave, it may be more important since our service may not be able to upgrade simultaneously.

Make sense!

@liurenjie1024 liurenjie1024 merged commit 8cfc06f into icelake-io:main Aug 14, 2023
3 checks passed
@liurenjie1024 liurenjie1024 deleted the renjie/data_file_serde branch August 14, 2023 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants