Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit parquet file #1192

Closed
eerison opened this issue Aug 14, 2024 · 2 comments
Closed

Edit parquet file #1192

eerison opened this issue Aug 14, 2024 · 2 comments

Comments

@eerison
Copy link

eerison commented Aug 14, 2024

Hello :)

is it possible to edit the file?

I am writing the file like this

        $writer = new Writer();
        $schema = Schema::with(
            FlatColumn::int64('id'),
            FlatColumn::string('name'),
            FlatColumn::boolean('active'),
            FlatColumn::dateTime('created_at'),
        );

        $path = '/app/test.parquet';
        $row1 = ['id' => 3, 'name' => 'Alice'];
        $row2 = ['id' => 4, 'name' => 'Fernando'];

        $writer->open($path, $schema);
        $writer->writeRow($row1);
        $writer->writeRow($row2);
        $writer->close();

the first execution it works, But if I try to write again into the same file I get the exception bellow:

Flow\Parquet\Exception\InvalidArgumentException: File /app/test.parquet already exists

is it expected?

Note: in case it isn't the correct place to open this issue, could you move to the main repo please.

@eerison
Copy link
Author

eerison commented Aug 14, 2024

I was using flow-php/parquet But I just saw that, there is a saveMode on main repo.

I will check the code bellow

data_frame()
    ->read(from_parquet(__DIR__ . '/orders_flow.parquet'))
    ->select('created_at', 'total_price', 'discount')
    ->withEntry('created_at', ref('created_at')->cast('date')->dateFormat('Y/m'))
    ->withEntry('revenue', ref('total_price')->minus(ref('discount')))
    ->select('created_at', 'revenue')
    ->groupBy('created_at')
    ->aggregate(sum(ref('revenue')))
    ->sortBy(ref('created_at')->desc())
    ->withEntry('daily_revenue', ref('revenue_sum')->round(lit(2))->numberFormat(lit(2)))
    ->drop('revenue_sum')
    ->write(to_output(truncate: false))
    ->withEntry('created_at', ref('created_at')->toDate('Y/m'))
    ->saveMode(overwrite())
    ->write(to_parquet(__DIR__ . '/daily_revenue.parquet'))
    ->run();

in case it didn't work I reopen the issue :)

@eerison eerison closed this as completed Aug 14, 2024
@norberttech
Copy link
Member

hey @eerison
let me answer your question:

is it possible to edit the file?

No, parquet files are immutable, they can't be directly edited. Instead, if you want to modify the file, you should read it, modify it on the fly, and save it somewhere else.

The saveMode in flow is going to let you overwrite existing file, bot not modify it.

Note: in case it isn't the correct place to open this issue, could you move to the main repo please.

no worries, here is fine, you can also join ourbdiscord server where you can get your answers faster 😊 you can find link on the website header https://flow-php.com

@norberttech norberttech transferred this issue from flow-php/parquet Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants