You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am trying to read parquet files that are in S3 and were generated via python script.
I get the following error:
Error: thrown: "invalid parquet version"
When I am trying to read similar file but the file was generated via spark - it manages to digest the file and read it.
I am also able to parse the python file and open it in a parquet viewer
Any idea why? the file is parquet lvl 2
File metadata:
file written by pyarrow 11.0.0
created_by: parquet-cpp-arrow version 11.0.0
num_columns: 6
num_rows: 42
num_row_groups: 1
format_version: 2.6
serialized_size: 3975
Full error:
(node:41711) V8: /Users/saritvakrat/Documents/automation/be_automation/node_modules/brotli/build/encode.js:34 Linking failure in asm.js: Unexpected stdlib member (Use node --trace-warnings ...` to show where the warning was created)
console.error
Error parsing Parquet file: invalid parquet version
39 | return records;
40 | } catch (error) {
> 41 | console.error('Error parsing Parquet file:', error);
| ^
42 | throw error; // Rethrow the error to be handled by the caller
43 | }
44 | }`
Packages:
"parquetjs": "^0.11.2",
"@types/parquetjs": "^0.10.6",
My function:
``export async function parseParquetFile(filePath: string): Promise<any[]> {
try {
// create new ParquetReader
const reader = await ParquetReader.openFile(filePath) as any;
// create a new cursor
const cursor = reader.getCursor();
const records = [];
// read all records from the file and print them
let record = await cursor.next();
while (record !== null) {
records.push(record);
record = await cursor.next();
}
await reader.close();
return records;
} catch (error) {
console.error('Error parsing Parquet file:', error);
throw error; // Rethrow the error to be handled by the caller
}
}
`async parseSingleParquetFromS3(bucketName: string, key: string | null | undefined): Promise<any[]> {
if (!bucketName || !key) {
throw new Error('S3 client or bucket name is not provided');
}
const getObjectCommand = new GetObjectCommand({
Bucket: bucketName,
Key: key
});
let objectResponse;
try {
objectResponse = await this.s3Client.send(getObjectCommand);
} catch (error) {
console.error(`Error fetching object from S3: ${error}`);
throw error;
}
const objectData = objectResponse.Body;
if (!(objectData instanceof Readable)) {
throw new Error('Object data is not a readable stream');
}
const fileName = key.split('/').pop() || 'temp.parquet';
const tempFilePath = join(tmpdir(), fileName);
try {
await pipeline(objectData, createWriteStream(tempFilePath));
return await parseParquetFile(tempFilePath);
} catch (error) {
console.error(`Error in streaming data to file: ${error}`);
throw error;
}
}`
The text was updated successfully, but these errors were encountered:
Hi, I am trying to read parquet files that are in S3 and were generated via python script.
I get the following error:
Error: thrown: "invalid parquet version"
When I am trying to read similar file but the file was generated via spark - it manages to digest the file and read it.
I am also able to parse the python file and open it in a parquet viewer
Any idea why? the file is parquet lvl 2
File metadata:
file written by pyarrow 11.0.0
created_by: parquet-cpp-arrow version 11.0.0
num_columns: 6
num_rows: 42
num_row_groups: 1
format_version: 2.6
serialized_size: 3975
Full error:
(node:41711) V8: /Users/saritvakrat/Documents/automation/be_automation/node_modules/brotli/build/encode.js:34 Linking failure in asm.js: Unexpected stdlib member (Use
node --trace-warnings ...` to show where the warning was created)console.error
Error parsing Parquet file: invalid parquet version
My function:
``export async function parseParquetFile(filePath: string): Promise<any[]> {
try {
// create new ParquetReader
const reader = await ParquetReader.openFile(filePath) as any;
// create a new cursor
const cursor = reader.getCursor();
const records = [];
// read all records from the file and print them
let record = await cursor.next();
while (record !== null) {
records.push(record);
record = await cursor.next();
}
await reader.close();
return records;
} catch (error) {
console.error('Error parsing Parquet file:', error);
throw error; // Rethrow the error to be handled by the caller
}
}
`async parseSingleParquetFromS3(bucketName: string, key: string | null | undefined): Promise<any[]> {
if (!bucketName || !key) {
throw new Error('S3 client or bucket name is not provided');
}
The text was updated successfully, but these errors were encountered: