-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cannot open Parquet file with 2 similar column names (different case) #68
Comments
I gave this a shot but it turns out DataTables are case insensitive when it comes to column names. So it's not possible to show two fields with the same name. For now I've added logic to gracefully exclude duplicate fields from the output. It's not ideal but at least the utility won't crash when opening such files. Give it a shot here if you get the chance: https://github.com/mukunku/ParquetViewer/releases/tag/v2.5.1 I'll leave this ticket open since the original issue hasn't been solved and it should be possible, albeit difficult, to handle case sensitive field names. |
So the issue is not with you but with the underlying library you are using to parse Parquet files? I can open a bug report there. I'll test the fix, but indeed it's a workaround... |
@MCRE-BE The issue is with the data structure the app is using to store the data in memory. It doesn't support multiple columns with the same name because it's built to be case insensitive. In your original bug report you mentioned:
Is this a legitimate use case for your workflow or was it a mistake and you don't normally have same column names with different casing? If this isn't a normal use case maybe just gracefully warning the user of the problem is a sufficient solution here: |
For me it was a mistake. So for me it's a sufficient solution, but might not be for others 🙄 But thanks for the fix 😄 I guess you can't change the column names easily (like setting a _x behind)? That's how pandas solves the issue in its dataframes. |
Appending a suffix might be the only way to handle these but it's not straightforward. Might not be worth investing time if it's such a rare use-case. Let's see if anyone else needs this kind of support. If demand increases I can take a look. |
Parquet Viewer Version
What version of Parquet Viewer are you experiencing the issue with?
2.4.2.0
Where was the parquet file created?
pyarrow
Sample File
Example.zip
Describe the bug
I believe the bug comes from having two column names that are equal when viewed as lowercase.
I can open the file in pyarrow/python, not in ParquetViewer.
Screenshots
Additional context
The similar column names is a bug in my code, but should not make the program crash.
Note: This tool relies on the parquet-dotnet library for all the actual Parquet processing. So any issues where that library cannot process a parquet file will not be addressed by us. Please open a ticket on that library's repo to address such issues.
The text was updated successfully, but these errors were encountered: