You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From what I understand Hive does this checking based on columnNames in case of Parquet. Is this something we can implement here as well? (I'm not sure what the correct behavior should be for formats like ORC, so it might be that we need to have different checks for different formats).
On similar lines - #8911. Once the flag hive.parquet.use-column-names is set, we use column names while reading Parquet data. This flag is currently not involved at the schema checking step.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.
We occasionally run into an issue in our Hive parquet partitioned tables when users introduce new columns in the middle of the table.
For example:
If new columns are added in the middle:
Attempting to access one of the partitions with the older schema results in errors as Presto is matching the types of the columns in the partition and the table by index. So index 1 ends up being an integer in one and a boolean in the other which results in an error (https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java#L299). The behavior is similar in case of structs as well - https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveCoercionPolicy.java#L99
From what I understand Hive does this checking based on columnNames in case of Parquet. Is this something we can implement here as well? (I'm not sure what the correct behavior should be for formats like ORC, so it might be that we need to have different checks for different formats).
On similar lines - #8911. Once the flag
hive.parquet.use-column-names
is set, we use column names while reading Parquet data. This flag is currently not involved at the schema checking step.The text was updated successfully, but these errors were encountered: