Praquet-formatted columns using VARCHAR(n) won't appear in Presto 0.138 #4592

Closed
marklit opened this Issue Feb 20, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@marklit

marklit commented Feb 20, 2016

According to the 0.137 release notes there is now support for the VARCHAR(n) data type. When I attempt to view a schema with this data type the VARCHAR(n) fields do not appear. Here are the steps I took to re-create this problem.

$ hive --version
Hive 1.0.0
Subversion git://0b4842e27402/ws/output/hive/hive-1.0.0 -r 3bfb9712a6abf61b5bf0099e45b9a116df0e1d69
Compiled by root on Wed Aug 12 14:58:18 UTC 2015
From source with checksum 7d322f95990d4d850634912b56c27996
$ hive
CREATE TABLE trips_parquet (
    trip_id                 INT,
    vendor_id               VARCHAR(3),
    pickup_datetime         TIMESTAMP,
    dropoff_datetime        TIMESTAMP,
    store_and_fwd_flag      VARCHAR(1),
    rate_code_id            SMALLINT,
    pickup_longitude        DOUBLE,
    pickup_latitude         DOUBLE,
    dropoff_longitude       DOUBLE,
    dropoff_latitude        DOUBLE,
    passenger_count         SMALLINT,
    trip_distance           DOUBLE,
    fare_amount             DOUBLE,
    extra                   DOUBLE,
    mta_tax                 DOUBLE,
    tip_amount              DOUBLE,
    tolls_amount            DOUBLE,
    ehail_fee               DOUBLE,
    improvement_surcharge   DOUBLE,
    total_amount            DOUBLE,
    payment_type            VARCHAR(3),
    trip_type               SMALLINT,
    pickup                  VARCHAR(50),
    dropoff                 VARCHAR(50),

    cab_type                VARCHAR(6),

    precipitation           SMALLINT,
    snow_depth              SMALLINT,
    snowfall                SMALLINT,
    max_temperature         SMALLINT,
    min_temperature         SMALLINT,
    average_wind_speed      SMALLINT,

    pickup_nyct2010_gid     SMALLINT,
    pickup_ctlabel          VARCHAR(10),
    pickup_borocode         SMALLINT,
    pickup_boroname         VARCHAR(13),
    pickup_ct2010           VARCHAR(6),
    pickup_boroct2010       VARCHAR(7),
    pickup_cdeligibil       VARCHAR(1),
    pickup_ntacode          VARCHAR(4),
    pickup_ntaname          VARCHAR(56),
    pickup_puma             VARCHAR(4),

    dropoff_nyct2010_gid    SMALLINT,
    dropoff_ctlabel         VARCHAR(10),
    dropoff_borocode        SMALLINT,
    dropoff_boroname        VARCHAR(13),
    dropoff_ct2010          VARCHAR(6),
    dropoff_boroct2010      VARCHAR(7),
    dropoff_cdeligibil      VARCHAR(1),
    dropoff_ntacode         VARCHAR(4),
    dropoff_ntaname         VARCHAR(56),
    dropoff_puma            VARCHAR(4)
) STORED AS parquet;
$ ~/presto-server-0.138/bin/launcher start
$ ./presto --version
Presto CLI 0.138
$ ./presto --server localhost:8080 --catalog hive --schema default
presto:default> desc trips_parquet;
        Column         |   Type    | Comment
-----------------------+-----------+---------
 trip_id               | bigint    |
 pickup_datetime       | timestamp |
 dropoff_datetime      | timestamp |
 rate_code_id          | bigint    |
 pickup_longitude      | double    |
 pickup_latitude       | double    |
 dropoff_longitude     | double    |
 dropoff_latitude      | double    |
 passenger_count       | bigint    |
 trip_distance         | double    |
 fare_amount           | double    |
 extra                 | double    |
 mta_tax               | double    |
 tip_amount            | double    |
 tolls_amount          | double    |
 ehail_fee             | double    |
 improvement_surcharge | double    |
 total_amount          | double    |
 trip_type             | bigint    |
 precipitation         | bigint    |
 snow_depth            | bigint    |
 snowfall              | bigint    |
 max_temperature       | bigint    |
 min_temperature       | bigint    |
 average_wind_speed    | bigint    |
 pickup_nyct2010_gid   | bigint    |
 pickup_borocode       | bigint    |
 dropoff_nyct2010_gid  | bigint    |
 dropoff_borocode      | bigint    |
(29 rows)

Query 20160220_134400_00016_karjv, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [29 rows, 2.1KB] [267 rows/s, 19.4KB/s]
@marklit

This comment has been minimized.

Show comment
Hide comment
@marklit

marklit Feb 20, 2016

For the record I tried to change each VARCHAR(n) column to a STRING column and they still wouldn't appear in Presto.

marklit commented Feb 20, 2016

For the record I tried to change each VARCHAR(n) column to a STRING column and they still wouldn't appear in Presto.

@marklit marklit changed the title from Praquet columns using VARCHAR(n) won't appear in Presto 0.138 to Praquet-formatted columns using VARCHAR(n) won't appear in Presto 0.138 Feb 20, 2016

@kbajda

This comment has been minimized.

Show comment
Hide comment
@kbajda

kbajda Feb 23, 2016

Member

@marklit : By "tried to change" do you mean data reloaded into a new table or just a new external table with modified data types pointing to the same Parquet files?

Member

kbajda commented Feb 23, 2016

@marklit : By "tried to change" do you mean data reloaded into a new table or just a new external table with modified data types pointing to the same Parquet files?

@marklit

This comment has been minimized.

Show comment
Hide comment
@marklit

marklit Feb 23, 2016

I created a new table and loaded the data from the CSV table into there.

marklit commented Feb 23, 2016

I created a new table and loaded the data from the CSV table into there.

@kbajda

This comment has been minimized.

Show comment
Hide comment
@kbajda

kbajda Feb 23, 2016

Member

It is our belief that this PR will fix it: #3092

Member

kbajda commented Feb 23, 2016

It is our belief that this PR will fix it: #3092

@findepi findepi closed this Jun 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment