Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Praquet-formatted columns using VARCHAR(n) won't appear in Presto 0.138 #4592

Closed
marklit opened this Issue Feb 20, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@marklit
Copy link

marklit commented Feb 20, 2016

According to the 0.137 release notes there is now support for the VARCHAR(n) data type. When I attempt to view a schema with this data type the VARCHAR(n) fields do not appear. Here are the steps I took to re-create this problem.

$ hive --version
Hive 1.0.0
Subversion git://0b4842e27402/ws/output/hive/hive-1.0.0 -r 3bfb9712a6abf61b5bf0099e45b9a116df0e1d69
Compiled by root on Wed Aug 12 14:58:18 UTC 2015
From source with checksum 7d322f95990d4d850634912b56c27996
$ hive
CREATE TABLE trips_parquet (
    trip_id                 INT,
    vendor_id               VARCHAR(3),
    pickup_datetime         TIMESTAMP,
    dropoff_datetime        TIMESTAMP,
    store_and_fwd_flag      VARCHAR(1),
    rate_code_id            SMALLINT,
    pickup_longitude        DOUBLE,
    pickup_latitude         DOUBLE,
    dropoff_longitude       DOUBLE,
    dropoff_latitude        DOUBLE,
    passenger_count         SMALLINT,
    trip_distance           DOUBLE,
    fare_amount             DOUBLE,
    extra                   DOUBLE,
    mta_tax                 DOUBLE,
    tip_amount              DOUBLE,
    tolls_amount            DOUBLE,
    ehail_fee               DOUBLE,
    improvement_surcharge   DOUBLE,
    total_amount            DOUBLE,
    payment_type            VARCHAR(3),
    trip_type               SMALLINT,
    pickup                  VARCHAR(50),
    dropoff                 VARCHAR(50),

    cab_type                VARCHAR(6),

    precipitation           SMALLINT,
    snow_depth              SMALLINT,
    snowfall                SMALLINT,
    max_temperature         SMALLINT,
    min_temperature         SMALLINT,
    average_wind_speed      SMALLINT,

    pickup_nyct2010_gid     SMALLINT,
    pickup_ctlabel          VARCHAR(10),
    pickup_borocode         SMALLINT,
    pickup_boroname         VARCHAR(13),
    pickup_ct2010           VARCHAR(6),
    pickup_boroct2010       VARCHAR(7),
    pickup_cdeligibil       VARCHAR(1),
    pickup_ntacode          VARCHAR(4),
    pickup_ntaname          VARCHAR(56),
    pickup_puma             VARCHAR(4),

    dropoff_nyct2010_gid    SMALLINT,
    dropoff_ctlabel         VARCHAR(10),
    dropoff_borocode        SMALLINT,
    dropoff_boroname        VARCHAR(13),
    dropoff_ct2010          VARCHAR(6),
    dropoff_boroct2010      VARCHAR(7),
    dropoff_cdeligibil      VARCHAR(1),
    dropoff_ntacode         VARCHAR(4),
    dropoff_ntaname         VARCHAR(56),
    dropoff_puma            VARCHAR(4)
) STORED AS parquet;
$ ~/presto-server-0.138/bin/launcher start
$ ./presto --version
Presto CLI 0.138
$ ./presto --server localhost:8080 --catalog hive --schema default
presto:default> desc trips_parquet;
        Column         |   Type    | Comment
-----------------------+-----------+---------
 trip_id               | bigint    |
 pickup_datetime       | timestamp |
 dropoff_datetime      | timestamp |
 rate_code_id          | bigint    |
 pickup_longitude      | double    |
 pickup_latitude       | double    |
 dropoff_longitude     | double    |
 dropoff_latitude      | double    |
 passenger_count       | bigint    |
 trip_distance         | double    |
 fare_amount           | double    |
 extra                 | double    |
 mta_tax               | double    |
 tip_amount            | double    |
 tolls_amount          | double    |
 ehail_fee             | double    |
 improvement_surcharge | double    |
 total_amount          | double    |
 trip_type             | bigint    |
 precipitation         | bigint    |
 snow_depth            | bigint    |
 snowfall              | bigint    |
 max_temperature       | bigint    |
 min_temperature       | bigint    |
 average_wind_speed    | bigint    |
 pickup_nyct2010_gid   | bigint    |
 pickup_borocode       | bigint    |
 dropoff_nyct2010_gid  | bigint    |
 dropoff_borocode      | bigint    |
(29 rows)

Query 20160220_134400_00016_karjv, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
0:00 [29 rows, 2.1KB] [267 rows/s, 19.4KB/s]
@marklit

This comment has been minimized.

Copy link
Author

marklit commented Feb 20, 2016

For the record I tried to change each VARCHAR(n) column to a STRING column and they still wouldn't appear in Presto.

@marklit marklit changed the title Praquet columns using VARCHAR(n) won't appear in Presto 0.138 Praquet-formatted columns using VARCHAR(n) won't appear in Presto 0.138 Feb 20, 2016

@kbajda

This comment has been minimized.

Copy link
Member

kbajda commented Feb 23, 2016

@marklit : By "tried to change" do you mean data reloaded into a new table or just a new external table with modified data types pointing to the same Parquet files?

@marklit

This comment has been minimized.

Copy link
Author

marklit commented Feb 23, 2016

I created a new table and loaded the data from the CSV table into there.

@kbajda

This comment has been minimized.

Copy link
Member

kbajda commented Feb 23, 2016

It is our belief that this PR will fix it: #3092

@findepi findepi closed this Jun 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.