Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array columns aren't properly parsed which causes exception in fetching results #68

Closed
kkraus14 opened this issue Jan 25, 2018 · 9 comments · Fixed by #183
Closed

Array columns aren't properly parsed which causes exception in fetching results #68

kkraus14 opened this issue Jan 25, 2018 · 9 comments · Fixed by #183
Labels
Milestone

Comments

@kkraus14
Copy link
Contributor

Minimal example to reproduce error:

from pymapd import connect
con = connect(user="mapd", password="HyperInteractive", host="localhost", dbname="mapd")

con.execute("DROP TABLE IF EXISTS keith_test_lists;")
con.execute("CREATE TABLE IF NOT EXISTS keith_test_lists (col1 TEXT, col2 INT[]);")

row = [("row1", "{10,20,30}"), ("row2", "{40,50,60}")]

con.load_table_rowwise("keith_test_lists", row)

con.execute("select * from keith_test_lists").fetchall()

yields:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/splunk/miniconda2/envs/splunk_mapd/lib/python2.7/site-packages/pymapd/cursor.py", line 170, in fetchall
    return list(self)
  File "/opt/splunk/miniconda2/envs/splunk_mapd/lib/python2.7/site-packages/pymapd/cursor.py", line 206, in make_row_results_set
    yield tuple(columns[j][i] for j in range(ncols))
  File "/opt/splunk/miniconda2/envs/splunk_mapd/lib/python2.7/site-packages/pymapd/cursor.py", line 206, in <genexpr>
    yield tuple(columns[j][i] for j in range(ncols))
IndexError: list index out of range

I'm guessing the issue is related to the following definitions not checking if the columns are array types: https://github.com/mapd/pymapd/blob/b52f92c47f4a192e8f33ef14a04327daa8893dd0/pymapd/_parsers.py#L36-L72

@kkraus14
Copy link
Contributor Author

I've been digging a bit and it looks like this issue isn't necessarily in the parsers but at the thrift level:

con.execute("select * from keith_test_lists")._result.row_set.columns

returns

[TColumn(data=TColumnData(int_col=[], real_col=[], str_col=['row1', 'row2']), nulls=[False, False]),
 TColumn(data=TColumnData(int_col=[], real_col=[], str_col=[]), nulls=[False, False])]

Whereas mapdql works as expected:

mapdql> select * from keith_test_lists;
col1|col2
row1|{10, 20, 30}
row2|{40, 50, 60}

@andrewseidl any ideas here or is this a wait for Thrift v0.11 to fix handling array types issue?

@andrewseidl
Copy link
Contributor

It's most likely related to Thrift v0.10.0's broken array/recursive struct handling.

Thankfully v0.11.0 is finally up on pypi, so updating pymapd to use that is next on my list.

@kkraus14
Copy link
Contributor Author

👍 I'll close this out once PyMapD is updated to thrift v0.11.0 and I confirm it's working as expected.

andrewseidl added a commit to andrewseidl/pymapd that referenced this issue Feb 5, 2018
Recursive structs are now supported by Thrift 0.11.0

Resolves heavyai#68
@andrewseidl
Copy link
Contributor

Just merged Thrift v0.11.0 support, but it looks like this is still an issue.

On the plus side the row_set's columns are now being filled in correctly:

[TColumn(data=TColumnData(int_col=[], real_col=[], str_col=['row1', 'row2'], arr_col=[]), nulls=[False, False]),
 TColumn(data=TColumnData(int_col=[], real_col=[], str_col=[], arr_col=[TColumn(data=TColumnData(int_col=[10, 20, 30], real_col=[], str_col=[], arr_col=[]), nulls=[False, False, False]), TColumn(data=TColumnData(int_col=[40, 50, 60], real_col=[], str_col=[], arr_col=[]), nulls=[False, False, False])]), nulls=[False, False])]

I'll take a look. Hoping to push out a new release soon with this resolved and #63 rebased and merged.

andrewseidl added a commit to andrewseidl/pymapd that referenced this issue Mar 3, 2018
andrewseidl added a commit to andrewseidl/pymapd that referenced this issue Mar 24, 2018
andrewseidl added a commit to andrewseidl/pymapd that referenced this issue Mar 24, 2018
@randyzwitch randyzwitch added the bug label Nov 7, 2018
@xmnlab
Copy link
Contributor

xmnlab commented Jan 2, 2019

is there any update about this issue?

@randyzwitch
Copy link
Contributor

I just merged in a related PR that handles arrays, will try and work on this today

@xmnlab
Copy link
Contributor

xmnlab commented Mar 7, 2019

hey @randyzwitch any update?

@randyzwitch
Copy link
Contributor

This isn't something I have the capacity to work on for at least another month, unfortunately.

@randyzwitch
Copy link
Contributor

@xmnlab Spent some time with ipdb, I now see why this is happening...of course, how to fix it is the bigger issue :)

@randyzwitch randyzwitch added this to the 0.9 milestone Mar 21, 2019
randyzwitch added a commit that referenced this issue Mar 22, 2019
Provides ability to return list columns within sql_execute. Closes #68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants