New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add Support for ODBC Connection #999
Conversation
Seems a bit weird for CircleCI test result. It said, AttributeError: module 'ibis.impala' has no attribute 'connect' I don't touch anything inside |
I will try to review this today. I'll also investigate why the build isn't passing. |
The error you're seeing is usually related to imports. There might be a circular import or a missing dependency, though master is passing so it would be strange if there was a missing dependency. |
@napjon, it looks like you didn't write your patch on top of master. Can you rebase on top of master and push again? |
I hope I do the right rebase. Let me know if I still do the same thing. |
ibis/impala/api.py
Outdated
argument is None, then certificate validation is skipped. | ||
user : string, LDAP user to authenticate | ||
password : string, LDAP password to authenticate | ||
auth_mechanism : string, {'NOSASL' <- default, 'PLAIN', 'GSSAPI', 'LDAP'}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@napjon These docstrings are old. You need to remove these changes, but keep your odbc_connect
and other changes. git add -p <file>
is very good for this as it allows you stage and commit the things you want and then discard (by using git checkout <file>
) the things you don't want.
@cpcloud: ok, I think it's because my first fork about 6 months ago, back then from cloudera. I've tried to resync my master repo, but git still said nothing changes. I'll solve this. |
e2d9728
to
2793cbf
Compare
Seems I correct the changes now. @cpcloud: could you please review it again? |
@napjon Reviewing now. |
ibis/impala/client.py
Outdated
for name, chunks in czip(names, czip(*[b.columns for b in batches])): | ||
cols[name] = _chunks_to_pandas_array(chunks) | ||
return pd.DataFrame(cols, columns=names) | ||
try: #Give Space for turbodbc type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you have quite a few lint errors: https://circleci.com/gh/pandas-dev/ibis/128?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
I've found it helpful to integrate flake8
with my editor so that every time I save, flake8
will run and report any errors in the editor. Then, I can fix them up right away. Most common editors (vim, emacs, and pycharm) support this kind of integration.
ibis/impala/api.py
Outdated
|
||
params = {'con_string': connection_string, | ||
'dsn':dsn, | ||
'turbodbc_options': turbodbc_options} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation here is inconsistent.
ibis/impala/api.py
Outdated
ImpalaClient | ||
""" | ||
|
||
params = {'con_string': connection_string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just pass these directly instead of putting them in a dict
and then **
-ing them in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. though I just following code style of connect
ibis/impala/client.py
Outdated
@@ -26,6 +26,8 @@ | |||
import numpy as np | |||
import pandas as pd | |||
|
|||
import turbodbc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need a new entry into setup.py
. You can add a new pip extra called 'odbc'
, like this:
odbc_extras = ['turbodbc']
setup(
...
'odbc': odbc_extras
...
)
@napjon A few changes are needed before we can merge this. If you run |
@cpcloud: I do believe something has to be tidied up for the changes, so need some guidance here about odbc only in impala module. But I guess it's okay for now to merge. Let me know if there is any correction that needed to be changed. And I also encounter some error when using this Suppose I have one table with this schema, t = ibis.table(ibis.Schema(['i'], ['int8']), name='held') I got error when executing this statement, t.i.mean().execute() Called KeyError, 'mean' not in dictionary. But simple filtering and aggregation works. Our cluster is going to be restarted this weekend, so I can't reproduce and show you what is exactly the error. Perhaps I should install the docker ibis, and see from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments on formatting.
ibis/impala/client.py
Outdated
@@ -145,7 +145,7 @@ def _get_cursor(self): | |||
raise com.InternalError('Too many concurrent / hung queries') | |||
else: | |||
if (cursor.database != self.database or | |||
cursor.options != self.options): | |||
cursor.options != self.options): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave this formatting as is.
ibis/impala/client.py
Outdated
]) | ||
column for column in expr.columns | ||
if column not in partition_schema_names | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave this formatting as is.
@napjon I think we need to separate things a bit here so that we don't require users to install What do you think about the following directory structure and API?
import ibis
con = ibis.odbc.impala.connect(...) |
@cpcloud: Isn't That sounds reasonable. I'll move around to create impala.py and see from there. |
@napjon it is in extras but anyone using the Impala client will run the code that imports |
@cpcloud: ah yes, I forgot |
…-connection Merge with upstream master
Changes:
Caveat: turbodbc arrow are not exist in windows. if arrow not exist or windows os, we can create exception to use fetchnumpy instead. |
@napjon So it looks like your builds are failing because you're missing the OS ODBC package and So, you need to make the following changes.
sudo apt-get -qq install clang libboost-dev graphviz with sudo apt-get -qq install clang libboost-dev graphviz unixodbc-dev unixodbc-bin unixodbc
|
@napjon It's fine that we're missing |
It seems not everything have to result to pandas dataframe. Should've separate columnar like in original. |
@cpcloud: What kind of tests do we want have here? I think it's more of a connection test or do you want to have other type of queries? |
What's the state of this? I could use odbc with clickhouse too. |
I'm waiting for docker update to include impala odbc connector. @cpcloud? |
@@ -40,13 +40,15 @@ | |||
kerberos_requires = ['requests-kerberos'] | |||
visualization_requires = ['graphviz'] | |||
pandas_requires = ['multipledispatch'] | |||
turbodbc_requires = ['pybind11', 'pyarrow', 'turbodbc>=2.0.0'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pybind11
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is when I tested it back then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as turbodbc
does not ship wheels on all platforms, you might need it with some setuptools
versions (older ones). Note that there is the extra-requirement [arrow]
so that turbodbc[arrow]
will already pull in pyarrow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay got it.
@cpcloud Agree, odbc drivers live on the client side. |
The simplest case would be to create conda forge packages for impala and/or clickhouse odbc drivers. |
@kszucs and @cpcloud : forgive my ignorance. I thought the docker act as a client and server. So the next step should be;
Should we use odbc.ini or embed it in the test script? @kszucs: I'm not familiar with creating conda-forge, could you create it? |
@cpcloud: So I've configured turbodbc to connect to docker-impala (codingtony/impala). However, there's no data to test for simple queries. Should I suppose to use "cpcloud86/impala:java8-1" instead? |
@napjon Yep, you can use that, however it won't have data. But, you can use the |
Got it, thanks for the pointer. |
Hi @cpcloud, I've read the circleci/config.yml and use docker cpcloud86/impala:java8-1.
The docker loaded until I've seen the response
There are some error about Postgres, but I think it's not relevant for impala. Finally the step I execute,
This gave me some error,
I've seen the solution is to upgrade impala, which I don't believe to be the problem. Could you give me some pointer to where I should look into? |
hey @napjon, what is the status of this PR? there are some files with conflict. could you rebase your code? |
@xmnlab: sorry for late reply. I currently don't have the machine to do this anymore, and maybe for a while until I get the correct setup. If I recall the issue is configuring unit test to be able to download odbc driver programmatically without asking for license. |
Closing this as stale. @napjon let me know if at some point you can work in this again. Thanks! |
Closes #985
Some caveats:
_column_batches_to_dataframe
to make a space for turbodbccc @mariusvniekerk
nthreads option
unit tests