Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.DataFrame support #1

Closed
ghuname opened this issue Oct 29, 2020 · 7 comments
Closed

pandas.DataFrame support #1

ghuname opened this issue Oct 29, 2020 · 7 comments

Comments

@ghuname
Copy link

ghuname commented Oct 29, 2020

@long2ice
Do you plan to support pandas dataframes that are heavily used by data scientists.
It would be very nice to be able to select directly to dataframe.
Is such feature on your roadmap?

@long2ice
Copy link
Owner

No, what's the relation between asynch and pandas? What's the meaning to select directly to dataframe?

@ghuname
Copy link
Author

ghuname commented Oct 31, 2020

Well, we are talking here about clickhouse database, and how to access it asynchronously.
Most of the time, we will selecting data from database.
When you are selecting data from database you need some complex structure to hold the result.
If we are talking about python, pandas dataframe has no alternative for such purpose, if you need to further do something with the data (data wrangling, machine learning...).

At the moment I am using clickhouse_driver and DB API connection and pandas.read_sql function (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html).

As aioch doesn't have DB API connection I am kind of stuck with that approach.
I will try to do the same with your asynch driver. I hope it will work.

@long2ice
Copy link
Owner

long2ice commented Nov 1, 2020

Well, so did you try it? I never use pandas, and asynch also support DB API.

@ghuname
Copy link
Author

ghuname commented Nov 1, 2020

I tried this:

async def main():
    conn = await connect(
        host="127.0.0.1",
        port=9001,
        database="default",
    )

    async with conn.cursor() as cursor:
        await cursor.execute("SELECT 1")
        ret = cursor.fetchone()
        print(ret)

I got (1,) as a result, but where are column types? As I can see you are using with_column_types=True in response = await execute(query, args=args, with_column_types=True, **execute_kwargs), but you are not returning them.

Anyway I hoped that the following will work, but it doesn't:

import asyncio
from asynch import connect
import pandas as pd
from jinjasql import JinjaSql

async def main():
    conn = await connect(
        host="127.0.0.1",
        port=9001,
        database="default",
    )

    jsql = JinjaSql(param_style='pyformat')

    sql_templ = 'select 1'
    params = {}

    query, bind_params = jsql.prepare_query(sql_templ, params)
    df = pd.read_sql_query(query, conn, params=bind_params) # I tried df = await pd.read_sql(...) but it hasn't worked

    print(df)

asyncio.run(main())
RuntimeWarning: coroutine 'Cursor.execute' was never awaited
  cur.execute(*args, **kwargs)

Looks like you should create/delete the cursor on the fly in the background for such usage.

@ghuname
Copy link
Author

ghuname commented Nov 5, 2020

@long2ice can you please comment my findings?

@long2ice
Copy link
Owner

long2ice commented Nov 6, 2020

Does pandas support asyncio?

@ghuname
Copy link
Author

ghuname commented Nov 7, 2020

No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants