Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Add support for async SQLAlchemy connections to read_database #15162

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Mar 19, 2024

Ref: https://stackoverflow.com/questions/77166897

Database connectivity continues to expand; with this PR we can now seamlessly init a DataFrame from async SQLAlchemy Connection, Engine, and Session objects.

Added async unit test coverage using aiosqlite (SQLite), but additionally validated the PR locally using asyncpg (PostgreSQL) and aioodbc (ODBC) drivers.

Examples

SQL Server over ODBC (async driver):

from sqlalchemy.ext.asyncio import create_async_engine
import polars as pl

odbc_string = "Driver={ODBC Driver 17 for SQL Server};Server=localhost"
async_conn = create_async_engine(
    f"mssql+aioodbc:///?odbc_connect={odbc_string}"
)
pl.read_database(
    query = "SELECT optname,value FROM master.dbo.MSreplication_options",
    connection = async_conn,
)
# shape: (3, 2)
# ┌────────────────┬───────┐
# │ optname        ┆ value │
# │ ---            ┆ ---   │
# │ str            ┆ bool  │
# ╞════════════════╪═══════╡
# │ transactional  ┆ true  │
# │ merge          ┆ true  │
# │ security_model ┆ true  │
# └────────────────┴───────┘

SQLite (async driver):

async_conn = create_async_engine("sqlite+aiosqlite:///chinook.db")

pl.read_database( 
    query = """
      SELECT l.Title AS album_name, r.Name AS artist
      FROM albums l INNER JOIN artists r ON r.ArtistId = l.ArtistId
      ORDER BY l.Title LIMIT 4
    """,
    connection = async_conn,
)
# shape: (4, 2)
# ┌────────────────────────────────┬────────────────────────────────┐
# │ album_name                     ┆ artist                         │
# │ ---                            ┆ ---                            │
# │ str                            ┆ str                            │
# ╞════════════════════════════════╪════════════════════════════════╡
# │ ...And Justice For All         ┆ Metallica                      │
# │ 20th Century Masters - The Mi… ┆ Scorpions                      │
# │ A Copland Celebration, Vol. I  ┆ Aaron Copland & London Sympho… │
# │ A Matter of Life and Death     ┆ Iron Maiden                    │
# └────────────────────────────────┴────────────────────────────────┘

Also

The "execute" method of the internal ConnectionExecutor abstraction was getting a bit long, so factored out the SQLAlchemy setup block into its own method for clarity. Next commit will likely be a larger-scale (and overdue ;) refactor of all the database code into its own "io.database" subdirectory to facilitate easier maintenance.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Mar 19, 2024
@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Mar 19, 2024

Looks like some slight differences in the latest async behaviour; the new unit tests pass on Python 3.8 and 3.11, but not 3.12 - will investigate 🤔

@alexander-beedie alexander-beedie force-pushed the sqlalchemy-async-support branch 2 times, most recently from 107e159 to 8c4fb0c Compare March 19, 2024 14:25
@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Mar 19, 2024

... will investigate

Got it; all tests passing on all platforms now 😎

Copy link

codecov bot commented Mar 19, 2024

Codecov Report

Attention: Patch coverage is 80.95238% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 81.18%. Comparing base (2250055) to head (0502c8d).
Report is 2 commits behind head on main.

❗ Current head 0502c8d differs from pull request most recent head 35307a8. Consider uploading reports for the commit 35307a8 to get more accurate results

Files Patch % Lines
py-polars/polars/io/database.py 81.81% 10 Missing and 4 partials ⚠️
py-polars/polars/datatypes/convert.py 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15162   +/-   ##
=======================================
  Coverage   81.18%   81.18%           
=======================================
  Files        1347     1347           
  Lines      175423   175467   +44     
  Branches     2506     2516   +10     
=======================================
+ Hits       142410   142449   +39     
- Misses      32533    32536    +3     
- Partials      480      482    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Just one comment as an aside: I usually extract private functions below the main function, following the "newspaper style" proposed by uncle Bob. I notice you usually go the other way (there's also something to be said for that).

I often find myself moving functions around when refactoring, which is not ideal for getting clean git diffs/history. Are you very attached to your current modus operandi?

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Mar 21, 2024

I often find myself moving functions around when refactoring, which is not ideal for getting clean git diffs/history. Are you very attached to your current modus operandi?

I guess I think the other way round - if you're going to use something, it should already have been defined, so you will know it's available and have seen it coming (rather than using a newspaper analogy I find it more like a house; you don't build a house from the top-floor down, you build it from the foundations up 🤣)

However, I'm about to make that difference largely go away here as my follow-up PR (already ready to go!) refactors the database code into a clean io/database/... structure, with utility functions/classes all separated into their own private modules, which I suspect we both prefer, rather than having everything inlined in one large file! 👍

@alexander-beedie alexander-beedie merged commit cb3efbc into pola-rs:main Mar 21, 2024
33 checks passed
@alexander-beedie alexander-beedie deleted the sqlalchemy-async-support branch March 21, 2024 08:25
@stinodego
Copy link
Member

Sounds good!

@alexander-beedie
Copy link
Collaborator Author

Sounds good!

Here we go: #15201
Much more manageable/organised 😎

@stinodego
Copy link
Member

@alexander-beedie I just got a spurious failure in the CI, I think it might be related to the async test added in this PR as it looks async-ish. What do you think?
https://github.com/pola-rs/polars/actions/runs/8372713122/job/22924229816?pr=14765

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Mar 21, 2024

@alexander-beedie I just got a spurious failure in the CI, I think it might be related to the async test added in this PR as it looks async-ish. What do you think?

Will have a look in a few minutes!

@WilliamStam
Copy link

WilliamStam commented May 31, 2024

after installing "nest_asyncio" i finally got the engine to work with it. but due to reasons of setting variables and schema etc first i need to use the "session" instead. i cant get it working with the session. pls sir.. can i have some help - puppy dog eyes -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-database Area: reading/writing to databases enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants