Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python codegen support #923

Merged
merged 2 commits into from Mar 5, 2021
Merged

Conversation

robholt
Copy link
Contributor

@robholt robholt commented Mar 3, 2021

I've come to really like using sqlc for Go projects and wanted something similar for Python. There's a few existing Python libraries that load queries from sql files, but they lack the type information, models, and query validation that sqlc can do. Since sqlc already has support for multiple languages I figured adding Python support should be fairly straight forward, which turned out to be accurate.

So basically this PR adds support for generating fully type-hinted Python code. I modeled my changes after the existing Kotlin support, although I think it ended up looking at lot more like the Go codegen. There's a bunch of code duplication between the Python and Go codegen, but I think that's unavoidable until some kind of general codegen interface can be developed. I wanted to keep my changes limited to just adding Python support.

The generated code relies on a small runtime library: sqlc-python-runtime. This keeps the generated code simple and allows for one function to support synchronous and asynchronous executors.

What's working:

  • postgresql
  • All the projects under examples/ (including tests I've added for them) and the features they use
  • Dates
  • Json fields
  • Type overrides

What doesn't work:

  • mysql (I focused on postgresql because the project I'm using this on is postgresql only, but mysql support could definitely be added)
  • json/db tag options (not a thing for Python)
  • The emit_prepared_queries option (although the asyncpg executor automatically prepares and caches queries)

As far as additional testing goes, I've started using this in an existing project with a couple dozen queries against a moderately complicated schema and it's been great so far.

What's different:

  • Like JDBC in Kotlin/Java, psycopg2 doesn't support numbered parameters. However asyncpg does, so the regex hack the Kotlin codegen uses is applied at runtime for only psycopg2. I'm planning on investigating psycopg2s named parameters as another potential solution.
  • The generated query functions aren't part of a class or anything, so their first parameter is the database connection object. This made multiple python query files (instead of just one big one) and the sync/async support much simpler.

Thanks again for sqlc! Hopefully this is something you're interested in for sqlc.

@kyleconroy
Copy link
Collaborator

Before I get into the actual code, I just wanted to express how happy I am to see pull requests like this. I've wanted sqlc to support more than Go and Kotlin for a long time, and I think Python is a great addition. I'm happy to merge this as-s behind an --experimental flag. I want to get some people using it before we commit to the structure of the generated code.

Can you tell me more about how much time you're looking to spend on Python support? If you'd just like to contribute the initial implementation, I'm happy to use it. If you're looking to maintain it, there are a number of changes to be made before I'd be comfortable marking it as stable.

I'll leave an initial review and we discuss from there.

Copy link
Collaborator

@kyleconroy kyleconroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Go implementation, I've tried to generate code that doesn't rely on external packages. This isn't always possible, but in most cases the generated code just uses the standard library.

With that in mind, I don't think we should use pydantic or a custom sqlc-runtime package. Instead, I'd like to use dataclasses for models and SQLAlchemy for the runtime.

SQLAlchemy solves the issue of supporting multiple database engines. It also gives us a consistent interface across engines. The ORM mode is too high-level for our use case, but we can rely on the core package. This also allows people already using SQLAlchemy and easy way to integrate the code.

Lastly, I'd prefer the classes and methods over bare functions. The Go code was initially modeled after gRPC services. gRPC services in Python look like this. Ignoring the obvious style issues, I think simple classes like this are easier to pass around and mock out.

Comment on lines +7 to +9
# Enums

# Models
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't generate this comments, they're mainly just noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, these are leftover from my initial single mega-file implementation, I'll remove them.

Comment on lines +37 to +48
@overload
def create_author(conn: sqlc.Connection, name: str, bio: Optional[str]) -> Optional[models.Author]:
pass


@overload
def create_author(conn: sqlc.AsyncConnection, name: str, bio: Optional[str]) -> Awaitable[Optional[models.Author]]:
pass


def create_author(conn: sqlc.GenericConnection, name: str, bio: Optional[str]) -> sqlc.ReturnType[Optional[models.Author]]:
return conn.execute_one_model(models.Author, CREATE_AUTHOR, name, bio)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never used or seen the @overload operator. How common is it? Are there other ways to expose a similar sync / async function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how common it is, I'm using it here so a single function can be used with both sync and async engines and type checkers properly understand it. It's definitely something I cooked up myself though, I haven't seen this specific pattern for sync+async support anywhere else. Without it I'd have to define separate sync and async functions, and with bare functions I definitely didn't want to just prefix (or suffix) every function with async_.

However, if the query functions are moved into classes it's a lot simpler to just define two classes, one with sync functions and another with async functions, no method prefix/suffix is needed since they are contained within separate classes. The generation of async (or sync) query classes can also be placed behind a config option, so users who don't care about async (or sync) support don't get that generated code.

Comment on lines +31 to +34
LIST_AUTHORS = """-- name: list_authors :many
SELECT id, name, bio FROM authors
ORDER BY name
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to generate the query strings closed to the functions themselves so it's easier to read. If I'm looking at list_authors, I can then see the query it's referencing without having to jump around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's nice to keep them close together, but how would this work if the query functions are in a class? I could define these constants as class properties above each method, but I can't say I've seen other Python code that does that, class properties usually go at the top of the class definition.

examples/python/src/booktest/models.py Show resolved Hide resolved
@robholt
Copy link
Contributor Author

robholt commented Mar 4, 2021

I'm definitely on-board for maintaining Python support. I'm not totally sold on the current state of the generated code, so I'm happy to work on improvements.

Concerning pydantic: One of the use cases I had in mind was using sqlc + FastAPI to create a simple REST API (this is also what the asyncio support is for), directly using the sqlc generated models for the API. I certainly see the merit in having zero external dependencies for the generated code though. What about an option to generate pydantic models (off by default)? To the query code dataclasses and pydantic models will be identical, so an option should be very simple.

Concerning Sqlalchemy: I considered it initially, but didn't want to require it for the generated code to work. I went though a few iterations since then though, so I'm not sure that reasoning makes sense anymore, especially since using it could eliminate the runtime package. The only potential problem is asyncio support was only added in 1.4, which is still in beta and has some breaking changes. The non-asyncio code won't require any 1.4 features though, so it shouldn't be a big deal. I think I will switch to it, that also simplifies parameter handling (I can just change $1 to :p1 in the generated code).

Classes vs functions: Wrapping the queries up into a class would make unit test mocks (and assertions they're called) easier. I can change that so each query function is a class method. I do want to stick to a file/class per .sql file though, my first implementation generated a single file with a single Queries mega-class and I think that got unwieldy quick.

@kyleconroy
Copy link
Collaborator

I'm definitely on-board for maintaining Python support

Great! We can discuss the rest of the changes in future PRs. I'm going to merge in the support as-is and then put it behind an experimental command line flag. Sound good?

@robholt
Copy link
Contributor Author

robholt commented Mar 4, 2021

Yeah, that sounds good to me

@kyleconroy kyleconroy merged commit 3f131b7 into sqlc-dev:master Mar 5, 2021
victoraugustolls pushed a commit to Streppel/sqlc that referenced this pull request May 6, 2021
* Initial python codegen support
* Add python tests CI job
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants