New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework psycopg2.connect() interface. #363

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@a1exsh
Copy link
Contributor

a1exsh commented Oct 27, 2015

This might sound scary, but... I have some good reasons to move the psycopg2.connect() logic to the C level.

  1. (actually "0", but markdown is stupid :-p) This is meant to be 100% backwards compatible. All tests pass with Python 2.7 and 3.4.
  2. Adds new module function make_dsn that replaces the join('%s=%s') / _param_escape logic.
  3. Adds new (undocumented) module function parse_args that is used in _connect implementation, and in test_module to capture the arguments.
  4. The order of dbname/user/host/port after mapping to conninfo is not stable anymore so I had to replace some assertions of exact equality in tests. Hope this is not a huge problem.
  5. This is how I intend to use this new API to implement the ReplicationConnection class in C: zalando-stups@fbcf99a#diff-ae1fa330192a5b746297ae9317bbeae0R62
  6. As you can see, both parse_dsn and make_dsn are utilized. Currently, while this is not available yet I have to to duplicate the _param_escape logic in extras.py.

I've found it pity that the handy PyArg_ParseTupleAndKeywords can't cope with variable number of arguments, so I had to invent some magic in psyco_parge_args(). Basically, it pops any keys it is aware of from kwargs, then passes the rest to psyco_make_dsn().

@@ -109,12 +189,16 @@ psyco_connect(PyObject *self, PyObject *args, PyObject *keywds)
conn = PyObject_CallFunction(factory, "si", dsn, async);

This comment has been minimized.

@a1exsh

a1exsh Oct 28, 2015

Contributor

Personally, if this wouldn't break (presumably existing) connection subclasses, I'd just pass kwargs down here and let connection_init() run the psyco_parse_args() step. This would remove the need for _connect() function altogether and would let any connection class being instantiated directly w/o any actual need for connection_factory parameter, like:

conn = psycopg2.connection(dsn, **kwargs)
replconn = psycopg2.ReplicationConnection(dsn, replication_type=REPLICATION_LOGICAL, **kwargs)

This comment has been minimized.

@a1exsh

a1exsh Oct 29, 2015

Contributor

And actually, what would stop us from passing the key-value pairs straight to PQconnectdbParams() which is available since 9.0?

The dsn then will have to become a calculated property of course, so if requested we could still create a conninfo string, to use in repr(), for example, but if the user never asks for it, why bother making one?

This comment has been minimized.

@a1exsh

a1exsh Oct 29, 2015

Contributor

Sorry, I can't just stop :-p

Using PQconnectdbParams() would also allow us to lift the limitation of either dsn or keyword arguments that exists in current psycopg2.connect() implementation, because this function is able to do just that: take a dsn in either conninfo or URI form and apply any number of keyword arguments on top of it, to produce the actual set of connection parameters.

There is a problem with this approach, however: extracting the exact dsn string that was passed to connect function is no longer possible, because there is no such single string (well, we can still combine parse_dsn and make_dsn in creative ways...)

What is possible, is extracting the dsn that was used to establish the connection, using PQconninfo(), but it's only available from 9.3 and it is going to add a number of default values such as client_encoding=utf8. What it will also do is reveal any connection parameters that are set from environment variables, which might be a useful thing in itself. I think we should add support for PQconninfo() separately, regardless of the status of this patch.

This comment has been minimized.

@a1exsh

a1exsh Oct 30, 2015

Contributor

Please see #364 for PQconninfo() support.

This comment has been minimized.

@alexeyklyukin

alexeyklyukin Dec 21, 2015

Came across the issue of being unable to use dbname="host = localhost port=5432..." parameters in psycopg2, so +1 for passing the arguments from connect directly into PQconnectdbParams (with expand_dbname=True)


/* check for any whitespace or empty string */
if (from < end && *from) {
for (src = from; src < end && *src; ++src) {

This comment has been minimized.

@a1exsh

a1exsh Oct 28, 2015

Contributor

By the way, there seems to be a missing check on len in psycopg_escape_identifier_easy() just above this function: if len < strlen(from) it will try to go beyond provided len.

static PyObject *
parse_arg(int pos, char *name, PyObject *defval, PyObject *args, PyObject *kwargs)
{
Py_ssize_t nargs = PyTuple_GET_SIZE(args);

This comment has been minimized.

@alexeyklyukin

alexeyklyukin Dec 21, 2015

It seems you are calculating the size of the args tuple on each invocation of this function. It's called only from psyco_parse_args, which in itself calculates this value in the beginning and does not change it subsequently. Would it make sense to just pass it as a parameter?

This comment has been minimized.

@a1exsh

a1exsh Dec 21, 2015

Contributor

It wouldn't hurt to do so, but I'm not sure that would be a win. There is only one call site now, but I think it makes it easier to reuse this function if you don't require to pass nargs separately. The calculation itself is cheap of course.

if (nargs > 3) {
PyErr_Format(PyExc_TypeError,
"parse_args() takes at most 3 arguments (%d given)", (int)nargs);
goto exit;

This comment has been minimized.

@alexeyklyukin

alexeyklyukin Dec 21, 2015

Doesn't make much sense to call Py_XDECREF on NULL pointers. Also, I think there are too much goto-s in this function, basically, most of the code paths end with goto. I'd try to rewrite it with just conditionals, i.e

if (((dsn = parse_arg(0, ...)) != NULL) && ((factory = parse_arg(1, ...)) != NULL) && ...) {
/* continue with what you are doing /
}
/
do the cleanup */

This comment has been minimized.

@a1exsh

a1exsh Dec 21, 2015

Contributor

The thing is you still need to do the cleanup, and you don't know which of the objects were initialized, so there still would be some XDECREFS on NULLs. They don't hurt anyway.

The use of goto is an issue of 1) personal taste and 2) context. For one, I don't have problems with that (as long as it's forward goto for cleanup), and 2) there's plenty of harmless gotos around in this code base. :-)

This comment has been minimized.

@dvarrazzo

dvarrazzo Dec 22, 2015

Member

@alexeyklyukin, this function is better written with goto for error handling and cleanup than with deeply nested conditionals. The style used by @a1exsh is perfectly idiomatic for a Python C extension.

This comment has been minimized.

@a1exsh

a1exsh Dec 22, 2015

Contributor

@dvarrazzo Daniele, what do you think about this patch in general: does it make sense? Please see my latest comment about possible use for PQconnectdbParams: #363 (comment)
Thank you!

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jan 4, 2016

@a1exsh sorry I couldn't look at the pull requests in the last days. I'll try and take a look soon.

@valgog

This comment has been minimized.

Copy link

valgog commented Feb 29, 2016

@dvarrazzo any news here? Did you have a chance to look into this?

dvarrazzo added a commit to dvarrazzo/psycopg that referenced this pull request Mar 3, 2016

Implementation of make_dsn in Python
This is equivalent to what proposed in psycopg#363, but with a much simpler
implementation.
@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Mar 3, 2016

I have been thinking about this branch, and I'm not convinced. The C implementation is way more complicated than a Python one, for no real benefit.

I have put together a pure Python implementation of make_dsn(), which is what needed in the replication branch for the dsn manipulation: please see my make_dsn branch. The replication connection can import the function from Python in its __init__().

Thanks to the use of parse_dsn() it is now possible to specify both a string dsn and keyword arguments, with the latter overriding the former.

I would prefer to close this merge request and use my implementation, but I'd like to have @a1exsh and/or @fogzot comment about that.

@a1exsh

This comment has been minimized.

Copy link
Contributor

a1exsh commented Mar 3, 2016

@dvarrazzo hm... but can I really import your Python-defined make_dsn from CPython (see: zalando-stups@fbcf99a#diff-ae1fa330192a5b746297ae9317bbeae0R115)? If so, I think that would solve my need and I'm OK with your approach.

Thank you for looking into this! :-)

@fogzot

This comment has been minimized.

Copy link
Member

fogzot commented Mar 3, 2016

It is not customary because C modules are supposed to be called from Python and not vice-versa, but yes, you can. In this case, where both C and Python are part of the same module I don't see anything against what @dvarrazzo proposes. Thumbs up!

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Mar 3, 2016

Yes, it's less frequent but it's no problem for the C part importing bits of Python: for some example grep for PyImport_ImportModule into the psycopg dir. For example take a look at cursor_type.c where we import psycopg2.tz. Python functions can be called from C using the PyObject_CallFunction* functions.

As an advice, don't cache the imported module but deal with it as a normal Python object, dereferencing it at the end of the function. Importing a module already imported is a fast operation; not releasing it creates all sort of problems in environments where the Python interpreter is reloaded, because the C library is not (Federico will remember a lot of pain around Decimal and mod_python...)

Thank you very much for your contribution and I apologise again for the long time taken to get back at your work: I've really appreciated your collaboration and even if this pull request didn't make in the library I think make_dsn/parse_dsn are a good addition, together with all your other Ideas.

All the best!

@dvarrazzo dvarrazzo closed this Mar 3, 2016

@a1exsh

This comment has been minimized.

Copy link
Contributor

a1exsh commented Mar 3, 2016

Yeah, I was already looking around and found these uses. Thank you, I will update the other PR soon (assuming your make_dsn branch is merged).

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Mar 3, 2016

I'm finishing the branch to get it merged in master soon

dvarrazzo added a commit that referenced this pull request Mar 3, 2016

Merge branch 'make_dsn'
Close issue #363 instead of the proposed merge request.
@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Mar 3, 2016

And it's in! :) You can use the above link to the merge commit to check documentation, features, tests etc.

@a1exsh

This comment has been minimized.

Copy link
Contributor

a1exsh commented Mar 3, 2016

Great, thanks! :-)

On Thu, Mar 3, 2016, 18:18 Daniele Varrazzo notifications@github.com
wrote:

And it's in! :) You can use the above link to the merge commit to check
documentation, features, tests etc.


Reply to this email directly or view it on GitHub
#363 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment