Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework psycopg2.connect() interface. #363

Closed
wants to merge 1 commit into from

Conversation

a1exsh
Copy link
Contributor

@a1exsh a1exsh commented Oct 27, 2015

This might sound scary, but... I have some good reasons to move the psycopg2.connect() logic to the C level.

  1. (actually "0", but markdown is stupid :-p) This is meant to be 100% backwards compatible. All tests pass with Python 2.7 and 3.4.
  2. Adds new module function make_dsn that replaces the join('%s=%s') / _param_escape logic.
  3. Adds new (undocumented) module function parse_args that is used in _connect implementation, and in test_module to capture the arguments.
  4. The order of dbname/user/host/port after mapping to conninfo is not stable anymore so I had to replace some assertions of exact equality in tests. Hope this is not a huge problem.
  5. This is how I intend to use this new API to implement the ReplicationConnection class in C: zalando-stups@fbcf99a#diff-ae1fa330192a5b746297ae9317bbeae0R62
  6. As you can see, both parse_dsn and make_dsn are utilized. Currently, while this is not available yet I have to to duplicate the _param_escape logic in extras.py.

I've found it pity that the handy PyArg_ParseTupleAndKeywords can't cope with variable number of arguments, so I had to invent some magic in psyco_parge_args(). Basically, it pops any keys it is aware of from kwargs, then passes the rest to psyco_make_dsn().

@@ -109,12 +189,16 @@ psyco_connect(PyObject *self, PyObject *args, PyObject *keywds)
conn = PyObject_CallFunction(factory, "si", dsn, async);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, if this wouldn't break (presumably existing) connection subclasses, I'd just pass kwargs down here and let connection_init() run the psyco_parse_args() step. This would remove the need for _connect() function altogether and would let any connection class being instantiated directly w/o any actual need for connection_factory parameter, like:

conn = psycopg2.connection(dsn, **kwargs)
replconn = psycopg2.ReplicationConnection(dsn, replication_type=REPLICATION_LOGICAL, **kwargs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, what would stop us from passing the key-value pairs straight to PQconnectdbParams() which is available since 9.0?

The dsn then will have to become a calculated property of course, so if requested we could still create a conninfo string, to use in repr(), for example, but if the user never asks for it, why bother making one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I can't just stop :-p

Using PQconnectdbParams() would also allow us to lift the limitation of either dsn or keyword arguments that exists in current psycopg2.connect() implementation, because this function is able to do just that: take a dsn in either conninfo or URI form and apply any number of keyword arguments on top of it, to produce the actual set of connection parameters.

There is a problem with this approach, however: extracting the exact dsn string that was passed to connect function is no longer possible, because there is no such single string (well, we can still combine parse_dsn and make_dsn in creative ways...)

What is possible, is extracting the dsn that was used to establish the connection, using PQconninfo(), but it's only available from 9.3 and it is going to add a number of default values such as client_encoding=utf8. What it will also do is reveal any connection parameters that are set from environment variables, which might be a useful thing in itself. I think we should add support for PQconninfo() separately, regardless of the status of this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #364 for PQconninfo() support.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came across the issue of being unable to use dbname="host = localhost port=5432..." parameters in psycopg2, so +1 for passing the arguments from connect directly into PQconnectdbParams (with expand_dbname=True)

static PyObject *
parse_arg(int pos, char *name, PyObject *defval, PyObject *args, PyObject *kwargs)
{
Py_ssize_t nargs = PyTuple_GET_SIZE(args);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you are calculating the size of the args tuple on each invocation of this function. It's called only from psyco_parse_args, which in itself calculates this value in the beginning and does not change it subsequently. Would it make sense to just pass it as a parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't hurt to do so, but I'm not sure that would be a win. There is only one call site now, but I think it makes it easier to reuse this function if you don't require to pass nargs separately. The calculation itself is cheap of course.

@dvarrazzo
Copy link
Member

@a1exsh sorry I couldn't look at the pull requests in the last days. I'll try and take a look soon.

@valgog
Copy link

valgog commented Feb 29, 2016

@dvarrazzo any news here? Did you have a chance to look into this?

dvarrazzo added a commit to dvarrazzo/psycopg2 that referenced this pull request Mar 3, 2016
This is equivalent to what proposed in psycopg#363, but with a much simpler
implementation.
@dvarrazzo
Copy link
Member

I have been thinking about this branch, and I'm not convinced. The C implementation is way more complicated than a Python one, for no real benefit.

I have put together a pure Python implementation of make_dsn(), which is what needed in the replication branch for the dsn manipulation: please see my make_dsn branch. The replication connection can import the function from Python in its __init__().

Thanks to the use of parse_dsn() it is now possible to specify both a string dsn and keyword arguments, with the latter overriding the former.

I would prefer to close this merge request and use my implementation, but I'd like to have @a1exsh and/or @fogzot comment about that.

@a1exsh
Copy link
Contributor Author

a1exsh commented Mar 3, 2016

@dvarrazzo hm... but can I really import your Python-defined make_dsn from CPython (see: zalando-stups@fbcf99a#diff-ae1fa330192a5b746297ae9317bbeae0R115)? If so, I think that would solve my need and I'm OK with your approach.

Thank you for looking into this! :-)

@fogzot
Copy link
Member

fogzot commented Mar 3, 2016

It is not customary because C modules are supposed to be called from Python and not vice-versa, but yes, you can. In this case, where both C and Python are part of the same module I don't see anything against what @dvarrazzo proposes. Thumbs up!

@dvarrazzo
Copy link
Member

Yes, it's less frequent but it's no problem for the C part importing bits of Python: for some example grep for PyImport_ImportModule into the psycopg dir. For example take a look at cursor_type.c where we import psycopg2.tz. Python functions can be called from C using the PyObject_CallFunction* functions.

As an advice, don't cache the imported module but deal with it as a normal Python object, dereferencing it at the end of the function. Importing a module already imported is a fast operation; not releasing it creates all sort of problems in environments where the Python interpreter is reloaded, because the C library is not (Federico will remember a lot of pain around Decimal and mod_python...)

Thank you very much for your contribution and I apologise again for the long time taken to get back at your work: I've really appreciated your collaboration and even if this pull request didn't make in the library I think make_dsn/parse_dsn are a good addition, together with all your other Ideas.

All the best!

@dvarrazzo dvarrazzo closed this Mar 3, 2016
@a1exsh
Copy link
Contributor Author

a1exsh commented Mar 3, 2016

Yeah, I was already looking around and found these uses. Thank you, I will update the other PR soon (assuming your make_dsn branch is merged).

@dvarrazzo
Copy link
Member

I'm finishing the branch to get it merged in master soon

dvarrazzo added a commit that referenced this pull request Mar 3, 2016
Close issue #363 instead of the proposed merge request.
@dvarrazzo
Copy link
Member

And it's in! :) You can use the above link to the merge commit to check documentation, features, tests etc.

@a1exsh
Copy link
Contributor Author

a1exsh commented Mar 3, 2016

Great, thanks! :-)

On Thu, Mar 3, 2016, 18:18 Daniele Varrazzo notifications@github.com
wrote:

And it's in! :) You can use the above link to the merge commit to check
documentation, features, tests etc.


Reply to this email directly or view it on GitHub
#363 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants