datastore plugin's permission checks will fail on a non english environnement #642

domoritz · 2013-03-26T08:49:05Z

ckanext/datastore/plugin.py is using the error messages returned by postgresql to check if the permisisons are right, for example:

if 'permission denied' in str(e) or 'read-only transaction' in str(e):
                        pass

and

 if 'permission denied' not in str(e):
                        raise

this will fail if postgresql running in a non english localized environnement since it seems that postgresql errors are translated (verified with postesql 8.4 and 9.1 on release-v2.0)

…tead of raising an exception. If postgres is set to a language other than english, some strings might not occur in the error message returned from the database. This change makes the checks less strict but in almost all cases, this should not be a problem because the only error raised during executing of the permission check statements are (expected) permission errors.

tobes · 2013-03-26T10:50:38Z

@domoritz can you assign this to review if ready

tobes · 2013-03-26T14:13:13Z

@domoritz I thought you assigned this to @TomDunham

domoritz · 2013-03-26T15:06:12Z

@tobes That was #686

tobes · 2013-03-26T15:13:20Z

@domoritz

this does not seem to deal with the actual issue
your messages need spaces which are missing in many cases

vitorbaptista · 2013-03-27T01:47:37Z

@domoritz I've created a patch in https://github.com/vitorbaptista/ckan/commit/eb92e3d837bc55e451c18eab7b2057e6e160a899 using Postgres' has_database_privilege instead of relying on the string returned. This should work in any locale (and works in all postgres versions we support).

I haven't sent a pr yet because that commit changes the behavior as well. I'm not raising if the DB is writable, but simply returning False. This might not be the correct behavior, but it felt cleaner. What do you think?

domoritz · 2013-03-27T09:34:46Z

@tobes, @vitorbaptista

We have two places where we need to check the permissions of the database.

The first is where we ask whether the database is read only (https://github.com/okfn/ckan/blob/master/ckanext/datastore/plugin.py#L109). This could be the case when someone uses a slave db that does not allow editing in case the normal db is not accessible. The check only makes sure that we don't raise an exception if you have this read-only set up.

The second place is https://github.com/okfn/ckan/blob/master/ckanext/datastore/plugin.py#L144 where we check whether we can create or update tables. This check is far more important because if this check fails, it is possible to write data to the database through the datastore sql API. Basically, if this check fails, the sql interface allows writing and not only querying.

I think @vitorbaptista solution looks very promising. We can probably use has_table_privilege() for the second case. I'll mark this pr as [WIP] and ping you later.

domoritz · 2013-03-27T09:37:03Z

@vitorbaptista The behaviour of _is_read_only_database is to return a bool. It only raises an issue if the exception is not related to permissions at all.

domoritz · 2013-03-27T09:58:48Z

@vitorbaptista has_database_privilege means the privilege to create a schema within a database, not a table in the database. We have to use has_schema_privileges (see b68601d).

…perimental privilege checks.

…g exceptions)

… tests

vitorbaptista · 2013-03-27T18:40:39Z

Hi @domoritz

@vitorbaptista has_database_privilege means the privilege to create a schema within a database, not a table in > the database. We have to use has_schema_privileges (see b68601d).

Are you sure? Could you point me to where I could read this?

vitorbaptista · 2013-03-27T18:46:51Z

ckanext/datastore/plugin.py

        A table is created by the write user to test the read only user.
        '''
        write_connection = db._get_engine(None,
            {'connection_url': self.write_url}).connect()
-        write_connection.execute(u"DROP TABLE IF EXISTS public._foo;"
-            u"CREATE TABLE public._foo (id INTEGER, name VARCHAR)")
+        write_connection.execute(


It might be better to use a transaction/rollback, as we used to do in _is_read_only_database(). It makes no difference now, but IMHO it makes the code a bit cleaner.

It would be much cleaner, indeed. Sadly this does not work here because we have two independent connections.

domoritz · 2013-03-28T11:33:23Z

Hey @vitorbaptista

Are you sure? Could you point me to where I could read this?

http://www.postgresql.org/docs/8.1/static/sql-grant.html and then read what it says about the CREATE privilege.

"For databases, allows new schemas to be created within the database."

I put the check for the legacy mode in this function to make it testable.

domoritz · 2013-03-28T12:20:32Z

@vitorbaptista I assigned you to this. Are you happy to review this?

domoritz · 2013-03-28T16:29:53Z

Hmm, I just noticed that we have thousands of places in https://github.com/okfn/ckan/blob/master/ckanext/datastore/db.py where we look at the error messages. I think it would take quite some time to replace all of them with a localisation independent solution. I guess, we have to add a note to the docs. Nonetheless, this pr is still good since it cleans up the configuration process.

tobes · 2013-03-28T16:36:22Z

@domoritz yes maybe create a fresh issue to do this for 2.1/2.2

domoritz · 2013-03-28T16:42:05Z

Created issue #718

vitorbaptista · 2013-03-28T17:43:07Z

ckanext/datastore/plugin.py

+                    log.warn("Legacy mode active. "
+                             "The sql search will not be available.")
+                elif not self._read_connection_has_correct_privileges():
+                    if 'debug' in self.config and self.config['debug']:


Not a big deal for this, just a matter of dev style, but something like if self.config.get('debug'): feels cleaner to me.

vitorbaptista · 2013-03-28T17:49:40Z

@domoritz I'll be glad to :-)

I've added a few comments.

domoritz · 2013-03-28T18:04:26Z

@vitorbaptista Thanks. I've changed the code.

vitorbaptista · 2013-03-28T22:04:18Z

ckanext/datastore/plugin.py

-        if not self.legacy_mode:
-            if self.write_url == self.read_url:
-                raise Exception("The write and read-only database connection url are the same.")
-
        if self._get_db_from_url(self.ckan_url) == self._get_db_from_url(self.read_url):


This isn't important, but if you're changing other stuff, might change this at the same time as well. Change this to simply return self._get_db_from_url(self.ckan_url) == self._get_db_from_url(self.read_url)

…d easier to test

vitorbaptista · 2013-03-29T00:14:47Z

Hey @domoritz,

I like your idea of creating the handler, so you can reduce duplication. I'm not very keen on the naming: it's too general. Maybe something more explicit like log_or_raise, or something similar. Also, I saw that you changed the method name to _check_urls_and_permissions, which is much better, and are receiving the handler as a parameter. This kind of dependency injection is useful for less-dynamic languages, so we're able to test. But in Python we don't need it. You could create a method like

def _log_or_raise(message):
    if self.config.get('debug'):
        log.critical(message)
    else:
        raise DatastoreException(message)

and use that in _check_urls_and_permissions, which won't need any parameter now. In the test, you could overwrite it, so it always raised the exception. Then you could test the following cases:

Variables	Test 1	Test 2	Test 3	Test 4	Test 5
legacy_mode	False	True	False	False	False
_same_read_and_write_url()	False	True	True	False	False
_same_ckan_and_datastore_db()	False	False	False	True	False
_read_connection_has_correct_privileges()	True	True	True	True	False
RESULT	OK	OK	RAISE	RAISE	RAISE

This doesn't test all the possibilities, but I guess it's good enough. I've put in bold the variable that I'm testing in each case, all others are there just so I can see its effects in the code. Right now, you're just testing cases 3 and 4. It's also not clear why sqlalchemy.exc.OperationalError might be thrown, and why you're ignoring it.

Obviously, this would only make sense if we also had tests for each of these methods, as we're stubbing them. But I guess it'll make the tests easier to read, and we'll catch more issues overall.

P.S.: I'm probably be being too picky with this pr, so please feel free to tell me if you think there's not much value in the specific changes I'm proposing.

…on tests

domoritz · 2013-03-29T11:55:53Z

@vitorbaptista I rewrote the test a little bit so that it does not require the try..catch ant more and mocked _check_urls_and_permissions. In this case I'm against mocking all the functions because you could easily miss a not somewhere. Also, I noticed that the test, where legacy mode is active and _read_connection_has_correct_privileges() returns False was missing.

Yes, you are picky but I value your pickyness. It's exactly what helps me improve my coding style and sense for simple and safe code.

…on tests

domoritz · 2013-03-29T12:45:15Z

@vitorbaptista I have no idea why but it seems that the configuration test overwrites the _read_connection_has_correct_privileges method in the datastore plugin. The tests only fail if I run the configure and create tests together. Do you know where I have to reset something to clean up after the tests?

vitorbaptista · 2013-04-01T16:28:32Z

@domoritz I haven't looked into why your test is breaking yet, just reading the code.

The code itself looks much better now, but the test got worse. The main issue that I see is that you're using global variables. Taking a step back, why are you using globals in the first place? You need a way to check the result of the _check_urls_and_permissions() call. Your solution was injecting the error_handler, which saves the error message and call count (not sure why), and check if the message was what you expected.

Among other reasons, this is bad because it creates another way for your test to fail: if someone changes the message. The code also gets more complex, because I have to deal with globals. My suggestion would be to change that code a bit.

First of all, _check_urls_and_permissions() doesn't need to receive the error_handler as a parameter. It can simply use _log_or_raise directly. Python is dynamic enough to let us inject the dependency by overriding _log_or_raise directly, not through a parameter. This makes the code simpler, IMO, as we have one parameter less to reason about.

Then, in the test you could do something like:

def error_handler(message):
    raise Exception(message)

self.p._log_or_raise = error_handler

And, instead of asserting on the message passed, you assert on the exception being thrown. It might be better to use a specific exception (i.e. DatastoreException), so the test will break if some unexpected exception is thrown.

Also, please, create one test method for each "thing" that you want to test, not each method. For example, instead of testing all cases inside test_check_urls_and_permissions, you would do:

@raises(DatastoreException)
def test_check_urls_and_permissions_raises_when_ckan_and_datastore_db_are_the_same:
    self.p.read_url = 'postgresql://u2:pass@localhost/ckan'
    self.p.ckan_url = 'postgresql://u:pass@localhost/ckan'
    self.p._log_or_raise = _raise_datastore_exception

    self.p._check_urls_and_permissions()

and so on... following the pattern

<PRE-CONDITIONS>

<EXERCISE METHOD>

<POST-CONDITIONS>

This way, if your test fail, you know where's the problem: it should raise when the ckan and datastore dbs are the same, but didn't. If you assert lots of cases in the same method, you don't know if the error was because there's a real failure, or something that you did before had an unexpected side effect.

P.S.: This might be controversial :-)

domoritz · 2013-04-01T17:07:31Z

@vitorbaptista

First of all, _check_urls_and_permissions() doesn't need to receive the error_handler as a parameter. It can simply use _log_or_raise directly

I hesitated to do this since it adds a side effect and I generally like explicit code with obvious code paths more. However, I see that it makes the code much simpler and in this case it is obvious who uses the _log_or_raise. So, I'm happy to give up a tiny bit of Zen 2.

And, instead of asserting on the message passed, you assert on the exception being thrown.

Which may be a problem because I could end up catching the wrong exception. Which is a risk, I can accept.

Also, please, create one test method for each "thing" that you want to test, not each method. For example, instead of testing all cases inside test_check_urls_and_permissions,

Agreed. However, I'll put everything into a new test class. This way I can use one setUp method for all tests.

This way, if your test fail, you know where's the problem: it should raise when the ckan and datastore dbs are the same, but didn't.

I used the line number for that ;-) I agree that separate methods are better but it's not the way we have done it in ckan. I'm happy to change it though. Maybe we can convince everyone to write shorter tests.

…argument, split large test into smaller tests

vitorbaptista · 2013-04-01T18:19:17Z

@domoritz

I hesitated to do this since it adds a side effect and I generally like explicit code with obvious code paths more. However, I see that it makes the code much simpler and in this case it is obvious who uses the _log_or_raise. So, I'm happy to give up a tiny bit of Zen 2.

IMO _check_urls_and_permissions() has the side effect of calling the error handler anyway, even if you're passing it as a parameter. The only way for it to have no side effects would be if it returned True/False, and then callee would decide what it wants to do. But sure, passing the handler as a parameter makes it more explicit :P

Which may be a problem because I could end up catching the wrong exception. Which is a risk, I can accept.

Agreed. An easy way to overcome this risk would be defining a new exception class inside the test case itself, which might be better. We have to write all this boilerplate code because we have no stub/mocking library :/

Agreed. However, I'll put everything into a new test class. This way I can use one setUp method for all tests.

Yeah, this might make a small mess whenever we add more tests following this pattern. We might end with a test class for each method being tested. I'm not sure how to overcome this with nosetests, though. Maybe, as there's not much repetition in these tests (just the _log_or_raise mock), it might be better to keep it as just one class and repeat in each test case. We can relax a bit on DRY when writing tests as, in their case, we don't change them so often, which makes readability is even more important. But whatever :P

I used the line number for that ;-) I agree that separate methods are better but it's not the way we have done it in ckan. I'm happy to change it though. Maybe we can convince everyone to write shorter tests.

The line number tells you where your code blew, but not why. It might be because, 50 lines ago, you have set legacy_mode = True, but haven't cleaned that up before running this new one. When, if every case is a small, you can rely on the setUp() and teardown() being implemented correctly, and your test starting with a clean state. So, if something failed, it's your fault :P

Yeah, I know that it's not how we do at CKAN. That's probably because we tend to write huge methods, which are impossible to test bit by bit, so we need to create a huge test method to go with it. I prefer the small methods mantra, but again, this is controversial. These are just suggestions. I accept, if you prefer the "One Method to Rule Them All". We just need to figure out why it's breaking now...

…ton issue but is better anyway.

vitorbaptista · 2013-04-02T00:14:06Z

@domoritz Please, review https://github.com/vitorbaptista/ckan/commit/376abb11518a500ecd96ad742567c33979bd3784. I've fixed your tests, and refactored a bit. I don't like that I'm stubbing methods when testing _check_urls_and_permissions() that weren't tested, so we can't guarantee that they work. Adding a test for _same_read_and_write_url() is trivial, but _read_connection_has_correct_privileges() would be more difficult.

Anyway, the tests are passing... :-)

domoritz · 2013-04-02T07:44:39Z

@vitorbaptista It's super hacky but It's probably the best we can do. And since it's only in the tests, we can probably accept the fact that this may break when pyutilib changes. Nonetheless, I don't understand why you refactored the test in the way that we have a setUp method that has to be called. Why don't we just a separate unittest.TestCase (class) with a setUp method that can be used for all tests related to the permissions check method. I don't say that because of DRY but because the separate class would make the code more readable and we can use the Set up, Execute, Test, Tear down patter where setUp creates common pre conditions for the tests that test similar things.

Nonetheless, I'm happy to see the tests passing ;-) Should I merge your fix and then have it reviewed by someone else?

On 2 Apr 2013, at 02:14, Vitor Baptista notifications@github.com wrote:

@domoritz Please, review https://github.com/vitorbaptista/ckan/commit/376abb11518a500ecd96ad742567c33979bd3784. I've fixed your tests, and refactored a bit. I don't like that I'm stubbing methods when testing _check_urls_and_permissions() that weren't tested, so we can't guarantee that they work. Adding a test for _same_read_and_write_url() is trivial, but _read_connection_has_correct_privileges() would be more difficult.

Anyway, the tests are passing... :-)

Reply to this email directly or view it on GitHub:
#642 (comment)

tobes · 2013-04-02T09:44:10Z

ckanext/datastore/plugin.py

+                                   "connection urls are the same.")
+
+            if not self._read_connection_has_correct_privileges():
+                self._log_or_raise("The read-only user has write privileges.")


picky - we use ' nor " in ckan python whenever we can

"but use double-quotes for strings that are likely to contain single-quote characters as part of the string itself (such as error messages, or any strings containing natural language)"

I think that is misleading it means

"this ain't bad" 'this is good' "this is bad though"

where is this crap I'll delete it as it is incorrect

This is what it says in http://docs.ckan.org/en/latest/python-coding-standards.html#use-single-quotes. However, I'll change it it here since there are no single quotes in the strings themselves.

vitorbaptista · 2013-04-02T14:07:01Z

Nonetheless, I don't understand why you refactored the test in the way that we have a setUp method that has to be called. Why don't we just a separate unittest.TestCase (class) with a setUp method that can be used for all tests related to the permissions check method.

My problem with that approach is more long-term. Consider that we split in two classes here. Then, whenever we add a new test for another complex method (i.e. _read_connection_has_correct_privileges()), would we create another class for that? We'll then end up with a bunch of classes, which I find odd.

Actually, I'm still not sure if I like the setUp_plugin_for_check_urls_and_permissions_tests() method. It doesn't reduce the line count when compared to simply setting up explicitly whatever you need in the test itself. Actually, it increases, as we need a test for it as well. The only improvement is that in each test we can focus on what we're testing, and ignoring all that boilerplate code.

If you prefer, I would be OK with removing the setUp_plugin... and explictly setting up whatever we need in each test.

Datastore is a SingletonPlugin, so it doesn't matter if we call plugin.DatastorePlugin() many times: we always end up with the same instance. I've added a workaround that, first, saves and unloads the current datastore instance, then sets: pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin] = True This will make plugin.DatastorePlugin not be a Singleton anymore, so any subsequent calls to ckan.plugins.load('datastore') will create a new instance. Then, in the next line, we create a new DatastorePlugin instance by loading it, and save it into self.p and pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin]. This turns DatastorePlugin into a Singleton again, and subsequent calls to ckan.plugins.load('datastore') will return this new instance instead. Then, in the teardown, we unload the current the datastore, which gets rid of our test instance, and put the original datastore back in its place, so the environment before setUp() is the same as after tearDown(). For InvalidUrlsOrPermissionsException, what I wanted was a way to check if _check_urls_and_permissions() failed. I did this by overloading _log_or_raise() with an unique Exception, and checking if it's raised. If so, I guarantee that _log_or_raise() was called. This feels like too much boilerplate, but we don't have a stub/mock library, so we have to write it. Conflicts: ckanext/datastore/tests/test_configure.py

domoritz · 2013-04-02T22:01:06Z

@vitorbaptista This should be it then.

…ore-permission-ckecks datastore plugin's permission checks will fail on a non english environnement

vitorbaptista · 2013-04-02T22:18:29Z

Done! 😄 👍

…argument, split large test into smaller tests

…ton issue but is better anyway.

Datastore is a SingletonPlugin, so it doesn't matter if we call plugin.DatastorePlugin() many times: we always end up with the same instance. I've added a workaround that, first, saves and unloads the current datastore instance, then sets: pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin] = True This will make plugin.DatastorePlugin not be a Singleton anymore, so any subsequent calls to ckan.plugins.load('datastore') will create a new instance. Then, in the next line, we create a new DatastorePlugin instance by loading it, and save it into self.p and pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin]. This turns DatastorePlugin into a Singleton again, and subsequent calls to ckan.plugins.load('datastore') will return this new instance instead. Then, in the teardown, we unload the current the datastore, which gets rid of our test instance, and put the original datastore back in its place, so the environment before setUp() is the same as after tearDown(). For InvalidUrlsOrPermissionsException, what I wanted was a way to check if _check_urls_and_permissions() failed. I did this by overloading _log_or_raise() with an unique Exception, and checking if it's raised. If so, I guarantee that _log_or_raise() was called. This feels like too much boilerplate, but we don't have a stub/mock library, so we have to write it. Conflicts: ckanext/datastore/tests/test_configure.py

ghost assigned domoritz Mar 15, 2013

ghost assigned tobes Mar 26, 2013

[#642] Add spaces to log messages where they are missing

8d39171

domoritz added 3 commits March 27, 2013 12:19

[#642] Use has_table_privilege and has_schema_privilege instead of ex…

b68601d

…perimental privilege checks.

[#642] Make check functions consistent (return bool instead of raisin…

cbc4fa9

…g exceptions)

[#642] Refactor datastore plugin configuration, improve (and fix ;-))…

302a9ff

… tests

vitorbaptista reviewed Mar 27, 2013
View reviewed changes

[#642] Fix how the check for separate urls is ignored in legacy mode.

b2f477f

I put the check for the legacy mode in this function to make it testable.

ghost assigned vitorbaptista Mar 28, 2013

vitorbaptista reviewed Mar 28, 2013
View reviewed changes

[#642] Simplify check for debug mode, only create _foo once

1a9566b

vitorbaptista reviewed Mar 28, 2013
View reviewed changes

[#642] Refactored datastore config to make it easier to understand an…

c82ba85

…d easier to test

domoritz added a commit that referenced this pull request Mar 28, 2013

[#642] Refactored datastore config to make it easier to understand an…

a97bb5f

…d easier to test

[#642] Ignore permission check in legacy mode and improve configurati…

0f8c196

…on tests

domoritz added a commit that referenced this pull request Mar 29, 2013

[#642] Ignore permission check in legacy mode and improve configurati…

38ea5c6

…on tests

[#642] Inject error_handler instead of explicitly passing it as an …

bb2ca7f

…argument, split large test into smaller tests

[#642] Add plugin loading and unloading. This does not fix the single…

ad4bb46

…ton issue but is better anyway.

tobes reviewed Apr 2, 2013
View reviewed changes

domoritz and others added 2 commits April 2, 2013 23:55

[#642] Use single quotes where possible

66c450a

[#642] PEP8

511f6f4

vitorbaptista added a commit that referenced this pull request Apr 2, 2013

Merge pull request #642 from okfn/642-localization-independent-datast…

f6459ca

…ore-permission-ckecks datastore plugin's permission checks will fail on a non english environnement

vitorbaptista merged commit f6459ca into master Apr 2, 2013

vitorbaptista deleted the 642-localization-independent-datastore-permission-ckecks branch April 2, 2013 22:17

domoritz added a commit that referenced this pull request Apr 3, 2013

[#642] Inject error_handler instead of explicitly passing it as an …

a4d88b7

…argument, split large test into smaller tests

domoritz added a commit that referenced this pull request Apr 3, 2013

[#642] Add plugin loading and unloading. This does not fix the single…

d76c657

…ton issue but is better anyway.

domoritz added a commit that referenced this pull request Apr 3, 2013

[#642] Use single quotes where possible

705c39e

domoritz added a commit that referenced this pull request Apr 3, 2013

[#642] PEP8

09b23dd

datastore plugin's permission checks will fail on a non english environnement #642

datastore plugin's permission checks will fail on a non english environnement #642

Conversation

domoritz commented Mar 26, 2013

tobes commented Mar 26, 2013

tobes commented Mar 26, 2013

domoritz commented Mar 26, 2013

tobes commented Mar 26, 2013

vitorbaptista commented Mar 27, 2013

domoritz commented Mar 27, 2013

domoritz commented Mar 27, 2013

domoritz commented Mar 27, 2013

vitorbaptista commented Mar 27, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domoritz commented Mar 28, 2013

domoritz commented Mar 28, 2013

domoritz commented Mar 28, 2013

tobes commented Mar 28, 2013

domoritz commented Mar 28, 2013

Choose a reason for hiding this comment

vitorbaptista commented Mar 28, 2013

domoritz commented Mar 28, 2013

Choose a reason for hiding this comment

vitorbaptista commented Mar 29, 2013

domoritz commented Mar 29, 2013

domoritz commented Mar 29, 2013

vitorbaptista commented Apr 1, 2013

domoritz commented Apr 1, 2013

vitorbaptista commented Apr 1, 2013

vitorbaptista commented Apr 2, 2013

domoritz commented Apr 2, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vitorbaptista commented Apr 2, 2013

domoritz commented Apr 2, 2013

vitorbaptista commented Apr 2, 2013