Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore plugin's permission checks will fail on a non english environnement #642

Merged

Conversation

domoritz
Copy link
Contributor

ckanext/datastore/plugin.py is using the error messages returned by postgresql to check if the permisisons are right, for example:

if 'permission denied' in str(e) or 'read-only transaction' in str(e):
                        pass

and

 if 'permission denied' not in str(e):
                        raise

this will fail if postgresql running in a non english localized environnement since it seems that postgresql errors are translated (verified with postesql 8.4 and 9.1 on release-v2.0)

@ghost ghost assigned domoritz Mar 15, 2013
…tead of raising an exception.

If postgres is set to a language other than english, some strings might not occur in the error message returned from the database. This change makes the checks less strict but in almost all cases, this should not be a problem because the only error raised during executing of the permission check statements are (expected) permission errors.
@tobes
Copy link
Contributor

tobes commented Mar 26, 2013

@domoritz can you assign this to review if ready

@ghost ghost assigned tobes Mar 26, 2013
@tobes
Copy link
Contributor

tobes commented Mar 26, 2013

@domoritz I thought you assigned this to @TomDunham

@domoritz
Copy link
Contributor

@tobes That was #686

@tobes
Copy link
Contributor

tobes commented Mar 26, 2013

@domoritz

  1. this does not seem to deal with the actual issue

  2. your messages need spaces which are missing in many cases

@vitorbaptista
Copy link
Contributor

@domoritz I've created a patch in https://github.com/vitorbaptista/ckan/commit/eb92e3d837bc55e451c18eab7b2057e6e160a899 using Postgres' has_database_privilege instead of relying on the string returned. This should work in any locale (and works in all postgres versions we support).

I haven't sent a pr yet because that commit changes the behavior as well. I'm not raising if the DB is writable, but simply returning False. This might not be the correct behavior, but it felt cleaner. What do you think?

@domoritz
Copy link
Contributor

@tobes, @vitorbaptista

We have two places where we need to check the permissions of the database.

The first is where we ask whether the database is read only (https://github.com/okfn/ckan/blob/master/ckanext/datastore/plugin.py#L109). This could be the case when someone uses a slave db that does not allow editing in case the normal db is not accessible. The check only makes sure that we don't raise an exception if you have this read-only set up.

The second place is https://github.com/okfn/ckan/blob/master/ckanext/datastore/plugin.py#L144 where we check whether we can create or update tables. This check is far more important because if this check fails, it is possible to write data to the database through the datastore sql API. Basically, if this check fails, the sql interface allows writing and not only querying.

I think @vitorbaptista solution looks very promising. We can probably use has_table_privilege() for the second case. I'll mark this pr as [WIP] and ping you later.

@domoritz
Copy link
Contributor

@vitorbaptista The behaviour of _is_read_only_database is to return a bool. It only raises an issue if the exception is not related to permissions at all.

@domoritz
Copy link
Contributor

@vitorbaptista has_database_privilege means the privilege to create a schema within a database, not a table in the database. We have to use has_schema_privileges (see b68601d).

@vitorbaptista
Copy link
Contributor

Hi @domoritz

@vitorbaptista has_database_privilege means the privilege to create a schema within a database, not a table in > the database. We have to use has_schema_privileges (see b68601d).

Are you sure? Could you point me to where I could read this?

A table is created by the write user to test the read only user.
'''
write_connection = db._get_engine(None,
{'connection_url': self.write_url}).connect()
write_connection.execute(u"DROP TABLE IF EXISTS public._foo;"
u"CREATE TABLE public._foo (id INTEGER, name VARCHAR)")
write_connection.execute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to use a transaction/rollback, as we used to do in _is_read_only_database(). It makes no difference now, but IMHO it makes the code a bit cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much cleaner, indeed. Sadly this does not work here because we have two independent connections.

@domoritz
Copy link
Contributor

Hey @vitorbaptista

Are you sure? Could you point me to where I could read this?

http://www.postgresql.org/docs/8.1/static/sql-grant.html and then read what it says about the CREATE privilege.

"For databases, allows new schemas to be created within the database."

I put the check for the legacy mode in this function to make it testable.
@ghost ghost assigned vitorbaptista Mar 28, 2013
@domoritz
Copy link
Contributor

@vitorbaptista I assigned you to this. Are you happy to review this?

@domoritz
Copy link
Contributor

Hmm, I just noticed that we have thousands of places in https://github.com/okfn/ckan/blob/master/ckanext/datastore/db.py where we look at the error messages. I think it would take quite some time to replace all of them with a localisation independent solution. I guess, we have to add a note to the docs. Nonetheless, this pr is still good since it cleans up the configuration process.

@tobes
Copy link
Contributor

tobes commented Mar 28, 2013

@domoritz yes maybe create a fresh issue to do this for 2.1/2.2

@domoritz
Copy link
Contributor

Created issue #718

log.warn("Legacy mode active. "
"The sql search will not be available.")
elif not self._read_connection_has_correct_privileges():
if 'debug' in self.config and self.config['debug']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal for this, just a matter of dev style, but something like if self.config.get('debug'): feels cleaner to me.

@vitorbaptista
Copy link
Contributor

@domoritz I'll be glad to :-)

I've added a few comments.

@domoritz
Copy link
Contributor

@vitorbaptista Thanks. I've changed the code.

if not self.legacy_mode:
if self.write_url == self.read_url:
raise Exception("The write and read-only database connection url are the same.")

if self._get_db_from_url(self.ckan_url) == self._get_db_from_url(self.read_url):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't important, but if you're changing other stuff, might change this at the same time as well. Change this to simply return self._get_db_from_url(self.ckan_url) == self._get_db_from_url(self.read_url)

domoritz added a commit that referenced this pull request Mar 28, 2013
@vitorbaptista
Copy link
Contributor

Hey @domoritz,

I like your idea of creating the handler, so you can reduce duplication. I'm not very keen on the naming: it's too general. Maybe something more explicit like log_or_raise, or something similar. Also, I saw that you changed the method name to _check_urls_and_permissions, which is much better, and are receiving the handler as a parameter. This kind of dependency injection is useful for less-dynamic languages, so we're able to test. But in Python we don't need it. You could create a method like

def _log_or_raise(message):
    if self.config.get('debug'):
        log.critical(message)
    else:
        raise DatastoreException(message)

and use that in _check_urls_and_permissions, which won't need any parameter now. In the test, you could overwrite it, so it always raised the exception. Then you could test the following cases:

Variables Test 1 Test 2 Test 3 Test 4 Test 5
legacy_mode False True False False False
_same_read_and_write_url() False True True False False
_same_ckan_and_datastore_db() False False False True False
_read_connection_has_correct_privileges() True True True True False
RESULT OK OK RAISE RAISE RAISE

This doesn't test all the possibilities, but I guess it's good enough. I've put in bold the variable that I'm testing in each case, all others are there just so I can see its effects in the code. Right now, you're just testing cases 3 and 4. It's also not clear why sqlalchemy.exc.OperationalError might be thrown, and why you're ignoring it.

Obviously, this would only make sense if we also had tests for each of these methods, as we're stubbing them. But I guess it'll make the tests easier to read, and we'll catch more issues overall.

P.S.: I'm probably be being too picky with this pr, so please feel free to tell me if you think there's not much value in the specific changes I'm proposing.

@domoritz
Copy link
Contributor

@vitorbaptista I rewrote the test a little bit so that it does not require the try..catch ant more and mocked _check_urls_and_permissions. In this case I'm against mocking all the functions because you could easily miss a not somewhere. Also, I noticed that the test, where legacy mode is active and _read_connection_has_correct_privileges() returns False was missing.

Yes, you are picky but I value your pickyness. It's exactly what helps me improve my coding style and sense for simple and safe code.

@domoritz
Copy link
Contributor

@vitorbaptista I have no idea why but it seems that the configuration test overwrites the _read_connection_has_correct_privileges method in the datastore plugin. The tests only fail if I run the configure and create tests together. Do you know where I have to reset something to clean up after the tests?

@vitorbaptista
Copy link
Contributor

@domoritz I haven't looked into why your test is breaking yet, just reading the code.

The code itself looks much better now, but the test got worse. The main issue that I see is that you're using global variables. Taking a step back, why are you using globals in the first place? You need a way to check the result of the _check_urls_and_permissions() call. Your solution was injecting the error_handler, which saves the error message and call count (not sure why), and check if the message was what you expected.

Among other reasons, this is bad because it creates another way for your test to fail: if someone changes the message. The code also gets more complex, because I have to deal with globals. My suggestion would be to change that code a bit.

First of all, _check_urls_and_permissions() doesn't need to receive the error_handler as a parameter. It can simply use _log_or_raise directly. Python is dynamic enough to let us inject the dependency by overriding _log_or_raise directly, not through a parameter. This makes the code simpler, IMO, as we have one parameter less to reason about.

Then, in the test you could do something like:

def error_handler(message):
    raise Exception(message)

self.p._log_or_raise = error_handler

And, instead of asserting on the message passed, you assert on the exception being thrown. It might be better to use a specific exception (i.e. DatastoreException), so the test will break if some unexpected exception is thrown.

Also, please, create one test method for each "thing" that you want to test, not each method. For example, instead of testing all cases inside test_check_urls_and_permissions, you would do:

@raises(DatastoreException)
def test_check_urls_and_permissions_raises_when_ckan_and_datastore_db_are_the_same:
    self.p.read_url = 'postgresql://u2:pass@localhost/ckan'
    self.p.ckan_url = 'postgresql://u:pass@localhost/ckan'
    self.p._log_or_raise = _raise_datastore_exception

    self.p._check_urls_and_permissions()

and so on... following the pattern

<PRE-CONDITIONS>

<EXERCISE METHOD>

<POST-CONDITIONS>

This way, if your test fail, you know where's the problem: it should raise when the ckan and datastore dbs are the same, but didn't. If you assert lots of cases in the same method, you don't know if the error was because there's a real failure, or something that you did before had an unexpected side effect.

P.S.: This might be controversial :-)

@domoritz
Copy link
Contributor

domoritz commented Apr 1, 2013

@vitorbaptista

First of all, _check_urls_and_permissions() doesn't need to receive the error_handler as a parameter. It can simply use _log_or_raise directly

I hesitated to do this since it adds a side effect and I generally like explicit code with obvious code paths more. However, I see that it makes the code much simpler and in this case it is obvious who uses the _log_or_raise. So, I'm happy to give up a tiny bit of Zen 2.

And, instead of asserting on the message passed, you assert on the exception being thrown.

Which may be a problem because I could end up catching the wrong exception. Which is a risk, I can accept.

Also, please, create one test method for each "thing" that you want to test, not each method. For example, instead of testing all cases inside test_check_urls_and_permissions,

Agreed. However, I'll put everything into a new test class. This way I can use one setUp method for all tests.

This way, if your test fail, you know where's the problem: it should raise when the ckan and datastore dbs are the same, but didn't.

I used the line number for that ;-) I agree that separate methods are better but it's not the way we have done it in ckan. I'm happy to change it though. Maybe we can convince everyone to write shorter tests.

…argument, split large test into smaller tests
@vitorbaptista
Copy link
Contributor

@domoritz

I hesitated to do this since it adds a side effect and I generally like explicit code with obvious code paths more. However, I see that it makes the code much simpler and in this case it is obvious who uses the _log_or_raise. So, I'm happy to give up a tiny bit of Zen 2.

IMO _check_urls_and_permissions() has the side effect of calling the error handler anyway, even if you're passing it as a parameter. The only way for it to have no side effects would be if it returned True/False, and then callee would decide what it wants to do. But sure, passing the handler as a parameter makes it more explicit :P

Which may be a problem because I could end up catching the wrong exception. Which is a risk, I can accept.

Agreed. An easy way to overcome this risk would be defining a new exception class inside the test case itself, which might be better. We have to write all this boilerplate code because we have no stub/mocking library :/

Agreed. However, I'll put everything into a new test class. This way I can use one setUp method for all tests.

Yeah, this might make a small mess whenever we add more tests following this pattern. We might end with a test class for each method being tested. I'm not sure how to overcome this with nosetests, though. Maybe, as there's not much repetition in these tests (just the _log_or_raise mock), it might be better to keep it as just one class and repeat in each test case. We can relax a bit on DRY when writing tests as, in their case, we don't change them so often, which makes readability is even more important. But whatever :P

I used the line number for that ;-) I agree that separate methods are better but it's not the way we have done it in ckan. I'm happy to change it though. Maybe we can convince everyone to write shorter tests.

The line number tells you where your code blew, but not why. It might be because, 50 lines ago, you have set legacy_mode = True, but haven't cleaned that up before running this new one. When, if every case is a small, you can rely on the setUp() and teardown() being implemented correctly, and your test starting with a clean state. So, if something failed, it's your fault :P

Yeah, I know that it's not how we do at CKAN. That's probably because we tend to write huge methods, which are impossible to test bit by bit, so we need to create a huge test method to go with it. I prefer the small methods mantra, but again, this is controversial. These are just suggestions. I accept, if you prefer the "One Method to Rule Them All". We just need to figure out why it's breaking now...

@vitorbaptista
Copy link
Contributor

@domoritz Please, review https://github.com/vitorbaptista/ckan/commit/376abb11518a500ecd96ad742567c33979bd3784. I've fixed your tests, and refactored a bit. I don't like that I'm stubbing methods when testing _check_urls_and_permissions() that weren't tested, so we can't guarantee that they work. Adding a test for _same_read_and_write_url() is trivial, but _read_connection_has_correct_privileges() would be more difficult.

Anyway, the tests are passing... :-)

@domoritz
Copy link
Contributor

domoritz commented Apr 2, 2013

@vitorbaptista It's super hacky but It's probably the best we can do. And since it's only in the tests, we can probably accept the fact that this may break when pyutilib changes. Nonetheless, I don't understand why you refactored the test in the way that we have a setUp method that has to be called. Why don't we just a separate unittest.TestCase (class) with a setUp method that can be used for all tests related to the permissions check method. I don't say that because of DRY but because the separate class would make the code more readable and we can use the Set up, Execute, Test, Tear down patter where setUp creates common pre conditions for the tests that test similar things.

Nonetheless, I'm happy to see the tests passing ;-) Should I merge your fix and then have it reviewed by someone else?

On 2 Apr 2013, at 02:14, Vitor Baptista notifications@github.com wrote:

@domoritz Please, review https://github.com/vitorbaptista/ckan/commit/376abb11518a500ecd96ad742567c33979bd3784. I've fixed your tests, and refactored a bit. I don't like that I'm stubbing methods when testing _check_urls_and_permissions() that weren't tested, so we can't guarantee that they work. Adding a test for _same_read_and_write_url() is trivial, but _read_connection_has_correct_privileges() would be more difficult.

Anyway, the tests are passing... :-)


Reply to this email directly or view it on GitHub:
#642 (comment)

"connection urls are the same.")

if not self._read_connection_has_correct_privileges():
self._log_or_raise("The read-only user has write privileges.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

picky - we use ' nor " in ckan python whenever we can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"but use double-quotes for strings that are likely to contain single-quote characters as part of the string itself (such as error messages, or any strings containing natural language)"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is misleading it means

"this ain't bad"
'this is good'
"this is bad though"

where is this crap I'll delete it as it is incorrect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what it says in http://docs.ckan.org/en/latest/python-coding-standards.html#use-single-quotes. However, I'll change it it here since there are no single quotes in the strings themselves.

@vitorbaptista
Copy link
Contributor

Nonetheless, I don't understand why you refactored the test in the way that we have a setUp method that has to be called. Why don't we just a separate unittest.TestCase (class) with a setUp method that can be used for all tests related to the permissions check method.

My problem with that approach is more long-term. Consider that we split in two classes here. Then, whenever we add a new test for another complex method (i.e. _read_connection_has_correct_privileges()), would we create another class for that? We'll then end up with a bunch of classes, which I find odd.

Actually, I'm still not sure if I like the setUp_plugin_for_check_urls_and_permissions_tests() method. It doesn't reduce the line count when compared to simply setting up explicitly whatever you need in the test itself. Actually, it increases, as we need a test for it as well. The only improvement is that in each test we can focus on what we're testing, and ignoring all that boilerplate code.

If you prefer, I would be OK with removing the setUp_plugin... and explictly setting up whatever we need in each test.

domoritz and others added 2 commits April 2, 2013 23:55
Datastore is a SingletonPlugin, so it doesn't matter if we call
plugin.DatastorePlugin() many times: we always end up with the same instance.
I've added a workaround that, first, saves and unloads the current datastore
instance, then sets:

  pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin] = True

This will make plugin.DatastorePlugin not be a Singleton anymore, so any
subsequent calls to ckan.plugins.load('datastore') will create a new instance.
Then, in the next line, we create a new DatastorePlugin instance by loading
it, and save it into self.p and
pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin].
This turns DatastorePlugin into a Singleton again, and subsequent calls to
ckan.plugins.load('datastore') will return this new instance instead.

Then, in the teardown, we unload the current the datastore, which gets rid of
our test instance, and put the original datastore back in its place, so the
environment before setUp() is the same as after tearDown().

For InvalidUrlsOrPermissionsException, what I wanted was a way to check if
_check_urls_and_permissions() failed. I did this by overloading _log_or_raise()
with an unique Exception, and checking if it's raised. If so, I guarantee that
_log_or_raise() was called. This feels like too much boilerplate, but we don't
have a stub/mock library, so we have to write it.

Conflicts:
	ckanext/datastore/tests/test_configure.py
@domoritz
Copy link
Contributor

domoritz commented Apr 2, 2013

@vitorbaptista This should be it then.

vitorbaptista added a commit that referenced this pull request Apr 2, 2013
…ore-permission-ckecks

datastore plugin's permission checks will fail on a non english environnement
@vitorbaptista vitorbaptista merged commit f6459ca into master Apr 2, 2013
@vitorbaptista vitorbaptista deleted the 642-localization-independent-datastore-permission-ckecks branch April 2, 2013 22:17
@vitorbaptista
Copy link
Contributor

Done! 😄 👍

domoritz added a commit that referenced this pull request Apr 3, 2013
…argument, split large test into smaller tests
domoritz added a commit that referenced this pull request Apr 3, 2013
domoritz added a commit that referenced this pull request Apr 3, 2013
vitorbaptista added a commit that referenced this pull request Apr 3, 2013
Datastore is a SingletonPlugin, so it doesn't matter if we call
plugin.DatastorePlugin() many times: we always end up with the same instance.
I've added a workaround that, first, saves and unloads the current datastore
instance, then sets:

  pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin] = True

This will make plugin.DatastorePlugin not be a Singleton anymore, so any
subsequent calls to ckan.plugins.load('datastore') will create a new instance.
Then, in the next line, we create a new DatastorePlugin instance by loading
it, and save it into self.p and
pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin].
This turns DatastorePlugin into a Singleton again, and subsequent calls to
ckan.plugins.load('datastore') will return this new instance instead.

Then, in the teardown, we unload the current the datastore, which gets rid of
our test instance, and put the original datastore back in its place, so the
environment before setUp() is the same as after tearDown().

For InvalidUrlsOrPermissionsException, what I wanted was a way to check if
_check_urls_and_permissions() failed. I did this by overloading _log_or_raise()
with an unique Exception, and checking if it's raised. If so, I guarantee that
_log_or_raise() was called. This feels like too much boilerplate, but we don't
have a stub/mock library, so we have to write it.

Conflicts:
	ckanext/datastore/tests/test_configure.py
domoritz added a commit that referenced this pull request Apr 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants