New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datastore plugin's permission checks will fail on a non english environnement #642
datastore plugin's permission checks will fail on a non english environnement #642
Conversation
…tead of raising an exception. If postgres is set to a language other than english, some strings might not occur in the error message returned from the database. This change makes the checks less strict but in almost all cases, this should not be a problem because the only error raised during executing of the permission check statements are (expected) permission errors.
@domoritz can you assign this to review if ready |
@domoritz I thought you assigned this to @TomDunham |
|
@domoritz I've created a patch in https://github.com/vitorbaptista/ckan/commit/eb92e3d837bc55e451c18eab7b2057e6e160a899 using Postgres' has_database_privilege instead of relying on the string returned. This should work in any locale (and works in all postgres versions we support). I haven't sent a pr yet because that commit changes the behavior as well. I'm not raising if the DB is writable, but simply returning False. This might not be the correct behavior, but it felt cleaner. What do you think? |
We have two places where we need to check the permissions of the database. The first is where we ask whether the database is read only (https://github.com/okfn/ckan/blob/master/ckanext/datastore/plugin.py#L109). This could be the case when someone uses a slave db that does not allow editing in case the normal db is not accessible. The check only makes sure that we don't raise an exception if you have this read-only set up. The second place is https://github.com/okfn/ckan/blob/master/ckanext/datastore/plugin.py#L144 where we check whether we can create or update tables. This check is far more important because if this check fails, it is possible to write data to the database through the datastore sql API. Basically, if this check fails, the sql interface allows writing and not only querying. I think @vitorbaptista solution looks very promising. We can probably use |
@vitorbaptista The behaviour of |
@vitorbaptista |
…perimental privilege checks.
Hi @domoritz
Are you sure? Could you point me to where I could read this? |
A table is created by the write user to test the read only user. | ||
''' | ||
write_connection = db._get_engine(None, | ||
{'connection_url': self.write_url}).connect() | ||
write_connection.execute(u"DROP TABLE IF EXISTS public._foo;" | ||
u"CREATE TABLE public._foo (id INTEGER, name VARCHAR)") | ||
write_connection.execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to use a transaction/rollback, as we used to do in _is_read_only_database(). It makes no difference now, but IMHO it makes the code a bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be much cleaner, indeed. Sadly this does not work here because we have two independent connections.
Hey @vitorbaptista
http://www.postgresql.org/docs/8.1/static/sql-grant.html and then read what it says about the CREATE privilege. "For databases, allows new schemas to be created within the database." |
I put the check for the legacy mode in this function to make it testable.
@vitorbaptista I assigned you to this. Are you happy to review this? |
Hmm, I just noticed that we have thousands of places in https://github.com/okfn/ckan/blob/master/ckanext/datastore/db.py where we look at the error messages. I think it would take quite some time to replace all of them with a localisation independent solution. I guess, we have to add a note to the docs. Nonetheless, this pr is still good since it cleans up the configuration process. |
@domoritz yes maybe create a fresh issue to do this for 2.1/2.2 |
Created issue #718 |
log.warn("Legacy mode active. " | ||
"The sql search will not be available.") | ||
elif not self._read_connection_has_correct_privileges(): | ||
if 'debug' in self.config and self.config['debug']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal for this, just a matter of dev style, but something like if self.config.get('debug'):
feels cleaner to me.
@domoritz I'll be glad to :-) I've added a few comments. |
@vitorbaptista Thanks. I've changed the code. |
if not self.legacy_mode: | ||
if self.write_url == self.read_url: | ||
raise Exception("The write and read-only database connection url are the same.") | ||
|
||
if self._get_db_from_url(self.ckan_url) == self._get_db_from_url(self.read_url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't important, but if you're changing other stuff, might change this at the same time as well. Change this to simply return self._get_db_from_url(self.ckan_url) == self._get_db_from_url(self.read_url)
Hey @domoritz, I like your idea of creating the def _log_or_raise(message):
if self.config.get('debug'):
log.critical(message)
else:
raise DatastoreException(message) and use that in
This doesn't test all the possibilities, but I guess it's good enough. I've put in bold the variable that I'm testing in each case, all others are there just so I can see its effects in the code. Right now, you're just testing cases 3 and 4. It's also not clear why Obviously, this would only make sense if we also had tests for each of these methods, as we're stubbing them. But I guess it'll make the tests easier to read, and we'll catch more issues overall. P.S.: I'm probably be being too picky with this pr, so please feel free to tell me if you think there's not much value in the specific changes I'm proposing. |
@vitorbaptista I rewrote the test a little bit so that it does not require the Yes, you are picky but I value your pickyness. It's exactly what helps me improve my coding style and sense for simple and safe code. |
@vitorbaptista I have no idea why but it seems that the configuration test overwrites the _read_connection_has_correct_privileges method in the datastore plugin. The tests only fail if I run the configure and create tests together. Do you know where I have to reset something to clean up after the tests? |
@domoritz I haven't looked into why your test is breaking yet, just reading the code. The code itself looks much better now, but the test got worse. The main issue that I see is that you're using global variables. Taking a step back, why are you using globals in the first place? You need a way to check the result of the Among other reasons, this is bad because it creates another way for your test to fail: if someone changes the message. The code also gets more complex, because I have to deal with globals. My suggestion would be to change that code a bit. First of all, Then, in the test you could do something like: def error_handler(message):
raise Exception(message)
self.p._log_or_raise = error_handler And, instead of asserting on the message passed, you assert on the exception being thrown. It might be better to use a specific exception (i.e. Also, please, create one test method for each "thing" that you want to test, not each method. For example, instead of testing all cases inside @raises(DatastoreException)
def test_check_urls_and_permissions_raises_when_ckan_and_datastore_db_are_the_same:
self.p.read_url = 'postgresql://u2:pass@localhost/ckan'
self.p.ckan_url = 'postgresql://u:pass@localhost/ckan'
self.p._log_or_raise = _raise_datastore_exception
self.p._check_urls_and_permissions() and so on... following the pattern
This way, if your test fail, you know where's the problem: it should raise when the ckan and datastore dbs are the same, but didn't. If you assert lots of cases in the same method, you don't know if the error was because there's a real failure, or something that you did before had an unexpected side effect. P.S.: This might be controversial :-) |
I hesitated to do this since it adds a side effect and I generally like explicit code with obvious code paths more. However, I see that it makes the code much simpler and in this case it is obvious who uses the
Which may be a problem because I could end up catching the wrong exception. Which is a risk, I can accept.
Agreed. However, I'll put everything into a new test class. This way I can use one
I used the line number for that ;-) I agree that separate methods are better but it's not the way we have done it in ckan. I'm happy to change it though. Maybe we can convince everyone to write shorter tests. |
…argument, split large test into smaller tests
IMO _check_urls_and_permissions() has the side effect of calling the error handler anyway, even if you're passing it as a parameter. The only way for it to have no side effects would be if it returned True/False, and then callee would decide what it wants to do. But sure, passing the handler as a parameter makes it more explicit :P
Agreed. An easy way to overcome this risk would be defining a new exception class inside the test case itself, which might be better. We have to write all this boilerplate code because we have no stub/mocking library :/
Yeah, this might make a small mess whenever we add more tests following this pattern. We might end with a test class for each method being tested. I'm not sure how to overcome this with nosetests, though. Maybe, as there's not much repetition in these tests (just the _log_or_raise mock), it might be better to keep it as just one class and repeat in each test case. We can relax a bit on DRY when writing tests as, in their case, we don't change them so often, which makes readability is even more important. But whatever :P
The line number tells you where your code blew, but not why. It might be because, 50 lines ago, you have set Yeah, I know that it's not how we do at CKAN. That's probably because we tend to write huge methods, which are impossible to test bit by bit, so we need to create a huge test method to go with it. I prefer the small methods mantra, but again, this is controversial. These are just suggestions. I accept, if you prefer the "One Method to Rule Them All". We just need to figure out why it's breaking now... |
…ton issue but is better anyway.
@domoritz Please, review https://github.com/vitorbaptista/ckan/commit/376abb11518a500ecd96ad742567c33979bd3784. I've fixed your tests, and refactored a bit. I don't like that I'm stubbing methods when testing _check_urls_and_permissions() that weren't tested, so we can't guarantee that they work. Adding a test for _same_read_and_write_url() is trivial, but _read_connection_has_correct_privileges() would be more difficult. Anyway, the tests are passing... :-) |
@vitorbaptista It's super hacky but It's probably the best we can do. And since it's only in the tests, we can probably accept the fact that this may break when pyutilib changes. Nonetheless, I don't understand why you refactored the test in the way that we have a setUp method that has to be called. Why don't we just a separate Nonetheless, I'm happy to see the tests passing ;-) Should I merge your fix and then have it reviewed by someone else? On 2 Apr 2013, at 02:14, Vitor Baptista notifications@github.com wrote:
|
"connection urls are the same.") | ||
|
||
if not self._read_connection_has_correct_privileges(): | ||
self._log_or_raise("The read-only user has write privileges.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picky - we use '
nor "
in ckan python whenever we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"but use double-quotes for strings that are likely to contain single-quote characters as part of the string itself (such as error messages, or any strings containing natural language)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is misleading it means
"this ain't bad"
'this is good'
"this is bad though"
where is this crap I'll delete it as it is incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what it says in http://docs.ckan.org/en/latest/python-coding-standards.html#use-single-quotes. However, I'll change it it here since there are no single quotes in the strings themselves.
My problem with that approach is more long-term. Consider that we split in two classes here. Then, whenever we add a new test for another complex method (i.e. Actually, I'm still not sure if I like the If you prefer, I would be OK with removing the setUp_plugin... and explictly setting up whatever we need in each test. |
Datastore is a SingletonPlugin, so it doesn't matter if we call plugin.DatastorePlugin() many times: we always end up with the same instance. I've added a workaround that, first, saves and unloads the current datastore instance, then sets: pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin] = True This will make plugin.DatastorePlugin not be a Singleton anymore, so any subsequent calls to ckan.plugins.load('datastore') will create a new instance. Then, in the next line, we create a new DatastorePlugin instance by loading it, and save it into self.p and pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin]. This turns DatastorePlugin into a Singleton again, and subsequent calls to ckan.plugins.load('datastore') will return this new instance instead. Then, in the teardown, we unload the current the datastore, which gets rid of our test instance, and put the original datastore back in its place, so the environment before setUp() is the same as after tearDown(). For InvalidUrlsOrPermissionsException, what I wanted was a way to check if _check_urls_and_permissions() failed. I did this by overloading _log_or_raise() with an unique Exception, and checking if it's raised. If so, I guarantee that _log_or_raise() was called. This feels like too much boilerplate, but we don't have a stub/mock library, so we have to write it. Conflicts: ckanext/datastore/tests/test_configure.py
@vitorbaptista This should be it then. |
…ore-permission-ckecks datastore plugin's permission checks will fail on a non english environnement
Done! 😄 👍 |
…argument, split large test into smaller tests
…ton issue but is better anyway.
Datastore is a SingletonPlugin, so it doesn't matter if we call plugin.DatastorePlugin() many times: we always end up with the same instance. I've added a workaround that, first, saves and unloads the current datastore instance, then sets: pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin] = True This will make plugin.DatastorePlugin not be a Singleton anymore, so any subsequent calls to ckan.plugins.load('datastore') will create a new instance. Then, in the next line, we create a new DatastorePlugin instance by loading it, and save it into self.p and pyutilib.component.core.PluginGlobals.singleton_services()[plugin.DatastorePlugin]. This turns DatastorePlugin into a Singleton again, and subsequent calls to ckan.plugins.load('datastore') will return this new instance instead. Then, in the teardown, we unload the current the datastore, which gets rid of our test instance, and put the original datastore back in its place, so the environment before setUp() is the same as after tearDown(). For InvalidUrlsOrPermissionsException, what I wanted was a way to check if _check_urls_and_permissions() failed. I did this by overloading _log_or_raise() with an unique Exception, and checking if it's raised. If so, I guarantee that _log_or_raise() was called. This feels like too much boilerplate, but we don't have a stub/mock library, so we have to write it. Conflicts: ckanext/datastore/tests/test_configure.py
ckanext/datastore/plugin.py is using the error messages returned by postgresql to check if the permisisons are right, for example:
and
this will fail if postgresql running in a non english localized environnement since it seems that postgresql errors are translated (verified with postesql 8.4 and 9.1 on release-v2.0)