Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global: add record/collection permissions #1589

Merged
merged 14 commits into from
Oct 5, 2016

Conversation

jmartinm
Copy link
Contributor

@jmartinm jmartinm commented Sep 23, 2016

Signed-off-by: Javier Martin Montull javier.martin.montull@cern.ch

Closes #919

@jmartinm
Copy link
Contributor Author

Addresses #1577

One problem I see at the moment is that when I enable invenio-collections I am getting ES timeouts locally when migrating records (possibly due to some extra computation happening in invenio-collections receivers)

@jmartinm jmartinm changed the title global: enable invenio-collections WIP global: enable invenio-collections Sep 23, 2016
@kaplun
Copy link
Contributor

kaplun commented Sep 23, 2016

I see from https://github.com/inveniosoftware/invenio-collections/blob/master/invenio_collections/config.py#L24 that by adjusting COLLECTIONS_DELETED_RECORDS we can directly have ES to exclude Deleted records, so then @spirosdelviniotis don't have to explicitly delete records in ES.

@jmartinm
Copy link
Contributor Author

Another side effect I just realised. As invenio-collections listens to before_record_insert signal when creating a new record in the HoldingPen, the receiver runs (which is probably ok) but it is causing an error since in the HoldingPen instead of passing around a Record object we are passing a dict

:15:38 worker.1    | [2016-09-23 13:15:38,553: ERROR/MainProcess] Task invenio_workflows.tasks.start[e8913b38-46fb-487b-a088-4bbf83b73c3d] raised unexpected: AttributeError("'dict' object has no attribute 'dumps'",)
13:15:38 worker.1    | Traceback (most recent call last):
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
13:15:38 worker.1    |     R = retval = fun(*args, **kwargs)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/flask_celeryext/app.py", line 43, in __call__
13:15:38 worker.1    |     return Task.__call__(self, *args, **kwargs)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
13:15:38 worker.1    |     return self.run(*args, **kwargs)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/src/invenio-workflows/invenio_workflows/tasks.py", line 77, in start
13:15:38 worker.1    |     return text_type(run_worker(workflow_name, data, **kwargs).uuid)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/src/invenio-workflows/invenio_workflows/worker_engine.py", line 53, in run_worker
13:15:38 worker.1    |     engine.process(objects, **kwargs)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/workflow/engine.py", line 390, in process
13:15:38 worker.1    |     self._process(objects)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/workflow/engine.py", line 547, in _process
13:15:38 worker.1    |     obj, self, callbacks, exc_info
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/src/invenio-workflows/invenio_workflows/engine.py", line 363, in Exception
13:15:38 worker.1    |     obj, eng, callbacks, exc_info
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/workflow/engine.py", line 970, in Exception
13:15:38 worker.1    |     reraise(*exc_info)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/workflow/engine.py", line 529, in _process
13:15:38 worker.1    |     self.run_callbacks(callbacks, objects, obj)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/workflow/engine.py", line 481, in run_callbacks
13:15:38 worker.1    |     self.execute_callback(callback_func, obj)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/workflow/engine.py", line 564, in execute_callback
13:15:38 worker.1    |     callback(obj, self)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/src/inspire/inspirehep/modules/workflows/tasks/actions.py", line 175, in emit_record_signals
13:15:38 worker.1    |     before_record_insert.send(obj.data)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/lib/python2.7/site-packages/blinker/base.py", line 267, in send
13:15:38 worker.1    |     for receiver in self.receivers_for(sender)]
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/src/invenio-collections/invenio_collections/receivers.py", line 112, in __call__
13:15:38 worker.1    |     matcher=self.matcher)
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/src/invenio-collections/invenio_collections/receivers.py", line 88, in get_record_collections
13:15:38 worker.1    |     for collections in matcher(collections, record):
13:15:38 worker.1    |   File "/Users/jmartinm/.virtualenvs/invenio3/src/invenio-collections/invenio_collections/percolator.py", line 99, in _find_matching_collections_externally
13:15:38 worker.1    |     body = {"doc": record.dumps()}
13:15:38 worker.1    | AttributeError: 'dict' object has no attribute 'dumps'

@kaplun
Copy link
Contributor

kaplun commented Sep 23, 2016

As discussed IRL we can drop emit_record_signals since all the use-cases are already taken care during real record insertion/indexing.

@jmartinm
Copy link
Contributor Author

One problem I see at the moment is that when I enable invenio-collections I am getting ES timeouts locally when migrating records (possibly due to some extra computation happening in invenio-collections receivers)

This problem is now solved by not using percolators (COLLECTIONS_USE_PERCOLATOR=False) and instead using 'internal matching'.

@jmartinm
Copy link
Contributor Author

Second commit fixes #1589 (comment)

@jmartinm
Copy link
Contributor Author

jmartinm commented Sep 23, 2016

TODO:

  • Fix landing page query. Currently shows the number of records in the index (and once searching we only search with Literature collection) so numbers don't match.
  • Restrict collections. Collections that are currently restricted should still be restricted.

@jmartinm jmartinm changed the title WIP global: enable invenio-collections WIP global: add record/collection permissions Sep 26, 2016
@jmartinm jmartinm force-pushed the authorizations branch 3 times, most recently from 02eb99e to ada053f Compare September 26, 2016 20:56
if request:
collection = request.values.get('cc', 'Literature')

all_restricted_collections = set(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better cached to avoid doing a DB query on every request.

@jmartinm jmartinm force-pushed the authorizations branch 2 times, most recently from 723b96e to 10af50d Compare September 27, 2016 15:33
@jmartinm jmartinm changed the title WIP global: add record/collection permissions global: add record/collection permissions Sep 27, 2016
@jmartinm
Copy link
Contributor Author

Ready for comments. Now that Travis is passing, will add some extra tests regarding permissions/collections.

@jacquerie
Copy link
Contributor

Last commit (tests: test_record app fix) should go first, otherwise you are breaking git bisect.

{
"dbquery": "collections.primary:HEPNAMES",
"name": "HepNames"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this list is not up-to-date.

a.argument for a in ActionUsers.query.filter_by(
action='view-restricted-collection').all()
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be computed only once, for performance reason. We could cache it in Redis for a minute or so.

'view_restricted_collection'
' = inspirehep.modules.records.permissions:'
'action_view_restricted_collection',
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems extreme indentation 💃

@jmartinm jmartinm force-pushed the authorizations branch 4 times, most recently from 07a6ac4 to 629c2b1 Compare October 4, 2016 07:29
@jmartinm
Copy link
Contributor Author

jmartinm commented Oct 4, 2016

This PR is now ready to be reviewed. The functionality added is covered by tests.

@@ -125,6 +125,114 @@ def _(x):
USERPROFILES_EXTEND_SECURITY_FORMS = False
USERPROFILES_SETTINGS_TEMPLATE = 'inspirehep_theme/accounts/settings/profile.html'

# Collections
# ===========
COLLECTIONS_DELETED_RECORDS = '{dbquery} AND NOT collections.primary:"DELETED"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, after @spirosdelviniotis work the DELETED information corresponds to a flag being set to True in deleted.

I am not sure how to formulate this filter in Invenio query syntax though. (i.e. deleted != True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spirosdelviniotis will test this out.

Maybe something like:

COLLECTIONS_DELETED_RECORDS = '{dbquery} AND NOT deleted:True'

works.

},
"lowercase_analyzer": {
"filter": "lowercase",
"tokenizer": "keyword"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: since you ignore case in matching collection in ES, are you anyway then presenting them in the HTML and URLs in their canonical form or you are reusing the case that was provided as input by the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use them in the UI in the canonical form, like Authors. But then you can do /search?cc=Authors or /search?cc=authors

if restricted_collections:
cache.set(
'restricted_collections',
pickle.dumps(restricted_collections),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use protocol -1 to have the most efficient protocol being chosen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use #1622 so will not do pickling in the end myself.

query = Q('match', _collections=collection)

for collection in list(all_restricted_collections - user_coll):
query = query & ~Q('match', _collections=collection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried using actually ES filtering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query returned here is added to a ES filter query. You can see the output in this test https://github.com/inspirehep/inspire-next/pull/1589/files#diff-8eedb14f28e016fe0e021789e25878abR336

@jmartinm
Copy link
Contributor Author

jmartinm commented Oct 4, 2016

After #1622 is merged I will change the way I read/write into Redis as Flask-cache allows for writing sets into Redis (I guess with some internal magic)

Copy link
Contributor

@jacquerie jacquerie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, just some nits.

COLLECTIONS_QUERY_WALKERS = [
'inspirehep.modules.search.walkers.pypeg_to_ast:PypegConverter',
]
"""Modules to create the query AST."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: space here.

COLLECTIONS_USE_PERCOLATOR = False
"""Define which percolator you want to use.

Default value is `False` to use the internal percolator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's confusing! Maybe we should change upstream to have COLLECTIONS_USE_ES_PERCOLATOR instead? Of course, not something to be fixed in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is copy/pasted from invenio-collections. Maybe the phrase is a bit confusing since they called it 'internal percolator'. If we think of percolator just as the ES one, then the config makes sense COLLECTIONS_USE_PERCOLATOR True or False

RECORDS_UI_ENDPOINTS = dict(
literature=dict(
pid_type='literature',
route='/literature/<pid_value>',
template='inspirehep_theme/format/record/'
'Inspire_Default_HTML_detailed.tpl',
record_class='inspirehep.modules.records.wrappers:LiteratureRecord',
permission_factory_imp='invenio_records_rest.utils:allow_all',
record_class='inspirehep.modules.records.wrappers:LiteratureRecord'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: keep the trailing commas here, the diffs will look nicer now and in the future.

@@ -585,63 +694,59 @@ def _(x):
),
)


RECORDS_UI_DEFAULT_PERMISSION_FACTORY = \
"inspirehep.modules.records.permissions:record_read_permission_factory"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, I was the one who added the permission_factory_imp everywhere...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@@ -20,7 +20,7 @@
"numeric_detection": false,
"properties": {
"_collections": {
"index": "not_analyzed",
"analyzer": "lowercase_analyzer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for using this particular one for _collections? Should we prefer the lowercase_analyzer to not_analyzed in all cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to be forgiving to the user. Legacy INSPIRE/Invenio has always been case-insensitive for any field.

I wonder maybe if for e.g. DOI we have to be careful though.

from inspirehep.modules.pidstore.minters import inspire_recid_minter
from inspirehep.modules.search.api import LiteratureSearch
from inspirehep.utils.cache import cache

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ...one extra space here.

login_user_via_session(client, email=user_info['email'])

result = client.get("/literature/123")
assert result.status_code == status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are not going to use result further IMO assert client.get('/literature/123').status_code == status is enough.

login_user_via_session(api_client, email=user_info['email'])

result = api_client.get("/literature/123")
assert result.status_code == status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here.

login_user_via_session(client, email=user_info['email'])

result = client.get("/literature/222")
assert result.status_code == status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here.

if user_info:
login_user_via_session(api_client, email=user_info['email'])
result = api_client.get("/literature/222")
assert result.status_code == status
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

* When the test client makes a call to /api `current_app` becomes the API
  app. This was causing problems since invenio-collections is not registered
  in the API.

Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
* The task is not needed since enhancements should happen before record
  insert/index. And it was causing problem since a dictionary was being
  passed to receivers instead of a Record object.

Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
* Enhances queries with _collections field based on URL cc parameter.

* By default Literature will only show records that had HEP collection in
  legacy.

Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
* Prevents circular dependency problem when doing imports in config.py.

Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
* Allow cataloger to access the Holding Pen.

Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
* Adds read permission factory for Literature API.

Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
* To avoid adding the default filter, use get_source().

Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
* Creates receiver for user_logged_in signal to populate user restricted
  collections upon login.

Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
Signed-off-by: Javier Martin Montull <javier.martin.montull@cern.ch>
@jmartinm
Copy link
Contributor Author

jmartinm commented Oct 4, 2016

Comments addressed and changed redis cache get/set to use Flask-cache (16e3b4a)

Please, review one more or time and ready to 🚢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants