WIP: Add elastic app #177

dblenkus · 2016-12-15T08:38:54Z

No description provided.

kostko

Did a quick review.

kostko · 2016-12-15T09:33:41Z

resolwe/elastic/indices.py

+from resolwe.flow.models import dict_dot
+
+
+class NoValue(object):


Please explain that this is used to differentiate missing attributes from None (another approach would be to catch AttributeError instead of having a custom guard type).

kostko · 2016-12-15T09:34:35Z

resolwe/elastic/indices.py

+    def __init__(self, obj):
+        self.obj = obj
+
+    def filter(self):


Add documentation for all methods.

kostko · 2016-12-15T09:36:09Z

resolwe/elastic/indices.py

+
+            if field in self.mapping:
+                if callable(self.mapping[field]):
+                    setattr(document, field, self.mapping[field]())


I think it would be better if the callable is called with the object (self.obj) as argument.

kostko · 2016-12-15T09:38:16Z

resolwe/elastic/indices.py

+                    continue
+
+                if isinstance(self.mapping[field], six.string_types):
+                    object_attr = dict_dot(self.obj, self.mapping[field], default=self.mapping[field])


I don't think this is a good approach as typos will automatically change to default values (e.g. saying fo.bar instead of foo.bar will silently set all indexed values for that field to the string 'fo.bar'). In order to set constants we should force them to be wrapped in lambda expressions or figure out a different way.

Yes, I was considering the same. But I think that lambda obj: None is really ugly workaround. If you will miss-type, you will see the error really quickly on the API. So I don't see this as really big problem.

Another solution would be to add identity function (i.e. const) and set constant as const(None).

I like this const suggestion, just name it constant.

BTW, is there a reason why a constant value should be indexed? Wouldn't it be the same for all documents?

kostko · 2016-12-15T09:38:44Z

resolwe/elastic/indices.py

+                else:
+                    object_attr = self.mapping[field]
+                if callable(object_attr):
+                    setattr(document, field, object_attr())


As above, pass self.obj as argument.

kostko · 2016-12-15T09:39:11Z

resolwe/elastic/indices.py

+                    setattr(document, field, object_attr)
+                continue
+
+            object_value = dict_dot(self.obj, field, default=NoValue())


Why using NoValue instead of propagating AttributeError/KeyError?

Ok, I will catch error and raise more descriptive one.

kostko · 2016-12-15T09:40:14Z

resolwe/elastic/indices.py

+        document.save()
+
+    def get_permissions(self):
+        return {


How expensive is this when doing indexing in bulk? Can we optimize during bulk reindex?

At the moment each object is processed individually, so I don't see easy option to optimise this. But this is definitely something that we have to work on in the future. I will add a comment about this.

kostko · 2016-12-15T09:41:09Z

resolwe/elastic/management/commands/es_index.py

+
+    def handle(self, *args, **options):
+        """Command handle."""
+        for data in Data.objects.all():


Can we avoid hardcoding which models to index? Instead, the list of registered index classes should be used to determine the set of models to index.

Ok, I will finish everything else and then work on this.

kostko · 2016-12-15T09:41:59Z

resolwe/elastic/signals.py

+    add_index(obj)
+
+
+@receiver(post_save, sender=Data)


I don't like these hardcoded signals. Can we make it so that the first time a new index references a new model, the appropriate signals are connected?

mstajdohar · 2016-12-15T09:52:40Z

resolwe/elastic/indices.py

+        pass
+
+    def preprocess_object(self):
+        pass


Raise NotImplementedError exception.

This is not required method, just convenient one, so you don't need to implement it.

mstajdohar · 2016-12-15T09:56:29Z

resolwe/elastic/management/commands/es_index.py

@@ -0,0 +1,27 @@
+""".. Ignore pydocstyle D400.


Consider renaming to elastic_index.py.

mstajdohar · 2016-12-15T09:57:17Z

resolwe/elastic/viewsets.py

+
+
+class ElasticSearchViewSet(GenericViewSet):
+    """Django REST Framework viewset


One line comment.

Yes, I will improve documentation :)

codecov-io · 2016-12-15T10:24:27Z

Current coverage is 85.67% (diff: 34.83%)

Merging #177 into master will decrease coverage by 2.19%

@@             master       #177   diff @@
==========================================
  Files            84         94    +10   
  Lines          5641       5885   +244   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           4957       5042    +85   
- Misses          684        843   +159   
  Partials          0          0

Powered by Codecov. Last update 8ac4543...98c15a0

kostko · 2016-12-19T13:21:39Z

resolwe/elastic/builder.py

+    def __init__(self):
+        """Initialize index builder object."""
+        # Set dafault connection for ElasticSearch
+        # TODO: Reestablish connection if broken!


Is this not handled automatically (e.g. retry API request)? Because AFAIK connections are not persistent, each request is done via HTTP?

It should be, but the connection stopped working when ElasticSearch was restarted.

How did it stop working, did the queries raise errors?

Yes, urllib3 raised error that connection is broken.

kostko

Did another review pass, I like the improved API, just a few things more.

kostko · 2016-12-19T14:27:32Z

resolwe/elastic/builder.py

+                    if inspect.isclass(attr) and issubclass(attr, BaseIndex) and attr is not BaseIndex:
+                        self.indexes.append(attr())
+            except ImportError:
+                pass  # no `elastic_indexes` in app


Can we differentiate this from "importing that specific index caused an ImportError"? Because this one should be reraised.

kostko · 2016-12-19T14:29:04Z

resolwe/elastic/builder.py

+            except ImportError:
+                pass  # no `elastic_indexes` in app
+
+    def trigger(self, obj=None):


I am not sure if trigger is the best name for this, perhaps it should be build, index or update_object (to be consistent with remove_object below).

I'll name it build, because it triggers whole build process and remove_object only removes single object, so there should be some difference in the name.

kostko · 2016-12-19T14:30:36Z

resolwe/elastic/indices.py

+    """
+
+    #: list of user ids with view permission on the object
+    users_with_permissions = dsl.String()


Shouldn't there be a multi=True here as it accepts an array of strings (also below)?

kostko · 2016-12-19T14:32:50Z

resolwe/elastic/indices.py

+
+    def run(self, obj=None):
+        """Main function for running indexes."""
+        if obj and obj not in self.queryset:


Does this execute a database query just to determine if an object matches the filter? Can this be improved?

Yes, database query is executed, but I don't see any easier way to implement this. This is used only for single object (if obj is defined), so it shouldn't slow the process too much.

Ok, if it is just for a single object (and never used in loops with a large number of objects) then it is ok.

kostko · 2016-12-19T14:36:14Z

resolwe/elastic/viewsets.py

+__all__ = ('ElasticSearchViewSet',)
+
+
+class ElasticSearchViewSet(GenericViewSet):


Should this viewset also include methods for handling most common search requests, so that the user needs to only override a few specific things and common stuff like ordering and pagination are handled correctly? An even better approach would be to maybe have this as a mixin.

Agree, I've changed it to the mixin.

kostko · 2016-12-20T10:22:44Z

resolwe/elastic/indices.py

+    """
+
+    #: list of user ids with view permission on the object
+    users_with_permissions = dsl.String(many=True)


Shouldn't it be multi and not many? At least based on this.

kostko · 2016-12-20T10:24:04Z

resolwe/elastic/indices.py

+
+        self._index_name = self.document_class()._get_index()  # pylint: disable=not-callable,protected-access
+
+    def filter(self):


In which case is filter used, now that we have queryset?

kostko

Looks great, just some minor things. Also, I see that we currently only have a management command for reindexing? What about adding one for clearing indices as well?

kostko · 2017-01-03T14:07:18Z

resolwe/elastic/__init__.py

+
+elasticsearch_host = getattr(settings, 'ELASTICSEARCH_HOST', 'localhost')  # pylint: disable=invalid-name
+elasticsearch_port = getattr(settings, 'ELASTICSEARCH_PORT', 9200)  # pylint: disable=invalid-name
+connections.create_connection(hosts=['{}:{}'.format(elasticsearch_host, elasticsearch_port)])


Any reason why this must be done here and not in the ready method?

Thanks for spotting this, Ive forgot to revert the changes when performing some tests.

kostko · 2017-01-03T14:07:58Z

resolwe/elastic/builder.py

+    def __init__(self):
+        """Initialize index builder object."""
+        # Set dafault connection for ElasticSearch
+        # elasticsearch_host = getattr(settings, 'ELASTICSEARCH_HOST', 'localhost')


What's up with this commented block?

Same as above.

kostko · 2017-01-03T14:08:28Z

resolwe/elastic/builder.py

+        # Set dafault connection for ElasticSearch
+        # elasticsearch_host = getattr(settings, 'ELASTICSEARCH_HOST', 'localhost')
+        # elasticsearch_port = getattr(settings, 'ELASTICSEARCH_PORT', 9200)
+        # # TODO: Reestablish connection if broken!


So currently connections are not reestablished? Would it be hard to add this?

Looks like it has started to work by itself :)

kostko · 2017-01-03T14:10:36Z

resolwe/elastic/indices.py

+
+        for field in document._doc_type.mapping:  # pylint: disable=protected-access
+            if field in ['users_with_permissions', 'groups_with_permissions']:
+                continue  # this fields are handled separately


this fields -> These fields

kostko · 2017-01-03T14:13:53Z

BTW, this PR seems to decrease coverage?

dblenkus · 2017-01-04T09:42:14Z

@kostko I've added a commit with fixes.

I will add tests in separate pull request, so we can start testing this.

mstajdohar assigned kostko Dec 15, 2016

kostko suggested changes Dec 15, 2016

View reviewed changes

mstajdohar approved these changes Dec 15, 2016

View reviewed changes

dblenkus force-pushed the feature-elastic branch from 1086bcd to a6fcc12 Compare December 15, 2016 10:24

dblenkus force-pushed the feature-elastic branch 2 times, most recently from d213e93 to 1d53934 Compare December 15, 2016 21:50

kostko reviewed Dec 19, 2016

View reviewed changes

dblenkus force-pushed the feature-elastic branch 2 times, most recently from fc0b046 to 0fd2277 Compare December 20, 2016 07:35

kostko reviewed Dec 20, 2016

View reviewed changes

dblenkus force-pushed the feature-elastic branch 4 times, most recently from 6de0bcd to cf0fc19 Compare December 21, 2016 13:54

tjanez mentioned this pull request Dec 23, 2016

Make EnrichmentProcessorTestCase independent from external API server genialis/resolwe-bio#173

Closed

3 tasks

dblenkus force-pushed the feature-elastic branch 6 times, most recently from b61d468 to 5d9c527 Compare January 3, 2017 12:45

kostko reviewed Jan 3, 2017

View reviewed changes

dblenkus force-pushed the feature-elastic branch 2 times, most recently from 4ea8bea to acd832a Compare January 4, 2017 09:41

kostko approved these changes Jan 4, 2017

View reviewed changes

Add elastic app

98c15a0

dblenkus force-pushed the feature-elastic branch from acd832a to 98c15a0 Compare January 4, 2017 12:14

dblenkus merged commit 98c15a0 into genialis:master Jan 4, 2017

dblenkus deleted the feature-elastic branch January 4, 2017 12:32

		from resolwe.flow.models import dict_dot


		class NoValue(object):



		class ElasticSearchViewSet(GenericViewSet):
		"""Django REST Framework viewset

		__all__ = ('ElasticSearchViewSet',)


		class ElasticSearchViewSet(GenericViewSet):


		self._index_name = self.document_class()._get_index() # pylint: disable=not-callable,protected-access

		def filter(self):

WIP: Add elastic app #177

WIP: Add elastic app #177

Conversation

dblenkus commented Dec 15, 2016

kostko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Dec 15, 2016 • edited Loading

Current coverage is 85.67% (diff: 34.83%)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kostko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kostko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kostko commented Jan 3, 2017

dblenkus commented Jan 4, 2017

codecov-io commented Dec 15, 2016 •

edited

Loading