-
Notifications
You must be signed in to change notification settings - Fork 6
Adds elasticsearch-dsl and adds it alongside Haystack for now. #673
Conversation
That's a neat video, it does seem to clean up a lot of little messy things about Haystack |
@noisecapella: Thanks for checking it out. It's definitely more straightforward. Haystack was keeping things simple for the benefit of the Haystack authors, which made our use of Haystack pretty clumsy. We can now get it to work the way we want it to and it's simpler to use. Everybody wins. |
DOC_TYPE = "learningresource" | ||
INDEX_NAME = settings.HAYSTACK_CONNECTIONS["default"]["INDEX_NAME"] | ||
URL = settings.HAYSTACK_CONNECTIONS["default"]["URL"] | ||
CONN = connections.create_connection(hosts=[URL]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we initialize the connection here there will be a network connection happening we import search.utils
. The other globals are just strings but this one does a whole lot more. My main concern is that this may cause an exception if the connection fails and it wouldn't be clean to put a try block around an import.
Could you make this a lazy connection instead where functions call something like get_connection
which may initialize? This would follow how databases work in Django where the connection is made on first request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, keep CONN as a global variable, defaulting to None
, and have each function in utils.py
call get_connection
(or maybe initialize_connection
) which checks if it's none, initializes if necessary, and returns CONN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
Almost through with this, looking at tests now |
Okay. The first round of issues were addressed and pushed. Diff: diff --git search/utils.py search/utils.py
index ffe01a1..2b8759b 100644
--- search/utils.py
+++ search/utils.py
@@ -24,10 +24,22 @@ log = logging.getLogger(__name__)
DOC_TYPE = "learningresource"
INDEX_NAME = settings.HAYSTACK_CONNECTIONS["default"]["INDEX_NAME"]
URL = settings.HAYSTACK_CONNECTIONS["default"]["URL"]
-CONN = connections.create_connection(hosts=[URL])
+_CONN = connections.create_connection(hosts=[URL])
PAGE_LENGTH = 10
+def get_conn():
+ """
+ Lazily create the connection.
+ """
+ # pylint: disable=global-statement
+ # This is ugly. Any suggestions on a way that doesn't require "global"?
+ global _CONN
+ if _CONN is None:
+ _CONN = connections.create_connection(hosts=[URL])
+ return _CONN
+
+
def get_resource_terms(resource_ids):
"""
Returns taxonomy metadata for LearningResources.
@@ -104,8 +116,9 @@ def index_resources(resources):
ensure_vocabulary_mappings(term_info)
# Perform bulk insert using Elasticsearch directly.
+ conn = get_conn()
insert_count, errors = bulk(
- CONN,
+ conn,
(resource_to_dict(x, term_info.get(x.id, {})) for x in resources),
index=INDEX_NAME,
doc_type=DOC_TYPE
@@ -120,8 +133,9 @@ def index_resources(resources):
@statsd.timer('lore.elasticsearch.delete_index')
def delete_index(resource):
"""Delete a record from Elasticsearch."""
+ conn = get_conn()
try:
- CONN.delete(
+ conn.delete(
index=INDEX_NAME, doc_type=DOC_TYPE, id=resource.id)
refresh_index()
except NotFoundError:
@@ -201,10 +215,11 @@ def resource_to_dict(resource, term_info=None):
def clear_index():
"""Wipe the index."""
- if CONN.indices.exists(INDEX_NAME):
- CONN.indices.delete(INDEX_NAME)
- CONN.indices.create(INDEX_NAME)
- CONN.indices.refresh()
+ conn = get_conn()
+ if conn.indices.exists(INDEX_NAME):
+ conn.indices.delete(INDEX_NAME)
+ conn.indices.create(INDEX_NAME)
+ conn.indices.refresh()
create_mapping()
# re-index all existing LearningResource instances:
@@ -228,8 +243,9 @@ class SearchResults(object):
def page_count(self):
"""Total number of result pages."""
- count = self._search.count() / PAGE_LENGTH
- if self._search.count() % PAGE_LENGTH > 0:
+ total = self._search.count()
+ count = total / PAGE_LENGTH
+ if total % PAGE_LENGTH > 0:
count += 1
return int(count)
@@ -247,7 +263,7 @@ class SearchResults(object):
def __getitem__(self, i):
"""Return result by index."""
- return self._search[i:i+1].execute().hits[0]
+ return self._search[i].execute().hits[0]
def create_mapping():
@@ -262,11 +278,12 @@ def create_mapping():
"""
# Create the index if it doesn't exist.
- if not CONN.indices.exists(INDEX_NAME):
- CONN.indices.create(INDEX_NAME)
+ conn = get_conn()
+ if not conn.indices.exists(INDEX_NAME):
+ conn.indices.create(INDEX_NAME)
# Delete the mapping if an older version exists.
- if CONN.indices.exists_type(index=INDEX_NAME, doc_type=DOC_TYPE):
- CONN.indices.delete_mapping(index=INDEX_NAME, doc_type=DOC_TYPE)
+ if conn.indices.exists_type(index=INDEX_NAME, doc_type=DOC_TYPE):
+ conn.indices.delete_mapping(index=INDEX_NAME, doc_type=DOC_TYPE)
mapping = Mapping(DOC_TYPE)
@@ -275,7 +292,7 @@ def create_mapping():
mapping.field("description", "string", index="analyzed")
mapping.field("preview_url", "string", index="no")
mapping.field("repository", "string", index="not_analyzed")
- mapping.field("resource_type", "string", index="analyzed")
+ mapping.field("resource_type", "string", index="not_analyzed")
mapping.field("content_xml", "string", index="no")
mapping.field("content_stripped", "string", index="analyzed")
mapping.field("run", "string", index="not_analyzed")
@@ -293,7 +310,7 @@ def create_mapping():
# LearningResource instances. This function will probably only
# ever be called by migrations.
index_resources(LearningResource.objects.all())
- CONN.indices.refresh()
+ conn.indices.refresh()
def refresh_index():
@@ -301,7 +318,8 @@ def refresh_index():
Force a refresh instead of waiting for it to happen automatically.
This should only be necessary during tests.
"""
- CONN.indices.refresh(index=INDEX_NAME)
+ conn = get_conn()
+ conn.indices.refresh(index=INDEX_NAME)
def ensure_vocabulary_mappings(term_info):
|
I got an error running your migration:
|
If the connection in |
Test coverage looks good |
Is there any reasonable way to test functionality, or should that wait for the next PR? |
The migration and |
I tried some DSL queries using the shell already and they worked, that's probably enough for this phase |
On Wed, Sep 16, 2015 at 4:17 PM, George Schneeloch <notifications@github.com
Created a stand-alone script to test this: from elasticsearch_dsl.connections import connections
from elasticsearch.exceptions import ConnectionError
conn = connections.create_connection(hosts=["localhost:10200"])
def print_info():
try:
print conn.info()
except ConnectionError:
print "failed"
return False
return True
assert print_info() == True
print "now, stop the service"
raw_input()
assert print_info() == False
print "now, start the service"
raw_input()
assert print_info() == True This works, so it does repair itself after a temporary loss of |
I think the outstanding issues have been addressed: What happens when the connection dies and the migration error. |
👍 after squash |
Haystack will removed once all the features are added and tested. This requires the addition of searching by facets, React.js updates, and updates to the restful API.
e238824
to
f68ce59
Compare
Adds elasticsearch-dsl and adds it alongside Haystack for now.
Haystack will removed once all the features are added and tested. This requires
the addition of searching by facets, React.js updates, and updates to the
restful API.
Before code review, please watch this, which explains how the new libraries are being used.
The
search/utils.py
file should be read first, in this order:create_mapping
resource_to_dict
index_resources
search_index
class SearchResults
This should make the tests easy to follow.