Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #4268] Adding Documentation for upgraded Search #4467

Merged
merged 8 commits into from Sep 7, 2018

Conversation

@safwanrahman
Copy link
Member

@safwanrahman safwanrahman commented Aug 3, 2018

I have added instruction for preparing the local development for search. Its first step towards documentation of search as proposed in #4268.
@ericholscher r?

@safwanrahman safwanrahman requested a review from ericholscher Aug 3, 2018
Copy link
Member

@ericholscher ericholscher left a comment

A good start. Needs some spell checking, and I think I touched on the grammar issues I found.

Search
============

Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search result. Documentations are indexed in Elasticsearch index and the search is made through API. All the Search Code is Opensource and lives in `Github Repository`_. Currently we are using `Elasticsearch 6.3`_ version.
Copy link
Member

@ericholscher ericholscher Aug 6, 2018

result -> results

Copy link
Member

@ericholscher ericholscher Aug 6, 2018

Documentations -> Documents


Installing and running Elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You must need to install and run `Elasticsearch 6.3`_ version in your local development machine. You can get installation instuction `here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_.
Copy link
Member

@ericholscher ericholscher Aug 6, 2018

instuction is misspelled.

^^^^^^^^
For using search, you need to index data to Elasticsearch Index. Run `reindex_elasticsearch` management command::

./manage.py reindex_elasticsearch
Copy link
Member

@ericholscher ericholscher Aug 6, 2018

Doesn't this take arguments?

Copy link
Member Author

@safwanrahman safwanrahman Aug 6, 2018

./manage.py search_index which is provided by DED takes argument. our implementation of management command do not take arguments.

@safwanrahman safwanrahman changed the title Adding Documentation for upgraded Search [Fix #4268] Adding Documentation for upgraded Search Aug 7, 2018
@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Aug 7, 2018

@ericholscher fixed the grammer and added some docs about architecture. Is it ready to merge?

Search
============

Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

...instead of the built-in Sphinx search...

============

Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search
results. Documents are indexed in Elasticsearch index and the search is made through API.
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

...in the Eleasticearch index...through the API.


Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search
results. Documents are indexed in Elasticsearch index and the search is made through API.
All the Search Code is Opensource and lives in `Github Repository`_.
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

...the GitHub Repository.

Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

And "open source", not Opensource

Read The Docs uses Elasticsearch_ instead of built in Sphinx search for providing better search
results. Documents are indexed in Elasticsearch index and the search is made through API.
All the Search Code is Opensource and lives in `Github Repository`_.
Currently we are using `Elasticsearch 6.3`_ version.
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

using the Elasticsearch 6.3 version.


Installing and running Elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You must need to install and run `Elasticsearch 6.3`_ version in your local development machine.
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

You need to install and run the Elasticsearch 6.3 version on your local development machine. You can get the installation instructions here.

You must need to install and run `Elasticsearch 6.3`_ version in your local development machine.
You can get installation instruction
`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_.
Otherwise, you can also start a Elasticsearch Docker container by running following command::
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

running the following command


Auto Indexing
^^^^^^^^^^^^^
By default, Auto Indexing is turned off in development mode. To turn in on, change the
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

To turn it on

^^^^^^^^^^^^^
By default, Auto Indexing is turned off in development mode. To turn in on, change the
`ELASTICSEARCH_DSL_AUTOSYNC` settings to `True` in `readthedocs/settings/dev.py` file.
After that, whenever a documentation successfully build, or project gets added,
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

builds

------------
The search architecture is devided into 2 parts.
One part is responsible for **indexing** the documents and projects and
other part is responsible to query through the Index for showing proper result to users.
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

results

The search architecture is devided into 2 parts.
One part is responsible for **indexing** the documents and projects and
other part is responsible to query through the Index for showing proper result to users.
We use `django-elasticsearch-dsl`_ package mostly for the keep the search working.
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

to keep the


Indexing
^^^^^^^^
All the Sphinx documents are indexed into elasticsearch after build gets successfully finish.
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

the build is successful.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

After any build gets successfully finished, `HTMLFile` objects are created for each of the
`HTML` file of the build version and delete the old version's `HTMLFile` object. Signal_
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

files and the old version's HTMLFile object is deleted.

files. Both of the signals are dispatched with the list of the instances of `HTMLFile`
in `instance_list` parameter.

We listen the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application and
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

listen to the


How we index projects
~~~~~~~~~~~~~~~~~~~~~
We also index project informations in our search index so that user can search for projects
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

information ... so that they user can...

~~~~~~~~~~~~~~~~~~~~~~

`elasticsearch-dsl`_ provide model like wrapper for `Elasticsearch document`_.
As per requirements of `django-elasticsearch-dsl`_, its stored in
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

it's stored

Elasticsearch Document
~~~~~~~~~~~~~~~~~~~~~~

`elasticsearch-dsl`_ provide model like wrapper for `Elasticsearch document`_.
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

provides a model-like wrapper for the

As per requirements of `django-elasticsearch-dsl`_, its stored in
`readthedocs/search/documents.py` file.

**ProjectDocument:** Its used for indexing projects. Signal listener of
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

it's

`django-elasticsearch-dsl`_ listen the `post_save` singal of `Project` model and
then index/delete into Elasticsearch.

**PageDocument**: Its used for indexing documentation of projects. By default, the auto
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

It's

**PageDocument**: Its used for indexing documentation of projects. By default, the auto
indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`.
`settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production.
As mentioned above, our `Search` app listens the `bulk_post_create` and `bulk_post_delete`
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

listens to the

As mentioned above, our `Search` app listens the `bulk_post_create` and `bulk_post_delete`
signals and index/delete documentations into Elasticsearch. The signal listeners are in
the `readthedocs/search/signals.py` file. Both of the signals are dispatched
after successful documentation build.
Copy link
Member

@RichardLitt RichardLitt Aug 7, 2018

after a successful

@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Aug 8, 2018

@RichardLitt I have fixed all the issue you have mentioned. can you check again?


Installing and running Elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You need to install and run the `Elasticsearch 6.3`_ version on your local development machine.
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

...run Elastichsearch version 6.3_ on your...

This might be a bit more fluent.

You need to install and run the `Elasticsearch 6.3`_ version on your local development machine.
You can get the installation instructions
`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_.
Otherwise, you can also start a Elasticsearch Docker container by running the following command::
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

start an Elasticsearch


Indexing into Elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
For using search, you need to index data to Elasticsearch Index. Run `reindex_elasticsearch`
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

data to the Elasticsearch

Auto Indexing
^^^^^^^^^^^^^
By default, Auto Indexing is turned off in development mode. To turn it on, change the
`ELASTICSEARCH_DSL_AUTOSYNC` settings to `True` in `readthedocs/settings/dev.py` file.
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

in the readthedocs... file

------------
The search architecture is devided into 2 parts.
One part is responsible for **indexing** the documents and projects and
other part is responsible to query through the Index for showing proper results to users.
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

the other part is responsible for querying the Index to show the proper results to users.

The search architecture is devided into 2 parts.
One part is responsible for **indexing** the documents and projects and
other part is responsible to query through the Index for showing proper results to users.
We use `django-elasticsearch-dsl`_ package mostly to the keep the search working.
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

We use the


Indexing
^^^^^^^^
All the Sphinx documents are indexed into elasticsearch after the build is successful.
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

Elasticsearch should always be capitalized.

How we index documentations
~~~~~~~~~~~~~~~~~~~~~~~~~~~

After any build gets successfully finished, `HTMLFile` objects are created for each of the
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

any build is successfully finished

~~~~~~~~~~~~~~~~~~~~~~~~~~~

After any build gets successfully finished, `HTMLFile` objects are created for each of the
`HTML` files and the old version's `HTMLFile` object is deleted. Signal_
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

The Signal_

~~~~~~~~~~~~~~~~~~~~~
We also index project information in our search index so that the user can search for projects
from the main site. `django-elasticsearch-dsl`_ listen `post_create` and `post_delete` signals of
`Project` model and index/delte into Elasticsearch accordingly.
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

delete

Elasticsearch Document
~~~~~~~~~~~~~~~~~~~~~~

`elasticsearch-dsl`_ provides model like wrapper for the `Elasticsearch document`_.
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

provides a model-like wrapper


`elasticsearch-dsl`_ provides model like wrapper for the `Elasticsearch document`_.
As per requirements of `django-elasticsearch-dsl`_, it is stored in
`readthedocs/search/documents.py` file.
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

stored in a

Copy link
Member Author

@safwanrahman safwanrahman Aug 10, 2018

I think it should be the

`readthedocs/search/documents.py` file.

**ProjectDocument:** It is used for indexing projects. Signal listener of
`django-elasticsearch-dsl`_ listen the `post_save` singal of `Project` model and
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

listens to the

indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`.
`settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production.
As mentioned above, our `Search` app listens to the `bulk_post_create` and `bulk_post_delete`
signals and index/delete documentations into Elasticsearch. The signal listeners are in
Copy link
Member

@RichardLitt RichardLitt Aug 8, 2018

indexes/deleted documentation into...

Copy link
Member

@RichardLitt RichardLitt left a comment

Getting there! Sorry about all of these comments.

@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Aug 10, 2018

@RichardLitt I have updated the issues you have mentioned.

Copy link
Member

@ericholscher ericholscher left a comment

This looks great. We could definitely add a bit more detail in some places, but I think this is a solid start to work from. 👍

For using search, you need to index data to the Elasticsearch Index. Run `reindex_elasticsearch`
management command::

./manage.py reindex_elasticsearch
Copy link
Member

@ericholscher ericholscher Aug 10, 2018

It's probably worth noting here why we implemented our own version, instead of using the detail one provided by django-elasticsearch-dsl.

in `instance_list` parameter.

We listen to the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application and
index/delete the documentation content from the `HTMLFile` instances.
Copy link
Member

@ericholscher ericholscher Aug 10, 2018

Again, it's probably worth mentioning why we designed it this way, because by default it was doing an HTTP request per object.

the `readthedocs/search/signals.py` file. Both of the signals are dispatched
after a successful documentation build.


Copy link
Member

@ericholscher ericholscher Aug 10, 2018

It's probably useful to mention how we parse the data from the JSON files to get the actual indexed data.

@RichardLitt
Copy link
Member

@RichardLitt RichardLitt commented Aug 10, 2018

@safwanrahman Thanks. :) Sorry for being so persnickety!

@safwanrahman
Copy link
Member Author

@safwanrahman safwanrahman commented Aug 10, 2018

@ericholscher updated with fixes. Possible to merge?

@safwanrahman safwanrahman added this to Backlog in Search update via automation Aug 11, 2018
@safwanrahman safwanrahman moved this from Backlog to In progress in Search update Aug 11, 2018
@agjohnson agjohnson added this to the Search improvements milestone Aug 27, 2018
@ericholscher ericholscher merged commit 88ff413 into readthedocs:search_upgrade Sep 7, 2018
1 check passed
Search update automation moved this from In progress to Done Sep 7, 2018
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Sep 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Search update
  
Done
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants