Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record search queries smartly #6088

Merged

Conversation

@dojutsu-user
Copy link
Member

commented Aug 21, 2019

This PR refactor the recording mechanism for better support of "search as you type" feature.
Also, It makes all the queries lowercase so as to have no difference between words like Build and build

@dojutsu-user dojutsu-user changed the title Record queries smartly Record search queries smartly Aug 21, 2019

@dojutsu-user dojutsu-user added this to In progress in In-doc search UI via automation Aug 21, 2019

@dojutsu-user

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

Currently, it reduces the total partial queries -- not all but better than storing every partial query.

project=project,
version=version,
query=query,
)
obj.created = time

This comment has been minimized.

Copy link
@stsewd

stsewd Aug 21, 2019

Member

Shouldn't this be an auto field?

This comment has been minimized.

Copy link
@dojutsu-user

dojutsu-user Aug 21, 2019

Author Member

Yes... but the actual time is when the query was made.
Celery task are not executed immediately so the automated time will be wrong.

This comment has been minimized.

Copy link
@humitos

humitos Aug 21, 2019

Member

I'd say that's probably OK to use the auto field for this.

This comment has been minimized.

This comment has been minimized.

Copy link
@ericholscher

ericholscher Sep 5, 2019

Member

It should be querying the modifed field, and using that -- it will automatically update whenever it is saved: https://github.com/django-extensions/django-extensions/blob/master/django_extensions/db/models.py#L19 -- we should likely use that, instead of trying to change the created date.

@stsewd

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

Also, what about just saving the query if it is greater than 3 characters?

@dojutsu-user

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

@stsewd

Also, what about just saving the query if it is greater than 3 characters?

This will fail in many cases, like: readt, readthe, readthedo (for readthedocs)

@stsewd

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

Why would that fail?

We are start executing the same logic from readt. I'm not saying to remove the logic from this PR

@dojutsu-user

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

@stsewd
Ohhh... Okay
I get it.
Yeahh... that's a nice improvement.
I will update the PR.

@dojutsu-user

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

This is not ready for a review. It is too far from being OK.
Results are not good currently.

readthedocs/search/tasks.py Outdated Show resolved Hide resolved
version__slug=version_slug,
query__startswith=query,
created__gte=before_10_sec,
).order_by('-created')

This comment has been minimized.

Copy link
@davidfischer

davidfischer Aug 21, 2019

Contributor

I wonder about the performance characteristics of this. I don't believe created is indexed although project and version are. This might be worth testing on the live system before merging. This might be ok but I'm not 100% confident without just timing a similar query in prod.

readthedocs/search/tasks.py Outdated Show resolved Hide resolved

# record the search query with a celery task
tasks.record_search_query.delay(
project_slug,
version_slug,
query,
total_results,
time,

This comment has been minimized.

Copy link
@davidfischer

davidfischer Aug 21, 2019

Contributor

Depending on how tasks are serialized, this may not work exactly as planned. I know celery supports Pickle and JSON and I'm not sure which one we are using. Datetimes are not JSON serializable by default although Django does have a way to do it. However, I'm not sure they come back as timezone aware datetimes after they do. Did you verify this?

This comment has been minimized.

Copy link
@dojutsu-user

dojutsu-user Aug 23, 2019

Author Member

During local development, celery uses json serializer.

CELERY_RESULT_SERIALIZER = 'json'

(Docs: https://docs.celeryproject.org/en/latest/userguide/configuration.html?highlight=CELERY_RESULT_SERIALIZER#new-lowercase-settings)

they come back as timezone aware datetimes after they do

I am not sure how to verify this.
I did print the timezone.now() and the SQL query -- that looks pretty similar.

timezone.now(): 2019-08-23 08:45:42.841161+00:00
SQL Query:

SELECT "search_searchquery"."id", "search_searchquery"."created", "search_searchquery"."modified", "search_searchquery"."project_id", "search_searchquery"."version_id", "search_searchquery"."query"
FROM "search_searchquery"
INNER JOIN "projects_project"
ON ("search_searchquery"."project_id" = "projects_project"."id")
INNER JOIN "builds_version"
ON ("search_searchquery"."version_id" = "builds_version"."id")
WHERE ("projects_project"."slug" = rtd AND "builds_version"."slug" = latest AND "search_searchquery"."query" LIKE read% ESCAPE '\' AND "search_searchquery"."created" >= 2019-08-23 08:45:32.841161)
ORDER BY "search_searchquery"."created" DESC
return

# don't record query with zero results.
if not total_results:

This comment has been minimized.

Copy link
@davidfischer

davidfischer Aug 21, 2019

Contributor

Is there a reason this block isn't at the top of the function? I guess you could update the time on another query even if this one returns no results. However, that probably warrants a comment or something.

This comment has been minimized.

Copy link
@dojutsu-user

dojutsu-user Aug 23, 2019

Author Member

Is there a reason this block isn't at the top of the function?

Yes. I wanted to update the timing and query even if no results are there. For example: readthed will return no results but I wanted to update the timing and query if there is a previous query of readth.

I guess you could update the time on another query even if this one returns no results. However, that probably warrants a comment or something.

I am not sure if I understood.

@ericholscher

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

@dojutsu-user what is the status here? Is it working locally, or still needs more work?

@dojutsu-user

This comment has been minimized.

Copy link
Member Author

commented Aug 28, 2019

@ericholscher
I would really like to have some help here.
The logic and everything seems correct but I it is not working as expected.

@ericholscher

This comment has been minimized.

Copy link
Member

commented on readthedocs/search/tasks.py in 9200121 Aug 28, 2019

This should likely be setting an updated time, not created, no?

This comment has been minimized.

Copy link
Member Author

replied Aug 28, 2019

I don't quite get it. Did you mean modified_time?

@ericholscher
Copy link
Member

left a comment

Think this looks good, but we should definitely be updating the modified time, not the created time :)

project=project,
version=version,
query=query,
)
obj.created = time

This comment has been minimized.

Copy link
@ericholscher

ericholscher Sep 5, 2019

Member

It should be querying the modifed field, and using that -- it will automatically update whenever it is saved: https://github.com/django-extensions/django-extensions/blob/master/django_extensions/db/models.py#L19 -- we should likely use that, instead of trying to change the created date.

@ericholscher

This comment has been minimized.

Copy link
Member

commented Sep 5, 2019

Gonna merge this so it goes out on the next deploy, but would like to do some small updates to this over time.

@ericholscher ericholscher merged commit a1f3b05 into readthedocs:master Sep 5, 2019

2 checks passed

continuous-documentation/read-the-docs Read the Docs build succeeded!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@dojutsu-user dojutsu-user deleted the dojutsu-user:recording-query-smartly branch Sep 6, 2019

@dojutsu-user dojutsu-user moved this from In progress to Done in In-doc search UI Sep 6, 2019

@humitos

This comment has been minimized.

Copy link
Member

commented Sep 9, 2019

Think this looks good, but we should definitely be updating the modified time, not the created time :)

@dojutsu-user can you create an issue to track this?

@dojutsu-user

This comment has been minimized.

Copy link
Member Author

commented Sep 9, 2019

@humitos
Sure 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants
You can’t perform that action at this time.