Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search analytics #6019

Merged
merged 52 commits into from Aug 7, 2019
Merged

Search analytics #6019

merged 52 commits into from Aug 7, 2019

Conversation

@dojutsu-user
Copy link
Member

@dojutsu-user dojutsu-user commented Jul 31, 2019

Closes #5967
WIP

@dojutsu-user dojutsu-user requested a review from Jul 31, 2019
readthedocs/search/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@ericholscher ericholscher left a comment

This looks great! I haven't heavily reviewed the graphing. I'd like to keep to the same libraries we're already using, so if you can use the same stuff as the ad code, that would be great. I don't feel strongly though.

readthedocs/search/utils.py Outdated Show resolved Hide resolved
readthedocs/search/utils.py Outdated Show resolved Hide resolved
readthedocs/settings/base.py Outdated Show resolved Hide resolved
_('Query'),
max_length=4092,
)
count = models.PositiveIntegerField(
Copy link
Member

@ericholscher ericholscher Jul 31, 2019

I'm wondering if we want more data here. Should we be storing an object each time a search happens? That way we can show the frequency of a search over time. Currently, this only tells us how many times a search has happened.

I think if we plan to delete the data every 3 months, we can probably store every search query with it's own timestamp. I'm fine with shipping this initially though, before we start storing a lot more data.

Copy link
Member Author

@dojutsu-user dojutsu-user Aug 1, 2019

I am not sure what you meant be more data.

Storing search object everytime a search was made is a better idea. I realised that the graphs were wrong before. And going this way makes them correct and easier.

That way we can show the frequency of a search over time.

Can you exapand this more?

  • Do we want this to be selected by user? Like the user can select a date and we show him the frequency of searches made vs time for that day.
  • Or we just show this for today/yesterday?

Copy link
Member

@ericholscher ericholscher Aug 1, 2019

Just that we will be able to see when a specific search was done each time it was done. The current modeling only shows the number of times a search was done, but no time data about each search.

@dojutsu-user dojutsu-user requested review from ericholscher and Aug 2, 2019
@dojutsu-user dojutsu-user self-assigned this Aug 2, 2019
@dojutsu-user
Copy link
Member Author

@dojutsu-user dojutsu-user commented Aug 3, 2019

@ericholscher
I have updated the PR.
And also added a download button which allows proj admins to download all data in csv format.

Screenshot - https://ibb.co/xzX54Ty

@dojutsu-user dojutsu-user changed the title [WIP] Search analytics Search analytics Aug 5, 2019
Copy link
Member

@ericholscher ericholscher left a comment

Looks good with a few small nits. I'll go ahead and merge this to get the modeling shipped, but we should clean up some of these tidbits.

project_slug
)
# data for plotting the doughnut-chart
distribution_of_top_queries = SearchQuery.generate_distribution_of_top_queries(
Copy link
Member

@ericholscher ericholscher Aug 7, 2019

I'm a little worried this will be slow in production after we have a lot of data, but we can deal with it then.


response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = f'attachment; filename="{file_name}"'
template = loader.get_template('projects/search_analytics/csv_data_template.txt')
Copy link
Member

@ericholscher ericholscher Aug 7, 2019

Why are we writing this with a template instead of a CSV library?

verbose_name=_('Version'),
related_name='search_queries',
on_delete=models.CASCADE,
)
Copy link
Member

@ericholscher ericholscher Aug 7, 2019

Not sure if we really even want to cascade these deletes. Is there a reason we don't want to store Version here as a string, so we can keep them forever even if a version is deleted?

.order_by('created_date')
.annotate(count=Count('id'))
.values_list('created_date', 'count')
)
Copy link
Member

@ericholscher ericholscher Aug 7, 2019

This looks really slow. We will see in prod, hopefully it won't be an issue.



@app.task(queue='web')
def record_search_query(project_slug, version_slug, query, total_results):
Copy link
Member

@ericholscher ericholscher Aug 7, 2019

Hrm yea, that seems less than ideal. We should probably think more about the right approach for "search as you type" -- probably adding Autocomplete vs. search as you type in some cases.


project_qs = Project.objects.filter(slug=project_slug)

if not project_qs.exists():
Copy link
Member

@ericholscher ericholscher Aug 7, 2019

This should probably log a warning.

@@ -0,0 +1,3 @@
serial_no,date_time,query
{% for row in data %}{{ forloop.counter }},"{{ row.0|addslashes }}","{{ row.1|addslashes }}"
{% endfor %}
Copy link
Member

@ericholscher ericholscher Aug 7, 2019

Definitely we should use the csv library for this.

@ericholscher ericholscher merged commit f9f6c53 into readthedocs:master Aug 7, 2019
1 check passed
@dojutsu-user dojutsu-user deleted the search-analytics branch Aug 7, 2019
@dojutsu-user dojutsu-user restored the search-analytics branch Aug 7, 2019
@dojutsu-user dojutsu-user deleted the search-analytics branch Aug 8, 2019
@dojutsu-user dojutsu-user added this to In progress in In-doc search UI via automation Aug 9, 2019
@dojutsu-user dojutsu-user moved this from In progress to Done in In-doc search UI Aug 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Linked issues

Successfully merging this pull request may close these issues.

2 participants