Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order search results by most viewed pages #5968

Open
dojutsu-user opened this issue Jul 20, 2019 · 4 comments

Comments

@dojutsu-user
Copy link
Member

commented Jul 20, 2019

Currently the order of the search results don't consider the number of views of a page.
It would be good if search results gets ordered based on the number of views parametere.

@dojutsu-user dojutsu-user changed the title Order search results by most viewed pages [Feature] Order search results by most viewed pages Jul 20, 2019

@humitos

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

Do you have an idea about how to implement this?

I have some questions:

  • where will we store this data? Maybe in the HTMLFile object?
  • this data needs to be considering when doing full re-index
  • where this data will come from? how? is it a celery task querying the source of data every X minutes?
  • once we have this data, how complicate is to add it to ElasticSearch to be considered when sorting results?
@dojutsu-user

This comment has been minimized.

Copy link
Member Author

commented Jul 22, 2019

@humitos
I don't have a definite answer to most of the points yet.

where will we store this data? Maybe in the HTMLFile object?

HTMLFile object seems to be the right place -- but they get deleted and recreated after every build, so we will lost all the data

this data needs to be considering when doing full re-index

We want the data into elasticsearch -- so yes

where this data will come from? how? is it a celery task querying the source of data every X minutes?

We can use Google Analytics, I don't know if they have an API for this or something else.
Or we can just count it ourselves -- Increase a count of the page by 1 everytime when the page loads.
Once we have the data, we can have celery run in every 7 days to update the data in elasticsearch.

once we have this data, how complicate is to add it to ElasticSearch to be considered when sorting results?

Once we have the data, I don't think it should be very complicated. I will research about this point.

@davidfischer Can we somehow use Google Analytics here?

@agjohnson agjohnson changed the title [Feature] Order search results by most viewed pages Order search results by most viewed pages Aug 5, 2019

@dojutsu-user

This comment has been minimized.

Copy link
Member Author

commented Aug 12, 2019

Some simple thoughts on this:

Storing the data

We can store the data in a separate model. We can't store in HTMLFile model because these gets deleted and recreated after a build. Also, no use of ForeignKey because we don't want the relationships to be null when improted files objects are deleted and recreated.

Updating the data

I believe we can have API endpoint for it. Send a API request as soon as the page loads which increases its count by one.

Syncing the data with elasticsearch

Just a query to our new "count model" should be enough to get the data.

cc: @ericholscher

@ericholscher

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

I think we should likely update it via the footer API, not with JS. I think it should basically work exactly the same as the Search Analytics, except that we aggregate the counts by day for each (project/version/page) grouping. We should likely do the same for Search Analytics, where we aggregate the count on the model for each day by project, or at least project.

I think the main goals here are how we want to query/display the data. I'm imagining similar to search analytics:

  • List of top viewed pages
  • Each page can have a detail with a # of pageviews per day in a graph

I'm imagining something similar to this: https://github.com/readthedocs/readthedocs.org/graphs/traffic -- but we likely won't track referrer to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.