Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show suggestion on the 404 page #678

Merged
merged 11 commits into from
Mar 26, 2014

Conversation

maskit
Copy link
Contributor

@maskit maskit commented Feb 16, 2014

Show suggestion when users land on the 404 page.
What to show are commented in the source code.

@toffer
Copy link
Member

toffer commented Feb 16, 2014

This feature would be really cool, and I really like how you've enumerated all the different cases, and figured out what type of message to show for each one. I can see that you've put a lot of work into this.

However, there's a lot of duplicated work here. Inside your new server_error_404(), you first reconstruct the path, then run regexes against it to pull out all the slugs, and use the slugs to query the database for project and versions. But, that work has already been done by Django's URL dispatching (getting the slugs) and the core.views.serve_docs() function (querying for project and versions).

So, how can we use the work that's already been done?

Here a different approach:

  1. Create a new 404 handler (helpful_404?) that just handles this logic, and leave the generic 404 to handle all other cases.

  2. In serve_docs(), don't use get_object_or_404(), since that calls the generic 404 handler. Instead, specifically call the helpful_404 handler if something's amiss, and pass in the project and version objects that you've already gotten from the DB.

    In fact, it may be better to have multiple helpful_404 type handlers, one for each type of error (for example, project exists, but version doesn't, would be one case).

Lastly, as you demonstrate, there are a lot of cases to consider, so it would be great to include some tests to show that they work as expected.

Please don't consider any of the above to be the final word! I'm just throwing out my ideas in the interest of starting a discussion to figure out the best way to make this feature work.

@maskit
Copy link
Contributor Author

maskit commented Feb 17, 2014

Thank you for your comment.

However, there's a lot of duplicated work here.

I Agree. They should have been done somewhere else. I tried getting slugs somehow not from request.path, because I thought it had already been done when the error handler was called. But at least, /user_builds/... is not in URL pattern so I had to do that. I should have added the pattern into urls.py (may be) but I've implemented it in a quick way because I'm new to Django.

Lastly, as you demonstrate, there are a lot of cases to consider, so it would be great to include some tests to show that they work as expected.

Absolutely. I'll add tests before refining the implementation. It might take some time to learn how to use Django's test framework.

Also, your approach makes sense. It seems good to me. Current implementation is rough so any ideas are welcome, and I'm sure it needs to be refine. But if there is no big problem, I think done is better than perfect. Once it has been merged, anybody can refine it. (I will try, of course, but it's slower than Django users do.)

@maskit
Copy link
Contributor Author

maskit commented Feb 17, 2014

Please take another look.
I think it's good enough. But any feedbacks are welcome.

@toffer
Copy link
Member

toffer commented Feb 19, 2014

Thanks for updating so quickly...and especially, for including tests!

I'll take a look at this tomorrow.

@toffer
Copy link
Member

toffer commented Feb 20, 2014

I think this is a good start, but I don't think it's quite ready for merging. The goal is to display helpful links on the 404 page, so users can find the content they are looking for. But, as it works now, there are a number of cases where the links presented are for invalid URLs that link to another 404 page.

Here are a few examples:

  1. Broken link when version is nonexistent

https://pip.readthedocs.org/en/nonexistent/ -> displays 'latest'
click on 'latest' -> https://pip.readthedocs.org/en/latest/.html (404)
2. Broken link when version and file don't exist.

https://pip.readthedocs.org/en/nonexistent/foo.html -> displays 'latest'
click on 'latest' -> https://pip.readthedocs.org/en/latest/foo.html (404)
3. Broken link when language and file don't exist

https://pip.readthedocs.org/fr/latest/foo.html -> displays 'en'
click on 'en' -> https://pip.readthedocs.org/en/latest/foo.html (404)

It's a real challenge to match filenames, since they aren't stored in the DB and so the Django app has no knowledge of what HTML files exist for each set of documentation. So, if you are searching for 'install.html' for a particular version of a project and it doesn't exist, what's the best way to show a list of versions where 'install.html' does exist? I'm honestly not sure.

Is the list of files easily accessible from Elasticsearch? Could we use that? Do we scan the filesystem before generating the 404 to compile the list of files? (That sounds bad...) Do we store the list of files in the DB somehow? (Keeping the DB and filesystem in sync sounds bad...)

I'm open to suggestions.

Lastly, I should note that the text of the error messages and the template would need some refining before launching, but that's easily doable once the feature works well.

@maskit
Copy link
Contributor Author

maskit commented Feb 23, 2014

Yes, I recognize the issues and it would be nice if they are solved.
However, I'm going to look into example No.1 but others are not my concerns for now. Because my motivation for this change is making language switching better (#661). No.2 and No.3 are not so good behavior but it navigates users to https://pip.readthedocs.org/en/latest/ eventually.

Also, nobody can add tests for cases you've found and refine text of the error messages until this change is merged. That means I need to do all things discussed here. I'm not going to do that, especially finding files things.

So my suggestion is that merge this change to some branch and keep refining on RTD's repository, and discuss for improvement on new issue.

IMO, making language switching better is more useful for users than keep discussing and making helpful 404 page better.

@maskit
Copy link
Contributor Author

maskit commented Feb 23, 2014

I mean, finding files feature will take much time. I have no idea to achieve this.

@toffer
Copy link
Member

toffer commented Feb 25, 2014

@maskit Thanks for hanging in there and continuing to work on the PR. I like your idea to pull the branch into the RTD repository, and then let's see if we can merge some of it piecemeal. Hopefully, we can get a basic helpful 404 message for non-existent languages and versions first, and try to tackle the filename matching as a second step.

Do you want to tackle the 1st issue above before I create the branch on the RTD repo?

@toffer
Copy link
Member

toffer commented Feb 25, 2014

Ping @ericholscher @robhudson.

I've got a couple of elasticsearch questions...

Is there a way to query for a list of HTML documentation filenames for a given project? If someone navigated to /en/latest/install.html, but it didn't exist, is there a way to query elasticsearch for a list of languages/versions where the install.html file did exist?

If this query isn't currently possible, given what we are indexing, is this a type of problem for which elasticsearch would be a good solution? If so, how would we do it?

Lastly, I'm noticing other situations where having a list of documentation filenames for each project/version would be handy. For instance, generating a sitemap (issue #557) would require knowing every doc filename for a project.

My other ideas for coming up a list of filenames are either to store the filenames in the DB, or to walk the filesystem. Perhaps, elasticsearch is a better option?

Your thoughts on the best way to handle this?

@ericholscher
Copy link
Member

We have the language. version. and page indexed. So all we need to do is parse that out of that URL, which we should easily be able to do.

The filesystem is the canonical place for that data though, so depending on the use case it seems like we should look at the file system. Definitely for the site map generation, as it should be part of the build. We can also guarantee that a page will not 404 if it exists on the file system, where as keeping it in sink in elastic search will be harder.

On Feb 25, 2014, at 1:18 PM, Tom Offermann notifications@github.com wrote:

Ping @ericholscher @robhudson.

I've got a couple of elasticsearch questions...

Is there a way to query for a list of HTML documentation filenames for a given project? If someone navigated to /en/latest/install.html, but it didn't exist, is there a way to query elasticsearch for a list of languages/versions where the install.html file did exist?

If this query isn't currently possible, given what we are indexing, is this a type of problem for which elasticsearch would be a good solution? If so, how would we do it?

Lastly, I'm noticing other situations where having a list of documentation filenames for each project/version would be handy. For instance, generating a sitemap (issue #557) would require knowing every doc filename for a project.

My other ideas for coming up a list of filenames are either to store the filenames in the DB, or to walk the filesystem. Perhaps, elasticsearch is a better option?

Your thoughts on the best way to handle this?


Reply to this email directly or view it on GitHub.

@toffer
Copy link
Member

toffer commented Feb 25, 2014

When you say the language. version. and page are "indexed", you mean in elasticsearch, right?

@ericholscher
Copy link
Member

Yep. It is definitely doable with ES. I'm on my phone at the moment though, so I can't say much more until later.

On Feb 25, 2014, at 1:45 PM, Tom Offermann notifications@github.com wrote:

When you say the language. version. and page are "indexed", you mean in elasticsearch, right?


Reply to this email directly or view it on GitHub.

@maskit
Copy link
Contributor Author

maskit commented Feb 25, 2014

@toffer Thank you for your understanding.

Hopefully, we can get a basic helpful 404 message for non-existent languages and versions first, and try to tackle the filename matching as a second step.

Yes, that is what I wanted to say. There is no need to do them in one step.

Do you want to tackle the 1st issue above before I create the branch on the RTD repo?

Does "the 1st issue" mean https://pip.readthedocs.org/en/latest/.html ? If so, I will tackle it in this weekend.

@toffer
Copy link
Member

toffer commented Feb 25, 2014

Does "the 1st issue" mean https://pip.readthedocs.org/en/latest/.html ? If so, I will tackle it in this weekend.

Yes. That's what I meant.

If so, I will tackle it in this weekend.

Perfect.

@maskit
Copy link
Contributor Author

maskit commented Mar 1, 2014

I figured out the 1st issue above. Actually it's not a helpful 404's issue but doc_url tag's. I added tests for that.
Also I added support for singlehtml to doc_url tag.

@toffer
Copy link
Member

toffer commented Mar 3, 2014

Thanks for the update. I'll take a look at this in the next few days.

@ericholscher ericholscher merged commit 090d59f into readthedocs:master Mar 26, 2014
@ericholscher
Copy link
Member

Went ahead and merged this, because it is an improvement over what we currently have. We should open a new ticket about improvements.

@wolph wolph mentioned this pull request Dec 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants