Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataverse URL - Page Not Found (404 Error) w/ Trailing Forward Slash #3130

Closed
mheppler opened this issue May 18, 2016 · 7 comments
Closed

Dataverse URL - Page Not Found (404 Error) w/ Trailing Forward Slash #3130

mheppler opened this issue May 18, 2016 · 7 comments
Milestone

Comments

@mheppler
Copy link
Contributor

mheppler commented May 18, 2016

Came across this minor navigation issue recently. When there is a trailing forward slash at the end of a dataverse URL, you get a 404/Page Not Found error.

Good: https://dataverse.harvard.edu/dataverse/mra

Bad: https://dataverse.harvard.edu/dataverse/mra/

Other sites that I've tested this one, forwards you from one to the other. I found a related post on the Pretty Faces support forum with a potential solution.

@pdurbin
Copy link
Member

pdurbin commented Mar 6, 2019

Related #4196

@djbrooke djbrooke removed Feature: Search/Browse UX & UI: Design This issue needs input on the design of the UI and from the product owner labels Mar 6, 2019
@landreev
Copy link
Contributor

landreev commented Mar 7, 2019

This is an honest 1, for once.
This is solved by adding this extra pretty-faces mapping:

   <url-mapping id="dataverseslash">
        <pattern value="/dataverse/#{alias}/" />
        <view-id value="/dataverse.xhtml" />
    </url-mapping>

(this is in src/main/webapp/WEB-INF/pretty-config.xml)

@pdurbin
Copy link
Member

pdurbin commented Mar 7, 2019

@landreev cool, should we fix #4196 with prettyfaces too? That one's about a trailing parenthesis, a ")".

@landreev
Copy link
Contributor

landreev commented Mar 7, 2019

I don't think we should encourage people to use invalid urls. A trailing back slash kind of makes sense on the end of a url; a single mismatched parenthesis? - not really.
I feel like a better error message would be a correct solution for #4196.

@mheppler
Copy link
Contributor Author

mheppler commented Mar 11, 2019

Added mapping for dataverse url with trailing slash.

This now allows for the user to navigate to both /dataverse/alias and /dataverse/alias/ URL's, and both return a 200 OK status code.

There was some concern about duplicate content, but that was alleviated by @landreev who claimed his work in robots.txt will block this URL format. My assumption when I created this issue was that we would forward from one format to another, not support both.

Here is what the Google Webmaster Central Blog said on Wednesday, April 21, 2010 in the To slash or not to slash post.

If both slash and non-trailing-slash versions contain the same content and each returns 200, you can:

  • Consider changing this behavior (more info below) to reduce duplicate content and improve crawl efficiency.
  • Leave it as-is. Many sites have duplicate content. Our indexing process often handles this case for webmasters and users. While it’s not totally optimal behavior, it’s perfectly legitimate and a-okay. :)
  • Rest assured that for your root URL specifically, http://example.com is equivalent to http://example.com/ and can’t be redirected even if you’re Chuck Norris.

@landreev
Copy link
Contributor

landreev commented Mar 11, 2019

Correct - I don't think duplicate indexed content is a problem in our case. Google bot will only index "/dataverse/foo" and "/dataverse/foo/" separately, if it has some way to discover both urls. I don't think it should be possible to ever get to "/dataverse/foo/" by crawling the site. Plus, our current approach is to specifically discourage the bots from crawling the site (i.e., from following the search and facets links); and instead we want to make them go straight to the pages we advertise via the sitemap. The sitemap never uses the "/dataverse/name/" notation - only "/dataverse/name".
(It was still nice to fix this issue, for the benefit of the human users)

@landreev
Copy link
Contributor

(the only times I saw google bot try to access "/dataverse/name/" was when I specifically sent it there, via their search console)

@sekmiller sekmiller removed their assignment Mar 12, 2019
kcondon added a commit that referenced this issue Mar 12, 2019
Added mapping for dataverse url with trailing slash [ref #3130]
@kcondon kcondon closed this as completed Mar 12, 2019
@djbrooke djbrooke added this to the 4.12 milestone Mar 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants