Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component router can't parse old URLs #14848

Open
Bakual opened this Issue Mar 21, 2017 · 69 comments

Comments

Projects
@Bakual
Copy link
Contributor

Bakual commented Mar 21, 2017

Steps to reproduce the issue

  • Install Joomla with testing data. Leave "Modern/Experimental" Router disabled.
  • Open the "Article Category List" menu item (/article-category-list.html). Leave that page open.
  • In a second tab go to the com_content Options in backend and set the "URL Routing" parameter (in "Integration" tab) to "Experimental".
  • Go back to the first tab with the open "Article Category List" and try some links there (open them in new tabs). Some of those "legacy" links will work, some (eg "Getting Started") will give you a 404.

Expected result

All links should still be parsed

Actual result

Only the "correct" links according to new router are parsed, the others are discarded.

System information (as much as possible)

Staging from 2017-03-21

Additional comments

The example article link "Getting Started" is generating a link /getting-started/19-sample-data-articles/joomla/22-getting-started.html currently with legacy router. This link actually is wrong and should be /getting-started.html since it is a direct match with a menu item. Our current code does that wrong and the new router would do it right.
However our current code is able to parse both the wrong and correct URL just fine and give the expected page.
The new router will break those "wrong" links, which imho is a quite major B/C break with a big impact on search engines and incoming links in general.
You could argue that it's an option the admin has to enable, but that is only half of the truth. With 4.0, there will no such option anymore and we will face a lot of broken links after either enabling the router option or upgrading to 4.0. Both of which I think is unacceptable. There is also no real migration path.

Now what I would expect is the following:

  • New Router is always enabled
  • If an URL can't be parsed by the new router, there is some fallback code which tries with the legacy parsing rules
  • If legacy can parse the URL successfully, the wrong URL will be added to com_redirect with the new correct URL as target and a redirect 301 will be executed. This way, we don't loose any visitors and search engines will update their links.
  • If legacy can't parse it as well, a 404 is issued of course.
  • With 4.0, we can drop the fallback code if needed (personally I would give more time) or make it optional. Admins can then choose which "old" URLs they still want to be working by simply checking in com_redirect the existing redirects.

This way, we would have 100% B/C plus an easy migration path without loosing any incoming traffic.

Without that, I think we will face our next big Joomla drama when site owners realise the fancy new router will break some of their external links and Google Webmastertools starts listing a lot of 404.

@brianteeman

This comment has been minimized.

Copy link
Contributor

brianteeman commented Mar 21, 2017

Isnt that why the new router is not the default and has a warning message? I would say that this is expected behaviour?

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 22, 2017

It's a known issue and apparently expected behavior by the dev. But not by the user. And it certainly doesn't mean it is the correct thing to do.

Also, with 4.0 there will be no option and no warning anymore.
Thus the admin has no real choice. He will have to break the links sooner or later and we don't help him find out which links that may be.

Seriously? That's our plan and expected behaviour? We can do better than that for sure.

@chrisdavenport

This comment has been minimized.

Copy link
Contributor

chrisdavenport commented Mar 22, 2017

For reference, from https://developer.joomla.org/development-strategy.html#backward_compatibility

6.1.8 URLs

Any change to a URL that will give a 404 (or some other error) where it previously gave a 200 is a break in backwards compatibility. However, if the change results in a redirect to a new URL (which gives a 200) then that is acceptable.

In general, if a URL is changed then provided the new URL delivers the exact same resource rendered in the same way then that is not considered to be a break in backwards compatibility. For example, changing the order of the arguments in the query part of a URL is not considered to be a break.

@dgrammatiko

This comment has been minimized.

Copy link
Contributor

dgrammatiko commented Mar 22, 2017

Any change to a URL that will give a 404 (or some other error) where it previously gave a 200 is a break in backwards compatibility

Not if the old URL was falsely 200, e.g. @Bakual 's example in the description. That's not a valid URL, current router return something valid which IS wrong! This wrong behaviour CANNOT be supported, that was one of the goals of the new router: to be a lot stricter than the loose one we currently have!
My 2c

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 22, 2017

Not if the old URL was falsely 200, e.g. @Bakual 's example in the description. That's not a valid URL, current router return something valid which IS wrong!

That statement isn't true. It wasn't falsely a 200. It was a valid URL generated by our current router and is correctly parsed and gives the expected result. So it is not the URL it should have generated but it is a valid URL.
The current router doesn't return "something". It returns the correct and expected page.

This wrong behaviour CANNOT be supported, that was one of the goals of the new router: to be a lot stricter than the loose one we currently have!

I can live with that as the end goal (although I think it's stupid since site owners prefer visitors and not 404s), but I don't agree with doing that without any possibility for site owners to mitigate the effects of it.

@dgrammatiko

This comment has been minimized.

Copy link
Contributor

dgrammatiko commented Mar 22, 2017

Educate people and then they will be fine. Tell them to create a sitemap of the old site, create another on when they'll upgrade to the new system. Then explain them how to connect the dots (map the old links to the new)
The tools are widely available...

Similar to this problem is the UX improvement task of the back end. If we really want to improve (and not change some colours or some paddings) then we will end up with different workflows (that end users can't even imagine, therefore user surveys are useless).
But then again I might be wrong on both, time will tell...

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 22, 2017

Educate people and then they will be fine. Tell them to create a sitemap of the old site, create another on when they'll upgrade to the new system. Then explain them how to connect the dots (map the old links to the new)
The tools are widely available...

Seriously??!! That's the recommended solution? Wow...

Backend is another topic. Changing workflows is fine if it is an improvement. That's not similar at all.

@brianteeman

This comment has been minimized.

Copy link
Contributor

brianteeman commented Mar 22, 2017

Have to agree that your suggestion is not a solution at all - it might be just about ok on a site with just a few pages (although that site probably wont be effected anyway) but its completely impractical to suggest to do that on a site with even a few hundred pages - never mind one with thousands

@dgrammatiko

This comment has been minimized.

Copy link
Contributor

dgrammatiko commented Mar 22, 2017

@brianteeman I'm guessing here that anyone that wants to move to the new router (is not forced to do so) understands the impact of that change.

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 22, 2017

(is not forced to do so)

We will be forcing it with 4.0. It's not optional at all.

@mbabker

This comment has been minimized.

Copy link
Member

mbabker commented Mar 22, 2017

Any plan which mandates that the current broken URLs that get accepted by the routing system is in my eyes not a valid plan. By that logic I can craft the URL of https://www.joomla.org/announcements/6-joomla-leadership-team.html which results in a 200 response, gives me exactly the body content that I'm looking for (even if it is now wrapped an the incorrect category/menu configuration), and therefore by your argument must continue to work or automagically redirect. Even funnier is this isn't a URL that will ever get generated within the Joomla application but if you know anything about how wonky the current router is you know exactly how to craft URLs in such a way to get mixed pages like this which just work.

Sooner or later we have to cut the technical debt and we have to address some of the underlying issues users have with the routing system. One of the most frequent groans is people manage to get "duplicate content" because there are a plethora of URLs you can use to get to a page if you know what you're doing (https://www.joomla.org/component/content/category/6-joomla-leadership-team.html is another perfectly valid mutation of the leadership page but again wrapped with the wrong menu data). We need to stop having a system that allows you to mutate the URL structure and land on a valid page, this system moves in that direction.

Yes, it does mean that users will require additional education and additional work to validate their links. Yes, I get this is not optimal user experience. But short of always supporting routing what are very obviously FUBAR URLs to the right content within our code, there is no fix for that.

@brianteeman

This comment has been minimized.

Copy link
Contributor

brianteeman commented Mar 22, 2017

for the record i have absolutely no issue with making urls that "work" today but cannot be "generated" by Joomla no longer work

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 22, 2017

@mbabker Michael, I'm not saying to keep the old URLs working forever. I just want to have a way site admins realistically can redirect the old URLs to the new ones without having to manually add all of them.

One of the most frequent groans is people manage to get "duplicate content" because there are a plethora of URLs you can use to get to a page

That's actually a misunderstanding from the people about what "duplicate content" is. Google has no issue with multiple links pointing to the same content as long as it's on the same domain.
But as said, it's fine for me to get there where only one valid URL exists for a given page. I just don't agree on the path which is currently taken to get there (because there is no path).

But short of always supporting routing what are very obviously FUBAR URLs to the right content within our code, there is no fix for that.

There is a way to temporary keep supporting the "FUBAR URLs", collecting them and leave it to the admin to decide which to drop and which to keep after the legacy support has been dropped (eg in 4.0).

@peteruoi

This comment has been minimized.

Copy link

peteruoi commented Mar 22, 2017

As i see it there is no problem with joomla 3.7 as the new router is optional.
Can an official joomla link migrator be developed for joomla 4? I 'm certainly no expert to say if it is possible with our current router mistakes, but if it is possible, could a link migrator automatically create 301 redirects with our component redirect???

@mbabker

This comment has been minimized.

Copy link
Member

mbabker commented Mar 22, 2017

They could not be collected and an admin be told there are URLs not valid with the new system. That's not how it works and trying to do that WOULD be a B/C break. To use com_redirect in that way would require throwing a 404 on what is currently a URL responding with a 200. Or you are suggesting to just automagically dump all valid legacy URLs into com_redirect with zero notification to anyone (which would be a massive change in behavior and user expectation because right now the component only collects 404 URLs or has items that are manually input).

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 22, 2017

but if it is possible, could a link migrator automatically create 301 redirects with our component redirect???

That's what I suggested in the initial issue description. But done in 3.7 and not in 4.0.

which would be a massive change in behavior and user expectation because right now the component only collects 404 URLs or has items that are manually input

Yes, it would be a change in behaviour since we collect the 404s before they happen, at a time where we actually still could say what the correct target is.
If you see an issue with that, make it optional. I don't see that as an issue.

@mbabker

This comment has been minimized.

Copy link
Member

mbabker commented Mar 22, 2017

The link migrator can't be done. Because there isn't a master list of all the URLs a site is accepting anywhere thanks to the glorious FUBAR behaviors of the current router, which as demonstrated allows you to mutate URLs (or in some cases will build them itself because of the oh so glorious FUBAR routing system) which results in "expected" content being displayed incorrectly. So a collection of bad URLs can only be compiled at runtime. Which by system behavior means that the URLs must 404 before they will automatically be collected into the redirect component or we will be introducing new black magic behaviors into a component and no notification to site owners about this.

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 22, 2017

sigh...

@rdeutz

This comment has been minimized.

Copy link
Contributor

rdeutz commented Mar 22, 2017

What you need to do is to check if the URL could be created by the system. If someone fooled the system and created a URL that works because of a bug/simplification then it is ok when this URL doesn't work in the new system. This will be the only a low number, but we need a solution for the majority of old URLs for a period of time.

@franzpeter

This comment has been minimized.

Copy link

franzpeter commented Mar 22, 2017

Sorry, I am not a coding specialist in case of Joomla, but could something like that help to solve the problem: a crawler, which automatically crawls all pages to get even those false correct pages, take the results and gives the correct rewrites?

@franzpeter

This comment has been minimized.

Copy link

franzpeter commented Mar 22, 2017

The only problem would be how to detect those false correct pages, it would need to crawl twice. First with the standard router, give the experimental router the results to try to route and if 404, detect that it needs a rewrite.

@wilsonge

This comment has been minimized.

Copy link
Contributor

wilsonge commented Mar 22, 2017

https://github.com/wilsonge/joomla-cms/tree/com-router-legacy-rule This rule will parse legacy URLs with the new structure (however it does not validate intermediate segments - this means that /getting-started/19-sample-data-articles/joomla/22-getting-started.html parses, but so does /getting-started/19-sample-data-articles/lalalalalalalal/joomla/22-getting-started.html which from my discussions with the SEO team was one of Joomla's biggest routing issues from an SEO perspective.

It's only com_content by example and doesn't do the redirect logic - but I'm sure you guys can figure out how to do the redirect logic and whilst each router treats this kind of link specially you can figure out how to make it work :)

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 23, 2017

Last night I was thinking about an approach where we would add a temporary argument $forceLegacy to JRouter::parse() which would then override the legacy/experimental parameter setting.
With that, we could put code into the redirect plugin which in case of a 404 would try to parse the route again with that $forceLegacy enabled. If that parse results in a valid URL, it would do the redirect and add the entry to the com_redirect table. Next time that URL is called, the regular redirect function would take care of it.
We can of course add a new parameter to the plugin to control that behaviour.

This way, the code would be in a central place and no coupling of the component routers to com_redirect.

@wilsonge

This comment has been minimized.

Copy link
Contributor

wilsonge commented Mar 23, 2017

It's cleaner but you can't do it as a temporary measure and keep the interface

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 23, 2017

I probably don't understand the sentence. With temporary I mean we could add that argument with 3.7 and deprecate it right away again for 4.0 when the legacy routing is removed (the argument is at least useless at that point).

@wilsonge

This comment has been minimized.

Copy link
Contributor

wilsonge commented Mar 23, 2017

https://github.com/joomla/joomla-cms/blob/staging/libraries/cms/component/router/interface.php As in you'd need to break this interface. Which would mean extensions couldn't have an implementation that supports J3 and J4 at the same time

@Bakual

This comment has been minimized.

Copy link
Contributor Author

Bakual commented Mar 23, 2017

You can't add it to the interface, that's true.
But as far as I know the component routers could have that additional optional parameter both in J3 and J4. It would still satisfy the interface. In J4 it will just be a useless argument which will be never called.

@wilsonge

This comment has been minimized.

Copy link
Contributor

wilsonge commented Mar 23, 2017

Ahh I didn't think you could. But we're doing that in JTable so I'm wrong. That could work

@chriswagner0815

This comment has been minimized.

Copy link

chriswagner0815 commented Mar 23, 2017

Dear colleagues -
thank's for raising all these issues. At the code sprint on Monday, the SEO team also met in Amsterdam, discussing with developers about the router issues raised. We have heard you all and we are reading what you write.

We are currently in the process of creating a document and a video with a project example (which we hope will be done by mid to end next week). We are also going to address how it needs fixing, why it needs fixing and what additional router features we would like to see from a technical SEO perspective.

We hope, that everyone can see the good in the community starting this process and we understand that there is guidance and information required from us.

Please give us the time to provide you with what we feel is needed. Let's all help in moving this forward.

And again: doing SEO for a living, I cannot even begin to tell you, how glad I am that we start working on these issues!

Kind regards
Christopher Wagner
Team Lead Joomla Optimization Team

@chriswagner0815

This comment has been minimized.

Copy link

chriswagner0815 commented Oct 4, 2017

Hey Brian,
since this is enabled on a test domain, we would have to chat in private. Hence last time you told me, that you won't, I really don't see a way in helping you out on that one.

Should you reconsider, please let me know!
Big hug
Chris

@brianteeman

This comment has been minimized.

Copy link
Contributor

brianteeman commented Oct 4, 2017

The reason for insisting that this is in public is so that everyone can see the issue and comment.

If it is such a massive problem then surely you would want everyone to be aware of it so that everyone can help to fix it. Thats what open source is all about.

If it is such a massive problem then it must be easy to replicate on any domain or test site.

@chriswagner0815

This comment has been minimized.

Copy link

chriswagner0815 commented Oct 4, 2017

Hey Brian,
Rowan just told me that on CJO, the issue with the redirects is fixed with the solution above.

Regarding everything else, I could only provide excel sheets and since we have no 301 redirects by default (look at the code) you see the issue.

Hope that helps
Chris

@brianteeman

This comment has been minimized.

Copy link
Contributor

brianteeman commented Oct 4, 2017

Nope doesnt help at all in seeing either of the two issues you mention.

@chriswagner0815

This comment has been minimized.

Copy link

chriswagner0815 commented Oct 4, 2017

Hello Brian,
in that case, I really don't know how to help you :( You could simply have a look at the code!
Chris :)

@rfmjoe

This comment has been minimized.

Copy link

rfmjoe commented Oct 4, 2017

Hello brian,
ok, will do that for you. i use one of my sites as example.
https://www.seo-webdesign.wien (joomla 3,8, experimental router enabled, removie IDs enabled)
NEW SEF-URL with new router:
https://www.seo-webdesign.wien/aktuelles/joomla-version-3-8-ist-da-und-bringt-neues-sef-routing-fuer-suchmaschinenfreundlichere-urls

if i`d enable the STABLE router, this SEF-URL would be rendered as:
https://www.seo-webdesign.wien/aktuelles/43-joomla-version-3-8-ist-da-und-bringt-neues-sef-routing-fuer-suchmaschinenfreundlichere-urls

now, assume google has indexed the old SEF-URL (Joomla 3.7.5 and before). now i come and activvate the new router. Guess what happens now if a user enters this url? - blank page. no redirect. no 404 status code to browser. try it:
https://www.seo-webdesign.wien/aktuelles/43-joomla-version-3-8-ist-da-und-bringt-neues-sef-routing-fuer-suchmaschinenfreundlichere-urls

solution:

  • automatically redirect (301) old urls to the new urls (htaccess rewrite for example). otherwise, google penalty incoming for larger sites who switch from old router to new router with joomla 3.8+.

let users redirect those old urls by hand is not an option. if you ignore this issue, i see big problems coming to joomla sites in the future.

to be clear:
it is NOT a problem that the NEW router can`t parse the old URLs! The problem is the kombination of:

  • old urls
  • new router + IDs removed
    result: old urls RETURN a blank page (point 1) and are not redirected to the new URL (point 2).

there is no problem (at least havent found any) if you activate the new router and you disable the remove ID from URL option.

hope this helps,
joe

@mbabker

This comment has been minimized.

Copy link
Member

mbabker commented Oct 4, 2017

A old URL returning a blank page is indicative of a PHP error. Any time you get a blank page is a hidden PHP error. Turn on your error reporting to find out what is going on there. Either way, the issue you are seeing with that is NOT related to the fact that old URLs cannot be parsed.

There is only so much that Joomla can be expected to do as it relates to redirect management. Given the way the API is written, it is not a simple solution to just try to parse a URL with every possible routing configuration and redirect if something matches. The only reliable way to handle redirects is to have a master list of the existing valid URLs prior to making a configuration change, making the configuration change, and reviewing the URL list afterwards to find discrepancies. URL management (including migrations) is not a task that a site owner should leave 100% to the platform they are working with, they should be actively involved in reviewing the sitemap and managing redirects as well as using the tools the platform offers to assist with that.

Nobody has downplayed this issue, contrary to what some might want to say here. But the fact of the matter is nobody has put any real effort into addressing the problem, people just come here and say the problem exists and that we must automatically redirect some URLs based on some set of parameters. As a 100% volunteer open source project, we need people helping to work on a solution and not just continuing to post comments saying the problem exists, otherwise there will be no solution.

@brianteeman

This comment has been minimized.

Copy link
Contributor

brianteeman commented Oct 4, 2017

@rfmjoe Thank you for supplying the link

As @mbabker says a blank page is an error message in disguise. If you set error reporting to Development in global configuration and then try again you should get an error message either on the screen or in your logs. If you can do that and post back the results then we can help.

@chriswagner0815 I fail to understand why you refuse to help

My own test results on a live site (now reverted)

Old router url

https://example.com/community/lifecycle/49-marriage

New router with id url

https://example.com/community/lifecycle/49-marriage

New router with no id url

https://example.comk/community/lifecycle/marriage

Result of using old url 404

I

@rfmjoe

This comment has been minimized.

Copy link

rfmjoe commented Oct 4, 2017

hey there brian,
thanks, i enabled maximum error and also tried another template. as it seems the missing error 404-code is template-related. when i enabled the beez3 template, a 404-error is correctly displayed.

these are the errors of my active template that prevents a correct 404-error:
Warning: require_once(/usr/www/users/rauschi/seo-webdesign/libraries/joomla/document/html/renderer/head.php): failed to open stream: No such file or directory in /usr/www/users/rauschi/seo-webdesign/templates/cloudbase3/error.php on line 78

Fatal error: require_once(): Failed opening required '/usr/www/users/rauschi/seo-webdesign/libraries/joomla/document/html/renderer/head.php' (include_path='.:/usr/local/lib/php/') in /usr/www/users/rauschi/seo-webdesign/templates/cloudbase3/error.php on line 78

thanks,
joe

@csthomas

This comment has been minimized.

Copy link
Contributor

csthomas commented Oct 4, 2017

@brianteeman
Your example present when old router url == new router with id url.

Then an issue is between new routing with id and without it.
Do you thing that new routing with removed id should return correct page for: https://www.sinaileeds.uk/community/lifecycle/49-marriage ?

@brianteeman

This comment has been minimized.

Copy link
Contributor

brianteeman commented Oct 4, 2017

@rfmjoe now you can see why I asked the questions that I did. It wasnt to be awkward but to show that the issue you had with the blank page was nothing to do with the router

@csthomas Personally yes I do but what do i know about routing ;)

@rfmjoe

This comment has been minimized.

Copy link

rfmjoe commented Oct 4, 2017

@brianteeman yes, blank page issue is clear for me now.
the other issue (redirect old urls to new ones) is still an issue as described in the original post by Bakual.

@csthomas

This comment has been minimized.

Copy link
Contributor

csthomas commented Oct 4, 2017

This is a simple fix for Brian example:

diff --git a/components/com_content/router.php b/components/com_content/router.php
index 4957dc0170..4b06e943b1 100644
--- a/components/com_content/router.php
+++ b/components/com_content/router.php
@@ -219,7 +219,24 @@ class ContentRouter extends JComponentRouterView
                                ->where('catid = ' . $dbquery->q($query['id']));
                        $db->setQuery($dbquery);
 
-                       return (int) $db->loadResult();
+                       $id = (int) $db->loadResult();
+
+                       if ($id === 0)
+                       {
+                               $alias = explode('-', $segment, 2);
+
+                               if (isset($alias[1]))
+                               {
+                                       $dbquery->clear('where')
+                                               ->where('alias = ' . $dbquery->q($alias[1]))
+                                               ->where('catid = ' . $dbquery->q($query['id']));
+                                       $db->setQuery($dbquery);
+
+                                       $id = (int) $db->loadResult();
+                               }
+                       }
+
+                       return $id;
                }
 
                return (int) $segment;

The same could be done for getCategoryId.

But the question is, do you want such improvement? optionally?

@mbabker

This comment has been minimized.

Copy link
Member

mbabker commented Oct 4, 2017

The fix is not that simple. That may make the legacy URL parsable, but how does the system know that URL should be a 404 or 301 with a different set of routing configurations enabled? Without that part of the equation (George wrote a parser rule too months ago), this can't go anywhere. And no, API changes like changing method signatures are not an option, especially when there is an interface involved (because if you can't rely on the interface why bother with it?).

@csthomas

This comment has been minimized.

Copy link
Contributor

csthomas commented Oct 4, 2017

I only suggest that URL from new routing with url id could be parsable (optional) on new routing without url id.

Decision about 404 or 301 can be made later in plugin.

Hannes gave an example of plugin plgSystemSeoRedirect at https://groups.google.com/forum/#!topic/joomla-dev-cms/RWya-5Gcvlg

Personally I made similar plugin for com_content and com_tags. It fixes all weird/doubled urls at J3.7 and use redirect 301 or add canonical link.

Joomla 3.x could have an option to parse all older versions of links.
If administrator checks all options (support old routing, support url with id on routing without id), then after URL parsing, the plugin makes a decision.

It will slow down the system but give us a time when J4 will be released without supporting the old routing.

@mbabker

This comment has been minimized.

Copy link
Member

mbabker commented Oct 4, 2017

For things to work right, what has to happen is that the router tries to parse things like normal, throw the 404, then in a plugin it decides if it should attempt to reparse the URL with another configuration. If so, the plugin should handle issuing the 301 required, otherwise it should fall back to whatever 404 handling is in place otherwise.

So, the router may need to be able to parse the old URLs, but it absolutely cannot be enabled by default otherwise that parsing will cause URLs that should 404 based on the configuration change to be a 200 and that's just going to make things even worse than they are now.

@JimJGitHub

This comment has been minimized.

Copy link

JimJGitHub commented Nov 21, 2017

#18771

Home page wrong url when home is a featured articles menu

@Ruud68

This comment has been minimized.

Copy link
Contributor

Ruud68 commented Nov 22, 2017

I have been looking at the redirect of old to new URLs from different perspectives.
I concluded (for myself) that this is a one time 'issue': when 'migrating' from the old URL to the new URL.
With my 'developer hat' on, I want to automate things as much as possible but for me the cost for creating that code / testing it and maintaining it would only make sense if it was NOT a one time issue for a site.

So I have followed the following approach for my and my customer's sites: I have added a function to my toolbox that will create an overview of the old URL and the new URL. This overview you can copy and paste into the build-in com_redirect component: problem solved. Both for Search Engines as for back-links.

What it does under the hood is switch on the stable router (with id), create URLs for all articles with JRoute, switch on the experimental router with ID turned of, create all the URLs again. It also handles URLs for multiple languages.

selection_250

@infograf768

This comment has been minimized.

Copy link
Member

infograf768 commented Nov 22, 2017

Could we test this?

@Ruud68

This comment has been minimized.

Copy link
Contributor

Ruud68 commented Nov 23, 2017

Sure @infograf768 if you sent me a mail (in my profile), I will sent you a downloadlink

@infograf768

This comment has been minimized.

Copy link
Member

infograf768 commented Nov 23, 2017

It looks like working.
Note: One anyway has to still create hidden menus of the type All Categories for each language and each component, specially when using featured menu items.

@Ruud68
Do you mind if I share this component with the Maintainers group?

@Ruud68

This comment has been minimized.

Copy link
Contributor

Ruud68 commented Nov 23, 2017

@infograf768 sure, be my guest :)

@brianteeman

This comment has been minimized.

Copy link
Contributor

brianteeman commented Jan 2, 2018

Dear colleagues,
we are still working on the router project seo team internally and do not have everything together because non-voluntary work needed our attention the last week.
Once we know when we can continue, we will provide you with a new deadline.
Please apologize the inconvenience - we are on it!
Chris

Was there ever an update on this from the router project seo team?

@danielmreck

This comment has been minimized.

Copy link

danielmreck commented Oct 1, 2018

Hello all,

As I have started working on preparing sites to be migrated toward version 4.0, the URL router breaking legitimate legacy links has been a real concern. Requiring site administrators to convert hundreds or thousands of URLs without an easy tool integrated into Joomla 4.0 is an invitation to for them move away from Joomla as a platform.

I agree with the router dev team that we should return 400-series errors on malformed URLs that would work in the legacy router:

https://example.com/my-made-up-garbage/123-real-content

should not resolve to "Real Content" with article ID 123, but should instead return a 400-series error.

 
However, we really should be taking into account legit URLs generated by the legacy router and reroute them to their new destinations:

https://example.com/my-real-menu-item/123-real-content

should receive a 301 redirect to the URL generated by the modern router, such as:

https://example.com/my-real-menu-item/real-content

 
It would seem to me that the best place to accomplish this is within the modern router, which would allow us to weed out other fake legacy URLs like:

https://example.com/my-real-menu-item/000-real-content

The modern router would look up the true ID for "Real Content" and know that 000 is incorrect (in addition to being invalid anyway).
 

If this is not going to be super-easy in Joomla 4.0, then our user base will either migrate to another platform or turn to inadequate .htaccess rules such as this:

RewriteCond %{HTTP_HOST} ^example.com$
RewriteRule ^(.*)\/[0-9]{1,6}-(.*)$ "https\:\/\/example\.com\/$1\/$2" [L,R=301]

# Based on @chriswagner0815's reply on October 4, 2017

This will correctly handle rewriting URLs that contain an ID after the last slash, but will choke on IDs appearing earlier in the URL, such as categories that are not assigned to menu items. For sites with a lot of legacy links pointing at them, this could generate a significant load on the server that could be eliminated if the modern router could just accept the correctly-formed legacy links.

 
Thanks again to everyone who has put in so much work on this!

@Ruud68

This comment has been minimized.

Copy link
Contributor

Ruud68 commented Oct 3, 2018

Hi, I once wrote a routine to handle this via the build in com_redirect plugin (#14848 (comment))

Currently working on a site with 60K of articles that need to drop the ID from the url, so now I am looking if com_redirect is a viable option (performance wise). It would have to import 60K of redirect rules :s Not sure what this will do on site performance.

Doing it with the redirect rules is only a temp solution because when all the search engines have visited they get the 301 and update their indexes. The rules then only come into affect when an old link (via e.g. facebook / twitter / email/ etc.) is followed. So the performance hit should not be that big.

@joomla joomla deleted a comment Oct 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.