-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4] Modern routing, nonSEF & SEF urls alias manipulation #32880
Comments
Just a question: Is there a reason that the alias is included in id value? Does the router code need it somewhere? Generally speaking: I personally am happy that the alias is not checked and that it doesn't matter if there is one or not or the wrong one. (I also understand all the discussions about SEO problems but know how to avoid them.) |
This comment was marked as abuse.
This comment was marked as abuse.
Is there any proof that this will happen? Because I actually think Google is smart enough to not do that. After all, it's a single link from an external site and all internal links to the same content have a different URL. For Google that is a mild form of duplicate content and they pretty sure give priority to the same-site links when it comes to which URL is correct. |
This comment was marked as abuse.
This comment was marked as abuse.
I just checked with some major news-site here in Switzerland (20min.ch) and Germany (bild.de and spiegel.de). You can manipulate their URL of any article as well, but they redirect (301) to the correct page then. Which is what I would expect as a user and site owner. As a site owner, I don't like presenting 404 to my users, even if they mispelled the URL. So imho the best solution would be to check the incoming URL and if it's not correct, automatically redirect to the correct one. |
This comment was marked as abuse.
This comment was marked as abuse.
Technically yes. But most sites actually like it when users find the correct page even if they misspelled the URL. So a redirect 301 to the correct URL is also an absolutely correct response. I agree that the current state is not a desired behavior. However I think it's still better than showing a 404. And yes I think the other PR is not a correcct solution. |
This comment was marked as abuse.
This comment was marked as abuse.
@Bakual here you go: dm me if you need the domainname so you can see what google is actually showing in the index #32490 (comment) |
This comment was marked as abuse.
This comment was marked as abuse.
No, this has not been changed in 4.0. When the URL is non-SEF, the router generally doesn't really change anything in it. |
This comment was marked as abuse.
This comment was marked as abuse.
Are you planning to just check this during the parseing? Because I would be hesitant to get the aliases each time from the database for building the URLs. In worst case you add a few hundred queries for a single page by that. |
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
Exactly! that is also why I said it is a show stopper and pinged @wilsonge on this. |
this is only for parsing, not for building. So that would involve only 1 (not additional but) changed query: you are already querying for the id that will be extended to query both the id and the alias |
That is what I described above. If you have a page with a hundred URLs, you get at least a 100 queries additionally to check the alias, which is why I'm asking to not do this in build, but in parse.
We are not validating the ID during parseing of the URL. That is something that the component has to do later on. So it would be an additional query. But one additional query shouldn't really worry us. I'm just trying to bring up all the things that we have to keep in mind. Generally: I, and none of the production team, are your enemies. Quite the opposite. We are very gratefull for your work. To me, you are coming over as if you think you had to fight against us. We have common goals here and we are all trying our best to reach these goals. We currently only differ on certain rules which we put up. |
I totally disagree with the production team - this issue is a major flaw. No one these days is go to type in an url... everyone clicks on a link - a link that can be manipulated and will show up in google search results - google will eventually change the link if its a 301 but google will surely delete the false link if its a 404. |
Agree, it is only a matter of time (but maybe that is already happening) before google categorizes / labels sites that in their eyes server p*rn links, so when you are running a legit business you will not show anymore on page 1 when somebody searches for your household equipment because you are categorized as running a completely different business. |
We don't disagree that this is a major flaw. However we disagree that this can be fixed in a backwards compatible way. Fixing this in Joomla 3 will break thousands of websites and thus we can't fix this in the 3.x major version. Instead, we already have fixed it in 4.0. |
Just FYI: I've been trying to fix this in Joomla 1.6 already and then pushed for the last big changes to the routing which at least partially fixed this for 6 years and yet another year to fix this in 4.0. |
I cannot discuss on the b/c topic - which is very important surely - but this is issue is a problem which can badly affect the status of joomla as a reliable cms - even if the system was not hacked - a hacker can make it look like it was - at least on the url - some people wont notice the technical difference between an url and the content - they think the url is coming from the site. so i can only urge to fix this for 99% of the joomla sites (at the moment) |
You mean that something that has been like this for 15 years needs to be fixed now, definitely breaking thousands of websites and requiring development work from them? At the same time breaking our semVer promises we made? I can guarantee you, that if we change this now, the Joomla project would loose half of its userbase. Not everyone would even be directly affected by this, but the break in trust would be devastating. I can guarantee you, that no one in charge in the production part of the project will support this change in the 3.x branch of Joomla, especially since you can partially fix this by using modern routing without IDs and additional fixes have already also been deployed to Joomla 4.0. We had such a change in another area in 2014, where someone thought it would be necessary to change the hashing of passwords and we still get people complaining about how unreliable we are because of that release 7 years ago. |
@Hackwar The same logic applies to security issues that are in core for 15+ years, you fix them when you find out about them. Ignoring (next to security by obfuscation) is no security at all. We (the Joomla community) trust that these matters will be dealt with when they arise). |
The stats are useless. If you set the "send once" option then you only know about the first version installed. |
You can always add a ? or a & to the url and add what ever text you want. would also work as
nothing we can do against this. even for Additionally to this, google tries to remove the complete url for years and with a market share of a monopolist it wouldn't take long anymore just my 2 cents |
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
looking forward to the PR for this, maybe one of the maintainers can share it here. The fake URL in my blog informing my customers about this issue is still resolving okay though, so 'I have tested this PR unsuccessful' #lol |
This comment was marked as abuse.
This comment was marked as abuse.
This one still works |
Just out of curiosity, what is the real disadvantage of such URLs? So imho it's more that it scares the owner of the site when he looks at the Analytics, but it doesn't affect customers. Or do I miss something? |
It definitely impacts users as it where the customers who brought this to the attention of the site owner. It's a (business) vulnerability. Just like a security issue, there is no issue until you get hacked... Or in this case your business gets linked by your customers to for example anti-semitism, or other nasty stuff |
Still wondering how the site visitors where impacted. How did they get to see those fake URLs? I'm not saying this shouldn't be fixed, don't get me wrong. I'm just wondering what the severety is. |
According to SEO experts Google takes into consideration working URLs (non 404) from other websites (backlinks) for the ranking of a website. So if keywords in the URL does not appear in the content this could lead to downgrading - especially if its in a highly contested segment. |
SEO and Expert are two words that should never be used in the same sentence |
@Ruud68 Without knowing what search words you used, that doesn't mean much. Did you search for an URL or for a keyword? |
I think that Ruud fixed this for his client some time ago (with his own patch) - the website shows 404 or 410 now, so google removed the url - imho it doesn't matter how to find this - it was in the search results - so 1) people could see it - may be searching for porn - and 2) may be google was downgrading the original website because of the fake url |
I don't know what they type in Google, they will not tell me. |
And as I've said before, we are happy to fix this, but not in Joomla 3. It has been like this for 16 years now and it is a rather well known issue. It is not something that we can properly fix in a backwards compatible way and thus we will not fix it in Joomla 3. You are welcome to provide a PR for Joomla 4 to fix this. |
This comment was marked as abuse.
This comment was marked as abuse.
Ok, let me rephrase: Fixing this in a backwards compatible way would require yet another option in the GUI and I would consider that as a new feature. I'm very much against adding yet more options unless absolutely necessary. In addition, new features can only be added in a minor release and that could only be Joomla 3.10. We decided quite some time ago (and communicated that as well) that Joomla 3.10 will only be a compatibility release to ease the migration to Joomla 4 and will not contain any additional new features. Thus this will not be fixed in Joomla 3. I would really prefer if instead of arguing about this here, we could concentrate on Joomla 4, fix this there and finally get this release out the door. |
This comment was marked as abuse.
This comment was marked as abuse.
Teaches me to ever volunteer to execute a decision by the PLT. |
What about writing the option directly into configuration.php - not needing a gui option? for me it is still fixing a bug and not a new feature. |
Wouldn't we all. Now if you only replied to comments specifically addressed to you we might make some progress. |
Steps to reproduce the issue
forked from #32879 about #32490
There is one more case I wanted to address but ran out of time and that is with the new MODERN routing its possible to generate a url like:
https://example.com/?view=article&id=3:my-article&catid=9
where
my-article
is the alias of the My Article (id:3) article. This can also be manipulated like:https://example.com/?view=article&id=3:HAHA&catid=9
and that url will still work correctly. This still needs addressing.
Expected result
https://example.com/?view=article&id=3:HAHA&catid=9 is a 404 as HAHA has no part to play here
Or could just remove the alias part from the id...
https://example.com/?view=article&id=3&catid=9
Actual result
https://example.com/?view=article&id=3:HAHA&catid=9 is a valid url that loads the article with id 3 and HAHA is not checked to see if its the same as the article alias (its not)
Additionally with SEF on and "Remove IDs from URLs" turned of
Checking Joomla 4, with "Remove IDs from URLs" turned off, then the same bug exists with SEF urls that #32887 aimed to fix in Joomla 3 whereas:
A url of http://127.0.0.1:4444/bottom-most/1-my-article is generated, and can be manipulated like http://127.0.0.1:4444/bottom-most/1-HAHAHAHAHAHAH without a redirect/404 being generated :-(
Additional comments
The text was updated successfully, but these errors were encountered: