Category canonical & breadcrumbs are incorrect - Generate from found category_id and not uri path #3179

Closed
dhaupin opened this Issue Jul 13, 2015 · 12 comments

Projects

None yet

5 participants

@dhaupin
dhaupin commented Jul 13, 2015

This doesn't generate a real canonical, it simply mirrors the url, which can change infinitely resulting in a semantic mess:

$this->document->addLink($this->url->link('product/category', 'path=' . $this->request->get['path']), 'canonical');

Instead it should be:

$this->document->addLink($this->url->link('product/category', 'path=' . $category_info['category_id'], 'canonical');

And of course, breadcrumbs are the same thing -- this is one of the oldest bugs in OC. If they rely on path then they are inaccurate, especially considering rich snippets in SERPs. Its part of the site structure, which should not be dynamic based on any of the parameters are present...should be strict, and always present, based on a found canonical that uses explicit $category_info['category_id'] instead:

A modified example of some random extension market loop:

if (count($parts) <= 1){
    $current_category = $this->model_catalog_category->getCategory((string)$this->request->get['path']);

    if (isset($current_category['parent_id']) > 0) {
        $parts[] = $current_category['parent_id'];

        $parent = $current_category['parent_id'];
        while (($current_category['store_id'] === $this->config->get('config_store_id')) && ($parent != 0)){
            $parent = $this->model_catalog_category->getCategory($parent);
            $parent = $parent['parent_id'];
            $parts[] = $parent;
        }
        $parts = array_reverse($parts);
    }
}

Just seems like something that should be in core instead of constantly relying on mods to do the dirty work for ya

@dhaupin
dhaupin commented Jul 13, 2015

To clarify, here is an example category "Mac": http://demo.opencart.com/index.php?route=product/category&path=20_27

Now lets assume you delete category ID 20 and put "Mac" in a new parent category ID, like 100. Google still comes to crawl and therefore uses the same old url for parent ID 20 which it does allllll the time (sometimes for years). The output of the old category would still show &path=20_27 as being the canonical, with the same content from "mac" present. The new entries from sitemap would show the canonical as &path=100_27 with the same content as "Mac", resulting in double canonicals and duplicate content until Google forgets about the old url (which will be never since it still reports &path=20_27 with content as being a valid canonical).

This works exactly the same with SEO urls. The difference being parent ID 20 uses a keyword string instead of an ID number.

@JeffroDH

Oh, my goodness. Thank you. This has been on my list for ages and I haven't taken the time to track it down.

Thank you, thank you, thank you.

@danielkerr danielkerr closed this Aug 18, 2015
@danielkerr danielkerr added a commit that referenced this issue Aug 18, 2015
@danielkerr danielkerr #3179 d43373e
@dhaupin
dhaupin commented Aug 18, 2015

Dude that is awesome, thank you!

@ADDCreative
Contributor

The canonical, next and prev links are still not quite right.

  1. None of the canonical, next or prev links should have the 'SSL' flag. Category pages don't use HTTPS.
  2. The canonical link should be on every page, not just page 1. However, other pages should not point to the first page. So the canonical link on page 2 or greater should be $this->document->addLink($this->url->link('product/category', 'path=' . $category_info['category_id'] . '&page=' . $page), 'canonical');. However the issue below should also be taken in to account if adding this.
  3. The fact that the products per page can be different due to the &limit URL parameter can cause the links to point to pages that will be a soft 404 page. For example, if the store admin has set the products per page to 30, and there are 30 products in a category. If a search engine crawls a category URL with the &limit=25 parameter, the next link will point to, as an example route=product/category&path=20&page=2. This page would have no products on and not require indexing.

The last issue could be a bit of a pain to get right and may be so rare that it's not worth the bother.

@garudacrafts

1 - https everywhere is a possible design choice (a good one).
3 - that's a store misconfiguration problem; if products per page are set to 30, then the store admin should update limits accordingly. Also, &limit= and other query params can be blocked from crawling using robots.txt

@ADDCreative
Contributor

@garudacrafts

1 - Yes, HTTPS everywhere is a good choice, but this should be done by setting the HTTP_SERVER to a https:// address in the config.php. Not by randomly setting the 'SSL' flag on some links. For example the product page is not using the 'SSL' flag https://github.com/opencart/opencart/blob/master/upload/catalog/controller/product/product.php#L220.

3 - It's not a store misconfiguration problem and the store admin can't change the limits as they are hard coded. Also blocking in the robots.txt file does not help if the URL is already indexed.

@dhaupin
dhaupin commented Aug 19, 2015

Agree, +1 for https everywhere. SSL in that link gen is fine. Should be that way everywhere, if you have an SSL. No point in running SSL on just checkout/account. I would gladly donate the proxy triggers script we use to get it working right.

And here is the pagination example we made (differs from the version in master). Obeys extra parameters like custom limits and sort, but it aint too pretty :) EDIT: Fixed canonical to be current page as per @ADDCreative suggestion

$page_trig = $product_total - $limit;
$page_last = ceil($product_total / $limit);

if (($page == 1) && ($page_trig < 1)) {
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id), 'canonical');

} elseif (($page == 1) && ($page_trig > 0)) {
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id), 'canonical');
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id . '&page=' . ($page + 1) . $url), 'next');

} elseif ($page == $page_last) {
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id . '&page=' . $page), 'canonical');
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id . '&page=' . ($page - 1) . $url), 'prev');

} elseif ($this->request->get['page'] > $page_last) {
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id . '&page=' . $page), 'canonical');
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id . '&page=' . $page_last . $url), 'prev');

} elseif (($page > 1) && ($page < $page_last)) {
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id . '&page=' . $page), 'canonical');
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id . '&page=' . ($page - 1) . $url), 'prev');
    $this->document->addLink($this->url->link('product/category', 'path=' . $category_id . '&page=' . ($page + 1) . $url), 'next');

}

Next, in order to highlight this visually, you need to do 2 other changes. First, change the title of paginated pages to something that indicates page number you are on, or in reality, to indicate in serps that youre about to land on page 2+. Yes there are still instances where Google will prefer to display page 2+ in serps instead of canonical. Replace the setTitle() with this:

if ($page > 1) {
    $this->document->setTitle($category_info['name'] . ' - Page ' . $page);
} else {
    $this->document->setTitle($category_info['name']);
}

Next, we need to fix the way OC shows page 1 of 10 or whatever in order to trigger the category entity count in serp results...it needs to be in this syntax "Showing 1-10 of 50 Results (5 pages)". All it takes is a slight change in the wording of catalog/language/english/english.php:

$_['text_pagination'] = 'Showing {start}-{end} of {total} Results ({pages} Pages)';

yeahhhha

@dhaupin
dhaupin commented Aug 19, 2015

@ADDCreative that change in config doesnt work to make proper https. You also need to change the url.php library to force all auto gen links https, add proxy/balancer merge router (for $_SERVER) in request.php library, fix the file manager to use relatives, make some sorta of bully rewrite for legacy data, add HSTS, and finally, for those who run really secure, add CSP header options. Its all not that hard, but it's more than just a simple config.php change....else you have broken locks, blocked assets, and possible redirect looping insanity (as in the case with Cloudflare)

@ADDCreative
Contributor

@dhaupin That example still falls into the trap of pointing the canonical to the first page for every page. It the first on the list of common canonical mistakes here http://googlewebmastercentral.blogspot.co.uk/2013/04/5-common-mistakes-with-relcanonical.html.

Thanks for the heads up on the other changes needed for proper HTTPS everywhere.

@dhaupin
dhaupin commented Aug 19, 2015

@ADDCreative Oh my bad man, I see what you're saying now. That script previously had a "view all" canonical which we removed, but apparently, never fixed back correct. Should have taken a min more to make sure the example paste was solid. Ill edit it up to include that when i get a sec.

Interesting note though, google def still indexes other pages in the case of the canonical pointing to only raw cat. They claim they dont... they def do ;)

@garudacrafts

@ADDCreative Yes, the limits are hard-coded but should be changed to accommodate the desired admin setting. Here's how I do it:


        $this->data['limits'] = array();

        $config_catalog_limit = $this->config->get('config_catalog_limit');

        $limits = array($config_catalog_limit / 2, $config_catalog_limit, $config_catalog_limit * 2, $config_catalog_limit * 4);

        foreach ($limits as $value) {
            $this->data['limits'][] = array(
                'text'  => $value,
                'value' => $value,
                'href'  => $this->url->link('product/allproducts', $url . '&limit=' . $value, 'SSL')
            );
        }

Let's be honest - setting up an OpenCart site requires some developer assistance no matter what.

@garudacrafts

@dhaupin To implement https everywhere, I set the protocol on both HTTP_SERVER and HTTPS_SERVER in config to "https://" and then I use .htaccess to check and force rewrite all non-SSL requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment