Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About SEO report and h1 tags #55

Open
ysard opened this issue Feb 2, 2022 · 3 comments
Open

About SEO report and h1 tags #55

ysard opened this issue Feb 2, 2022 · 3 comments

Comments

@ysard
Copy link
Contributor

ysard commented Feb 2, 2022

Problem

I see in the following line that the content of the article is used to count the
h1 tags:

self.content_title_analysis = ContentTitleAnalyzer(content=self._content)

You suggest (like everyone) the following Markdown structure in your README:

Title: Page Title
Description: Page Description

# Heading Content

Nevertheless, most (all?) templates already encapsulate the "Title" metadata in an h1 tag.

This processing is independent from the rest of the article written in Markdown indeed contained in the content attribute of the objects. It is inserted as is in the html template.

Therefore such an example poses 2 problems:
- duplication of the h1 tag (that of the template + that of the article content)
- duplication not detected by the current plugin

Currently, as far as I know, the only simple way to get a compliant html page is to write articles starting the heading level at h2 via ##
although it is semantically wrong in Markdown and can disturb some plugins (table of contents rendering, etc.).

I personally use an homemade plugin to modify the final html without modifying the original Markdown.

In this case, the SEO report plugin misleads the user by not detecting any h1 title.

I would like to mention that your plugin is very welcome because it allowed me to highlight a problem that I had totally missed.

Proposal

The plugin should be refactored to read finalized pages, like the SEOEnhancer part
called after the content_written signal.

Notes

@MaevaBrunelles
Copy link
Collaborator

MaevaBrunelles commented Feb 3, 2022

Hi @ysard and thanks for your report.

From a SEO point of view, template creators should never encapsulate title in a h1 tag. Website creators should be able to customize page title and content title as they want, and thus respect the Markdown semantic (# ## ###...). So I think this is an issue that should be addressed to template creators.

The plugin should be refactored to read finalized pages, like the SEOEnhancer part
called after the content_written signal.

I'm not sure to understand what would it change. In the case you pointed, the plugin works on HTML elements (content), as I use BeautifulSoup to analyze the h1 tag:

self._soup = BeautifulSoup(content, features="html.parser")

@ysard
Copy link
Contributor Author

ysard commented Feb 4, 2022

To clarify the discussion, let's take the example of a template commonly accepted as flawless:
https://github.com/Pelican-Elegant/elegant/blob/master/templates/article.html

On the subject of interest, here is the usage of the title metadata of a pelican article:

The content of the article stricto-sensu is displayed below:
https://github.com/Pelican-Elegant/elegant/blob/2a689472e42e159b3de2ff5b7bfd5434a9cc63d4/templates/article.html#L62

Two options from here:

  • either you choose the structure (#, ##, ###, etc.) and the <h1> level title is buried in the body/content of the article;
    it becomes non-customizable in the sense that we can no longer make other elements managed by the template appear between the title and the content of the article unless you insert Jinja tags in each markdown (I don't even know if it works (?)).
    Ex: gallery, author, number of comments, date, etc.

  • or you choose the structure (##, ###, etc.) and let the template manage the header part of the article.
    This is the way it is commonly applied in Pelican themes,
    and for once the common practice is acceptable because it's the only way to properly handle a header article.

    Without this, such example is to my knowledge impossible to achieve (I admit I am not an expert in Jinja template, maybe the problem is there):
    image


But I'm digressing because as you mention, this is a matter for both the template creator and the article writer.
This is indeed up to everyone's free will to do whatever they want with their tags, or to create valid code or not.

That said, in any case the SEO Report plugin only counts <h1> elements in the content attribute of an article;
I.e. in a blob of html that is a fragment of the final page.
This, in no way guarantees that the number of <h1> tags counted is correct.

This is why I point out that ContentTitleAnalyzer(content=xxx) should only work on the final result of the page;
so at a later stage than the one corresponding to the current all_generators_finalized signal which only
allows to work with the html blob of the article body.

@ysard ysard mentioned this issue Feb 4, 2022
@MaevaBrunelles
Copy link
Collaborator

Ok, got it, you're right.

Moreover, on the template you link, even without considering the structure of article content on Markdown/Rest, there is already two h1 tags: https://github.com/Pelican-Elegant/elegant/blob/2a689472e42e159b3de2ff5b7bfd5434a9cc63d4/templates/article.html#L35 and https://github.com/Pelican-Elegant/elegant/blob/2a689472e42e159b3de2ff5b7bfd5434a9cc63d4/templates/article.html#L35.

When the plugin will be updated, the SEO report will surely be red for h1 analysis for all sites using a template but at least, the analysis will have been done correctly.

From what I quickly tested and as you suggested, the seo_report must be launch when content_written signal is triggered; which mean open each HTML files, parse it with BeautifulSoup and do the analysis. I don't know if there is a better way.

Do you want to do the modifications? If not, I will wait for the merge of your current PR before working on this, as it will surely conflict with your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants