Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.1] [RFC] Use schema.org for metadata #25117

Closed
wants to merge 14 commits into from

Conversation

wilsonge
Copy link
Contributor

@wilsonge wilsonge commented Jun 4, 2019

Why Schema.org

  • Used by bing, google, yahoo and yandex
  • We've been using it for a long time now

Why JSON Over inline

Why the Spatie\SchemaOrg library

  • I like the fluid design and I think it's going to allow easy triggering of plugin events to allow easy custom enhancements
  • I don't think that people are template overriding these things much - the main issue is context - which templates don't have either. Generally plugins are better suited to provide context about an individual website

Why remove the microdata library?

  • We aren't updating the types
  • We aren't using it in core (too much PHP - we just hardcoded things)

Things that need to be done

  • Base Web page entity
  • Fix language code plugin inLanguage
  • Remove all the remaining itemProps and move to schema.org definitions where required
  • Plugin events
  • Publisher (either Person or organisation - see the section "Structured data in Yoast SEO" https://yoast.com/yoast-seo-11-0/ for an example)

Questions

  • Should be keep this as a separate entity in JDocument to allow even easier customisation?

@wilsonge wilsonge changed the title Use schema.org for metadata [4.0] [RFC] Use schema.org for metadata Jun 4, 2019
@brianteeman
Copy link
Contributor

Perhaps stating the obvious but surely we cant remove a library without it first being marked as deprecated? Just because we dont use it in core doesnt mean its not being used.

})
);
})
// TODO: Should we expose the raw email like this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do in the vcard download

}
}

// TODO: This requires a publisher to pass google structured data checker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it needs an image as well although the error messages are conflicting

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I saw that - but I'm not sure the best way of achieving this as its mandated for the image to be above 700px width - which clearly may not be the case in an article :/

@brianteeman
Copy link
Contributor

For the publisher part I would add two new fields to Global Config in the Metadata section - publisher name and publisher logo and then in the code you could also have a fallback of ->name($app->get('sitename')) if the name is empty


// TODO: This requires a publisher to pass google structured data checker
$schema = Schema::article()
->articleBody($this->item->text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicating the entire article text into JSON?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. that's apparently how it should be

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about that @wilsonge ? I am looking at some pages from the site of another cms that dont

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If true, I stand by my deleted comment - this is probably unsuitable for content. How does one insert this into article and tie it up to specific pieces of content? With Microdata it's simple by using editor button that inserts the tags around selected content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://jsonld.com/article/

You can paste your entire post in here, and yes it can get really really long

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks counter-productive to me. Would not want to bloat up the page like this.

How about the other question? Microdata is not just for automatically generated content. It can be used inside actual content, e.g. inside articles.

@ReLater
Copy link
Contributor

ReLater commented Jun 4, 2019

I think this would be a big step forward:

  • as long customization of the ld+JSON output will be fully possible by custom plugins.
    e.g. use case: Use a global fall back image if no image found.
    Use another publisher than found.

  • as long ld+JSON output can be deactivated globally or per component(?)
    e.g. use case: Custom plugins that use their own "logic" when, how, what and where to show ld+JSON. And are providing their data since years for search engines.
    And are providing more informations for other systems than search engines.

Just a suggestion to reduce page output: Output only if search bots are indexing the page. Simple
if $app->client->robot
should be sufficient.

@wilsonge
Copy link
Contributor Author

wilsonge commented Jun 4, 2019

Perhaps stating the obvious but surely we cant remove a library without it first being marked as deprecated? Just because we dont use it in core doesnt mean its not being used.

Yup - i'll do a PR to deprecate in 3.10 if we do. In this case given it's very outdated I think it's justified

@brianteeman
Copy link
Contributor

Perhaps stating the obvious but surely we cant remove a library without it first being marked as deprecated? Just because we dont use it in core doesnt mean its not being used.

Yup - i'll do a PR to deprecate in 3.10 if we do. In this case given it's very outdated I think it's justified

There has already been a conversation elsewhere on the tracker that deprecating in 3.10 is not acceptable because 3.10 and 4.0 are released on the same day so there is no notice

@ReLater
Copy link
Contributor

ReLater commented Jun 4, 2019

Duplicating the entire article text into JSON?

In the past articleBody (= entire article text; but normally with strip_tags, removed newlines and so on) inside ld+JSON was required by Google but then they removed the required and it's optional now. On the other hand: If a page markup is not perfect and it's not easy to identify the main content articleBody can be helpful. Also for other readers than search engines. But, yes, it can become a overhead.

See my comment concerning customization possibilities via custom plugins.

@mbabker
Copy link
Contributor

mbabker commented Jun 4, 2019

There has already been a conversation elsewhere on the tracker that deprecating in 3.10 is not acceptable because 3.10 and 4.0 are released on the same day so there is no notice

Short notice deprecations without strong reasoning (i.e. security issues). Something that's outdated, while not ideal, doesn't necessarily mean that's a good candidate for a short notice deprecation. Yes, I'm fully aware there is no documented requirement for number of versions an API element must be deprecated before it can be removed, but this is about ecosystem relations. I get the intentions here and with the PR's from Hannes that I completely tore apart, and I get the argument a lot of people are going to throw around about support overlap and "oh you have 2 years to get your code updated", but I don't think it's very ecosystem friendly to introduce new deprecations in 3.10 for things to be completely removed in 4.0 (and let's be real here, a lot of the ecosystem doesn't keep up with this repo so when the stable release comes around, that will be their first alert about the deprecation).

@brianteeman
Copy link
Contributor

Thinking about this PR over lunch and realised there is one major issue between this approach and the previous one. Before you could customise the metadata in your template override as it was all in the tmpl/ but now its all in the /View so you can't (at least I dont know how to)

@mbabker
Copy link
Contributor

mbabker commented Jun 4, 2019

Before you could customise the metadata in your template override as it was all in the tmpl/ but now its all in the /View so you can't (at least I dont know how to)

It needs a plugin event after the Schema object is finished being built and before it gets added to the document. Otherwise trying to manipulate the JSON string attached to the document in something like onAfterDispatch defeats the point of using the underlying library if anyone trying to further customize the schema doesn't even have access to it.

@wilsonge
Copy link
Contributor Author

wilsonge commented Jun 4, 2019

It needs a plugin event after the Schema object is finished being built and before it gets added to the document. Otherwise trying to manipulate the JSON string attached to the document in something like onAfterDispatch defeats the point of using the underlying library if anyone trying to further customize the schema doesn't even have access to it.

Which is why i have Plugin events listed in my todo section :) that's exactly the intention. I also explained why plugin events over template overrides in the Why the Spatie\SchemaOrg library section :)

@richard67
Copy link
Member

@wilsonge

I don't think that people are template overriding these things much

That's not true, I had to override it all the times, e.g. to change the item type from article to blog posing for articles which appear in a category blog, or to add the missing "name" property, which Google validator claims to be mandatory.

@joeforjoomla
Copy link
Contributor

joeforjoomla commented Jun 4, 2019

Although the json-ld VS Microdata is a bit more modern approach to deal with Schema.org, probably from a user perspective with average skills it's much more easy to override them now than having to write a plugin and manage Plugin Events.
So definitely there are pros and cons... maybe more cons?
Maybe a possible solution would be a parameter to switch between Microdata and Json-LD? Or something to have Json-LD easily overridable?

@mbabker
Copy link
Contributor

mbabker commented Jun 4, 2019

from a user perspective with average skills it's much more easy to override them now than having to write a plugin and manage Plugin Events

The admin UI, and the template files as a result, are already overly convoluted. The "right" way to do it and be the most user friendly practical would be adding all this stuff on metadata tabs on all the content types (articles, contacts, categories, etc.). Otherwise you're putting too many conditionals in a default.php template file to decide if something is an Article or BlogPost or some other "root" type. Even supporting this PHP library's API, to have options for the key bits that need flexibility is adding a lot of fields to the admin UI.

A plugin has the benefit of having access to all of the data needed (well, let's be real here, thanks to the lack of separation of concerns in the Joomla API a template file does too). It has the ability to programmatically make decisions that cannot be configured in the existing com_content UI (or any other core component). There shouldn't be a fear of using plugins for altering the system behavior, that's exactly how an event driven system should work, except Joomla really isn't an event driven system (the events that exist are really crippled IMO and mostly restricted to MVC contexts, very few of the library APIs have event support and that's a major issue). The only "problem" with them is they require you to be good at Joomla's version of PHP, being able to do everything in template files requires you to mostly understand HTML and to a lesser degree have an understanding of Joomla's version of PHP to decipher the over-engineered template file you're trying to edit because there are way too many PHP statements within them.

@simbus82
Copy link
Contributor

simbus82 commented Jun 4, 2019

Although the json-ld VS Microdata is a bit more modern approach to deal with Schema.org, probably from a user perspective with average skills it's much more easy to override them now than having to write a plugin and manage Plugin Events.
So definitely there are pros and cons... maybe more cons?
Maybe a possible solution would be a parameter to switch between Microdata and Json-LD? Or something to have Json-LD easily overridable?

I hope there is no doubt about the use of only Json-LD. But Microdata mixed with HTML, no thanks!
It would be nice to have the opportunity to decide where and how to activate structured data. For example, then will I be able to do a FAQPage? Or will the superimposed "article" microdata prevent me from adding the structured data I want?
At the moment I do well on J3 with the excellent Tassos plugin (Google Structured Data).
The JCE editor also allows the use of structured data, but uses the logic of microdata mixed with html, which is a horrible and disused thing.
But actually I feel the lack of something at the core level ... well done and configurable.
This PR, however, is a turning point !!! Thank you guys!

@joeforjoomla
Copy link
Contributor

At the moment I do well on J3 with the excellent Tassos plugin (Google Structured Data).

So definitely would not be better to clean up the HTML code from all mixed Microdata and leave this feature to third-party extensions that fully cover all types of schema.org and configurations?

@simbus82
Copy link
Contributor

simbus82 commented Jun 4, 2019

At the moment I do well on J3 with the excellent Tassos plugin (Google Structured Data).

So definitely would not be better to clean up the HTML code from all mixed Microdata and leave this feature to third-party extensions that fully cover all types of schema.org and configurations?

I cannot comment on this. If I could decide I would add these and many other things to the core. But in this way the developer community would die and the main team should follow every evolution working incessantly only to act on the news ... for example just a few weeks ago Google accepted in the SERP the FAQPage, who should work on the structured Joomla data in that case? The main team or third-party developer?
I really can't answer!

@ghost ghost added RFC Request for Comment and removed Request for Comment labels Jun 5, 2019
@AndySDH
Copy link
Contributor

AndySDH commented Feb 17, 2021

Was this abandoned? I see J4 is still using microdata instead of JSON-LD

@wilsonge
Copy link
Contributor Author

wilsonge commented Feb 17, 2021

I never got time to do anymore work on it. And likely won't before 4.0 ships - just too many things to work on and my personal life means I have limited time for Joomla right now (I'm buying my first house which needs quite a lot of builders arranged). So yes it's not going to hit 4.0 for sure.

@AndySDH
Copy link
Contributor

AndySDH commented Feb 17, 2021

Congrats on the first house @wilsonge!

I guess in the meanwhile we'll have to keep using this :) https://www.tassos.gr/joomla-extensions/google-structured-data-markup/

@rdeutz rdeutz changed the title [4.0] [RFC] Use schema.org for metadata [4.1] [RFC] Use schema.org for metadata Mar 15, 2021
@rdeutz rdeutz changed the base branch from 4.0-dev to 4.1-dev March 15, 2021 10:49
@hans2103
Copy link
Contributor

@wilsonge J4.1 ?
I'dd love to help on this implementation. Looks great!


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/25117.

* @package Joomla.Administrator
* @subpackage com_config
*
* @copyright Copyright (C) 2005 - 2019 Open Source Matters, Inc. All rights reserved.

This comment was marked as abuse.


/**
* Prepares the document.
*

This comment was marked as abuse.

* Prepares the document.
*
* @param CMSApplicationInterface $app The application object
*

This comment was marked as abuse.

/**
* Joomla! Content Management System
*
* @copyright Copyright (C) 2005 - 2019 Open Source Matters, Inc. All rights reserved.

This comment was marked as abuse.

/**
* Event class for representing the application's `onBeforeExecute` event
*
* @since 4.0.0

This comment was marked as abuse.

*
* @return Schema
*
* @since 4.0.0

This comment was marked as abuse.

*
* @return mixed
*
* @since 4.0.0

This comment was marked as abuse.

*
* @return mixed
*
* @since 4.0.0

This comment was marked as abuse.

*
* @return Schema
*
* @since 4.0.0

This comment was marked as abuse.

@bembelimen bembelimen marked this pull request as ready for review April 30, 2022 23:18
@bembelimen bembelimen marked this pull request as draft April 30, 2022 23:19
@wilsonge wilsonge closed this Jun 20, 2022
@wilsonge
Copy link
Contributor Author

I don't have the time right now to keep this going. It's a huge change because it needs extra site metadata (per my comments in the main PR description) we don't currently have and without it thing can get worse from an SEO perspective than better - which is obviously a major upgrade issue. @HLeithner / @bembelimen might be one to pick up as a gsoc project for v5.

@bembelimen
Copy link
Contributor

@wilsonge
Copy link
Contributor Author

Always ahead of me :) Obviously feel free to use this as a starting point

@bembelimen
Copy link
Contributor

We already do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet