Home
Schema
is a module for the Play! framework (1.2.X) which helps you to create full semantic stack web apps. The model
and view
layers of your
Play! app are picked from a series of template classes following the schema.org structure. Schema
helps you to provide
next generation content for search engines and to program your application in the scope of the Web.
Release date version: September 2012
Version: 1.0
Author: Samuel Croset
Documentation: Wiki
Discussion: Play! group with [schema-1.0]
in subject line.
Website | Bug reports | Feedback | Demo | Tutorial
The way we search things on the Web is about to dramatically change in the coming years. We are entering in an era were the information published will be much more structured than before. To understand it better, let's look at the current semantic stacks.
Nowadays, when you build a web application (for instance with Play! or any MVC framework), you define first a solid structure for your model
layer.
The model will help you to interact, query and extend your application. The content of the model is manipulated by a controller
and then rendered in a
view
. The view is an HTML page template that will be used to publish information on the Web. In this context, you have strict semantics (the model
) only
within the scope of your application.
The view
is really important, because it is the direct exposition of your data into the WWW jungle. The view
will be parsed and indexed by search engine
companies like Google or Yahoo for example. Then normal users (your parents, your friends) will use these very search engines to look for relevant
information about whatever they are interested in. Search engines will try to give the most relevant answers,
but they still fail hard on trivial tasks.
This problem comes mostly from the fact that most of the information on the Web is in a free-text form, meaning it's just plain text. And raw text is
super ambiguous. For instance a string of character such as Taj Mahal
can be used to refer to a place, a restaurant or a casino. How can Google
tell which one you want based on the query Taj Mahal
? Pretty much just by looking at the popularity and trust of web pages containing
the queried string (PageRank). The usage of plain text affects also smarter usage of your data. For instance, it is almost impossible to convert this email
you received containing your dentist appointment into an entry in your calendar in just one click, as the natural language is often too complex to be correctly
parsed.
When you publish a web page (with Play!), you totally destroy the time and efforts you have put in coming up into a nice structure in the first
place (your model
). Your web page are pretty much a soup of HTLM elements for the display with little meaning for search engines. Even if you had a class Restaurant
in your
model layer, it becomes completely invisible for the robot crawling the web page that you are actually referring to the concept of Restaurant
. When you
render this Restaurant
object on a page, it will for instance have a <h1>
tag with the name of the place and a <p>
containing the data about it (for instance the menu), but there is no <restaurant>
tag to explicitly
tell search engines that this page is about Restaurant
, and that they could use this object as such to help users searching
information about Restaurants
.
Everyone would benefit from a clearer and apparent structure: Search engines as they could use better your data, users as they could find better your information and you because you could drive more traffic and get your business easier noticed and integrated.
Here by full stack semantics I mean the following idea: A common data structure that is shared at the scope of the Web, that you will re-use to implement
your model
and view
layers (and CSS and URL patterns). The shared structure is fairly simple to implement and allows an improved interoperability between
web developers, search engines and Web users.
The schema
module assists you to build such a framework and embed your application in the scope of the Web.
Our good old Web of ambiguity is turning into a Web of Objects, thanks to initiatives such as the Google Knowledge Graph or the Facebook Graph. Go on Google and type "things to do in paris" or "taj mahal". You see that real-life objects are appearing, such as pictures of monuments, etc...
In order to improve the publication of objects on the Web, the major search engine companies sat down together and came up with a series of
universal classes, schema.org. These classes are describing the common things one encounters on the Internet, as
well as the relationships they have between one another. The schema.org classes are not Java classes, it's just a taxonomy of common concepts
you can use to annotate web pages. These classes are powerful, because they provide provide a common interface between all the players on
the Web: Users, developers and search engine providers. The schema
module will help you to implement these classes using Play! and to
provide search engines with objects that they are likely going to re-use.
OK, so long for the introduction, I invite you to read the short but comprehensive documentation provided by schema.org before moving
to the example application, were I describe how to work with the schema
module.