Skip to content
Samuel Croset edited this page Sep 7, 2012 · 37 revisions

Schema is a module for the Play! framework (1.2.X) which helps you to create full semantic stack web apps. The model and view layers of your Play! app are picked from a series of template classes following the schema.org structure. Schema helps you to provide next generation content for search engines and to program your application in the scope of the Web.

Release date version: September 2012

Version: 1.0

Author: Samuel Croset

Documentation: Wiki

Discussion: Play! group with [schema-1.0] in subject line.

Website | Bug reports | Feedback | Demo | Tutorial

"full semantic stack web app"?

objects everywhere

The way we search things on the Web is about to dramatically change in the coming years. We are entering in an era were the information published will be much more structured than before. To understand it better, let's look at the current semantic stacks.

The application semantics

Nowadays, when you build a web application (for instance with Play! or any MVC framework), you define first a solid structure for your model layer. The model will help you to interact, query and extend your application. The content of the model is manipulated by a controller and then rendered in a view. The view is an HTML page template that will be used to publish information on the Web. In this context, you have strict semantics (the model) only within the scope of your application.

The web page semantics

The view is really important, because it is the direct exposition of your data into the WWW jungle. The view will be parsed and indexed by search engine companies like Google or Yahoo for example. Then normal users (your parents, your friends) will use these very search engines to look for relevant information about whatever they are interested in. Search engines will try to give the most relevant answers, but they still fail hard on trivial tasks.

This problem comes mostly from the fact that most of the information on the Web is in a free-text form, meaning it's just plain text. And raw text is super ambiguous. For instance a string of character such as Taj Mahal can be used to refer to a place, a restaurant or a casino. How can Google tell which one you want based on the query Taj Mahal? Pretty much just by looking at the popularity and trust of web pages containing the queried string (PageRank). The usage of plain text affects also smarter usage of your data. For instance, it is almost impossible to convert this email you received containing your dentist appointment into an entry in your calendar in just one click, as the natural language is often too complex to be correctly parsed.

When you publish a web page (with Play!), you totally destroy the time and efforts you have put in coming up into a nice structure in the first place (your model). Your web page are pretty much a soup of HTLM elements for the display with little meaning for search engines. Even if you had a class Restaurant in your model layer, it becomes completely invisible for the robot crawling the web page that you are actually referring to the concept of Restaurant. When you render this Restaurant object on a page, it will for instance have a <h1> tag with the name of the place and a <p> containing the data about it (for instance the menu), but there is no <restaurant> tag to explicitly tell search engines that this page is about Restaurant, and that they could use this object as such to help users searching information about Restaurants.

Everyone would benefit from a clearer and apparent structure: Search engines as they could use better your data, users as they could find better your information and you because you could drive more traffic and get your business easier noticed and integrated.

Full stack semantics

Here by full stack semantics I mean the following idea: A common data structure that is shared at the scope of the Web, that you will re-use to implement your model and view layers (and CSS and URL patterns). The shared structure is fairly simple to implement and allows an improved interoperability between web developers, search engines and Web users.

scopes and the Internet

The schema module assists you to build such a framework and embed your application in the scope of the Web.

Our good old Web of ambiguity is turning into a Web of Objects, thanks to initiatives such as the Google Knowledge Graph or the Facebook Graph. Go on Google and type "things to do in paris" or "taj mahal". You see that real-life objects are appearing, such as pictures of monuments, etc...

Schema.org

In order to improve the publication of objects on the Web, the major search engine companies sat down together and came up with a series of universal classes, schema.org. These classes are describing the common things one encounters on the Internet, as well as the relationships they have between one another. The schema.org classes are not Java classes, it's just a taxonomy of common concepts you can use to annotate web pages. These classes are powerful, because they provide provide a common interface between all the players on the Web: Users, developers and search engine providers. The schema module will help you to implement these classes using Play! and to provide search engines with objects that they are likely going to re-use.

OK, so long for the introduction, I invite you to read the short but comprehensive documentation provided by schema.org before moving to the example application, were I describe how to work with the schema module.

Start the Tutorial