Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Richcards #3287

Merged
merged 13 commits into from
May 8, 2018
Merged

Richcards #3287

merged 13 commits into from
May 8, 2018

Conversation

bpedersen2
Copy link
Contributor

@bpedersen2 bpedersen2 commented Apr 4, 2018

Instead of using inline microdata, render the metadata as json-ld in the header.

The resulting data validates in the Google data testing tool with just warnings about missing option entries (mainly: "offers")

Fixes: #3269

@bpedersen2
Copy link
Contributor Author

This pull request fixes the meta data problems for all event types and is also safe in case a custom template overrides part of the content layout.

@ThiefMaster
Copy link
Member

ThiefMaster commented Apr 4, 2018

If this is as widely supported as the previous attributes I really like it. Much less messy than having attributes spread all over for sure.

What I really dislike however is generating JSON with string operations. For example, in many places having a " in a name would break the JSON. It's better to create a Python dict and then use the |tojson filter in Jinja to convert it to JSON. since meta.html is actually rendered from Python code, you can just create the dict there, pass it to the template, and then convert it to a JSON string in the template. Search for this line to find the place where it's rendered (events/views.py):

meta = render_template('events/meta.html', event=self.event, site_name=site_name)

@@ -31,7 +31,7 @@ <h1>
<img src="{{ event.logo_url }}" alt="{{ event.title }}" border="0" class="confLogo">
</div>
{% endif %}
<span itemprop="title">{{ event.title }}</span>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> Event.query.filter(db.cast(Event.stylesheet_metadata, JSONB) != 'null', Event.stylesheet.contains('itemprop')).count()
34

All these events use CSS selectors matching this particular item to hide or realign it. Probably better to keep this span here even if it'll be redundant...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bpedersen2
Copy link
Contributor Author

Rewritten to prepare the dict first.

At least google states that the json-ld format is preferred over inline microdata.

"@type": "Place",
"name": self.event.venue_name or 'No location set',
"address": self.event.address or 'No address set'
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you like this alignment, don't you :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eclipse...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch ;) I didn't realize anyone still uses eclipse/pydev. Might want to look at PyCharm. But I guess eclipse/pydev also has a setting for the brace alignment

"performers": self._getJsonLDPerformers(self.event.person_links)
}
if self.event.has_logo:
lddata["image"] = url_for('event_images.logo_display', self.event, slug=self.event.logo_metadata['hash'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add a external_logo_url property (right below the logo_url) property to Event and use that here?

@@ -120,9 +120,43 @@ def _getHeader(self):
def getJSFiles(self):
return WPDecorated.getJSFiles(self) + self._asset_env['modules_event_display_js'].urls()

def _getJsonLDData(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_json_ld_data (we don't use camelcase in new code)

@@ -120,9 +120,43 @@ def _getHeader(self):
def getJSFiles(self):
return WPDecorated.getJSFiles(self) + self._asset_env['modules_event_display_js'].urls()

def _getJsonLDData(self):
lddata = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ld_data or data (lddata looks a bit weird)

@@ -120,9 +120,43 @@ def _getHeader(self):
def getJSFiles(self):
return WPDecorated.getJSFiles(self) + self._asset_env['modules_event_display_js'].urls()

def _getJsonLDData(self):
lddata = {
"@context": "http://schema.org",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually we use single quotes everywhere

"@type": "Event",
"url": self.event.external_url,
"name": self.event.title,
"startDate": self.event.start_dt.isoformat(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_dt (UTC) or start_dt_local (event TZ)? Not sure what the spec requires

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec wants isoformat, see http://schema.org/DateTime. But dt_local seems ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, isoformat makes sense, the comment was just about whether to use UTC or local

"endDate": self.event.end_dt.isoformat(),
"location": {
"@type": "Place",
"name": self.event.venue_name or 'No location set',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? Otherwise we could maybe just omit the location if there's no location information and/or use an empty string in case of just one of the two pieces of data being empty. IMO an empty string is more meaningful than a custom string where only a human knows that it means "not set/known"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the google tester flags this a errors, and an empty string is also not allowed.

_external=True)
return lddata

def _getJsonLDPerformers(self, chairs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no camelcase please ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this method to act on a single person, and call it like this above:

'performers': map(self._get_json_ld_performer, self.event.person_links)

"affiliation": {
"@type": "Organization",
"name": chair.affiliation
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} alignment - like above

'affiliation': {
'@type': 'Organization',
'name': chair.affiliation
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return {'@type': 'Person',
        'name': chair.display_full_name,
        'affiliation': {'@type': 'Organization',
                        'name': chair.affiliation}}

or

return {
    '@type': 'Person',
    'name': chair.display_full_name,
    'affiliation': {
        '@type': 'Organization',
        'name': chair.affiliation
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -120,9 +120,40 @@ def _getHeader(self):
def getJSFiles(self):
return WPDecorated.getJSFiles(self) + self._asset_env['modules_event_display_js'].urls()

def _get_json_lddata(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_json_ld_data

data['image'] = self.event.external_logo_url
return data

def _get_json_ldperformer(self, chair):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_json_ld_performer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bpedersen2 bpedersen2 force-pushed the richcards branch 3 times, most recently from dc4be74 to 19748bf Compare April 6, 2018 06:45
@bpedersen2
Copy link
Contributor Author

On our production instance wiht these changes applied, google now renders the additional informations on the search results page (at least for all pages re-crawled after application of this patch)

'description': strip_tags(event.description),
}
if event.person_links:
data['performer'] = map(_get_json_ld_performer, event.person_links)
Copy link
Member

@ThiefMaster ThiefMaster Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the category page, I think you want to call serialize_person_for_json_ld here.

@@ -338,11 +339,13 @@ def _process(self):
'past_event_count': past_event_count,
'show_past_events': show_past_events,
'past_threshold': past_threshold.strftime(threshold_format),
'json_ld': serialize_events_for_json_ld(events + future_events),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this causes query spam (at least one extra query per event)

@ThiefMaster
Copy link
Member

ThiefMaster commented Apr 18, 2018

I like the PR in general (and was about to merge it), but I'm very unsure of adding the json-ld to the category page as well.

Besides the query spam (which I'll take care of fixing if we decide to keep it): What's the benefit of including this information there? Do you have an example how Google etc. display this for the category page? Especially if there are many future events...

@bpedersen2
Copy link
Contributor Author

bpedersen2 commented Apr 19, 2018

Data on category page:
I added this as the data was already included in the page before this change. As google will crawl pages not too frequently, enough lookahead is necessary.

As for the rendering of the data in google:
https://www.google.com/search?q=cern+conference+category&q=cern+conference+category

def serialize_person_for_json_ld(person):
return {
'@type': 'Person',
'name': person.display_full_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use full_name (display_full_name takes user preferences into account (whether the first or last name comes first etc) which aren't relevant anyway since search engines won't be logged in)

@ThiefMaster
Copy link
Member

I added this as the data was already included in the page before this change.

Ah, hadn't noticed it since you didn't remove the old attributes.

I don't like the idea of having to query all the future events just to include them here... Also, we cannot include stuff like the speaker on the category listing page - you can have events there which are protected and not accessible by you (we do not want to do access checks for each displayed events, since at least when you are logged in these checks can be expensive when they result in group membership lookups).

So I'd keep only the same basic metadata we have right now: date, url, title
Like this you also avoid triggering extra queries to get e.g. the speakers.

@bpedersen2
Copy link
Contributor Author

Hmm, as the main users of these data are search engines, would it make sense to omit this if a user is logged in? Google and co. Are not logged typically.

@bpedersen2
Copy link
Contributor Author

Search engines are also the reason why I included future events in the first place, as on sparsely populated categories they would not see much, an when they see it, all registration deadlines have passed...

bpedersen2 and others added 6 commits May 8, 2018 16:13
Instead of sprinkling the information across the page,
put everything into an json-ld blob.
This is now rendered as json-ld instead.
The data were not correct anyway and are now replaced
by the json-ld formatted data in the head.
bpedersen2 and others added 5 commits May 8, 2018 16:14
It's not visible from the category page so it should not be included in
the category json-ld as this is accessible without access to the event
itself.
It's mainly for search engines, and these are never logged in.
@ThiefMaster ThiefMaster merged commit 3c02eb5 into indico:master May 8, 2018
@bpedersen2 bpedersen2 deleted the richcards branch May 9, 2018 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants