Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Richcards #3287

Merged
merged 13 commits into from May 8, 2018
Merged

Richcards #3287

merged 13 commits into from May 8, 2018

Conversation

@bpedersen2
Copy link
Contributor

@bpedersen2 bpedersen2 commented Apr 4, 2018

Instead of using inline microdata, render the metadata as json-ld in the header.

The resulting data validates in the Google data testing tool with just warnings about missing option entries (mainly: "offers")

Fixes: #3269

@bpedersen2 bpedersen2 force-pushed the bpedersen2:richcards branch from 1958ae3 to 7adefe8 Apr 4, 2018
@bpedersen2
Copy link
Contributor Author

@bpedersen2 bpedersen2 commented Apr 4, 2018

This pull request fixes the meta data problems for all event types and is also safe in case a custom template overrides part of the content layout.

@ThiefMaster
Copy link
Member

@ThiefMaster ThiefMaster commented Apr 4, 2018

If this is as widely supported as the previous attributes I really like it. Much less messy than having attributes spread all over for sure.

What I really dislike however is generating JSON with string operations. For example, in many places having a " in a name would break the JSON. It's better to create a Python dict and then use the |tojson filter in Jinja to convert it to JSON. since meta.html is actually rendered from Python code, you can just create the dict there, pass it to the template, and then convert it to a JSON string in the template. Search for this line to find the place where it's rendered (events/views.py):

meta = render_template('events/meta.html', event=self.event, site_name=site_name)
@@ -31,7 +31,7 @@ <h1>
<img src="{{ event.logo_url }}" alt="{{ event.title }}" border="0" class="confLogo">
</div>
{% endif %}
<span itemprop="title">{{ event.title }}</span>

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

>>> Event.query.filter(db.cast(Event.stylesheet_metadata, JSONB) != 'null', Event.stylesheet.contains('itemprop')).count()
34

All these events use CSS selectors matching this particular item to hide or realign it. Probably better to keep this span here even if it'll be redundant...

This comment has been minimized.

@bpedersen2

bpedersen2 Apr 4, 2018
Author Contributor

Done

@bpedersen2 bpedersen2 force-pushed the bpedersen2:richcards branch from 7adefe8 to eca7a59 Apr 4, 2018
@bpedersen2
Copy link
Contributor Author

@bpedersen2 bpedersen2 commented Apr 4, 2018

Rewritten to prepare the dict first.

At least google states that the json-ld format is preferred over inline microdata.

"@type": "Place",
"name": self.event.venue_name or 'No location set',
"address": self.event.address or 'No address set'
},

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

you like this alignment, don't you :D

This comment has been minimized.

@bpedersen2

bpedersen2 Apr 4, 2018
Author Contributor

eclipse...

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

ouch ;) I didn't realize anyone still uses eclipse/pydev. Might want to look at PyCharm. But I guess eclipse/pydev also has a setting for the brace alignment

"performers": self._getJsonLDPerformers(self.event.person_links)
}
if self.event.has_logo:
lddata["image"] = url_for('event_images.logo_display', self.event, slug=self.event.logo_metadata['hash'],

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

Maybe you could add a external_logo_url property (right below the logo_url) property to Event and use that here?

@@ -120,9 +120,43 @@ def _getHeader(self):
def getJSFiles(self):
return WPDecorated.getJSFiles(self) + self._asset_env['modules_event_display_js'].urls()

def _getJsonLDData(self):

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

_get_json_ld_data (we don't use camelcase in new code)

@@ -120,9 +120,43 @@ def _getHeader(self):
def getJSFiles(self):
return WPDecorated.getJSFiles(self) + self._asset_env['modules_event_display_js'].urls()

def _getJsonLDData(self):
lddata = {

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

ld_data or data (lddata looks a bit weird)

@@ -120,9 +120,43 @@ def _getHeader(self):
def getJSFiles(self):
return WPDecorated.getJSFiles(self) + self._asset_env['modules_event_display_js'].urls()

def _getJsonLDData(self):
lddata = {
"@context": "http://schema.org",

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

usually we use single quotes everywhere

"@type": "Event",
"url": self.event.external_url,
"name": self.event.title,
"startDate": self.event.start_dt.isoformat(),

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

start_dt (UTC) or start_dt_local (event TZ)? Not sure what the spec requires

This comment has been minimized.

@bpedersen2

bpedersen2 Apr 4, 2018
Author Contributor

The spec wants isoformat, see http://schema.org/DateTime. But dt_local seems ok.

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

sure, isoformat makes sense, the comment was just about whether to use UTC or local

"endDate": self.event.end_dt.isoformat(),
"location": {
"@type": "Place",
"name": self.event.venue_name or 'No location set',

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

Is this required? Otherwise we could maybe just omit the location if there's no location information and/or use an empty string in case of just one of the two pieces of data being empty. IMO an empty string is more meaningful than a custom string where only a human knows that it means "not set/known"

This comment has been minimized.

@bpedersen2

bpedersen2 Apr 4, 2018
Author Contributor

At least the google tester flags this a errors, and an empty string is also not allowed.

_external=True)
return lddata

def _getJsonLDPerformers(self, chairs):

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

no camelcase please ;)

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

I would change this method to act on a single person, and call it like this above:

'performers': map(self._get_json_ld_performer, self.event.person_links)
"affiliation": {
"@type": "Organization",
"name": chair.affiliation
}

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

} alignment - like above

@bpedersen2 bpedersen2 force-pushed the bpedersen2:richcards branch from eca7a59 to 2859539 Apr 4, 2018
'affiliation': {
'@type': 'Organization',
'name': chair.affiliation
}

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

return {'@type': 'Person',
        'name': chair.display_full_name,
        'affiliation': {'@type': 'Organization',
                        'name': chair.affiliation}}

or

return {
    '@type': 'Person',
    'name': chair.display_full_name,
    'affiliation': {
        '@type': 'Organization',
        'name': chair.affiliation
    }
}

This comment has been minimized.

@bpedersen2

bpedersen2 Apr 4, 2018
Author Contributor

Done

@@ -120,9 +120,40 @@ def _getHeader(self):
def getJSFiles(self):
return WPDecorated.getJSFiles(self) + self._asset_env['modules_event_display_js'].urls()

def _get_json_lddata(self):

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

_get_json_ld_data

data['image'] = self.event.external_logo_url
return data

def _get_json_ldperformer(self, chair):

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 4, 2018
Member

_get_json_ld_performer

This comment has been minimized.

@bpedersen2

bpedersen2 Apr 4, 2018
Author Contributor

Done

@bpedersen2 bpedersen2 force-pushed the bpedersen2:richcards branch 3 times, most recently from dc4be74 to 19748bf Apr 4, 2018
@bpedersen2
Copy link
Contributor Author

@bpedersen2 bpedersen2 commented Apr 6, 2018

On our production instance wiht these changes applied, google now renders the additional informations on the search results page (at least for all pages re-crawled after application of this patch)

@bpedersen2 bpedersen2 force-pushed the bpedersen2:richcards branch from 19748bf to 8f237de Apr 10, 2018
'description': strip_tags(event.description),
}
if event.person_links:
data['performer'] = map(_get_json_ld_performer, event.person_links)

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 18, 2018
Member

This breaks the category page, I think you want to call serialize_person_for_json_ld here.

@@ -338,11 +339,13 @@ def _process(self):
'past_event_count': past_event_count,
'show_past_events': show_past_events,
'past_threshold': past_threshold.strftime(threshold_format),
'json_ld': serialize_events_for_json_ld(events + future_events),

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 18, 2018
Member

this causes query spam (at least one extra query per event)

@ThiefMaster
Copy link
Member

@ThiefMaster ThiefMaster commented Apr 18, 2018

I like the PR in general (and was about to merge it), but I'm very unsure of adding the json-ld to the category page as well.

Besides the query spam (which I'll take care of fixing if we decide to keep it): What's the benefit of including this information there? Do you have an example how Google etc. display this for the category page? Especially if there are many future events...

@ThiefMaster ThiefMaster force-pushed the bpedersen2:richcards branch from 8f237de to c670e27 Apr 18, 2018
@bpedersen2
Copy link
Contributor Author

@bpedersen2 bpedersen2 commented Apr 19, 2018

Data on category page:
I added this as the data was already included in the page before this change. As google will crawl pages not too frequently, enough lookahead is necessary.

As for the rendering of the data in google:
https://www.google.com/search?q=cern+conference+category&q=cern+conference+category

def serialize_person_for_json_ld(person):
return {
'@type': 'Person',
'name': person.display_full_name,

This comment has been minimized.

@ThiefMaster

ThiefMaster Apr 19, 2018
Member

This should use full_name (display_full_name takes user preferences into account (whether the first or last name comes first etc) which aren't relevant anyway since search engines won't be logged in)

@ThiefMaster
Copy link
Member

@ThiefMaster ThiefMaster commented Apr 19, 2018

I added this as the data was already included in the page before this change.

Ah, hadn't noticed it since you didn't remove the old attributes.

I don't like the idea of having to query all the future events just to include them here... Also, we cannot include stuff like the speaker on the category listing page - you can have events there which are protected and not accessible by you (we do not want to do access checks for each displayed events, since at least when you are logged in these checks can be expensive when they result in group membership lookups).

So I'd keep only the same basic metadata we have right now: date, url, title
Like this you also avoid triggering extra queries to get e.g. the speakers.

@bpedersen2 bpedersen2 force-pushed the bpedersen2:richcards branch from c670e27 to 8ad13b4 Apr 20, 2018
@bpedersen2
Copy link
Contributor Author

@bpedersen2 bpedersen2 commented Apr 24, 2018

Hmm, as the main users of these data are search engines, would it make sense to omit this if a user is logged in? Google and co. Are not logged typically.

@bpedersen2
Copy link
Contributor Author

@bpedersen2 bpedersen2 commented Apr 24, 2018

Search engines are also the reason why I included future events in the first place, as on sparsely populated categories they would not see much, an when they see it, all registration deadlines have passed...

bpedersen2 and others added 6 commits Apr 20, 2018
Instead of sprinkling the information across the page,
put everything into an json-ld blob.
This is now rendered as json-ld instead.
The data were not correct anyway and are now replaced
by the json-ld formatted data in the head.
bpedersen2 and others added 5 commits Apr 20, 2018
It's not visible from the category page so it should not be included in
the category json-ld as this is accessible without access to the event
itself.
@ThiefMaster ThiefMaster force-pushed the bpedersen2:richcards branch from 97081d1 to 4da702f May 8, 2018
ThiefMaster added 2 commits May 8, 2018
It's mainly for search engines, and these are never logged in.
@ThiefMaster ThiefMaster merged commit 3c02eb5 into indico:master May 8, 2018
1 check was pending
1 check was pending
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@bpedersen2 bpedersen2 deleted the bpedersen2:richcards branch May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.