New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL tutorial #131

Merged
merged 50 commits into from Nov 25, 2015

Conversation

Projects
None yet
8 participants
@mdlincoln
Member

mdlincoln commented Jul 26, 2015

Here's a full draft of my SPARQL tutorial. I look forward to your suggestions!

@wcaleb

This comment has been minimized.

Contributor

wcaleb commented Jul 27, 2015

Hey @fredgibbs @acrymble @miriamposner @williamjturkel @ianmilligan1, check it out. I've been working on a way to preview new pull requests on a development version of Programming Historian hosted on Heroku. For example, @mdlincoln lesson can be previewed here:

https://proghist-dev.herokuapp.com/lessons/graph-databases-and-SPARQL.html

(you may have to wait a second for the app to start up; let me know if it doesn't work)

Ideally, I can get this set up so that Heroku automatically creates a preview site for each individual pull request, as described in this post. That way, we can send a preview link to reviewers without having to accept the pull request into our repo. Conversation about submissions can then take place on the PR page, in one consolidated place, until we accept the lesson and merge the PR into our main branch.

@wcaleb

This comment has been minimized.

Contributor

wcaleb commented Jul 27, 2015

Oops, forgot to at-mention @ahegel.

@wcaleb wcaleb added the submission label Jul 27, 2015

@fredgibbs

This comment has been minimized.

Contributor

fredgibbs commented Aug 5, 2015

this is fantastic!!! this might be the perfect solution to getting accurate
previews as well as avoiding new contributors to have to maintain their own
repository while the lesson is being reviewed. I definitely like the idea
of merging the request officially once the lesson is done rather than in
progress. At that point, changes are minor enough that they can be made via
the website rather than offsite editing and pull requesting. I see the
styling doesn't quite match the real site, but I assume this will be a
trivial fix.

I vote that we should incorporate this into our editorial workflow if it is
indeed possible to automate the process of previewing pull request lessons.

Thanks @wcaleb!!

On Mon, Jul 27, 2015 at 9:50 AM, W. Caleb McDaniel <notifications@github.com

wrote:

Hey @fredgibbs https://github.com/fredgibbs @acrymble
https://github.com/acrymble @miriamposner
https://github.com/miriamposner @williamjturkel
https://github.com/williamjturkel @ianmilligan1
https://github.com/ianmilligan1, check it out. I've been working on a
way to preview new pull requests on a development version of Programming
Historian hosted on Heroku. For example, @mdlincoln
https://github.com/mdlincoln lesson can be previewed here:

https://proghist-dev.herokuapp.com/lessons/graph-databases-and-SPARQL.html

(you may have to wait a second for the app to start up; let me know if it
doesn't work)

Ideally, I can get this set up so that Heroku automatically creates a
preview site for each individual pull request, as described in this post
http://cobyism.com/blog/heroku-pull-request-apps/. That way, we can
send a preview link to reviewers without having to accept the pull request
into our repo. Conversation about submissions can then take place on the PR
page, in one consolidated place, until we accept the lesson and merge the
PR into our main branch.


Reply to this email directly or view it on GitHub
#131 (comment)
.

frederick w gibbs | assistant professor of history | univ. of new mexico
fredgibbs.net | @fredgibbs
http://www.twitter.com/fredgibbs

@wcaleb

This comment has been minimized.

Contributor

wcaleb commented Aug 5, 2015

@fredgibbs Thanks!

I experimented a little on #132 with the Pull Request Review apps. As you can see there, it created a link (look for the word "deployed") to a unique review site for that pull request. (The link doesn't work anymore because when the Pull Request is closed, Heroku deletes the review app.) Creating the review app isn't completely automated; when a new Pull Request is submitted, I'll have to go into the Heroku dashboard and push a button to deploy the review app, but it's simple to do and I could share the account with others.

There are still a few kinks to work out, as you note. The style issue seems to be browser specific; when I look at the Heroku app on Safari, the fonts show up correctly. Also, you'll note that I had to add the "html" extension to the URL; that's because of a Jekyll issue that should hopefully be fixed when 3.0 is released, and if I wanted I could go ahead and make the Heroku site use the Jekyll 3.0 beta release now (as GitHub Pages already does, I believe).

@mdlincoln

This comment has been minimized.

Member

mdlincoln commented Oct 2, 2015

Howdy - I just wanted to check if this PR is making any movement through the pipeline?

@fredgibbs

This comment has been minimized.

Contributor

fredgibbs commented Oct 2, 2015

yes! it is under review, which should be complete in the next few weeks at
most.

On Fri, Oct 2, 2015 at 10:39 AM, Matthew Lincoln notifications@github.com
wrote:

Howdy - I just wanted to check if this PR is making any movement through the
pipeline
https://github.com/programminghistorian/jekyll/wiki/Lesson-Pipeline?


Reply to this email directly or view it on GitHub
#131 (comment)
.

frederick w gibbs | assistant professor of history | univ. of new mexico
fredgibbs.net | @fredgibbs
http://www.twitter.com/fredgibbs

@mdlincoln

This comment has been minimized.

Member

mdlincoln commented Oct 2, 2015

👍

@patrickmj

This comment has been minimized.

patrickmj commented Oct 20, 2015

Audience

I'm taking the audience to be people already generally familiar with APIs and their responses (usually JSON nowadays, but also XML). This might be developers, or more generally project managers with enough technical knowledge to talk about APIs and data exchange. And, of course, the audience is people within those groups who have heard of SPARQL and RDF and LOD who have been looking for an accessible introduction.

Overall comments

This does a really good job of introducing what RDF looks like and how to read it. The concept of the RDF triple often scares people, and by starting with natural language triples this very nicely demystifies them. It was a little surprising to me to keep the natural language representations so long through the explanations, and I think that was the right choice. It kept accessibility at the forefront for as long as possible to establish core concepts before moving into dealing with the off-putting URIs.

The inclusion of graph diagrams is essential, and I'd say could be used even more. For example, the graph of data for <The Nightwatch> and <Woman with a Balance> is, I think, at just the right level of complexity to show the power of a graph (and hence SPARQL), without making it overwhelming. But, in the progression from the first example graph demonstrating the data to the SPARQL query represented, some intermediate graph diagrams would clarify things more. The use of color is a good move toward that, but I'm not sure that a newcomer to RDF will intuitively know how to interpret theh colors in the context of the entire graph. A graph diagram that shows only what's relevant to the SPARQL query, then also includes the current graph diagram to widen out the scope would help people move through the conceptual steps of how a SPARQL query stamps out a limited pattern in a graph.

For these concepts, the more that makes explicit the relationship between the turtle and the graph diagrams, the better. This piece is already doing that well, as seen in the caption to the diagram representing the first SPARQL query (though I have a quibble below in "Terminology and Usage"). Explicitly saying, for example, something like "?painting in the query stands for the returned nodes" will help line up the visual representation with the SPARQL more clearly. That also more firmly establishes the connection to the image of the HTML results.

The links out to running the example queries is a good use of the online medium here, but the transition is a little abrupt. Clicking on the link to look at the example object <http://collection.britishmuseum.org/id/object/PPA82633> gets to a screen that's hard to put into context at this point. Translating the first few lines of that result page back into 1) the turtle that was used earlier to explain RDF and 2) another graph diagram would place the reader much better into familiar territory. Using the graph diagram that you use later in the filtering section earlier on, for example, would be great.

There's an interesting choice to not look at the response JSON or XML to unpack that. That seems to be part of the audience choice, assuming that the coders in the audience will be able to figure out the connection between the HTML representation and the JSON or XML, and the non-coders will be happy with the readability of the HTML responses. It is a hard balance, and part of the scope decisions of writing for such a complex topic.

There's another choice at work, to concentrate on a single endpoint. The result is that it explains the graph and the powerful querying potential very well, but minimizes the "Linked" in Linked Open Data. There's a mention of this in the URI/URL terminology below. By only looking at one endpoint, the RDF idea of being able to link across datasets is lost. I see that as probably a necessary choice for the scope of this piece, but should at least get a nod for people interested in LOD. It seeps in in examples like <http://dbpedia.org/resource/Rembrandt>

It looks like a corrolary that the piece focuses on SELECT queries, without mention of CONSTRUCT queries. That again seems like a matter of defining the scope of the tutorial, which I see as the right choice for the audience. I'm not sure if there's a way to give a tip to CONSTRUCT queries, but if there's a way it would be a bit more complete.

Terminology and Usage

  • A brief explanation of what a 'graph' is might be helpful. Especially nodes and edges as subjects/objects and predicates will be helpful for understanding the graph diagrams. In the list of terms, since 'node' is there, 'edge' should be, too.
  • Clarifying the distinction between 'URI' and 'URL' is a fine technical detail, but one that readers will likely see elsewhere. URL works great as explaining the linkiness of LOD. But, it also conceals the linkiness, as people will associate a URL with a site -- here, BM -- whereas URIs are universal, and ideally reused.
  • "qName" is also a helpful term to include, as the audience will see it elsewhere. It also helps put a label on the <prefix:name> examples when prefixes are introduced.
  • objects vs literals: there's a discussion that make these seem equivalent ("The objects of these statements ....."). Since the upcoming discussion works with queries on literals, it makes sense to make that the emphasis there, but I worry about people forgetting about node objects.

Technical Quibbles

  • It should be noted that prefixes in SPARQL queries usually follow conventions, but they are unique to the query. Looks like BM uses dct, but other endpoints default to dcterms. For people exploring other endpoints, they'll need to know that distinction when they look at other endpoints, even though the focus here is on BM.
  • After the Complex Queries first example, you say "?object_type is a blank node". I don't think that's quite right, since when I include it in what I select, it looks like I get real, dereferencable results back.
@mdlincoln

This comment has been minimized.

Member

mdlincoln commented Oct 22, 2015

Thanks for the thorough comments. Some questions that I have as I go through these:

A brief explanation of what a 'graph' is might be helpful. Especially nodes and edges as subjects/objects and predicates will be helpful for understanding the graph diagrams. In the list of terms, since 'node' is there, 'edge' should be, too.

A definition for graph is definitely warranted, as I use it quite a bit. I am starting to have reservations about overemphasizing the node/edge dynamic, though. All the examples shown here use predicates easily understandable as edges, like <was created by> forming an "edge" between the "nodes" <Rembrandt> and <The Nightwatch>. However that particular predicate <was created by> would, in practice, itself be a node that is a potential subject/object of a statement (e.g. <was created by> could be a parent of the concept <was painted by>). I'm reluctant to introduce this further complexity into what's already a fairly heavily-loaded lesson, but do you think it would be worth adding this in?

@mdlincoln

This comment has been minimized.

Member

mdlincoln commented Nov 6, 2015

Nice idea to actually show a Palladio product 👍 I've thrown together a gallery of images with their accompanying timeline.

That covers all my responses for now.

@mdlincoln

This comment has been minimized.

Member

mdlincoln commented Nov 11, 2015

@fredgibbs Anything left to do on this lesson?

@whanley

This comment has been minimized.

Contributor

whanley commented Nov 11, 2015

Hi @mdlincoln -- I'm just starting to produce an RDF tutorial, and I scanned yours quickly. Useful in general terms, and also great support for my tutorial. Seeing that it's almost ready to go, I'll try to give it a really careful read tonight just in case anything comes up.

@mdlincoln

This comment has been minimized.

Member

mdlincoln commented Nov 11, 2015

@whanley brilliant - with your lesson on how to create RDF and this one on how to query, they might have to give us our own lesson category 😄

@whanley

This comment has been minimized.

Contributor

whanley commented Nov 11, 2015

The lesson I really need (to read, and then to write) is how to expose a SPARQL endpoint. Maybe I can figure that out, and we'll have a nice trio!

@fredgibbs

This comment has been minimized.

Contributor

fredgibbs commented Nov 15, 2015

@mdlincoln There's nothing else I had in mind that needs to be done. If you're finished with revisions, we will merge the pull request and finally release this into the wild! (I'll update the lessons page).

@whanley sounds great! And yes, a new lesson category is certainly appropriate and necessary for your tutorials (and hopefully related ones that will be inspired by them). Always exciting to expand the breadth of PH!

@whanley

This comment has been minimized.

Contributor

whanley commented Nov 15, 2015

@mdlincoln @fredgibbs sorry I'm late, but I do have a few comments that I will post in the next couple of hours, if you can hold off...

@fredgibbs

This comment has been minimized.

Contributor

fredgibbs commented Nov 15, 2015

excellent! no hurry at all.

Fred Gibbs
History Department
University of New Mexico
@fredgibbs https://twitter.com/fredgibbs || fredgibbs.net http://fredgibbs.net/

On Nov 15, 2015, at 7:15 AM, Will Hanley notifications@github.com wrote:

@mdlincoln https://github.com/mdlincoln @fredgibbs https://github.com/fredgibbs sorry I'm late, but I do have a few comments that I will post in the next couple of hours, if you can hold off...


Reply to this email directly or view it on GitHub #131 (comment).

@whanley

This comment has been minimized.

Contributor

whanley commented Nov 15, 2015

Hi @mdlincoln. Great tutorial. Very systematic example from the BM, and good turns to Europeana and Palladio at the end. Last week by mistake I read an earlier version, so most of the corrections I thought I had you'd already found. Thus my suggestions really concern the flow of the opening section.

I wonder if the first paragraph might be a bit too jargon-heavy. For example, the sentence “a museum may have information on donors, artists, artworks, exhibitions, and provenance, but its web API may offer only object-wise retrieval, with associated data about donors, artists, provenance, etc. embedded within subfields of each object's JSON data” is less user-friendly than what comes later in the lesson. Perhaps you could simplify the language of the first paragraph or two? For instance, the second half of this sentence could read “but its search form may only allow you to search by object, and not by donor, artist, provenance, etc.”

I wonder if you might consider reordering the paragraphs in the introduction by moving the current first paragraph to third position. I think that its discussion of APIs is a bit secondary to the main hooks for this tutorial, which involve understanding LOD. If the second paragraph became the first paragraph, you’d want a new lead sentence, something like “What’s under the hood of the online catalogs of some of the world’s major research collections?” Then introduce RDF, LOD, APIs, and SPARQL in that sequence.

The “RDF in brief” section is very clear—exemplary. And the Mad Lib comparison is very clever.

Typo on the word “language”: “conceptual entities from their plain-English (or other langauage!) labels”

Under URIs section, I would distinguish between URLs (a term that you use earlier) and URIs—I know that you do so in the term review, but it bears repeating. Also, you might want to define the way that literals can function as labels (for instance, in rdfs:label), maybe along the lines of “Fortunately, good RDF practice is to assign a human-readable label to otherwise-inscrutable URIs.”

For extra clarity, I might add a bit to the sentence “See the predicates in these statements, with domain names like purl.org, w3.org, and xmlns.com?” thus: “See the predicates (second elements) in these statements, with domain names like purl.org, w3.org, and xmlns.com? “

In the first part of the BM example, when you say “You'll note how this node has an English label,” I’m not sure what you mean.

Thanks for the great tutorial! Will

@mdlincoln

This comment has been minimized.

Member

mdlincoln commented Nov 16, 2015

Thanks for these additional comments. I have added in some of the suggested rewording. I am keeping the ordering of the introduction paragraphs because I think it is important to immediately establish a problem that RDF/LOD can solve. But your suggestion at swapping out that inadvertent jargon is a keeper 👍

@fredgibbs It's ready to go - I've already added in my bio metadata.

@mdlincoln

This comment has been minimized.

Member

mdlincoln commented Nov 20, 2015

@fredgibbs Is there anything else that I need to do with this? I was under the impression from your earlier comment that this was ready to publish.

@mdlincoln

This comment has been minimized.

Member

mdlincoln commented Nov 25, 2015

@fredgibbs @acrymble @ianmilligan1 Is there any more action needed on this tutorial before it can be published?

@fredgibbs

This comment has been minimized.

Contributor

fredgibbs commented Nov 25, 2015

It's good. Sorry, I've been out of town for a conference and a bit behind with lesson work. I will take care of this tomorrow.

On Nov 24, 2015, at 8:31 PM, Matthew Lincoln notifications@github.com wrote:

@fredgibbs @acrymble @ianmilligan1 Is there any more action needed on this tutorial before it can be published?


Reply to this email directly or view it on GitHub.

fredgibbs added a commit that referenced this pull request Nov 25, 2015

@fredgibbs fredgibbs merged commit 8b0e8ad into programminghistorian:gh-pages Nov 25, 2015

@drjwbaker

This comment has been minimized.

Member

drjwbaker commented Nov 26, 2015

Any reason why this has a 21 June publication date? (in the citation as well...)

@acrymble

This comment has been minimized.

Contributor

acrymble commented Nov 26, 2015

I presume that's when in was submitted. The author puts the publication
date in the yaml header. We should probably have them leave that blank and
fill it in last

On Thursday, November 26, 2015, James Baker notifications@github.com
wrote:

Any reason why this has a 21 June publication date? (in the citation as
well...)


Reply to this email directly or view it on GitHub
#131 (comment)
.

@acrymble

This comment has been minimized.

Contributor

acrymble commented Nov 26, 2015

I've updated this. Thanks James.

On Thu, Nov 26, 2015 at 5:48 PM, Adam Crymble adam.crymble@gmail.com
wrote:

I presume that's when in was submitted. The author puts the publication
date in the yaml header. We should probably have them leave that blank and
fill it in last

On Thursday, November 26, 2015, James Baker notifications@github.com
wrote:

Any reason why this has a 21 June publication date? (in the citation as
well...)


Reply to this email directly or view it on GitHub
#131 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment