Gianfranco Cecconi edited this page Jan 10, 2018 · 57 revisions

This page provides an overview of the purpose and use of a Dataset Site. Its aim is to help anyone in an organisation create a Dataset Site for their data (not just the developers).

As with all of Openactive, your feedback would be hugely appreciated. Please create an issue on the issue tracker with any feedback or if anything is unclear, or comment on the guides directly.

What is the Dataset Site Generator?

The Dataset Site Generator allows providers to create a subsite that contains license info and other metadata. A dataset homepage and documentation are required for providers opening their data, and the generator provides this and more. The Dataset Site Generator includes a template Dataset Site, and complete step-by-step guides designed for non-technical users.

What does a Dataset Site look like?

Take a look at examples from British Cycling and GoodGym.

What does a Dataset Site do?

The purpose of a Dataset Site is to provide:

  • A web page that can be referenced when discussing the dataset.
  • A human and machine readable licence associated with the data (the Dataset Page contains invisible metadata which allows its details to be read automatically).
  • A human and machine readable rights statement to specify how dataset users (innovators who want to build on top of/use your data) should attribute your data.
  • An accessible "single point of truth" that explains where the data can be found.
  • Details ("documentation") and historical record ("changelog") relating to the format of the data, including the specifications it follows, and the data fields it contains.
  • A place where the community can contribute with comments, and raise issues.
  • A mailing list to which the data users can subscribe to, to get updates about changes to the data format, specifications and fields.

What does the Dataset Site Generator create?

The Dataset Site Generator and associated guides very quickly create a minimal Dataset Site covering all of the criteria above, using freely available, open source tools. A generated site contains features sufficient for publishing a single dataset, which in most cases is enough for initial publishing of data relating to Openactive.

Additional datasets can be easily added later, please raise an issue on this repository to request a guide for this.

Do I need to be really techy to do this?

Not at all. There are no risks associated with just having a go at using the guides in the next section. If it all goes wrong, you can just delete the repositories (defined in the next section) you've created and start again.

I am non-techy. What is GitHub?

GitHub is a place where the open source community can collaborate.

A further explanation of GitHub terms to make this easier:

  • Repository: A repository in GitHub is the name for a collection of Code, Issues, and a Wiki. The page you are looking at right now is inside a repository (this repository is called "dataset-site-generator". See the "openactive / dataset-site-generator" title at the top of the page).
  • Organisation: This repository is called "dataset-site-generator", and it exists inside the organisation "openactive". See the "openactive / dataset-site-generator" title at the top of the page.
  • Wiki: You are currently looking at the Wiki inside this repository (see the "Wiki" tab at the top of the page). A wiki is a collection of pages that can be easily edited. Some wikis are unrestricted (like this one), so they can be edited by anyone on GitHub (and all existing editors are notified of changes). Others are restricted to be editable only by GitHub users who have been granted access.
  • Code: The code tab at the top of the page will show you the code in this repository, which can be edited.
  • Issues: The issues tab at the top of the page is a place people can leave comments about the repository.
  • Fork: Means to "copy", as in copy-and-paste a repository. A "fork" is a "copy" of a repository, and the forked repository always links back to the original. You can "fork" this repository to make your own Dataset Site by following one of the guides below.

This all sounds great! How do I make a Dataset Site? What do I need to create?

Simply follow each of the steps below:

  1. A GitHub account and GitHub organisation for you and your organisation, respectively.
  1. A repository, containing a Dataset Site, which can be "forked" (copied) from this repository.
  1. A Mailchimp mailing list
  • This allows dataset users (innovators who want to build on top of/use your data) to be kept up-to-date with changes to your data's format, spec and fields. Follow the guide in this document to create one of these.
  1. A repository containing Documentation, which can be created new, following the examples of others.
  • This repository provides dataset users with documentation of data format, spec and fields, as well as allowing them to comment and raise issues. It also includes a historical record of changes to the data format, spec and fields (a "changelog"). Follow the guide in this document to create one of these.
  1. A Custom Domain for your Dataset Site
  1. Links to your Dataset Site
  • To ensure your data is discoverable, we recommend linking to this Dataset Site from both:
    • The footer of your user-facing website (e.g. "The data on this site is available as open data" on pages in www.pingengland.co.uk).
    • An open data page or blog page on your main organisation website, if different from your user-facing website (e.g. a blog or page on www.tabletennisengland.co.uk).
    • Publicity through your networks and partners, in any newsletters, social media and as part of your existing engagement activity.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.