Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
131 lines (72 sloc) 14 KB

Three metaphors for open data

I love a good metaphor.

In "Metaphors We Live By", George Lakoff and Mark Johnson explore the importance of metaphors in how we think about and perceive the world. Conscious and unconscious use of metaphors help us understand one idea or concept in terms of another. The right metaphor will help to highlight the important characteristics of a new idea. But the wrong metaphor can also hide aspects of an idea, drawing attention from important elements. Metaphors are great communication tools. And good communication uses great metaphors.

Obviously any analogy can be taken too far. The skill is in knowing when to let them go, and when an alternate metaphor might help develop a discussion by demonstrating new perspectives.

People use lots of metaphors when they talk about data.

The most common is the idea that "Data is the New Oil". In this metaphor businesses around the world have discovered that they are sitting on untapped reserves of data. And there are now new tools and techniques that will let them refine that data to create products and services.

A related metaphor is that of a "gold rush" around data. Companies are racing to turn data into valuable insights and profits. Many are hoping to be the first to discover and extract value from public datasets. Or make a claim to be the data platform for their industry.

These metaphors may be helpful in explaining the scale of investment in new technologies, platforms and data science. But they don't really tell us much about data or even the data revolution itself. They don't have a great deal of power to provide insights beyond illustrating a sudden rush for profits.

Oil refining is an extractive industry. It involves taking a limited natural resource and then refining it to create other products and profits. But data isn't like oil. It can be consumed and reused endlessly, by different businesses and people, at the same time.

And we don't prospect for data. Data is not just sitting undiscovered in databases. Collecting and maintaining data requires concerted effort. Organisations don't just refine data, they also create it. Sometimes with or without the help of their customers and communities.

I want this book to help open data practitioners understand the key concepts that that underpin the open data movement. Throughout the book we'll introduce a collection of metaphors to help explore different aspects of the open data landscape.

We'll think about data being on a spectrum of closed, shared and open. We'll imagine data as energy and show how it supports an ecosystem. And we'll also describe data as water, to help us think about how it flows through different systems.

But to begin with lets look at three metaphors that individually demonstrate different aspects of the practice of open data.

Data is like Roads

Data is like roads. Roads help us navigate to a destination. Data helps us navigate to a decision.

First proposed by Jeni Tennison, this metaphor is intended to highlight the increasingly important role that data plays in modern society and business.

Roads help us travel to work and school. They also support a variety of different business uses. Roads are infrastructure that are created and maintained by society for the benefit of everyone. Open data, and especially open data published by the public sector, has similar characteristics. Like roads, data is infrastructure.

Unlike the oil or gold rush metaphors, thinking of data as infrastructure highlights some important characteristics. For example:

  • data, like all infrastructure, supports a community of users. They may use it differently but its continued availability is important to them
  • roads are available for anyone to use, for any purpose. You don't need explicit permission to use a road. Openly licensed data has the same properties
  • infrastructure needs to be managed and maintained to preserve its utility

Open data should also perhaps be governed by some "rules of the road" that will ensure that it is used fairly and responsibly. There is a growing body of best practices and social norms around the use of data which we'll be exploring throughout this guide.

The road analogy is also useful in another way. It can help us think through the terms under which data infrastructure is made available, now and in the future. In a series of blog posts, Peter Wells has used the metaphor to explore three possible futures:

  • open roads: in an open future, data is made as open as possible, so that it can support the widest possible community
  • toll roads: alternatively, rather than having free access to data, we may have to always pay a toll or other fee in order to access it
  • private roads: data is kept closed and is only accessible to those that collect it, for their own purposes

The future data landscape is likely to be criss-crossed by a mixture of open, shared and closed roads. What's important is to understand what the relative mix might be: what is best for society and the market; and how easy will it be for us to navigate to a decision using data?

I find this metaphor a useful framing for thinking through these types of questions.

Stone Soup

The tale of "Stone Soup" is one that you may have heard already. Like all folk tales there are a number of different variations. The version I prefer goes something like this:

A hungry traveller comes to a village carrying nothing but an empty cooking pot. Initially the villagers are unwilling to share their food with the traveller. So the traveler fills the cooking pot with water from a nearby stream, and sets it over a fire. The traveller then puts a large stone into the pot and sits down to wait.

Curious, the villagers ask the traveller what they're doing. The traveler explains that they are making "stone soup" and it will be a delicious meal when ready. The traveler also explains that the soup would benefit from a little bit of garnish, to help improve the flavour. One villager decides to offer a few carrots which are promptly added to the pot. As the story continues, more and more of the villagers choose to contribute something extra to the pot.

The story ends when the stone is removed from the pot and the traveler, along with whole village, enjoys a delicious meal.

This story is frequently used as a metaphor for community action. It illustrates how small contributions from a community can add up to something much greater.

In some interpretations of the story the traveler is described as being crafty: they are tricking the villagers into giving up their food. But this overlooks the fact that everyone takes a share in the results. And none of the villagers could have created the meal by themselves. The soup needed a little something from everyone.

I think the stone soup story works as a beautiful metaphor for "The Commons": the growing collection of openly licensed data, music, video and books that are freely available for anyone to enjoy. The commons isn't the creation of a single community or a single organisation. It's being created out of the work of a diverse variety of contributors.

The metaphor also highlights some useful aspects of the open data movement. For example:

  • there is often a need for a catalyst, the traveler, who encourages and supports a community in contributing to the creation of something that is of value to everyone
  • it's the mixture of different ingredients, or datasets, that creates the end result
  • everyone has something to contribute, but they need help in seeing the value in what they can add to the pot
  • the cooking pot is the infrastructure within which the meal is created. We rely on that infrastructure to help us collaborate
  • the first contributions, like the stone, may be the least valuable, so we need a sprinkle of imagination to help us envision the end result

If I were to push the metaphor to breaking point I'd suggest that standard open licences, such as those published by the Creative Commons, are the water in the pot: they are the means by which we can freely mix together different ingredients. But that might be going a little too far.

As an open data practitioner, you are the catalyst for your community. Together we are building the open data commons.

The Blind Men and the Elephant

The parable of the Blind Men and the Elephant is another story you've probably heard before. And like Stone Soup there are many different versions.

In the story six blind men are asked to describe an elephant. Each feels a different part of its body. The man that feels the leg declares that an elephant is like a pillar. The man that feels the tail says that an elephant is like rope. Whereas another believes it to be like a snake, after touching its trunk.

In some versions of the story the men are unable to agree on the nature of an elephant. They end up fighting, each defending their own conclusion. In other versions of the story, a king or other wise man, explains that they are all correct, but they had each only touched a part of the animal.

The story is a metaphor for the importance of different viewpoints in fully understanding a situation. Unlike the previous two metaphors, this story doesn't tell us much about the nature of open data. But I think it does help to highlight something important that we should understand about the open data movement.

The open data community is extremely broad. There is a variety of expertise in that community. And we are all looking at how the practice of open data can help us to solve problems in a number of different sectors.

For example open data is a key component of:

  • the open government movement
  • the open science and open access communities
  • the collaborative economy

Across these communities many of the challenges that open data practitioners are facing are the same. I've encountered exactly the same debates about how best to publish open data in many different sectors. And usually the answers are exactly the same: create useful metadata, use open formats, and apply standard open licences.

But the reasons why different communities are embracing open data, and the needs it is addressing, mean that sometimes best practices may vary a great deal.

For example there's a very strong overlap between the open data and open government communities. This is particularly true in the UK and US. This often results in a focus on open data as a means to:

  • encourage economic growth, by creating new businesses and startups that are able to make better use of public sector resources
  • create transparency, so that we can better hold our public institutions to account
  • help us solve social problems, by enabling government to be more data-driven or explore how open innovation can help it work with civil society on specific challenges

And based on that perspective we might choose to publish and prioritise the release of data in certain ways.

But the core reasons driving the adoption of open data in science are arguably different:

  • sharing data can support reproducible research, increasing the quality of the scientific process
  • highlighting the effort required to curate data allows proper credit to be given to its contributors, which an existing focus on publishing academic papers doesn't enable
  • collecting data can often be costly, or may not be repeatable for ethical reasons, so the ability to reuse data is important to support further research

Those communities engaged in collaborative production of openly licensed databases like OpenStreetMap, Musicbrainz or Discogs do so because they are inspired by their individual interests. Or perhaps because they believe that the world should have free alternatives to expensive commercial datasets. Here, open data is a necessary foundation upon which trust and shared ownership is created in the support of collective action.

As open data practitioners we must be mindful of these different perspectives as we create guidance or develop tools to support our community. These differences are particularly important when we attempt to assess the impact, progress and future development of open data programmes. Different parts of our community will have their own definitions of success.

For example, it is unhelpful to suggest that the only important or useful open datasets are those that are regularly updated and made available for use via web APIs. I've seen this argument used repeatedly, often to justify investment in data platforms that might offer better support to startups and developers. This might be indeed be a useful service to offer your re-users, in some sectors and for some datasets. APIs can certainly help support the creation of applications and services. But it's not a general requirement.

In contrast, we can look at data that is collected and released as part of academic research. The majority of these datasets are tiny, and will probably never be updated after the completion of a specific research project. They're likely to have a small audience and will probably be analysed using desktop tools. But these datasets still have an important role in both the commons and the scientific record. The differences are in the types of reuse they enable.

As practitioners it is important for us to see the whole elephant. We need to understand these different needs and tailor our advice accordingly. These different perspectives provide us with opportunities to learn from one another and, where applicable, transfer best practices between the sectors in which we work.

In this guide I will be trying to provide advice that is applicable to a broad set of open data projects. But not all of the advice and recommendations contained in the guide will necessarily be applicable in all circumstances. It's up to you to consider how and when to use these best practices in your own work.

Let's get started.