# Introducing the African Data Hub CKAN Repository
> An overview of the ADH data repository - what it is, how to use it, and how the African Data Hub engages with it.  

- toc: true 
- badges: false
- comments: false
- categories: [ADH, CKAN, Data, Overview]
- image: images/chart-preview.png

### Introduction
The [African Data Hub](https://www.africadatahub.org/) (ADH) seeks to support and promote quality data-driven journalism in Africa by providing newsrooms, researchers, and the general public with easy access to quality African data. African data is typically difficult to find, stored in unwieldy formats, and is often out of date. ADH is working to remedy this by actively seeking out interesting and useful African datasets, converting them to more easily accessible formats, updating and creating combined datasets where possible storing them all on our opensource, online, CKAN [data repository](https://ckan.africadatahub.org/).

[CKAN](https://ckan.org/) is an open source data management system that is used by hundreds of organisations around the world including the national governments of USA, Canada, Singapore, Australia and others. We use this resource to host any data we find that we believe may be useful in serving our mandate in the promotion of quality data-driven journalism in Africa. 






### Dataset vs Resource
When using CKAN, it is important to understand the difference between the system's definition of a `dataset` vs a `resource`. A dataset is a collection of related data resources, while a resource is a single file. It may be useful to think of a dataset as a folder on your computer and a resource as a file in that folder. When posting data on CKAN, you first need to create a dataset and fill in the required `metadata`. Then you can add resources either by uploading files, or linking them via URL. 

### Metadata
The importance of complete and correct metadata cannot be overstated. Metadata provides context for a given dataset which allows a potential user to understand what it is about, where it came from, how it was created and how it can be used. The following table provides details on the required metadata.

In [27]:
%%html
<p><b>Table 1: </b>Metadata description </p>
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-j1i3{border-color:inherit;position:-webkit-sticky;position:sticky;text-align:left;top:-1px;vertical-align:top;
  will-change:transform}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
@media screen and (max-width: 767px) {.tg {width: auto !important;}.tg col {width: auto !important;}.tg-wrap {overflow-x: auto;-webkit-overflow-scrolling: touch;}}</style>

<div class="tg-wrap"><table class="tg">
<thead>
  <tr>
    <th class="tg-j1i3"><b>Metadata</b></th>
    <th class="tg-j1i3"><b>Description</b></th>   
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0pky">Title</td>
    <td class="tg-0pky">A descriptive title</td>    
  </tr>
  <tr>
    <td class="tg-0pky">Description</td>
    <td class="tg-0pky">Brief description of what is in the data - is this the industry standard, how does it compare to similar datasets, is this data hosted/used elsewhere?
Summary of the methodology (include link to methodology where applicable). Link to example analysis (where applicable) 
</td>
  </tr>
  <tr>
    <td class="tg-0pky">Tags</td>
    <td class="tg-0pky">Main themes</td>
  </tr>
  <tr>
    <td class="tg-0pky">Licence</td>
    <td class="tg-0pky"><b>NB!!</b> Licence this data is shared under. Use <a href = "https://chooser-beta.creativecommons.org/"> this resource </a> if you're unsure
</td>
  </tr>
  <tr>
    <td class="tg-0pky">Organisation</td>
    <td class="tg-0pky">ADH organisation that sourced/uses this data</td>
  </tr>
  <tr>
    <td class="tg-0pky">Visibility</td>
    <td class="tg-0pky">Can be set to public/private (ADH members only)</td>
  </tr>
  <tr>
    <td class="tg-0pky">Source</td>
    <td class="tg-0pky"><b>NB!!</b> Link to where the data was found</td>
  </tr>
  <tr>
    <td class="tg-0pky">Version</td>
    <td class="tg-0pky">Version number</td>
  </tr>
  <tr>
    <td class="tg-0pky">Author</td>
    <td class="tg-0pky">Name of person/entity that produced the data</td>
  </tr>
  <tr>
    <td class="tg-0pky">Author email</td>
    <td class="tg-0pky">Contact email for person/entity who produced the data</td>
  </tr>
  <tr>
    <td class="tg-0pky">Maintainer</td>
    <td class="tg-0pky">Name of ADH member responsible for this dataset</td>
  </tr>
  <tr>
    <td class="tg-0pky">Maintainer email</td>
    <td class="tg-0pky">Contact email for ADH member responsible for this dataset</td>
  </tr>
  <tr>
    <td class="tg-0pky">Groups</td>
    <td class="tg-0pky">Project this data is used for</td>
  </tr>
  <tr>
    <td class="tg-0pky">Data format</td>
    <td class="tg-0pky">Filled in automatically</td>
  </tr>
</tbody>
</table></div>

Metadata,Description
Title,A descriptive title
Description,"Brief description of what is in the data - is this the industry standard, how does it compare to similar datasets, is this data hosted/used elsewhere? Summary of the methodology (include link to methodology where applicable). Link to example analysis (where applicable)"
Tags,Main themes
Licence,NB!! Licence this data is shared under. Use this resource if you're unsure
Organisation,ADH organisation that sourced/uses this data
Visibility,Can be set to public/private (ADH members only)
Source,NB!! Link to where the data was found
Version,Version number
Author,Name of person/entity that produced the data
Author email,Contact email for person/entity who produced the data


### Types of Data Available on the ADH data repository
The data is available in a wide variety of formats, from spreadsheets (eg: XLSX, CSV) to geographic (eg: SHP, geoJSON) to image (eg: geoTIFF, PNG) and even documents like PDF, etc. Some datasets will include the same data in different formats under different resources. It is therefore not always necessary to download an entire dataset, but rather, only look for formats that you are comfortable with or are able to use. Different formats of data are also often different sizes. For a tutorial on exploring a dataset with different formats of the same data, see [here](https://heikoheilgendorff.github.io/adh_data_blog/geo%20data/population%20density/2022/08/12/geo-data-prep-tutorial.html) **ENTER PUBLISHED LINK HERE**.

Many countries tend to release their data in PDF format. Unfortunately, data in PDF form is usually difficult to work with. As such, when we come across PDF data that we believe could be useful for an African data journalist, we try to extract the data and present it in csv or xlsx format, which is much easier to work with. When ever we do this, we are sure to include links to the original PDF documents so that the authenticity and correctness of the extracted data can be verified by anyone who wishes to use it.

### Finding and accessing data
Our data is organised in terms of `datasets`, `organisations`, `groups` and `tags`. Datasets have already been described above.

#### Organisations
Each organisation that is part of ADH is represented by an organisation on CKAN. Any data that was found, produced or used by a particular ADH partner can be found in that partner's [organisation](https://ckan.africadatahub.org/organization/). 

#### Groups
[Groups](https://ckan.africadatahub.org/group/) indicate the data category, for example `Health` data, `Economic` data etc. A dataset may belong to more than one group. Groups also serve as folders for all datasets used in a particular data tool or project. Finally, if several different datasets have all been sourced from the same place, then all of those datasets are placed in a group named after that source. See for example, the [Humanitarian Data Exchange](https://ckan.africadatahub.org/group/hdx-humanitarian-data-exchange), and the accompanying [blog post](https://heikoheilgendorff.github.io/adh_data_blog/hdx/overview/ckan/data/2022/08/31/HDX_Blogpost.html)**ENTER PUBLISHED LINK HERE**.

#### Tags


### Uploading Data
Permissions, data custodians
