# Module 1 — Research Data Life Cycle

**Learning objective:** Understand and describe the main phases of the Research Data Life Cycle and how they apply to oceanographic data.

*By failing to prepare, you are preparing to fail. Benjamin Franklin.*

### Lessons
1. Introduction to the Research Data Life Cycle
2. Phases of the RDLC
3. Open data
4. The FAIR Principles

--- 

### Introduction

The Research Data Life Cycle (RDLC) enables us to visualize and map the different phases through which research data need to go. Several schemes and interpretations of a RDLC exist, but they all share the same general flow: data is collected to gather information, this information leads to knowledge and this in its turn leads to wisdom.

This module will introduce you to the different phases of the Research Data Life Cycle (RDLC) and briefly explain what each of these phases generally encompasses. In the following modules, each phase of the RDLC will be dealt with in more detail, with specific aspects and best practices thoroughly explained.

#### Learning Outcomes
After completing this module successfully, you should be able to:

- Describe the concept of chaos in relation to data and information
- List the phases of the Research Data Life Cycle
- Describe the phases of the Research Data Life Cycle
- Describe the FAIR principles
- Explain the characteristics of open data 
- Identify the Pros and Cons of  open data
- Compare FAIR and open data


## M1. Lesson1. Introduction to RDLC

### Introduction

The Research Data Life Cycle (RDLC) enables the visualisation and mapping of the different steps research data go through. This in its turn allows us to associate these steps with short-term and long-term data management requirements. Several schemes and interpretations of a RDLC exist. Although different, they all share the same general concepts and flow, which is the following: data is collected to gather information, this information brings knowledge and from this knowledge, we evolve to wisdom. This flow is also known as the Data-Information-Knowledge-Wisdom model, or DIKW pyramid.

Data is very valuable. It is collected for a reason, and collecting data requires financial support. Data and information is therefore seen as one of the most valuable assets of an institute, organisation, university, company, together with the people working there. However, without continuous efforts to maintain order in the data, chances are high that you end up in ‘data chaos’ or ‘data entropy’.

Let’s look into the concept of chaos or entropy a little. Just close your eyes and picture your living room, closet, basement, garage, attic, kitchen cupboards, storage room, … Are these places well organised, where you can immediately find things, and don’t have any problem with neatly storing new things? Then you are a fine and talented organisor or manager! However, what are the odds that at least one of these places corresponds to a location where you randomly introduce new things, or just dump things from other locations because they don’t belong there anymore, and after a while you can not find that one item of which you are absolutely certain you stored it there? Welcome to chaos, or entropy. Now have an objective look at your computer, and how you store files… What do you think? Order or chaos? Or ‘organised chaos’, if you still find your way around your files, but it would be impossible for someone else? Welcome to the world of ‘data chaos’ or ‘data entropy’.

Now, project the way you organize your files on your laptop to the scale of an institute. The chaos or entropy level becomes much higher, and a lot harder to manage if a basic structure is missing. And no, data chaos is not an IT-problem, for data engineers or data architects to deal with. Data organization needs to happen at all levels, and it starts with proper instructions and guidance to all who collect data and work with the data, throughout the full life cycle of the data at hand. And this is also part of the Research Data Life Cycle, ensuring a certain level of order, to avoid data chaos.

A wide variety of books and articles have already been written on the topic of the Research Data Life Cycle. What is obvious from all these, is that planning is crucial, and this needs to be done in the very early stage of the life cycle, even prior to the start of the actual research or project. In this planning phase, the Data Management Plan (DMP) needs to be compiled, which has even become a mandatory document for many grant applications and funding agencies. The Data Management Plan specifies how data will be handled both during and after a project has ended. As this Data Management Plan is such an essential component, this will be addressed in detail in Module 2.

### The research data lifecycle: What is it?

The research data lifecycle is a key concept within Research Data Management (RDM). It describes the different stages research data go through before, during, and after a research project. Various data management activities take place within each stage of the data lifecycle, and the choices made in one stage influence the next one. Watch this Knowledge clip: The research data lifecycle (Ghent University Data Stewards (2020). Knowledge clip: The research data lifecycle. CC-BY)
(https://www.youtube.com/watch?v=OL_Vd9dd-AQ)

## M1. Lesson 2. Phases of the Research Data Life Cycle (RDLC)
### Phases of the RDLC

The number of phases defined in the Research Data Life Cycle (RDLC) may vary from source to source, depending on how people look at it. No matter how many phases are defined (possibly 5 to 8), the concepts and key-topics remain the same. Exceptions can always exist, where one or more of the stages do not apply to the available data.

Within this course, we will follow the 6-phase division, and make indications of where differences may occur with other phase-divisions.

The following scheme is a simple representation of the 6-phase Research Data Life Cycle, and will be used throughout this course.

    Phase 1: Planning
    Phase 2: Collecting data
    Phase 3: Processing and analysing data
    Phase 4: Preserving data
    Phase 5: Sharing data
    Phase 6: Reusing data

As there are differences in the number of phases that might be represented with the Research Data Life Cycle, there are also numerous variations in how the RDLC is displayed visually. The scheme below can be used as a mnemonic device throughout the course, to help situate a number of actions within the full RDLC.

<img src="../images/data_life_cycle.png" alt="how the RDLC is displayed visually" width="800"/>


### Phase 1. Planning

*Also referred to as 'Plan & Design'*

The Planning phase is where the actual planning of the full research is discussed and written down, where specifications and agreements are decided upon and where system architecture is enabled, as well as the development of the life cycle, the needed toolsets, infrastructure and standards are agreed upon and arrangements made for their implementation. This all corresponds to information required within the Data Management Plan, sometimes also referred to as the Evidence Plan. It is within this phase that researchers identify the data that will be collected to answer their research questions.

The Data Management Plan will be fully explained in Module 2 - Planning

Before you go to the Module on Data Management Plan (DMP), take some time to watch the following video “How to avoid a data management nightmare”, (NYU Health Sciences Library, CC BY). (https://www.youtube.com/watch?v=nNBiCcBlwRA)

### Phase 2: Collecting data

*Also referred to as ‘data acquisition’,  ‘Collect & create’, ‘find & create’, ‘generation & collection’, ‘data acquisition’,  ‘data collection’ , ‘data generation & collection’*

The ‘collect & create’ phase is all about the actual capturing of the data. Observations are either made by hand or with sensors or other instruments, and the data are placed into a digital format. The creation and collection of data varies greatly between and within scientific disciplines, which requires detailed documentation on how data are being collected, by making sure this information ends up in the associated metadata.

For a data manager and data steward, this phase is of particular importance. Through the data acquisition phase, strong focus needs to go to the metadata creation, management and maintenance rather than a single focus on the data itself. As data without appropriate metadata can be completely without value, it is the task of a data manager, curator and steward to guide researchers in the process of proper documentation of the metadata. 

The topic Metadata will be fully explained in Module 3 - Data acquisition

### Phase 3. Processing and analysing data

*Also referred to as ‘data curation & processing’*

The collected data will need to be processed, in order to be usable to test the hypotheses. The full organisation and standardization of the (meta)data takes place in this phase, as well as the extraction and mangling of the data. This might involve the cleaning of data, data transformations and putting the data through certain quality control procedures. All processing steps need to be thoroughly documented, so that the end results of the analyses can be replicated, starting from the raw data. The data analysis is about questioning the raw data, to give insights that either confirm or refute the research hypotheses. Also here, all used methodologies,used programmes and codes written for the analysis need to be documented in detail.

How to organize data will be explained in Module 4 - Data processing and analysing

### Phase 4. Preserving data

*Also referred to as ‘evaluate & archive’*

Towards the end of a project, the data that support the hypothesis and outcomes need to be preserved for the long-term, as they have long-term value. The goal of this phase is to ensure that data are kept available beyond the research project, which will in many cases involve the deposit of the digital data in a data repository or data centre. But it is also aimed at preserving data in the very short-term, through which the risk of data loss is minimized.

Preservation activities may involve quality assurance of data, file format conversion, creation of metadata records with assignment of Digital Object Identifiers (DOIs) to datasets, licensing datasets for re-use, and putting in place any required access controls. A number of these actions can already be taken on in other phases of the research data life cycle. Confidential and non-digital data may be held locally or in a non-public location, in which case they should be managed by an accountable person or group, who can ensure they are stored and preserved properly.

The complete opposite of safeguarding data - namely data destruction - is also part of this phase of the research data life cycle. Data destruction or purging implies the removal of every copy of a data item from an organisation. The challenge is to ensure that the data has been properly destroyed and - before destruction - that they have exceeded their required regulatory extension period.

Module 5 is completely dedicated to this topic of the research data life cycle.

### Phase 5. Sharing data

This phase is entered mostly after the actual research has been done. Data were collected, analyses were done and first results are available. At that point, the research data can be published for re-use, either by making it available in a repository, provided with metadata and a license, or published as a data paper, or - most preferably - a combination of both.

At this point in the data life cycle, it is important that data are disseminated in a timely fashion. Once data has been processed and placed in the appropriate archive, it should be made available online. An agreed data exchange policy should be documented. Even if the data is still under an embargo period, or restrictions apply, the data should be made discoverable by publicly sharing its metadata. This way, other researchers can learn about the existence and background of this data, get in touch with the right people to acquire more information on the data itself and - whenever relevant - discuss the possibilities of a collaboration.

How data can be shared is tackled in detail in Module 6.

### Phase 6. Reusing Data

Data that are available for discovery and access may be re-used by other researchers, either to substantiate the findings of the original research, or to generate new insights through further interrogation and analysis. At this stage the data may become raw materials collected within a new cycle of research. Research data may also have other valuable uses, e.g. in policy-making, development of commercial products and services, and teaching.