# A1: A brief introduction to reproducible research (in the social sciences)

- Concerns about the replicability and reproducibility of scientific research {cite:p}`goodmanWhatDoesResearch2016`

- You might have heard about the "replication crisis"

- Example of Reinhart and Rogoff (see also [Memo to Reinhart and Rogoff: I think it’s best to admit your errors and go on from there](https://statmodeling.stat.columbia.edu/2013/04/16/memo-to-reinhart-and-rogoff-i-think-its-best-to-admit-your-errors-and-go-on-from-there/); for more examples see https://twitter.com/kirstie_j/status/1360172705933365248)

- Very hands-on example of an incorrect paper by T. K. Moon on "The Expectation-Maximization Algorithm" and an explanation what went wrong by Dennis Ogbe: https://ogbe.net/blog/sloppy_papers.html



## Terminology

### Terms used in the literature
- Reproducibility 
- Replicability
- ...
- Repeatability
- Readability
- ...
- Reliability 
- Robustness 
- Generalizability
- ...

### What terms do we use?

- Various papers among various disciplines provide a multitude of (sometimes conflicting) definitions {cite:p}`goodmanWhatDoesResearch2016, pengReproducibleResearchComputational2011, freeseReplicationSocialScience2017`

- Our approach is as follows: Reproducibility $\neq$ Replicability {cite:p}`barbaTerminologiesReproducibleResearch2018`

- More precisely {cite:t}`stoddenImplementingReproducibleResearch2014`:

::::{important} 
**Reproducibility** is the calculation of quantitative scientific results by independent scientists using the original datasets and methods 
::::

::::{important} 
**Replication** is the practice of independently implementing scientific experiments to validate specific findings 
::::

### The Turing Way's of defining reproducible research

![](fig/reproducible-matrix.jpg)

- "**Reproducible**: A result is reproducible when the same analysis steps performed on the same dataset consistently produces the same answer.

- **Replicable**: A result is replicable when the same analysis performed on different datasets produces qualitatively similar answers.

- **Robust**: A result is robust when the same dataset is subjected to different analysis workflows to answer the same research question (for example one pipeline written in R and another written in Python) and a qualitatively similar or identical answer is produced. Robust results show that the work is not dependent on the specificities of the programming language chosen to perform the analysis.

- **Generalisable**: Combining replicable and robust findings allow us to form generalisable results. Note that running an analysis on a different software implementation and with a different dataset does not provide generalised results. There will be many more steps to know how well the work applies to all the different aspects of the research question. Generalisation is an important step towards understanding that the result is not dependent on a particular dataset nor a particular version of the analysis pipeline."

(Source: https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions.html)

### Reproducibility in more hands-on terms

> "'Reproducibility is just collaboration with people you don't know, including yourself next week' – @philipbstark #dsesummit #openscience"

(Source: https://twitter.com/jakevdp/status/519563939177197571, accessed on 2021-10-27)

### Reproducible research is just research done right

> "Reproducible research is a by-product of careful attention to detail throughout the research process and allows researchers to ensure that they can repeat the same analysis multiple times with the same results, at any point in that process. Because of this, researchers who conduct reproducible research are the primary beneficiaries of this practice" {cite:p}`alstonBeginnerGuideConducting2021`. 


## Opportunities and obstacles of reproducible research

The [Turing Way](https://the-turing-way.netlify.app/reproducible-research/overview/overview-barriers.html) as well as {cite:t}`alstonBeginnerGuideConducting2021` provide an overview of possible barriers to reproducibility: 

- Limited incentives to give evidence against yourself

- Publication bias towards novel findings

- Not considered for promotion

- Big data and complex computational infrastructure

- Takes time

- Requires additional skills

- Intellectual property rights

Some of the technical aspects will be discussed in section [](section:computer-literacy)

## Why does it matter?

> "An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures" {cite:p}`buckheitWaveLabReproducibleResearch1995`. 


{cite:t}`leeperReproducibleResearchWhat2014` distinguishes external and internal reasons

**External reasons** 

- Trust in scientific results is of immense importants, hence, make sure that your research results can be trusted

- Other scientists can build on your workflow/procedures, it helps to accumulate scientific knowledge

- Requirements of journals or funding agencies

https://the-turing-way.netlify.app/reproducible-research/overview/overview-benefit.html


**Internal reasons**

- You get questions about one of your earlier papers -- and are unable to reproduce the analyses (or are even unable to share the data)

- Confidence in your own work

- Easier workflow

- Easier collaboration

Another list of reasons is provided by {cite:t}`alstonBeginnerGuideConducting2021`, here, the authors distinguish between "Reproducible research benefits those who do it" and "Reproducible research benefits the research community":


**Reproducible research benefits those who do it** 

- It helps researchers remember how and why they performed specific analyses during the course of a project

- It enables researchers to quickly and simply modify analyses and figures

- Reproducible research enables quick reconfiguration of previously conducted research tasks so that new projects that require similar tasks become much simpler and easier

- Conducting reproducible research is a strong indicator to fellow researchers of rigor, trustworthiness, and transparency in scientific research

- Reproducible research increases paper citation rates

**Reproducible research benefits the research community**

- Reproducible research allows others to learn from your work

- Reproducible research allows others to protect themselves from your mistakes

Finally, The Turing Way also provides a list of reasons why reproducible research might be beneficial: https://the-turing-way.netlify.app/reproducible-research/overview/overview-benefit.html

## References

```{bibliography}
```