## How to create a graph for publication in an open and reproducible way

By [Serena Bonaretti](https://sbonaretti.github.io/)  
Content under Creative Commons Attribution license CC-BY-NC-SA 4.0     
Code under GNU-GPL v3 License 

---

- The task is to produce a figure similar to Fig. 8 in the paper:  
  *Orellana et al. 2024. **Revealing the complexity of meniscus 
microvasculature through 3D visualization and analysis***
https://doi.org/10.1038/s41598-024-61497-2

- This notebook is also an example of how to create a **reproducible workflow** using Jupyter Notebook, Python, Zenodo, and GitHub.

- These are the 3 steps that make this workflow reproducible:   
  1. [Automatically downloading data from a repository](#download)  
  2. [Automating data manipulation and plot creation](#manipulation)
  3. [Printing dependences](#dependencies) 

--- 
Installing the Python packages that we will need:

In [None]:
!pip3 install wget

In [None]:
!pip3 install watermark

--- 
- Importing the Python libraries that we will need below:

In [None]:
import wget                     # to download from zenodo
import pandas as pd             # to manipulate the table
import numpy as np              # for some math
import matplotlib.pyplot as plt # to plot 

---
<a name = "download"></a>
## 1. Automatically downloading data from a repository

- Input data should be in a repository that provides a **persistent digital object identifier (DOI)** so that data will be available in the future
- It is *discouraged* to share data from **personal repositories** because links tend to get deleted, thus compromising the reproducibility of the workflow

- The data that we will use in this notebook is in the file `2024_Orellana_data_fig_8.csv` at this Zenodo link: www.doi.org/10.5281/zenodo.11491110

In [None]:
# ---> complete the zenodo url with the last digits of the specific version of the dataset DOI
zenodo_url = "https://zenodo.org/record/      /files/"  
print (zenodo_url)

In [None]:
# ---> enter the file name
file_name  = "" 
print (file_name)

In [None]:
# download using wget.download(repository, local)
wget.download(zenodo_url + file_name, file_name) 

In [None]:
# read the table
df = pd.read_csv(file_name)
# show the table
df

---
<a name = "manipulation"></a>
## 2. Automating data manipulation and plot creation

- **Automatic** data manipulation does not compromise original data and keeps track of manipulations, making analyses reproducible
- It is *discouraged* to do **manual** manipulation, as it compromises original data, is prone to errors, and does not keep track of changes, making analyses hardly reproducible



In [None]:
# calculate mean and standard error for each column
means = df.mean()
print ("means:", means)

In [None]:
# calculate the standard error for each column
standard_errors = df.sem()
print ("standard_errors:", standard_errors)

In [None]:
# transform the mean from a pandas series into a list
means = means.to_list()
print ("means:", means)

In [None]:
# ---> transform the standard error from a pandas series into a list


In [None]:
# separate mean for lateral and medial sides
m_lateral = means[::2]
m_medial  = means[1::2]

print ("m_lateral", m_lateral)
print ("m_medial", m_medial)

In [None]:
# ---> separate standard error for lateral and medial sides


In [None]:
# create the plot

# position of the bars on x
x_axis = np.arange(len(m_lateral)) 

# plotting the bars for the lateral meniscus
plt.bar(x_axis - 0.2,                # position on x
        m_lateral,                   # y
        0.35,                        # bin width
        label = 'Lateral meniscus',  # label for legend
        yerr=[[0,0,0,0],se_lateral], # error bar
        capsize=4,                   # length of error bar tick
        color='#3B65AE',             # bar color
        edgecolor='black',           # color of bar edge 
        linewidth=1.8                # width of bar edge
       ) 

# plotting the bars for the medial meniscus
plt.bar(x_axis + 0.2, 
        m_medial, 
        0.35, 
        label = 'Medial meniscus',
        yerr=[[0,0,0,0], se_medial],
        capsize=4,
        color='#BED7EF',
        edgecolor='black',
        linewidth=1.8
       ) 

# adding other characteristics to the graph
plt.xticks(x_axis, ["Anterior zone", "Mid-Anterior\nzone", "Mid-posterior\nzone", "Posterior zone"]) 
plt.ylabel("Vascular volume contribution\n(%)")
plt.ylim([0,100])
plt.legend(loc="upper center")
plt.show()

<a name="dependencies"></a>
## 3. Printing dependences

- Dependences are fundamental to record the **computational environment**.   
- We use [watermark](https://github.com/rasbt/watermark) to print: version of python, ipython, and packages, and characteristics of the computer
  - *Note:* Watermark is not a Python package but a *Jupyter notebook extension*–this is why commands start with`%`

In [None]:
%load_ext watermark

# printing date, python version, ipython version, and machine characteristics
%watermark 

# print Python packages' version
%watermark --iversion 

---
## Last steps

- When publishing a paper, this kind of notebook should be uploaded to a GitHub repository
- The **link to notebook on GitHub** can be **added to the Figure caption** so that a reader can reproduce the graph
- *Note*: It is highly possible that in a not too far away future, we will write **digital reproducible and interactive papers** (using [MyST](https://mystmd.org/)!), which are papers with interactive graphs included in the publications. This starts to happen for conferences (like [SciPy 2024](https://www.scipy2024.scipy.org/)) and will happen soon for journals in the Earth Sciences
- To be continued...