# Reproducible publications with Jupyter Notebooks
A community initiative to assess reproducibility of publications accompanied by Jupyter Notebooks

_source | https://github.com/sparcopen/open-research-doathon/issues/25_

![source: https://github.com/Daniel-Mietchen/learning2code/milestone/3](../img/open_data_doathon.png)

# Loading dependencies¶


In [None]:
library(europepmc)
library(data.table)
library(DT)

# Settings parameters

In [None]:
limit_entries = 100000

# Retrieve information programmatically for publications in EuropepMC

This idea and code snippet below are [@rossmounce's](https://github.com/rossmounce) <img src="https://www.pngfind.com/pngs/m/40-405156_github-octocat-logo-black-and-white-transparent-github.png"  width="25" align="center" > and you can see it [here in the discussion thread at the respective github issue](https://github.com/sparcopen/open-research-doathon/issues/25#issuecomment-283959368).

In [None]:
hits <- europepmc::epmc_search(query = 'jupyter%20OR%20ipynb&synonym=TRUE', 
                               limit = limit_entries)

dim(hits)
write.csv(hits,file=paste0( "../data/3-reproducible-publications/hits_", dim(hits)[1], ".csv"))

In [None]:
epmc_search_results <- paste0( "../data/3-reproducible-publications/hits_", 
                              dim(hits)[1], 
                              ".csv")
message(paste0("\n","europepmc::epmc_search results file: ", epmc_search_results))

hits <- data.table::fread(epmc_search_results)
head(hits, 2)


# Let's try the link trick again!

We will make the `pmcid` column entries, hyperlinks that take us to the original article by using the HTML trick we learned in the session `2-plotting-in-R`.

In [None]:
# Mutate the package names to create clickable links:

hits$pmcid <- paste0("<a href='https://europepmc.org/articles/",
                     hits$pmcid, 
                     "'",
                     "target='blank",
                     "'>",
                     hits$pmcid, 
                     "</a>")

In [None]:
DT::datatable(head(hits,2), 
              escape = FALSE) # required for the link trick to work

# And now let's keep only the columns we really need
The title of the publication and the hyperlink to the journal.

In [None]:
toKeep <- c("pmcid", "title", "pubYear")
DT::datatable(hits[,toKeep, with=FALSE],escape = FALSE) # required for the link trick to work

# Our goal for the following time

- Take a look at the publications from the interactive table above.
- Find one that seems of interest to you and visit the link 
- Does the publication have a supplementary `.ipynb` file? Or it was only mentioned in text? (false positive!)
- While in the publication page `CTRL + F` the word `github`. Is there a link that point to a repository?

If yes, perfect!

- In the repository page, search and find the <img src="../img/clone_or_download.png"  width="100" align="center" > button.
- Once you have the link, come back to __CloudOS__ and let's try to inspect the code.


# Keeping track of what was 
- interesting
- reproducible
- had a github link

If you want you can also annotate the table we retrieved with help from the [{ropensci/europepmc}](https://ropensci.github.io/europepmc) R package.

You can initialise a column named `github` for example, with the value 0.

In [None]:
hits$github <- "edit me!"

There is an argument in the `DT::datatable()` function that we have been using, called `editable`. By turing this to true, you can update the information for the column you have just created. Similarly you can create other metadata columns, like `reproducible`.

In [None]:
toKeep <- c(toKeep, "github")
DT::datatable(hits[1:4,toKeep, with=FALSE],
              escape = FALSE, # required for the link trick to work
              editable = TRUE)

Double click on an `"edit me!"` cell in the `github` column table above, and set it to 1 for testing! You can export the newly annotated table into a .csv file using the `data.table::fwrite()` function.

In [None]:
data.table::fwrite(hits,
                   col.names = TRUE, 
                   row.names = FALSE,
                   file = "../data/3-reproducible-publications/annotated_hits.csv", 
                   sep  =',')

# We can now check together a really great example of a reproducible publication

In [None]:
DT::datatable(hits[,c("pmcid", "title", "github")],
              escape = FALSE)

# Hint: 🌧️ 🎨 🤔