Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
193 lines (121 sloc) 22.7 KB

Data is present: workshops and datathons

Enhanced data presentation using reproducible documents and dashboards

Calendar

Date Title Event and location Registration Attendance funding Organisation funding
To follow Data dashboards and Binder environments University of York Link N8 CIR N8 CIR,
SSI Fellowship
To follow Introduction to R and R Markdown Durham University Link N8 CIR N8 CIR,
SSI Fellowship
25 Aug 2020 R is for Resources, Reproducibility and oppRtunities Lancaster Conference on Infant and Early Child Development, Lancaster University Link
27 July 2020 Open data and reproducibility 2.0 UK Cognitive Linguistics Conference, University of Birmingham Link SSI Fellowship
To follow Data dashboards and Binder environments University of Liverpool Link N8 CIR N8 CIR,
SSI Fellowship
To follow Introduction to R and R Markdown University of Manchester Link N8 CIR N8 CIR,
SSI Fellowship
Event cancelled Open data and reproducibility 2.0 SatRday Newcastle upon Tyne, Newcastle University Link SSI Fellowship

Background

This project offers free activities to learn and practise reproducible data presentation across the UK. Pablo Bernabeu organises these events thanks to a Software Sustainability Institute Fellowship that covers organisation costs on a budget basis, and thanks to further support from the N8 CIR. If you would like to bring any of these events to your institution at no cost, please submit a request (see Contact).

Open-source software

Programming languages such as R and Python offer free, powerful resources for data processing, visualisation and analysis. Experience in these programs is highly valued in data-intensive disciplines.

Open data

Original data has become a public good in many research fields thanks to cultural and technological advances. On the internet, we can find innumerable data sets from sources such as scientific journals and repositories (e.g., OSF), local and national governments (e.g., London, UK [1, 2]), non-governmental organisations (e.g., data.world), etc. Researchers inside and outside academia nowadays share a lot of their data under attribution licences (e.g., Creative Commons, the UK Open Government Licence, etc.). This allows any external analysts to access these raw data, create (additional) visualisations and analyses, and share these. In society, making data more accessible can demonstrably benefit citizens (despite limitations).

Activities

Activities comprise free workshops and datathons.

Workshops

R is a programming language greatly equipped for the creation of reproducible documents and dashboards. Four workshops are offered that cover a suite of interrelated tools—R, R Markdown, data dashboards and Binder environments—, all underlain by reproducible workflows and open-source software.

Each workshop includes taught and practical sections. The practice provides a chance for participants to experience and address common issues with the code. The level of taught sections is largely tailored to participants; similarly, practice sections are individually adaptable by means of easier and tougher tasks. The duration is also flexible, and some of the workshops can be combined into the same session.

The RStudio interface is used in all workshops. Multi-levelled, real code examples are used. Throughout the workshops, and especially in the practice sections, individual questions will be encouraged.

Workshop 1: Introduction to R

This workshop can serve as an introduction to R or a revision. It demonstrates what can be done in R, and provides resources for individual training. Since the duration is limited, online courses are also recommended (see examples and fee waivers for full content).

Illustration of wide and tidy data formats, from Postma and Goedhart (2019)

Image from Postma and Goedhart (2019; https://doi.org/10.1371/journal.pbio.3000202.g001).

Workshop 2: R Markdown documents

Set your input and output in stone with R Markdown. These reports may be enriched with website features (HTML/CSS) and published as websites, PDF, or Word. Moreover, with R packages such as bookdown, bookdownplus, blogdown and flexdashboard, documents can be formatted into websites, digital books and data dashboards. Other useful packages include rmarkdown, knitr, kableExtra and ggplot2.

Example of paper created with bookdownplus (image retrieved from R bookdownplus package)

Image from bookdownplus package (https://bookdownplus.netlify.com/portfolio/).

Workshop 3: Introduction to data dashboards

Data dashboards are web applications used to visualise data in detail through tables and plots. They assist in explaining and accounting for our data processing and analysis. They don't require any coding from the end user.

Illustration of the usage of dashboards alongside data repositories

These all-reproducible dashboards are published as websites, and thus, they can include hyperlinks and downloadable files. Some of the R packages used are knitr, kableExtra, reactable, ggplot2, plotly, rmarkdown, flexdashboard and shiny. The aim of this workshop is to practise creating different forms of dashboards—Flexdashboard and Shiny—the latter of which offers greater features, and to practise also with the hosting platforms fitting each type—such as personal websites, RPubs, Binder, Shinyapps and custom servers. A great thing about dashboards is that they may be made very simple, but they can also be taken to the next level using some HTML, CSS or Javascript code (on top of the back-end code present in the R packages used), which is addressed in the next workshop.

Workshop 4: Binder environments and improving data dashboards

Binder

Binder is a tool to facilitate public access to software environments—for instance, by publishing an RStudio environment on the internet. Binder can also host Shiny apps. It is generously free for users. After looking at the nuts and bolts of a deployment, participants will be able to deploy their own Binder environments and check the result by the end of the workshop. For this purpose, it's recommended to have data and R code ready, ideally in a GitHub repository.

Improving data dashboards

We will practise how to improve the functionality of dashboards using some HTML, CSS and Javascript code, which is the basis of websites.

<!-- Javascript function to enable a hovering tooltip -->
<script>
$(document).ready(function(){
    $('[data-toggle="tooltip1"]').tooltip();
});
</script>

Examples of data dashboards

Trade-offs among dashboards

Next, we will practise with three dashboard types—Flexdashboard, Shiny and Flexdashboard-Shiny—and with the suitable hosting platforms. Firstly, the strength of Flexdashboard (example) is its basis on R Markdown, yielding an unmatched user interface (front-end). Secondly, the strength of Shiny (example) is the input reactivity (back-end) it offers, allowing users to download sections of data they select, in various formats. Last, Flexdashboard-Shiny (example) combines the best of both worlds.

Flexdashboard

★ ★ Shiny ★ ★

★ ★ ★ Flexdashboard-Shiny ★ ★ ★

Flexdashboard types are rendered as an HTML document—simple websites—, and can therefore be easily published on personal sites or RPubs. This is convenient because no special hosting is required. In contrast, Shiny and Flexdashboard-Shiny types offer greater features, but require Shiny servers. Fortunately, the shinyapps.io server is available for free, up to some usage limit. This server can host any of the three dashboards mentioned here. Another good option is presented by Binder environments, which can host the Shiny-type dashboards with no (explicit) limit. Yet, the Flexdashboard-Shiny type cannot be hosted in this server (as of January 2020, at least). Consequently, greater functionality may come at a cost for dashboards that have any considerable traffic, whereas dashboards with low traffic may do well on shinyapps.io. Knowing these trade-offs can help navigate usage limits, save on web hosting fees, and increase the availability of our dashboards online, as we can offer fall-back versions on different platforms, as in the example below:

... preferred-dashboard (in case of downtime, please visit this alternative)

Transforming dashboards into the different versions can be as easy as enabling or disabling some features, especially input reactivity. For instance, if we want to downgrade a Flexdashboard-Shiny to a Flexdashboard, to publish it outside of a Shiny server (see example), we must add a setting in the header of the script

knit: (function(inputFile, encoding) {
  rmarkdown::render(inputFile, encoding = encoding) })

and disable reactive features

```{r}
# Number of words selected on sidebar
# reactive(cat(paste0('Words selected below: ', nrow(selected_props()))))
```
Free accounts and tips

Hosting sites have specific terms of use. For instance, shinyapps.io has a free starter license with limited use. Free apps can handle a large but limited amount of data, and up to five apps may be created. Beyond this, RStudio offers a wide range of subscriptions starting at $9/month.

Memory and traffic limits of the free shinyapps.io account can sometimes present problems when heavy data data sets are used, or there are many visits to the app. The memory overload issue is often flagged as Shiny cannot use on-disk bookmarking, whereas excessive traffic may see the app not loading. Fortunately, usage limits need not always require a paid subscription or a custom server, thanks to the following workarounds:

  • develop app locally as far as possible, and only deploy to shinyapps.io only at the last stage;

  • prune data set, leaving only the necessary data;

  • if necessary, unlink data by splitting it into different sets, reducing computational demands;

  • if necessary, use various apps (five are allowed in each free shinyapps.io account);

  • if necessary, link from the app to a PDF with visualisations requiring heavy, interlinked data. High-resolution plots can be rendered into a PDF document in a snap, using code such as below.

    pdf('List of plots per page', width = 13, height = 5)
    print(plot1)
    print(plot2)
    # ...
    print(plot150)
    dev.off()
    

Conveniently, all text in a PDF—even in plots—is indexed, so it can be searched [ Ctrl+f / Cmd+f / 🔍 ] (see example). Furthermore, you may also merge the rendered PDF with any other documents.

Prerequisites and suggestions for participation in each workshop

Necessary: laptop or computer with R and RStudio installed, or access to RStudio Cloud; familiarity with the content of the preceding workshops through the web links herein.

Suggested: having your own data and R code ready (on a Github repository if participating in Workshop 4); participation in some of the preceding workshops.

Datathons: creating reproducible documents and dashboards

In these coding meetups, participants collaborate to create reproducible documents or dashboards using the data and software they prefer (see examples). Since the work can be split across different people and sections, some nice products may be achieved within a session. Any programming languages may be used.

  • Data used: academic or non-academic data of your own or from open-access sources such as OSF, scientific journals, governments, international institutions, NGOs, etc.

    • Inspired by the great Reprohacks, content suggestions are encouraged. That is, if you'd like to have a reproducible document or dashboard created for a certain, open-access data set, please let us know, and some participants may take it on. Suggestions may be posted as issues or emailed to p.bernabeu@lancaster.ac.uk.
  • Purposes

    • collaborating to visualise data in novel ways using reproducible documents or interactive dashboards. For this purpose, participants sometimes draw on additional data to look at a bigger picture;

    • reflecting on the process by reviewing the techniques applied and challenges encountered.

  • Output: A key aspect of datathons is the creation of output. Documents and dashboards are (co-)authored by the participants who work on them, who can then publish them on their websites, or on RPubs, Binder, Shinyapps or custom servers. Time constraints notwithstanding, a lot of this output may be very enticing for further development by the same participants, or even by other people if the code is shared online. Just like with data, an attribution licence can be attached to the code.

Prerequisites and suggestions for participation in datathons

Necessary: basic knowledge of reproducible documents or dashboards.

Suggested: familiarity with the development of reproducible documents or dashboards; an idea about the data you'd like to work with and the kind of document or dashboard you want to create.

Contact

Please submit any queries or requests by posting an issue or emailing p.bernabeu@lancaster.ac.uk.


Creative Commons License
Work licensed under a Creative Commons Attribution 4.0 International License. Linked-to content is independent.

You can’t perform that action at this time.