Welcome to Advanced R course's Binder repo

This initial noteobok gives some preliminary information on the technical prerequisites and the available toolset

# Technical Prerequisites

The main programming language we utilize throughout the course is R.

R derives most of its power from the plethora of extension packages actively developed by a waste community of volunteer coders.

You may browse the packages following the link at CRAN:

https://cran.r-project.org/web/packages/available_packages_by_name.html

However, it is better that we browse the packages in an R'ish way. Please run all the code cells (with a [] sign to the left) using the play button or Shift+Enter key combination followed by Esc key when the cell is actively selected:

First we load some necessary packages:

In [None]:
library(data.table)
library(tidyverse)
library(DT)
library(plotly)

And get info on all CRAN packages:

In [None]:
packages <- tools:::CRAN_package_db()

We see how many packages are available at CRAN at the moment:

In [None]:
data.table::setDT(packages)
packages[, .N]

There are too many columns. Select appropriate ones:

In [None]:
packages2 <- packages %>% select(c("Package", "Date", "Description"))

And we can navigate through this package data with a nice and interactive widget:
(You may need the run the cell twice, if the first try did not yield any output)

In [None]:
DT::datatable(packages2, filter = "top")

You can sort, paginate or search the data with this widget

You may have realized the ".N" code somewhere above which easily returned the number of rows or the "%>%" operator which select few columns from the dataset again easily

Most important packages that we will use throughout the course are data.table (that provides .N shortcut) and tidyverse (that provides %>% operator for pipes along with many other useful tools).

Below are **MUST READ AND WATCH** tools that you should cover before the subsequent sessions:

## Base R refresher

- Introduction to R (Free DataCamp course):
https://www.datacamp.com/courses/free-introduction-to-r

- “R Programming” swirl course:
https://github.com/swirldev/swirl_courses

You may install swirl locally to your PC but an easier way is to run this Binder repo in rstudio mode in two ways:

    - Open a new session following any of the links:
http://notebooks.gesis.org/binder/v2/gh/serhatcevikel/advanced_r/master?urlpath=rstudio

http://mybinder.org/v2/gh/serhatcevikel/advanced_r/master?urlpath=rstudio
    
    - Copy the link url of the active session to a new browser tab change the /tree or /lab part to /rstudio

Then you can run the following code to start the beginner swirl course:

```R
library(swirl)
swirl()
``` 

## "data.table" Prerequisites

### Must read and watch material

- A data.table R Tutorial: Intro to DT[i, j, by] (must read)

By Karljin Willems

https://www.datacamp.com/community/tutorials/data-table-r-tutorial

- Introduction to data.table (must read)

By data.table team

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

- Reference Semantics (must read)

By data.table team

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reference-semantics.html

- Efficient reshaping using data.tables (must read)

By data.table team

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reshape.html

- JOINing data in R using data.table (must read)

By Ronald Stalder

http://rpubs.com/ronasta/join_data_tables

- useR! International R User 2017 Conference data table for beginners I

by Arun Srinivasan

https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/datatable-for-beginners (must watch and exercise)

You can download the exercise material for this tutorial from here:

https://goo.gl/FqqCWz

Or view from here:

https://github.com/arunsrinivasan/user2017-data.table-tutorial


### Optional material

- useR! International R User 2017 Conference data table for beginners II

by Arun Srinivasan

https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/datatable-for-beginners-II (optional)

You can download the exercise material for this tutorial from here:

https://goo.gl/FqqCWz

Or view from here:

https://github.com/arunsrinivasan/user2017-data.table-tutorial


- Keys and fast binary search based subset (optional)

By data.table team

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-keys-fast-subset.html

- Quick R Tutorial: Chapter 3 Tables (optional)

By Frank Erickson

https://franknarf1.github.io/r-tutorial/_book/tables.html#tables

- Secondary indexing and auto indexing (optional)

By data.table team

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-secondary-indices-and-auto-indexing.html

- Talk by Matt Dowle, Main Author of the data.table package in R (optional)

https://www.youtube.com/watch?v=GHrebwrqZ-c

- Matt Dowle's "data.table" talk at useR 2014 (optional)

https://www.youtube.com/watch?v=qLrdYhizEMg

- Frequently asked questions (optional)

By data.table team

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-faq.html

- Advanced tips and tricks with data.table (optional)

By Andrew Brooks

http://brooksandrew.github.io/simpleblog/articles/advanced-data-table/

## "tidyverse" Prerequisites

### Must read and watch material

- Introduction to dplyr (must read)

by Hadley Wickham

https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html

- Data wrangling in R (must read)

by Julie Lowndes

http://jules32.github.io/2016-07-12-Oxford/dplyr_tidyr/

- Pipelines for Data Analysis (must watch)

by Hadley Wickham

https://www.youtube.com/watch?v=40tyOFMZUSM

- A Beginner’s Guide to Tidyverse – The Most Powerful Collection of R Packages for Data Science (must read)

by Akshat Arora

https://www.analyticsvidhya.com/blog/2019/05/beginner-guide-tidyverse-most-powerful-collection-r-packages-data-science/

### Optional material

- Data Wrangling R | RStudio Webinar - 2016 (optional)

by Garrett Grolemund

https://www.rstudio.com/resources/webinars/data-wrangling-with-r-and-rstudio/

or

https://www.youtube.com/watch?v=y9KJmUGc8SE

Download slides from:

https://github.com/rstudio/webinars/tree/master/05-Data-Wrangling-with-R-and-RStudio

- R for Data Science (optional)

by Garrett Grolemund and Hadley Wickham

http://r4ds.had.co.nz/

## data.table + tidyverse must read

- data.table vs dplyr: can one do something well the other can't or does poorly? (very must read)

Answers by Arun Srinivasan and Hadley Wickham

https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly

(This is a must read after you have a basic understanding of dplyr and data.table packages)


# Guided Tour of the Advanced R Toolset

This interface you are accessing is a binder repo.

Binder can create reproducible, executable and interactive environments for coding, analytics and documentation.

You can find more information following this link:

https://mybinder.readthedocs.io/en/latest/

There are two alternative links to open an instance/session of this binder repo:

http://notebooks.gesis.org/binder/v2/gh/serhatcevikel/advanced_r/master

http://mybinder.org/v2/gh/serhatcevikel/advanced_r/master

But gesis (Leibniz Institute) and Mybinder are members of the BinderHub Federation.

What you get is identical in terms of the material and tools. However the extent of available system resources may differ. Let's check:

## Three interfaces to the same repo: Jupyter, Jupyter Lab and Rstudio

In Jupyter, interactive and editable documents called notebooks can be created, opened or edited.

In notebooks there are two types of cells:

- markdown cells: Using markdown - a small subset of HTML - you can easily create professional looking narratives and document segments

- code cells: Starts with []. Can be run and the output will be displayed below

Here you can get come introductory information on how to use Jupyter notebooks:

[![The Data Incubator - Keyboard Shortcuts in Jupyter](https://img.youtube.com/vi/cuHY1o3Cf2s/0.jpg)](https://www.youtube.com/watch?v=cuHY1o3Cf2s&index=4&list=PLjDTd-bDo6Q3nnt7y_GjMaYD79-stYZ-O&t=0s)

### Kernels

The code cells are interpreted using small connector programs called kernels that act as bridges between Jupyter and respective programming languages

Four kernels are readily available in this binder repo:

- Python3
- R
- Bash
- SoS

### Jupyter Lab

A better option to access the Jupyter system's features in a single multi-framed page is Jupyter Lab.

While Jupyter is powered by Python, Jupyter Lab has components enhanced by the interactive and visual features of JavaScript. My personal choice nowadays is Jupyter Lab.

To initiate a session of this repo using Jupyter Lab you can follow the links:

http://notebooks.gesis.org/binder/v2/gh/serhatcevikel/advanced_r/master?urlpath=lab

http://mybinder.org/v2/gh/serhatcevikel/advanced_r/master?urlpath=lab

If you started a binder session in Jupyter or Rstudio mode and you want to switch to Jupyter Lab in the same session:

Just change the /tree or /rstudio part of the url of the session to /lab. You may copy the url to another tab and make the change there

For a short introduction to Jupyter Lab:

https://www.youtube.com/watch?v=7wfPqAyYADY

### Rstudio

Rstudio is a common IDE (integrated development environment) for R is Rstudio.

You are strongly recommended to instal Rstudio to your local PC, however you can use Rstudio in a new session through this repo following the link:

http://notebooks.gesis.org/binder/v2/gh/serhatcevikel/advanced_r/master?urlpath=rstudio

http://mybinder.org/v2/gh/serhatcevikel/advanced_r/master?urlpath=rstudio

If you started a binder session in Jupyter or Jupyter Lab mode and you want to switch to Rstudio in the same session:

Just change the /tree or /lab part of the url of the session to /rstudio. You may copy the url to another tab and make the change there

And of course, to switch to Jupyter interface from the other two, wtihout starting a new session:

Just change the /lab or /rstudio part of the url of the session to /tree. You may copy the url to another tab and make the change there

For some introductory info on Rstudio please watch:

https://www.youtube.com/watch?v=lVKMsaWju8w

## R packages

200+ packages are explicitly installed in this binder repo. However including the dependencies of those packages, a large subset of CRAN packages is readily available in R.

Let's check it:

First get information on installed packages

In [None]:
packages_i <- installed.packages()

And get a glimpse:

In [None]:
packages_i %>% str

That many packages are installed:

In [None]:
nrow(packages_i)

Now using the previous object on package information, let's subset the information for installed ones, make some transformations:

In [None]:
package_names <- packages_i[,"Package"]

package_names %>% str

In [None]:
packages2 %>% str

In [None]:
packages3 <- packages2[Package %in% package_names]
packages3[, Date := as.Date(Date)]

packages3 %>% str

In [None]:
datatable(packages3)

Let's visualize the frequency of the years that the packages were last updated:

In [None]:
package_dates <- packages3 %>%
mutate(year = year(Date) %>% as.integer) %>%
ggplot(aes(year)) +
geom_histogram(binwidth = 1)

package_dates %>% ggplotly()

We see that a great deal of the installed packages were installed in the last two years

And the median number of days since the last updates of packages is:

In [None]:
Sys.Date() - packages3[, Date %>% median(na.rm = T)]

See that the whole ecosystem is actively maintained and kept up to date with voluntary community effort

The spectrum of the installed packages can:

- Wrangle and transform data
- (Interactively) visualize or tabulate data
- Model data with various prediction and classification algorithms
- Read, extract or import data from sqlite3, ods, xls(x), csv, tsv, pdf, json, xml, html files

...

and many more