# NB: Introducing R

## What is R?

We have been studying Python as a data processing language from the perspective of data science. But **R is in many ways the original data science language**.

It was built from the beginning by and for data analysts, statisticians, and scientists.

Note, however, the R was *not* built by and for data miners, who compose a large segment of the data science community.

## Why Study R?

There are many reasons for this:

-   **Python borrows** many concepts from R, including the data frame.

-   The R community provides **insights** into data processing through
    excellent documentation and well-designed code.

-   Although not as popular as it once was, it is still widely
    used---**you may find yourself on a team that prefers it**.

-   Many of the courses in the **UVA SDS programs use R**.

-   It's **not that hard**, especially once you know basic programming
    concepts.

## R's Design

-   A scripting language, like Python.

-   Designed to support statistical computing above all.

-   Very strong academic community.

-   Many domain-specific functions are built-in.

-   Vector-first thinking.

-   Everything is an object.

## R Syntax

-   Syntax loosely follows traditional `C`-style

    -   **Braces** `{` and `}` are used to form blocks.

    -   **Semi-colons** are used optionally to end statements, required
        if on same line.

-   **Assignments** are made with `<-` or `->`.

-   **Dots** `.` have no special meaning---they are not operators.

    -   They are used like underscores `_` in Python.

-   Single and double **quotes** have the same meaning, but double
    quotes tend to be preferred.

    -   Use single quotes if you expect your string to contain double
        quotes.

    -   Backslash escape applies to R strings, although since there are
        no raw strings---Python's `r" "`---we often have to supply
        double backslashes in regular expressions.

## Using R

-   Although there are many ways to run R programs, by far the most
    common is to use **R Studio**.

-   R Studio provides **a fully-functional programming environment**
    that includes an editor, a command-line, access to the file system,
    a help system, an installation system, etc.

    -   Other programs run R too, though, such as VSCode and Jupyter.

-   R programs can be plain text files with an `.r` suffix, R Markdown
    files (`.Rmd`), or many other kinds of file.

    -   We will discuss these in a later module.

OK, so let's get into the basics---beginning with data types.

## Working Directory

Finally, the code samples above assume the data files are located in the
R working directory, which can be found with the function `getwd()`.

Note that these things can also be set using the RStudio's GUI.

In [215]:
getwd()               # get current working directory

You can select a different working directory with the function setwd(),
and thus avoid entering the full path of the data files

In [203]:
# setwd("<new path>")   # set working directory

Note that the forward slash should be used as the path separator even on
Windows platform

In [197]:
# setwd("C:/MyDoc")

## Installing and Loading Packages

Packages need to be installed once.

In [105]:
install.packages("tm")  

also installing the dependencies ‘NLP’, ‘slam’, ‘BH’


“installation of package ‘slam’ had non-zero exit status”
“installation of package ‘tm’ had non-zero exit status”


You can also install thing using the Package window.

Once they are installed, you import them with the `library()` function:

In [104]:
library(tm)

ERROR: Error in library(tm): there is no package called ‘tm’


Note that the library name is quoted in when installing, but not when
using `library`.

## Installing an R Kernel

You can run R in Jupyter by installing an R kernel. 

<!-- Here's a [how to](M11_00_Rkernel.pdf) document to set this up. -->

In brief, here's what you do. First, at the command line:
```bash
conda create -n r_env r-essentials r-base
conda activate r_env
R # This opens the R shell
```
Then, in the R shell:
```r
IRkernel::installspec(name = 'r_env', displayname = 'R Environment')
quit()
```

Now, fire up a Jupyter Lab instance from the OpenOnDemand page and select the kernel when you create a new notebook.