<a href="https://colab.research.google.com/github/nmagee/ds1002/blob/main/notebooks/20-introducing-R.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introducing R

<img align="right" width="200" height="200" src="https://stamsgroup.com/wp-content/uploads/2020/08/R.programming-300x300.png">

**What is R?** R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. It was created at the University of Auckland in 1991 and became an open source project in 1995.

**What are the pros and cons of R?**

There are a few advantages R has over Python in the realm of statistics and data analysis:

- Built for statistics
- Data visualization
- Additional statistical packages
- Data wrangling
- Academic focus
- Customizable

Disadvantages of R vs. Python:

- Less readable
- More difficult to learn well
- Not as beginner-friendly

- - -

## The Basics

### Design:

-   Designed to support statistical computing
-   Very strong community
-   Many domain-specific functions are built in
-   Vector first thinking
-   Everything is an object

### R Syntax

-   Syntax loosely follows traditional `C`-style
    -   **Braces** `{` and `}` are used to form blocks.
    -   **Semi-colons** are used optionally to end statements, required
        if on same line.
-   **Assignments** are made with `<-` or `->` (or `=`)
-   **Dots** `.` have no special meaning -- they are not operators.
-   Single and double **quotes** have the same meaning, but double
    quotes tend to be preferred.
    -   Use single quotes if you expect your string to contain double
        quotes.

- - -

## Variables

Like other languages, variables (known as "objects" in R) can be named arbitrarily. The limits on naming are:

- A variable name must start with a letter and can be a combination of letters, digits, period(.)
and underscore(_).
- If it starts with period(.), it cannot be followed by a digit.
- A variable name cannot start with a number or underscore (_)
- Variable names are case-sensitive (age, Age and AGE are three different variables)
- Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)

To assign a value to a variable, use the ` <- ` notation, which suggests "pushing" value into the variable.

Note: Just like in Python, variables/objects can be ANYTHING. A string, an integer, a data frame, a list, etc. And just like in Python you can always print out that variable to see what it contains.

In [None]:
# comments also use hashtags to indicate they are comments!
# let's create an object and populate it.

myvar <- 77
myvar

In [None]:
# you can learn what datatype an object is using the typeof() function.
# A "double" is a type of integer.

typeof(myvar)

In [None]:
# you can also assign in the other direction if you want (though the convention is R to L)

"mickey" -> mousename
mousename

In [None]:
# Another very basic operation is to concatenate values into an object using the c() function:

myname = c("Bob", "Dylan")
myname

### R Data Types

There are several basic R data types.

-   [Numeric](#scrollTo=hNZPBBDdHKzi&line=35&uniqifier=1)
-   [Integer](#scrollTo=eikm2wiRHiC_&line=47&uniqifier=1)
-   Complex
-   [Logical](#scrollTo=fYk9koTOHYxR&line=42&uniqifier=1)
-   [Character](#scrollTo=nnEP2VnkHPkd&line=61&uniqifier=1)


## Numeric

Decimal values are called "numerics" in R.

It is the **default** computational data type.

If we assign a decimal value to a variable x, x will be of numeric type:

```{r}
x <- 10.5       # assign a decimal value
x              # print the value of x
```

```{r}
class(x)      # print the class name of x
```

Even if we assign an integer to a variable k, it will still be saved
as a numeric value.

```{r}
k <- 1
k              # print the value of k
```

```{r}
class(k)       # print the class name of k
```

That k is not an integer can be confirmed with `is.integer()`:

```{r}
is.integer(k)  # is k an integer?
```


## Integers

To create an integer variable in R, we use `as.integer()`.

```{r}
y <- as.integer(3)
y              # print the value of y
```

```{r}
class(y)       # print the class name of y
is.integer(y)  # is y an integer?
```

We can also declare an integer by appending an `L` suffix.

```{r}
y <- 3L
is.integer(y)  # is y an integer?
```

We can coerce, or cast, a numeric value into an integer with
`as.integer()`.

```{r}
as.integer(3.14)    # coerce a numeric value
```

And we can parse a string for decimal values in much the same way.

```{r}
as.integer("5.27")  # coerce a decimal string
```

On the other hand, it is erroneous trying to parse a non-decimal string.

```{r}
as.integer("Joe")   # coerce an non-decimal string
```

We can convert booleans to numbers this way, too.

```{r}
as.integer(TRUE)    # the numeric value of TRUE
as.integer(FALSE)   # the numeric value of FALSE
```


### Math Operators

| **Operator**   | **Description**             |
|----------------|-----------------------------|
| **+**          | addition                    |
| **-**          | subtraction                 |
| **\***         | multiplication              |
| **/**          | division                    |
| **\^ or \*\*** | exponentiation              |
| **x %% y**     | modulus (x mod y) 5%%2 is 1 |
| **x %/% y**    | integer division 5%/%2 is 2 |



## Logical (Boolean)

A logical value is often created via comparison between variables.

```{r}
x <- 1
y <- 2   # sample values
z <- x > y      # is x larger than y?
z              # print the logical value
```

```{r}
class(z)       # print the class name of z
```

### Logical Operators

Standard logical operations are `&` (and), `|` (or), and `!` (negation).

```{r}
u <- TRUE
v <- FALSE
u & v          # u AND v
```

```{r}
u | v          # u OR v
```

```{r}
!u             # negation of u
```

Note that you can use `T` and `F` instead of `TRUE` and `FALSE`.

```{r}
a <- T
b <- F
a & b
```



## Characters

A character object is used to represent string values in R.

We convert objects into character values with the `as.character()`
function:

```{r}
x = as.character(3.14)
x
```

```{r}
class(x)       # print the class name of x
```

### `paste()`

Two character values can be concatenated with the `paste()` function.

```{r}
fname <- "Joe"
lname <-"Smith"
paste(fname, lname)
```

`paste()` takes a `sep` argument:

```{r}
paste("A", "B", "C", sep="--")
```

### `sprintf()`

However, it is often more convenient to create a readable string with
the `sprintf()` function, which has a C language syntax.

```{r}
sprintf("%s has %d dollars", "Sam", 100)
```

### `substr()`

To extract a substring, we apply the `substr()` function.

Here is an example showing how to extract the substring between the
third and twelfth positions in a string.

```{r}
substr("Mary has a little lamb.", start=3, stop=12)
```

### `sub()`

And to replace the first occurrence of the word "little" by another word
"big" in the string, we apply the `sub()` function.

```{r}
sub("little", "big", "Mary has a little lamb.")
```


# Plotting and Visualizations

In [None]:
# Get a random log-normal distribution
r <- rlnorm(1000)

# Get the distribution without plotting it using tighter breaks
h <- hist(r, plot=F, breaks=c(seq(0,max(r)+1, .1)))

# Plot the distribution using log scale on both axes, and use
# blue points
plot(h$counts, log="xy", pch=20, col="blue",
	main="Log-normal distribution",
	xlab="Value", ylab="Frequency")