# **Topics Covered:**
1.0 Introduction to RStudio

1.1 Working with R and RStudio

1.2 Built-in mathematical functions, numerics, arithmetic,

1.3 Vectors and assignment (assignment of variable, vector)

1.4 Simple summary statistics

1.5 Vectors and sample statistic

1.6 Matrices and Arrays


## video
- [R 1.0: Google Colab and Introduction to RStudio](https://youtu.be/3i5qHcanC_A?list=PLuGb0-rQ2tExA0uvWFTM0zHScVMA1Sciw)

##  **1.0 Introduction to RStudio**

Even though we're focusing on Google Colab for this course, it's useful to first understand **RStudio**, which is the most popular integrated development environment (IDE) for R programming. Many concepts from RStudio also apply when working in Colab.

---

###  What is RStudio?

RStudio is a **powerful IDE** designed specifically for R programming. It helps data scientists, statisticians, and analysts write, run, debug, and visualize R code efficiently.

RStudio comes in two main versions:

* **RStudio Desktop** → runs on your computer
* **RStudio Server** → runs on a server, accessible through your browser

---

###  Main Features of RStudio

Here's a breakdown of its key components:

| **Panel**                                    | **Description**                                                                                      |
| -------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| **Source**                                   | Where you write and edit your R scripts (.R files), Markdown files, Shiny apps, etc.                 |
| **Console**                                  | Where you execute R commands interactively and see the immediate output.                             |
| **Environment / History**                    | Shows your variables, loaded datasets, and command history.                                          |
| **Files / Plots / Packages / Help / Viewer** | Browse files, visualize plots, install/load packages, access documentation, or preview HTML widgets. |

Other key features:

✅ Syntax highlighting

✅ Auto-completion

✅ Version control (Git) integration

✅ Debugging tools

✅ Integrated help documentation

---

###  How to Import & Export Data in RStudio

* **Importing data:**
  You can use built-in functions like:

  ```r
  read.csv("data.csv")
  read.table("data.txt")
  readxl::read_excel("data.xlsx")  # from the 'readxl' package
  ```

  OR use the **Import Dataset** button in RStudio (GUI tool) → it automatically generates the code.

* **Exporting data:**

  ```r
  write.csv(mydata, "output.csv")
  write.table(mydata, "output.txt")
  ```

---

###  How to Run `.R` Files in RStudio

* Open the `.R` script in the **Source** pane.
* Click the **Run** button, or press **Ctrl + Enter** to run the current line or selection.
* Alternatively, use **Source** → it runs the whole script at once.

---

###  “Hello World” Example

This is the most classic starting point:

```r
print("Hello, World!")
```

Or simply:

```r
"Hello, World!"
```

R will automatically print the string in the console.

---

###  Google Colab R Example

You can run R code in Colab by changing runtime type to R. You can change by clicking Runtime and then click change runtime type.

Example:

In [None]:
# Hello World
print("Hello, World!")

# Importing a built-in dataset
data <- mtcars
head(data)

# Export dataset to CSV
write.csv(data, "/content/mtcars.csv")

# Read the CSV back
data2 <- read.csv("/content/mtcars.csv")
head(data2)


---

### Run an `.R` script in Colab

In Colab, you can **save an R script** and run it using the shell:

In [None]:
writeLines('print("Hello from R script!")', 'script.R')

In [None]:
# Run the script
source('script.R')

[1] "Hello from R script!"


✅ **Summary checklist:**

* You know the main RStudio panels
* You know how to import/export data
* You know how to run `.R` scripts

## Video
- [R 1.1 Working with R and RStudio](https://youtu.be/wljEBFEN2DM?list=PLuGb0-rQ2tExA0uvWFTM0zHScVMA1Sciw)

##  **1.1 Working with R and RStudio**

---

###  What is R?

**R** is a free, open-source programming language and software environment designed for:

* statistical computing
* data analysis
* data visualization
* machine learning
* and reproducible research

R is known for its **huge ecosystem of packages** (over 20,000 on CRAN!) and strong community of statisticians, data scientists, and researchers.

---

###  What is RStudio?

**RStudio** is the most popular IDE (Integrated Development Environment) for R.
It gives you:

* a code editor
* a console
* an environment viewer
* tools to handle plots, packages, and files

With RStudio, you can **organize your R work more efficiently**, run code interactively, and create polished reports.

---

###  Core Workflow in R + RStudio

Here's what your everyday R workflow looks like:

1️⃣ **Writing Code** →
You write R code in a script (`.R`) or in the console.

2️⃣ **Running Code** →
Run line-by-line (`Ctrl + Enter`) or run the entire script.

3️⃣ **Viewing Outputs** →
See results in the console, plots in the Plots pane, variables in the Environment pane.

4️⃣ **Managing Data** →
Import datasets, transform them using packages like `dplyr` or `data.table`.

5️⃣ **Creating Visualizations** →
Use `ggplot2` or base R plots to visualize data.

6️⃣ **Exporting Results** →
Save data (`write.csv()`), save plots (`ggsave()`), or create reports (`rmarkdown`).

7️⃣ **Installing & Managing Packages** →
Install with `install.packages("dplyr")`, load with `library(dplyr)`.

---

### 🔹 Key Things You’ll Work With

* **Variables**

  ```r
  x <- 10
  y <- 20
  result <- x + y
  ```

* **Data Structures**

  * Vectors: `c(1, 2, 3)`
  * Lists: `list(a = 1, b = "text")`
  * Data frames: `data.frame(x = 1:3, y = c("A", "B", "C"))`

* **Functions**

  ```r
  add <- function(a, b) {
    return(a + b)
  }
  add(2, 3)
  ```

* **Packages**

  ```r
  install.packages("ggplot2")
  library(ggplot2)
  ```

---

###  RStudio Tips

✅ Use the **Environment** tab to see your variables

✅ Use **Plots** tab to explore charts

✅ Use the **Help** tab to search documentation

✅ Use **Projects** to organize your work

---

###  Google Colab R Example

In [None]:
# Assigning variables
x <- 10
y <- 5

# Basic operations
sum <- x + y
product <- x * y

print(paste("Sum:", sum))
print(paste("Product:", product))

# Creating a vector
vec <- c(1, 2, 3, 4, 5)
print(vec)

# Creating a data frame
df <- data.frame(Name = c("Alice", "Bob", "Charlie"),
                 Age = c(25, 30, 35))
print(df)

# Installing and loading a package (only install once)
# install.packages("ggplot2")  # Uncomment if first time
library(ggplot2)

# Creating a simple plot
ggplot(df, aes(x = Name, y = Age)) +
  geom_bar(stat = "identity")


### Summary

* You understand what R does and why RStudio helps
* You can work with variables, vectors, data frames, functions, and packages
---


## Video
- [R 1.2 Built-in mathematical functions, numerics, arithmetic](https://youtu.be/9DRFkKEeuOo?list=PLuGb0-rQ2tExA0uvWFTM0zHScVMA1Sciw)

## **1.2 Built-in Mathematical Functions, Numerics, and Arithmetic in R**

---

###  Understanding Numerics in R

In R, **numeric** is one of the primary data types and includes:

* integers → e.g., `1, 2, 3`
* real numbers (floating point) → e.g., `3.14, 2.71`

You can check the type of a value with:

```r
typeof(3)    # "double" (R stores numbers as double-precision)
is.numeric(3)  # TRUE
```

---

### Basic Arithmetic Operations

R can act like a **calculator**. Here’s what you can do:

| **Operation**    | **R Example**       | **Result**      |
| ---------------- | ------------------- | --------------- |
| Addition         | `5 + 3`             | `8`             |
| Subtraction      | `5 - 3`             | `2`             |
| Multiplication   | `5 * 3`             | `15`            |
| Division         | `5 / 3`             | `1.6667`        |
| Exponentiation   | `5 ^ 3` or `5 ** 3` | `125`           |
| Modulus          | `5 %% 3`            | `2` (remainder) |
| Integer Division | `5 %/% 3`           | `1` (quotient)  |

---

### Built-in Mathematical Functions

R provides **many built-in functions** for calculations:

| **Function**               | **Description**                      | **Example**                  |
| -------------------------- | ------------------------------------ | ---------------------------- |
| `abs(x)`                   | Absolute value                       | `abs(-3)` → `3`              |
| `sqrt(x)`                  | Square root                          | `sqrt(16)` → `4`             |
| `exp(x)`                   | Exponential (e^x)                    | `exp(1)` → `2.718`           |
| `log(x)`                   | Natural log (ln x)                   | `log(2.718)` → `1`           |
| `log10(x)`                 | Base-10 log                          | `log10(100)` → `2`           |
| `sin(x)` `cos(x)` `tan(x)` | Trigonometric functions (in radians) | `sin(pi/2)` → `1`            |
| `round(x, n)`              | Round to `n` decimal places          | `round(3.14159, 2)` → `3.14` |
| `ceiling(x)`               | Smallest integer ≥ x                 | `ceiling(2.3)` → `3`         |
| `floor(x)`                 | Largest integer ≤ x                  | `floor(2.7)` → `2`           |
| `max(...)` `min(...)`      | Find maximum / minimum of inputs     | `max(1,5,3)` → `5`           |
| `sum(...)`                 | Sum values                           | `sum(1,2,3)` → `6`           |
| `mean(...)`                | Calculate average                    | `mean(c(1,2,3))` → `2`       |

---

###  Working with Vectors

You can apply math operations **directly to vectors** (vectorization):

```r
vec <- c(1, 2, 3)
vec * 2     # [1] 2 4 6
sqrt(vec)   # [1] 1.000 1.414 1.732
sum(vec)    # 6
```
---

### Example



In [None]:
# Basic arithmetic
a <- 10
b <- 3

add <- a + b
sub <- a - b
mul <- a * b
div <- a / b
exp <- a ^ b
mod <- a %% b
int_div <- a %/% b

print(paste("Addition:", add))
print(paste("Subtraction:", sub))
print(paste("Multiplication:", mul))
print(paste("Division:", div))
print(paste("Exponentiation:", exp))
print(paste("Modulus:", mod))
print(paste("Integer Division:", int_div))

# Built-in functions
x <- -4
print(abs(x))
print(sqrt(16))
print(exp(1))
print(log(2.718))
print(sin(pi / 2))
print(round(3.14159, 2))
print(ceiling(2.3))
print(floor(2.7))
print(max(1, 5, 3))
print(min(1, 5, 3))
print(sum(1, 2, 3))
print(mean(c(1, 2, 3)))

# Working with vectors
vec <- c(1, 2, 3, 4, 5)
print(vec * 2)
print(sqrt(vec))
print(sum(vec))

[1] "Addition: 13"
[1] "Subtraction: 7"
[1] "Multiplication: 30"
[1] "Division: 3.33333333333333"
[1] "Exponentiation: 1000"
[1] "Modulus: 1"
[1] "Integer Division: 3"
[1] 4
[1] 4
[1] 2.718282
[1] 0.9998963
[1] 1
[1] 3.14
[1] 3
[1] 2
[1] 5
[1] 1
[1] 6
[1] 2
[1]  2  4  6  8 10
[1] 1.000000 1.414214 1.732051 2.000000 2.236068
[1] 15


### ✅ Summary

* You know the **numeric types** in R
* You can do **arithmetic** and **modulus**
* You can use **built-in math functions**
* You can **apply functions to vectors**
* You have a **Colab-ready code block**


## video
- [R 1.3 Vectors and assignment (assignment of variable, vector)](https://youtu.be/-ab90KFI7_k?list=PLuGb0-rQ2tExA0uvWFTM0zHScVMA1Sciw)

---

## 1.3 Vectors and Assignment in R

---

### What is Assignment?

In R, **assignment** means storing a value into a variable.
You assign using:

* `<-` (recommended)
* `=` (also works, but `<-` is more common)

Examples:

In [None]:
x <- 5        # assign 5 to x
y = 10        # assign 10 to y

You can also **print** variables directly:

In [None]:
x

---

### What is a Vector?

A **vector** is the most basic data structure in R.
It holds **multiple elements of the same type** (numeric, character, logical, etc.).

Example:

In [None]:
numbers <- c(1, 2, 3, 4, 5)      # numeric vector
names <- c("Alice", "Bob", "Charlie")  # character vector
flags <- c(TRUE, FALSE, TRUE)    # logical vector
print(numbers)
print(names)
print(flags)

[1] 1 2 3 4 5
[1] "Alice"   "Bob"     "Charlie"
[1]  TRUE FALSE  TRUE


---

### Creating Vectors

* `c()` → combine values into a vector
* `seq()` → create a sequence
* `rep()` → repeat values

Examples:

```r
v1 <- c(1, 2, 3)
v2 <- seq(1, 10, by = 2)     # 1, 3, 5, 7, 9
v3 <- rep(5, times = 3)      # 5, 5, 5
```

---

### Accessing Vector Elements

You access elements by **indexing** (starting from 1):

```r
v <- c(10, 20, 30)
v[1]      # 10
v[2]      # 20
```

You can access multiple elements:

```r
v[c(1, 3)]     # 10, 30
```

---

### Vector Operations

R automatically **vectorizes** operations:

```r
v <- c(1, 2, 3)
v + 1        # [1] 2 3 4
v * 2        # [1] 2 4 6
v + c(10, 20, 30)  # [1] 11 22 33
```

---

### Naming Vector Elements

You can **assign names** to vector elements:

In [None]:
scores <- c(90, 80, 70)
names(scores) <- c("Math", "Science", "History")
print(scores)

   Math Science History 
     90      80      70 


### Example

In [None]:
# Assignment
x <- 5
y <- 10
message <- "Hello, R!"

print(x)
print(y)
print(message)

# Vectors
num_vec <- c(1, 2, 3, 4, 5)
char_vec <- c("apple", "banana", "cherry")
bool_vec <- c(TRUE, FALSE, TRUE)

print(num_vec)
print(char_vec)
print(bool_vec)

# Sequence and repeat
seq_vec <- seq(1, 10, by = 2)
rep_vec <- rep("R", times = 3)

print(seq_vec)
print(rep_vec)

# Accessing elements
print(num_vec[1])       # first element
print(num_vec[c(1,3)])  # first and third elements

# Vector operations
print(num_vec + 10)
print(num_vec * 2)

# Naming elements
scores <- c(85, 90, 95)
names(scores) <- c("Math", "English", "Science")
print(scores)

Summary:

* You understand variable assignment (`<-`, `=`)
* You can create vectors using `c()`, `seq()`, `rep()`
* You know how to access and manipulate vector elements
* You can name vector elements for clarity
* You have a ready-to-run Google Colab example

---

## Video
- [R 1.4 Simple summary statistics, 1.5 Vectors and sample statistic, 1.6 Matrices and Arrays (Part 1)](https://youtu.be/M8gXMzmXiEg?list=PLuGb0-rQ2tExA0uvWFTM0zHScVMA1Sciw)

## 1.4 Simple Summary Statistics in R

---

### What are Summary Statistics?

Summary statistics are **simple calculations** that describe key properties of a dataset or vector, such as:

* center (mean, median)
* spread (range, standard deviation)
* shape (min, max, quantiles)
* overall summary (summary table)

They help **quickly understand data** before doing deeper analysis.

---

### Basic Summary Functions in R

Here are the most useful built-in functions:

| Function      | Description                                       | Example                                    |
| ------------- | ------------------------------------------------- | ------------------------------------------ |
| `sum(x)`      | Sum of values                                     | `sum(c(1,2,3)) → 6`                        |
| `mean(x)`     | Arithmetic mean (average)                         | `mean(c(1,2,3)) → 2`                       |
| `median(x)`   | Middle value                                      | `median(c(1,2,3)) → 2`                     |
| `min(x)`      | Minimum value                                     | `min(c(1,2,3)) → 1`                        |
| `max(x)`      | Maximum value                                     | `max(c(1,2,3)) → 3`                        |
| `range(x)`    | Minimum and maximum                               | `range(c(1,2,3)) → 1 3`                    |
| `var(x)`      | Variance                                          | `var(c(1,2,3)) → 1`                        |
| `sd(x)`       | Standard deviation                                | `sd(c(1,2,3)) → 1`                         |
| `quantile(x)` | Quantiles (percentile cutoffs)                    | `quantile(c(1,2,3)) → 0%,25%,50%,75%,100%` |
| `summary(x)`  | Full summary of min, median, mean, max, quartiles | `summary(c(1,2,3))`                        |

---

### Example Explained

In [None]:
numbers <- c(5, 10, 15, 20, 25)

sum(numbers)        # 75
mean(numbers)       # 15
median(numbers)     # 15
min(numbers)        # 5
max(numbers)        # 25
range(numbers)      # 5 25
var(numbers)        # 62.5
sd(numbers)         # 7.9 (approx)
quantile(numbers)   # 0%→5, 25%→10, 50%→15, 75%→20, 100%→25
summary(numbers)    # full summary table

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      5      10      15      15      20      25 


### Dealing with Missing Values

If your data has `NA` (missing values), use:

In [None]:
mean(numbers, na.rm = TRUE)


The `na.rm = TRUE` option **ignores missing values**.

---

### Example

In [None]:
# Data vector
numbers <- c(5, 10, 15, 20, 25, NA)

# Basic summaries
print(sum(numbers, na.rm = TRUE))
print(mean(numbers, na.rm = TRUE))
print(median(numbers, na.rm = TRUE))
print(min(numbers, na.rm = TRUE))
print(max(numbers, na.rm = TRUE))
print(range(numbers, na.rm = TRUE))
print(var(numbers, na.rm = TRUE))
print(sd(numbers, na.rm = TRUE))
print(quantile(numbers, na.rm = TRUE))
print(summary(numbers))

Summary:

* You can calculate center (mean, median), spread (range, sd), and shape (quantiles)
* You know how to handle missing values with `na.rm = TRUE`
* You can run summary statistics directly in Google Colab


---

## 1.5 Vectors and Sample Statistics in R

---

### What are Sample Statistics?

**Sample statistics** are numerical values calculated from a **sample** (a subset of data) to estimate population properties.

In R, we often:

* Create a vector (sample data)
* Calculate statistics on that vector

Examples of sample statistics:

* Mean (`mean()`)
* Median (`median()`)
* Standard deviation (`sd()`)
* Variance (`var()`)
* Range (`range()`)
* Summary table (`summary()`)

---

### Sampling from a Vector

R allows you to **draw random samples** from a vector using the `sample()` function.

Example:

In [None]:
x <- c(10, 20, 30, 40, 50)
sample(x, size = 3)         # random sample of 3 elements
sample(x, size = 3, replace = TRUE)  # with replacement

### Example: Calculate Sample Statistics

In [None]:
# Sample vector
sample_data <- c(100, 102, 98, 105, 110)

# Basic statistics
mean(sample_data)        # average
median(sample_data)      # middle value
sd(sample_data)          # standard deviation
var(sample_data)         # variance
range(sample_data)       # min and max
summary(sample_data)     # full summary

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     98     100     102     103     105     110 

### Important Sampling Tips

* `set.seed()` ensures reproducibility:

```r
set.seed(123)
sample(1:100, 5)
```

* Sampling with or without replacement:

  * Without: unique values
  * With: values can repeat

---

### Example

In [None]:
# Set reproducible seed
set.seed(123)

# Create a numeric vector
data <- c(55, 60, 65, 70, 75, 80, 85, 90, 95, 100)

# Draw random samples
sample1 <- sample(data, size = 5)
sample2 <- sample(data, size = 5, replace = TRUE)

print("Sample without replacement:")
print(sample1)

print("Sample with replacement:")
print(sample2)

# Calculate statistics on sample1
print(mean(sample1))
print(median(sample1))
print(sd(sample1))
print(var(sample1))
print(range(sample1))
print(summary(sample1))

[1] "Sample without replacement:"
[1]  65 100  60  90  80
[1] "Sample with replacement:"
[1]  75  70  80  95 100
[1] 79
[1] 80
[1] 16.7332
[1] 280
[1]  60 100
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     60      65      80      79      90     100 


### Summary

* You understand how to **create vectors** and **draw random samples** in R
* You can calculate **sample statistics** using functions like `mean()`, `sd()`, and `summary()`
* You can make sampling reproducible with `set.seed()`

## Video

- [R 1.6 Matrices and Arrays (Part 2)](https://youtu.be/z2IYJaAjdl4?list=PLuGb0-rQ2tExA0uvWFTM0zHScVMA1Sciw)

## 1.6 Matrices and Arrays in R

---

### What is a Matrix?

A **matrix** in R is a two-dimensional (2D) rectangular data structure with:

* **rows** and **columns**
* only one data type (usually numeric or character)

You can think of a matrix like an Excel table — but all the cells must hold the same type.

---

### Creating a Matrix

You can create a matrix with the `matrix()` function:

In [None]:
# matrix(data, nrow, ncol, byrow = FALSE)


Example:

In [None]:
m <- matrix(1:6, nrow = 2, ncol = 3)
m

0,1,2
1,3,5
2,4,6



This creates:

```
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
```

In [None]:
m[1, 2]    # row 1, col 2


* Row or column:

In [None]:
m[1, ]     # row 1
m[, 2]     # column 2


* Transpose:

In [None]:
t(m)

0,1
1,2
3,4
5,6



* Matrix multiplication:

In [None]:
m %*% t(m)

0,1
35,44
44,56



* Apply functions to rows/columns:

In [None]:
apply(m, 1, sum)    # row sums
apply(m, 2, mean)   # column means


### What is an Array?

An **array** is a multi-dimensional (2D, 3D, 4D, …) generalization of a matrix.

* Use the `array()` function:

In [None]:
# array(data, dim = c(x, y, z))

Example:

In [None]:
a <- array(1:8, dim = c(2, 2, 2))
a

This creates a **2x2x2** array (like two 2x2 matrices stacked together).
### Example

In [None]:
# Create a matrix (2 rows, 3 columns)
m <- matrix(1:6, nrow = 2, ncol = 3)
print("Matrix m:")
print(m)

# Accessing elements
print("Element at row 1, col 2:")
print(m[1, 2])

# Row and column
print("Row 1:")
print(m[1, ])
print("Column 2:")
print(m[, 2])

# Transpose
print("Transpose of m:")
print(t(m))

# Row sums and column means
print("Row sums:")
print(apply(m, 1, sum))
print("Column means:")
print(apply(m, 2, mean))

# Create an array (2x2x2)
a <- array(1:8, dim = c(2, 2, 2))
print("Array a:")
print(a)

### Summary

* **Matrix** → 2D, same type, rows & columns
* **Array** → multi-dimensional extension of matrix
* You can perform element access, transpose, and apply functions over rows/columns
* You can use `apply()` for efficient calculations