# R

## General terminology

### 1. Programming Language
A **programming language** is a formal language used to write instructions that a computer can execute. These instructions perform specific tasks or solve problems. Examples of programming languages include Python, R, MATLAB, Java, C++, etc.

---

### 2. Interpreter
An **interpreter** is a program that executes code line by line. It reads the source code, converts it into machine-readable instructions, and executes it on the spot. This allows you to run and test code quickly, but execution can be slower compared to compiled code.

- **Example:** Python, R, and MATLAB use interpreters.
- **Advantage:** Immediate execution and testing.
- **Disadvantage:** Slower execution for large programs compared to compiled languages.

---

### 3. Compiler
A **compiler** translates the entire source code of a program into machine code (binary code) in one go before execution. The machine code is saved as an executable file that can run directly on a computer.

- **Example:** C, C++, and Java use compilers.
- **Advantage:** Faster execution after compilation.
- **Disadvantage:** Errors are found only after the entire code is compiled, and debugging can be harder.

---

### 4. High-Level Programming Language
A **high-level language** is a programming language that is easy for humans to read and write. It uses natural language elements, which makes it more abstract from machine-level details.

- **Examples:** Python, R, MATLAB, Java.
- **Features:** Easier to use, focuses on problem-solving rather than hardware specifics.
- **Advantage:** Higher productivity and easier to debug and maintain.
- **Disadvantage:** Slower than low-level languages since code needs to be translated into machine code.

---

### 5. Low-Level Programming Language
A **low-level language** is closer to machine code and deals with hardware-specific operations. It provides less abstraction and is more difficult to write and maintain but offers more control over the hardware.

- **Examples:** Assembly language, machine code.
- **Features:** More efficient and faster as it directly interacts with hardware.
- **Advantage:** Speed and control over the system.
- **Disadvantage:** Hard to write and debug.

---

### 6. Syntax
**Syntax** refers to the rules that define the correct structure of commands in a programming language. Each programming language has its own syntax, which you need to follow when writing code.

- **Example:** In Python, print statements are written as `print("Hello World")`, whereas in C, it would be `printf("Hello World");`.
- **Importance:** Correct syntax is crucial to ensure the code runs without errors.

---

### 7. Variable
A **variable** is a storage location in memory with a name and a value. It is used to store data that can be manipulated and retrieved during program execution.

- **Example:** In Python, `x = 10` creates a variable `x` that stores the value `10`.
- **Types:** Variables can hold different types of data such as numbers (integers, floats), strings, or lists.

---

### 8. Data Types
A **data type** defines the type of data that a variable can store. Common data types include:

- **Integer:** Whole numbers (e.g., 5, -10).
- **Float:** Numbers with decimals (e.g., 3.14, -2.7).
- **String:** Text data (e.g., "Hello World").
- **Boolean:** True or False values.
  
Knowing the data type helps in choosing the right operations to apply to a variable.

---

### 9. Function
A **function** is a block of code that performs a specific task. Functions take inputs (called arguments), process them, and return an output.

- **Example:** In R, `mean()` is a function that calculates the average of numbers.
- **Purpose:** It helps in organizing code, reusing it, and improving readability.

---

### 10. Control Structures
**Control structures** determine the flow of execution in a program based on certain conditions or loops. Common control structures are:

- **If-Else Statements:** Execute code based on conditions.
  - Example: `if (x > 0) { print("Positive") } else { print("Negative") }`
  
- **Loops:** Execute a block of code repeatedly.
  - **For Loop:** Repeats code for a set number of times.
  - **While Loop:** Repeats code while a condition is true.

---

### 11. Algorithm
An **algorithm** is a step-by-step procedure for solving a problem or performing a task. It defines the logical sequence of actions that must be followed to achieve a specific goal.

- **Example:** A sorting algorithm arranges data in a particular order (like ascending or descending).

---

### 12. Debugging
**Debugging** is the process of finding and fixing errors (bugs) in the program. It involves running the program, identifying where it behaves unexpectedly, and correcting the underlying issues.

- **Tools:** Many development environments provide debugging tools to help you step through code, set breakpoints, and inspect variable values.

---

### 13. IDE (Integrated Development Environment)
An **IDE** is software that provides tools to write, test, and debug code efficiently. It often includes a code editor, debugger, and other features that make coding easier.

- **Examples:** RStudio (for R), MATLAB IDE, PyCharm (for Python).

---

### 14. Library/Package
A **library** or **package** is a collection of pre-written code that provides specific functionality, such as mathematical operations, data visualization, or data manipulation. Using libraries helps save time and effort by reusing existing solutions.

- **Example:** In R, `ggplot2` is a package for data visualization, and in Python, `numpy` is used for numerical computations.

---

### 15. Object-Oriented Programming (OOP)
**Object-Oriented Programming** is a programming paradigm where code is organized into objects that represent real-world entities. These objects can contain both data (attributes) and functions (methods) that act on the data.

- **Examples:** Classes and objects are key concepts in OOP.
- **Advantages:** OOP makes code more modular, reusable, and easier to manage in large projects.

---

### 16. API (Application Programming Interface)
An **API** is a set of functions and protocols that allow different software applications to communicate with each other. APIs define how requests and responses should be structured.

- **Example:** A weather API might allow you to retrieve the current temperature for a location by making an HTTP request.

---


### 17. Console
The **console** (or command line interface) is a tool for interacting with the computer by typing commands. In programming, it is often used to run scripts, display outputs, and manage files.

- **Example:** The Python shell (when you run Python in terminal) or R console in RStudio.

---

### 18. Terminal
The **terminal** is a command-line interface that allows you to execute commands directly on your operating system. It provides access to the underlying file system and other utilities via text-based commands.

- **Example:** Bash terminal on Linux/Mac or Command Prompt on Windows.

---

### 19. REPL (Read-Eval-Print Loop)
A **REPL** is an interactive environment used to execute code line by line and immediately see the results. Many interpreted languages (like Python, R, and JavaScript) use REPL.

- **Example:** Python's interactive shell (`>>>`), where you can type commands and see results instantly.

---

### 20. Shell
A **shell** is a user interface that allows access to the operating system’s services. It can be a command-line shell (like Bash or Command Prompt) or a graphical shell (like Windows Explorer).

- **Example:** Bash shell in Linux, or PowerShell in Windows.

---

### 21. Script
A **script** is a set of instructions written in a programming language, often interpreted, that performs a specific task. Scripts are commonly used for automation and small tasks.

- **Example:** Python scripts (`.py` files) or shell scripts (`.sh` files).

---

### 22. Class
A **class** is a blueprint for creating objects in object-oriented programming. It defines the properties (attributes) and behaviors (methods) that the objects created from the class will have.

- **Example:** A `Car` class might have attributes like `color` and `model`, and methods like `drive()` or `stop()`.

---

### 23. Object
An **object** is an instance of a class. It represents a real-world entity with attributes and methods.

- **Example:** If `Car` is a class, then `myCar = Car()` creates an object `myCar` that is an instance of the `Car` class.

---

### 24. Module
A **module** is a file containing a set of functions, classes, or variables that you can import into other programs or scripts.

- **Example:** In Python, you can create a module by saving a `.py` file and importing it into other scripts using `import module_name`.

---

### 25. Framework
A **framework** is a collection of libraries, tools, and best practices designed to simplify software development. It provides a predefined structure for developers to build applications faster.

- **Examples:** Django (for Python web development), React (for JavaScript).

---

### 26. Version Control
**Version control** is a system that tracks changes to your code, allowing you to revert to previous versions and collaborate with others. Git is the most popular version control system.

- **Example:** GitHub is a platform for hosting Git repositories and collaborating on projects.

---

### 27. Repository
A **repository** is a storage location for your project files, along with their version history, typically managed by a version control system like Git.

- **Example:** A GitHub repository stores code, documentation, and the version history of a project.

---


### 28. Command-Line Argument
A **command-line argument** is an input passed to a script or program via the command line when it is executed.

- **Example:** In Python, `python script.py arg1 arg2` passes `arg1` and `arg2` as arguments to `script.py`.

---

**Conclusion:**
Understanding these fundamental programming terms will give you a strong foundation as you learn to code. Each term represents a building block that you'll frequently encounter when writing and executing programs in any language.

## Setting Up the R Environment

[swirl](https://swirlstats.com/students.html)

- **Installation:** Install R from CRAN and RStudio IDE from RStudio.
- **RStudio Basics:** Overview of the RStudio interface (Console, Script Editor, Environment, and Plots pane).
- **Package Management:** How to install, load, and update packages using `install.packages()` and `library()` functions.

## Introduction to Programming Concepts

- **Variables and Data Types:** Explain what variables are and the different data types (numeric, integer, character, logical, etc.).

In [128]:
"hello world"

In [129]:
print("hello world!")

[1] "hello world!"


## Variables and Data Types in R

When learning R programming, two key concepts are **variables** and **data types**. Here’s a breakdown of both:

---

### 1. Variables in R

A **variable** is a storage location in memory, identified by a name, that holds a value. In R, variables are used to store data that can be used and manipulated later in the program.

#### Creating Variables
In R, you can assign a value to a variable using the assignment operator `<-` (or sometimes `=`). Here’s an example:

#### Assigning values to variables

In [130]:
x <- 10       # Numeric variable

In [131]:
y <- "Hello"  # Character (string) variable

In [132]:
z <- TRUE     # Logical (boolean) variable

- **x** stores a numeric value `10`.
- **y** stores a string `"Hello"`.
- **z** stores a logical value `TRUE`.

#### Variable Naming Rules
- A variable name can contain letters, numbers, underscores (`_`), and periods (`.`).
- Variable names **cannot start with a number**.
- R is **case-sensitive**, meaning `Var1` and `var1` are two different variables.

#### Example of Valid and Invalid Variable Names:

In [133]:
valid_name <- 100

In [134]:
invalid-name <- 200   # Error: Hyphens are not allowed in variable names

ERROR: Error in invalid - name <- 200: object 'invalid' not found


In [135]:
1variable <- 300      # Error: Cannot start a variable name with a number

ERROR: Error in parse(text = input): <text>:1:2: unexpected symbol
1: 1variable
     ^


### 2. Data Types in R

A **data type** refers to the type of value that a variable can store. R supports various data types, and understanding them is essential to writing efficient R programs.

#### Primary Data Types in R:

#### 1. **Integer**:
   - Represents whole numbers. Use the `L` suffix to define integers explicitly.
   - Example:

In [192]:
int <- 42L  # Integer
int
typeof(int)
class(int)

In [163]:
int <- -2L  # Integer
int
typeof(int)

#### 2. **Double**: (or Numeric)
   - Represents decimal numbers.
   - Example:

In [193]:
decimal <- 3.14  # Numeric (floating-point)
typeof(decimal)
class(decimal)

In [194]:
num <- 42        # Numeric
typeof(num)
class(num)

#### 3. **Character (String)**:
   - Represents text or a sequence of characters.
   - Example:

In [166]:
char <- "Hello, R!"  # Character (string)
char

In [167]:
typeof(char)

#### 4. **Logical (Boolean)**:
   - Represents `TRUE` or `FALSE` values. (T/F)
   - Example:

In [176]:
is_sunny <- TRUE     # Logical
is_sunny
typeof(is_sunny)

In [177]:
is_raining <- FALSE  # Logical
typeof(is_raining)

In [178]:
is_sunny <- T     # Logical
is_sunny
typeof(is_sunny)

In [179]:
is_raining <- F  # Logical
is_raining
typeof(is_raining)

5. **Complex**:
   - Represents complex numbers with real and imaginary parts.
   - Example:

In [180]:
complex_num <- 4 + 2i  # Complex number (4 is real part, 2i is imaginary part)

In [181]:
complex_num

In [182]:
typeof(complex_num)

#### 6. **Raw**

A `raw` data type specifies values as raw bytes. You can use the following methods to convert character data types to a raw data type and vice-versa:

- `charToRaw()` - converts character data to raw data
- `rawToChar()` - converts raw data to character data

In [212]:
# convert character to raw
raw_variable <- charToRaw("Welcome to Programiz")

print(raw_variable)
print(class(raw_variable))

# convert raw to character
char_variable <- rawToChar(raw_variable)

print(char_variable)
print(class(char_variable))

 [1] 57 65 6c 63 6f 6d 65 20 74 6f 20 50 72 6f 67 72 61 6d 69 7a
[1] "raw"
[1] "Welcome to Programiz"
[1] "character"


In [207]:
single_raw <- as.raw(255)
single_raw
typeof(single_raw)
class(single_raw)

[1] ff

### 3. Checking Data Types

- `class()` - what kind of object is it (high-level)?
- `typeof()` - what is the object’s data type (low-level)?
- `length()` - how long is it? What about two dimensional objects?
- `attributes()` - does it have any metadata?

You can check the type of a variable using the `class()` or `typeof()` functions in R.

- **Example**:

In [210]:
x <- 10
class(x)    # Returns "numeric"
typeof(x)   # Returns "double"

is.integer(x)
is.double(x)

In [211]:
x <- 10L
class(x)    # Returns "integer"
typeof(x)   # Returns "integer"

is.integer(x)
is.double(x)

### 4. Type Conversion

You can convert between data types using functions like `as.numeric()`, `as.character()`, `as.logical()`, and so on.

#### Example of Type Conversion:

[Read: Conversion Functions in R](https://www.scaler.com/topics/conversion-functions-in-r/)

[Source: Convert ](https://cran.r-project.org/web/packages/hablar/vignettes/convert.html)

In [201]:
num <- "100"         # Character variable (string)
converted_num <- as.numeric(num)  # Convert to numeric
class(converted_num)  # Returns "numeric"

In [202]:
typeof(converted_num)

In [203]:
num <- "100"         # Character variable (string)
converted_num <- as.numeric(num)  # Convert to numeric
class(converted_num)  # Returns "numeric"

### 5. Special Data Types: NULL, NA, NaN, and Inf

- **NULL**: Represents the absence of a value or an empty object.

In [221]:
x <- NULL
x
#class(x)
#typeof(x)

NULL

- **NA**: Represents a missing or undefined value (Not Available).

In [220]:
x <- NA
x
#class(x)
#typeof(x)

- **NaN**: Stands for "Not a Number" and occurs in undefined mathematical operations like `0/0`.

In [219]:
x <- 0/0
x
#class(x)
#typeof(x)

- **Inf**: Represents infinity. For example, dividing a number by zero results in `Inf`.

In [223]:
x <- 10 / 0  # Returns Inf
x
#class(x)
#typeof(x)

### 6. Data Structures (Related to Variables and Data Types)

In R, data can also be organized into various structures. These are collections of variables and their values.

1. **Vector**: A sequence of data elements of the same type.

In [55]:
vec <- c(1, 2, 3, 4)  # Numeric vector
vec

2. **List**: A collection of elements of different types.
   ```R

In [54]:
lst <- list(1, "Hello", TRUE)  # List with different data types
lst

3. **Matrix**: A two-dimensional array of the same type.

In [53]:
mat <- matrix(1:6, nrow = 2)  # 2x3 numeric matrix
mat

0,1,2
1,3,5
2,4,6


4. **Data Frame**: A table with columns that can store different data types.

In [51]:
df <- data.frame(Name = c("John", "Doe"), Age = c(25, 30))

In [52]:
df

Name,Age
<chr>,<dbl>
John,25
Doe,30


5. **Data Frame**: Used to represent categorical data with a fixed set of values (called levels).


In [227]:
factor_data <- factor(c("low", "medium", "high"))
factor_data

### **Conclusion:**

- **Variables** are containers that store data, and in R, you assign values to variables using `<-`.
- **Data types** define the kind of values a variable can store, such as numeric, character, or logical.
- Understanding data types helps you manage and manipulate data correctly in your R programs. You can also check and convert between data types as needed.

Understanding these concepts is crucial for writing efficient and error-free R programs.

## Operators:

Arithmetic, relational, and logical operators.

### Operators in R

Operators in R are symbols or combinations of symbols that perform operations on variables and values. R provides a variety of operators to carry out arithmetic, relational, logical, assignment, and other types of operations. Here’s an overview of the main types of operators in R:

### **1. Arithmetic Operators**

Arithmetic operators perform mathematical calculations.

| Operator | Description           | Example         |
|----------|-----------------------|-----------------|
| `+`      | Addition              | `5 + 2` = 7     |
| `-`      | Subtraction           | `5 - 2` = 3     |
| `*`      | Multiplication        | `5 * 2` = 10    |
| `/`      | Division              | `5 / 2` = 2.5   |
| `^` or `**` | Exponentiation    | `5 ^ 2` = 25    |
| `%%`     | Modulus (remainder)   | `5 %% 2` = 1    |
| `%/%`    | Integer Division      | `5 %/% 2` = 2   |

#### **Example:**

In [229]:

x <- 10
y <- 3

sum <- x + y      # Addition
diff <- x - y     # Subtraction
prod <- x * y     # Multiplication
quotient <- x / y # Division
power <- x ^ y    # Exponentiation
remainder <- x %% y # Modulus
int_div <- x %/% y # Integer Division

print(c(sum, diff, prod, quotient, power, remainder, int_div))


[1]   13.000000    7.000000   30.000000    3.333333 1000.000000    1.000000
[7]    3.000000


### **2. Relational (Comparison) Operators**

Relational operators compare two values and return a logical value (`TRUE` or `FALSE`).

| Operator | Description             | Example         |
|----------|-------------------------|-----------------|
| `==`     | Equal to                | `5 == 2` = FALSE|
| `!=`     | Not equal to            | `5 != 2` = TRUE |
| `>`      | Greater than            | `5 > 2` = TRUE  |
| `<`      | Less than               | `5 < 2` = FALSE |
| `>=`     | Greater than or equal to| `5 >= 2` = TRUE |
| `<=`     | Less than or equal to   | `5 <= 2` = FALSE|

#### **Example:**

In [75]:

x <- 5
y <- 2

print(x == y)   # Returns FALSE
print(x != y)   # Returns TRUE
print(x > y)    # Returns TRUE
print(x < y)    # Returns FALSE
print(x >= y)   # Returns TRUE
print(x <= y)   # Returns FALSE


[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE
[1] TRUE
[1] FALSE


### **3. Logical Operators**

Logical operators are used to combine multiple conditions and return a logical result (`TRUE` or `FALSE`).

| Operator | Description        | Example         |
|----------|--------------------|-----------------|
| `&`      | Logical AND (element-wise) | `TRUE & FALSE` = FALSE |
| `&&`     | Logical AND (first element) | `TRUE && FALSE` = FALSE |
| `\|`      | Logical OR (element-wise)  | `TRUE \| FALSE` = TRUE  |
| `\|\|`     | Logical OR (first element) | `TRUE \|\| FALSE` = TRUE |
| `!`      | Logical NOT         | `!TRUE` = FALSE |

| Operator | Description | Example with Explanation |
|----------|-------------|-------------------------|
| `&` | Element-wise AND: Compares each corresponding element | `c(TRUE, FALSE) & c(TRUE, TRUE)` → `c(TRUE, FALSE)` |
| `&&` | Strict AND: Only works with single logical values | `TRUE && FALSE` → `FALSE` |
| `\|` | Element-wise OR: Compares each corresponding element | `c(TRUE, FALSE) \| c(FALSE, FALSE)` → `c(TRUE, FALSE)` |
| `\|\|` | Strict OR: Only works with single logical values | `TRUE \|\| FALSE` → `TRUE` |
| `!` | Logical NOT: Inverts logical values | `!c(TRUE, FALSE)` → `c(FALSE, TRUE)` |

#### Key Points:

1. **Vectorized vs Single Value Operations**
   - `&` and `|`: Work element-wise on vectors
   - `&&` and `||`: Only evaluate the first element of vectors

2. **Short-circuit Evaluation**
   - `&&` and `||` use short-circuit evaluation
   - Stop as soon as result is determined

In [8]:

   # Short-circuit example
   x <- NULL
   if(is.null(x) || x > 0) {
       print("Safely handled")
   }


[1] "Safely handled"


3. **Common Use Cases**
   - Use `&` and `|` for vectorized operations and data manipulation
   - Use `&&` and `||` in if statements and control flow

4. **Best Practices**
   - Always use `&&` and `||` in if statements
   - Use `&` and `|` when working with vectors
   - Be careful with `NA` values - they propagate through logical operations

#### **Example:**

In [9]:
# Example of element-wise AND
x <- c(TRUE, FALSE, TRUE)
y <- c(TRUE, TRUE, FALSE)
x & y  # Returns: c(TRUE, FALSE, FALSE)

In [10]:
# Example of single element AND
x <- c(TRUE, FALSE, TRUE)
y <- c(TRUE, TRUE, FALSE)
x && y  # Returns: TRUE (only looks at x[1] and y[1])

ERROR: Error in x && y: 'length = 3' in coercion to 'logical(1)'


The error `Error in x && y: 'length = 3' in coercion to 'logical(1)'` occurs because:
- `&&` and `||` are strict operators that only work with single logical values
- They cannot directly operate on vectors

Here's the correct usage:

In [12]:
# CORRECT: Element-wise operators with vectors
x <- c(TRUE, FALSE, TRUE)
y <- c(TRUE, TRUE, FALSE)
x & y  # Works fine: c(TRUE, FALSE, FALSE)

In [13]:
# CORRECT: Strict operators with single values
single_x <- TRUE
single_y <- FALSE
single_x && single_y  # Works fine: FALSE

In [14]:
# INCORRECT: Will cause error
x <- c(TRUE, FALSE, TRUE)
y <- c(TRUE, TRUE, FALSE)
x && y  # Error: Cannot use && with vectors

ERROR: Error in x && y: 'length = 3' in coercion to 'logical(1)'


 **Common Use Cases:**

1. **Use `&` and `|` when:**

```r
    # Data frame filtering
df[df$age > 25 & df$salary < 50000, ]
```

2. **Use `&&` and `||` when:**

   ```r
   # In if statements
   if(is.numeric(x) && x > 0) {
     print("Positive number")
   }
   ```

In [17]:
x <- c(TRUE, FALSE, TRUE)
y <- c(FALSE, FALSE, TRUE)

print(x & y)    # Element-wise AND: Returns c(FALSE, FALSE, TRUE)

print(x | y)    # Element-wise OR: Returns c(TRUE, FALSE, TRUE)

print(!x)       # NOT operator: Returns c(FALSE, TRUE, FALSE)

[1] FALSE FALSE  TRUE
[1]  TRUE FALSE  TRUE
[1] FALSE  TRUE FALSE


### **4. Assignment Operators**

Assignment operators are used to assign values to variables.

| Operator | Description            | Example           |
|----------|------------------------|-------------------|
| `<-`     | Leftward assignment    | `x <- 5`          |
| `->`     | Rightward assignment   | `5 -> x`          |
| `<<-`    | Global leftward assignment | `x <<- 5`      |
| `->>`    | Global rightward assignment| `5 ->> x`      |
| `=`      | Assignment (used in functions)| `x = 5`    |

- **`<-`** and **`=`** are the most common assignment operators in R.
- **`<<-`** assigns a value to a global variable from within a function.

#### **Example:**

In [80]:

x <- 10   # Leftward assignment
20 -> y   # Rightward assignment (equivalent to y <-20)

print(x)
print(y)


[1] 10
[1] 20


### **5. Miscellaneous Operators**

#### **a. Colon Operator (`:`)**
The colon (`:`) is used to create a sequence of numbers.

In [81]:

x <- 1:5  # Creates a sequence from 1 to 5
print(x)  # Prints c(1, 2, 3, 4, 5)


[1] 1 2 3 4 5


#### **b. Sequence Generation (`seq()`)**
The `seq()` function is used to generate a sequence with specific increments.

In [82]:

x <- seq(1, 10, by=2)  # Creates a sequence from 1 to 10 with a step of 2
print(x)  # Prints c(1, 3, 5, 7, 9)


[1] 1 3 5 7 9


#### **c. Element Selection (`[]`, `[[]]`)**
Used to select elements from a vector, matrix, or list.

In [83]:

vec <- c(10, 20, 30, 40)
print(vec[2])   # Access the second element (prints 20)


[1] 20


#### **d. List Access (`$`)**
The `$` operator is used to access elements by name in a list or data frame.

In [84]:

data <- list(a = 1, b = 2)
print(data$a)  # Access the element named 'a' (prints 1)


[1] 1


### **6. Special Operators**

R provides a few operators that are specific to certain tasks.

#### **a. Matrix Multiplication (`%*%`)**
Used for matrix multiplication.

In [85]:

mat1 <- matrix(1:4, nrow=2)
mat2 <- matrix(5:8, nrow=2)
result <- mat1 %*% mat2
print(result)  # Matrix multiplication


     [,1] [,2]
[1,]   23   31
[2,]   34   46


#### **b. Modulus and Integer Division (`%%` and `%/%`)**
These operators are used for division-related operations:

- **`%%`** gives the remainder.
- **`%/%`** gives the integer part of division.

In [86]:

x <- 10
y <- 3

print(x %% y)  # Returns 1 (remainder)
print(x %/% y) # Returns 3 (integer division)


[1] 1
[1] 3


#### **c. In Operator (`%in%`)**
Used to check whether an element exists in a vector or list.

In [87]:

x <- 3
y <- c(1, 2, 3, 4, 5)

print(x %in% y)  # Returns TRUE (3 is in the vector y)


[1] TRUE


### **7. Summary of Operators**

1. **Arithmetic Operators**: Perform basic math operations (`+`, `-`, `*`, `/`, etc.).
2. **Relational Operators**: Compare values (`==`, `!=`, `>`, `<`, etc.).
3. **Logical Operators**: Combine conditions (`&`, `|`, `!`, etc.).
4. **Assignment Operators**: Assign values to variables (`<-`, `=`, etc.).
5. **Miscellaneous Operators**: Sequence creation (`:`), element selection (`[]`), list access (`$`), etc.
6. **Special Operators**: Matrix multiplication (`%*%`), modulus (`%%`), and membership test (`%in%`).

### **Conclusion:**

Understanding these operators is crucial for performing calculations, comparisons, and data manipulations in R. Mastering how to use them efficiently will help you write more robust and flexible R programs.

## Control Structures:

- if statements, for loops, while loops.

Control structures in R are used to control the flow of execution in a program. They help in making decisions, repeating tasks, or breaking out of certain operations. The most common control structures in R include **conditional statements** and **loops**.

### **1. Conditional Statements (if, else if, else)**

[Read](https://www.datamentor.io/r-programming/if-else-statement)

Conditional statements allow the program to execute different pieces of code based on certain conditions.

#### **if Statement**
The `if` statement checks whether a condition is `TRUE`. If it is, the code inside the block is executed.

![](https://www.datamentor.io/sites/tutorial2program/files/r-if-statement.jpg)

In [65]:

x <- 5

if (x > 3) {
  print("x is greater than 3")
}


[1] "x is greater than 3"


#### **if-else Statement**
The `else` block is executed when the condition in the `if` statement is `FALSE`.

![](https://www.datamentor.io/sites/tutorial2program/files/r-if-else-statement.jpg)

In [64]:

x <- 2

if (x > 3) {
  print("x is greater than 3")
} else {
  print("x is less than or equal to 3")
}


[1] "x is less than or equal to 3"


#### **else if Statement**
The `else if` statement allows you to check multiple conditions sequentially. If none of the conditions are `TRUE`, the `else` block (if provided) is executed.

In [63]:

x <- 5

if (x > 10) {
  print("x is greater than 10")
} else if (x > 3) {
  print("x is greater than 3 but less than or equal to 10")
} else {
  print("x is less than or equal to 3")
}


[1] "x is greater than 3 but less than or equal to 10"




### **2. Loops**

Loops are used to repeat a block of code multiple times. There are several types of loops in R:

#### **for Loop**
The `for` loop repeats a block of code a specified number of times, iterating over elements in a sequence or vector.

![](https://www.datamentor.io/sites/tutorial2program/files/r-for-loop.jpg)

In [66]:

# Example: Print numbers 1 to 5
for (i in 1:5) {
  print(i)
}


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5


In this example, the loop iterates over the numbers 1 to 5, printing each one.

#### **while Loop**
The `while` loop repeats a block of code as long as a specified condition is `TRUE`.

![](https://www.datamentor.io/sites/tutorial2program/files/r-while-loop.jpg)

In [67]:

# Example: Print numbers from 1 to 5 using a while loop
x <- 1

while (x <= 5) {
  print(x)
  x <- x + 1
}


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5


In this example, the loop keeps running as long as `x` is less than or equal to 5. The value of `x` is incremented after each iteration.

#### **repeat Loop**
The `repeat` loop repeats a block of code indefinitely until a `break` statement is encountered.

![](https://www.datamentor.io/sites/tutorial2program/files/r-repeat-loop.jpg)

In [68]:

# Example: Print numbers from 1 to 5 using a repeat loop
x <- 1

repeat {
  print(x)
  x <- x + 1
  
  if (x > 5) {
    break  # Exit the loop when x becomes greater than 5
  }
}


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5


In this example, the loop continues until `x` is greater than 5, at which point the `break` statement is used to exit the loop.

key differences between repeat and while loops in R:

1. Execution Order:
- while loop: Checks the condition FIRST, then executes code if true
```r

In [3]:
# while loop example
x <- 1
while(x < 5) {
    print(x)
    x <- x + 1
}

[1] 1
[1] 2
[1] 3
[1] 4



- repeat loop: Executes code FIRST, then checks condition to break


In [4]:
# repeat loop example  
x <- 1
repeat {
    print(x)
    x <- x + 1
    if(x >= 5) break
}

[1] 1
[1] 2
[1] 3
[1] 4


2. Exit Condition:
- while loop: Uses a condition in the loop header
- repeat loop: Must explicitly use `break` statement inside the loop

3. Guarantee of Execution:
- while loop: May never execute if condition is false initially
- repeat loop: Always executes at least once (similar to do-while in other languages)

4. Common Use Cases:
- while loop: Better when you know the condition beforehand
- repeat loop: Better when you need to execute code at least once or when exit conditions are complex

Would you like me to elaborate on any of these differences or provide more examples?

### **3. Loop Control (break, next)**

Loop control statements allow you to change the normal flow of loops.

#### **break**
The `break` statement is used to exit a loop prematurely when a specific condition is met.

![](https://www.datamentor.io/sites/tutorial2program/files/r-break-flowchart.jpg)

In [69]:

# Example: Exit the loop when i equals 3
for (i in 1:5) {
  if (i == 3) {
    break  # Exit the loop when i is 3
  }
  print(i)
}


[1] 1
[1] 2


In this example, the loop stops when `i` equals 3, so only the numbers 1 and 2 are printed.

#### **next**
The `next` statement is used to skip the current iteration of a loop and move on to the next one.

![](https://www.datamentor.io/sites/tutorial2program/files/r-next-flowchart.png)

In [2]:

# Example: Skip the number 3
for (i in 1:5) {
  if (i == 3) {
    next  # Skip the current iteration when i is 3
  }
  print(i)
}


[1] 1
[1] 2
[1] 4
[1] 5


In this example, the loop skips the iteration where `i` equals 3, so 1, 2, 4, and 5 are printed, but not 3.

### **4. Logical Operators**

Logical operators are used within control structures to create complex conditions.

- **& (AND)**: Both conditions must be `TRUE`.
- **| (OR)**: At least one of the conditions must be `TRUE`.
- **! (NOT)**: Negates a condition (turns `TRUE` into `FALSE`, and vice versa).

#### **Example with Logical Operators:**

In [72]:

x <- 7

if (x > 3 & x < 10) {
  print("x is between 3 and 10")
}


[1] "x is between 3 and 10"


In this example, the condition is `TRUE` because `x` is greater than 3 and less than 10.



### **5. Vectorized ifelse() Function**

R also provides the `ifelse()` function, which allows you to apply conditional logic over vectors. This is useful when you want to apply conditions element-wise.

In [73]:

# Example: Check whether each number in a vector is even or odd
numbers <- 1:5
result <- ifelse(numbers %% 2 == 0, "Even", "Odd")
print(result)


[1] "Odd"  "Even" "Odd"  "Even" "Odd" 


In this example, the `ifelse()` function checks whether each element of `numbers` is even or odd. If the number is divisible by 2 (`numbers %% 2 == 0`), it returns `"Even"`, otherwise `"Odd"`.

### **Conclusion:**

- **Conditional statements** (`if`, `else if`, `else`) allow for decision-making in your code.
- **Loops** (`for`, `while`, `repeat`) enable you to execute a block of code multiple times.
- **Loop control statements** (`break`, `next`) allow you to control the flow of loops, letting you exit or skip iterations.
- R also provides the vectorized `ifelse()` function for efficient element-wise condition checking.

Understanding and using these control structures will enable you to build dynamic and flexible programs in R.

### Printing Values in R

#### Basic Print Functions

| Function | Description | Example |
|----------|-------------|---------|
| `print()` | Basic printing function | `print("Hello")` → `[1] "Hello"` |
| `cat()` | Concatenates and prints | `cat("Hello", "World")` → `Hello World` |
| `message()` | Displays messages | `message("Note:")` → `Note:` |
| `sprintf()` | Formatted string printing | `sprintf("Value: %d", 42)` → `"Value: 42"` |

#### Key Differences

1. **print() Function**

In [19]:
   # Prints with index
   x <- c(1, 2, 3)
   print(x)     # [1] 1 2 3
   

[1] 1 2 3


In [20]:
   # Prints each element on new line with quote marks for strings
   print("Hello")  # [1] "Hello"

[1] "Hello"


2. **cat() Function**

In [21]:
   # Concatenates without index
   cat("Hello", "World")  # Hello World
   

Hello World

In [22]:
   # Supports escape characters
   cat("Line 1\nLine 2")  # Line 1
                          # Line 2

Line 1
Line 2

3. **message() Function**

In [23]:

   # Used for diagnostic messages
   message("Processing data...")  # Processing data...


Processing data...



4. **sprintf() Function**

In [24]:

   # Formatted printing
   name <- "John"
   age <- 25
   sprintf("Name: %s, Age: %d", name, age)  # "Name: John, Age: 25"


#### Format Specifiers for sprintf()

| Specifier | Description | Example |
|-----------|-------------|---------|
| `%s` | String | `sprintf("%s", "text")` |
| `%d` | Integer | `sprintf("%d", 42)` |
| `%f` | Float | `sprintf("%.2f", 3.14159)` |
| `%e` | Scientific | `sprintf("%e", 1000000)` |

#### Best Practices

1. **Choose the Right Function:**
   - Use `print()` for general output
   - Use `cat()` for concatenated, formatted output
   - Use `message()` for diagnostic messages
   - Use `sprintf()` for complex string formatting

2. **Formatting Numbers:**

In [25]:

   # Round to 2 decimal places
   num <- 3.14159
   sprintf("%.2f", num)  # "3.14"


3. **Multiple Values:**

In [26]:

   # Using cat()
   x <- 1:3
   cat("Values:", x, "\n")  # Values: 1 2 3
   
   # Using sprintf()
   sprintf("X: %d, Y: %d", 10, 20)  # "X: 10, Y: 20"
   

Values: 1 2 3 


4. **Suppressing Output:**

In [27]:

   # Using invisible()
   invisible(print("Hidden"))  # Nothing displayed
   

[1] "Hidden"


Would you like me to elaborate on any of these printing methods?

## Functions

What functions are, how to define and call them.

Functions are a fundamental concept in R, allowing you to group blocks of code into reusable components. They make your code modular, easier to debug, and more readable. A function takes inputs (called arguments), processes them, and returns an output (result). R also comes with many **built-in functions**, but you can create **user-defined functions** as well.

### **1. What are Functions?**

A function is a set of statements organized together to perform a specific task. Functions in R can:
- Take inputs (called parameters or arguments).
- Perform some operations using these inputs.
- Return one or more outputs.

#### **Built-in Functions Example:**

In [28]:

# Example: sqrt() is a built-in function to calculate square root
result <- sqrt(16)
print(result)  # Output: 4


[1] 4


In this example, `sqrt()` is a built-in function that takes one input (16) and returns its square root.

In [1]:
rnorm(1)

In [2]:
rnorm(2)

### **2. Defining a Function in R**

To define a function in R, use the `function()` keyword. A function has the following components:
- **Function Name**: A name to call the function.
- **Arguments/Parameters**: Inputs the function will use.
- **Body**: The block of code that performs the task.
- **Return Value**: The output returned by the function.

#### **Syntax:**

In [90]:

function_name <- function(arg1, arg2, ...) {
  # Code to execute
  return(value)  # Optional, if not provided, the last evaluated expression is returned
}


#### **Example: Simple Function**

In [91]:

# Define a function to add two numbers
add_numbers <- function(a, b) {
  sum <- a + b
  return(sum)  # Return the sum
}

# Call the function
result <- add_numbers(5, 3)
print(result)  # Output: 8


[1] 8


In this example, the function `add_numbers()` takes two inputs (`a` and `b`), adds them, and returns the result.

### **3. Calling a Function**

Once a function is defined, you can call it by using its name followed by parentheses containing the arguments.

#### **Example:**

In [92]:

# Function call with arguments
result <- add_numbers(10, 20)
print(result)  # Output: 30


[1] 30


You can pass arguments directly when calling the function.

### **4. Function Arguments**

Functions in R can have several types of arguments:
- **Positional Arguments**: Arguments are matched to function parameters by their position.
- **Named Arguments**: Arguments can be passed using names, in any order.
- **Default Arguments**: You can provide default values for arguments, which will be used if no value is supplied during the function call.

#### **Example with Named and Default Arguments:**

In [93]:

# Define a function with a default argument
greet <- function(name, greeting = "Hello") {
  paste(greeting, name)
}

# Call the function with both arguments
print(greet("Simran", "Hi"))  # Output: "Hi Simran"

# Call the function using default value for greeting
print(greet("Simran"))  # Output: "Hello Simran"


[1] "Hi Simran"
[1] "Hello Simran"


### **5. Returning Values from Functions**

A function can return a value using the `return()` statement, but if you don’t use `return()`, R will automatically return the value of the last evaluated expression.

#### **Example:**

In [94]:

# Function without explicit return statement
multiply <- function(a, b) {
  a * b  # The last evaluated expression is returned
}

# Call the function
result <- multiply(4, 5)
print(result)  # Output: 20


[1] 20


In this case, R automatically returns the result of `a * b` because it’s the last expression in the function.

### **6. Nested Functions**

You can define functions within other functions. The inner function is only accessible within the outer function.

#### **Example:**

In [95]:

# Define a function with a nested function
outer_function <- function(x) {
  inner_function <- function(y) {
    return(y + 2)
  }
  
  return(inner_function(x) * 3)
}

# Call the outer function
result <- outer_function(5)
print(result)  # Output: 21 (5 + 2 = 7; 7 * 3 = 21)


[1] 21


### **7. Anonymous Functions**

In R, you can create anonymous functions, i.e., functions without a name, especially useful for short tasks like passing functions as arguments to other functions.

#### **Example:**

In [96]:

# Define an anonymous function inside sapply
result <- sapply(1:5, function(x) x^2)  # Squaring numbers from 1 to 5
print(result)  # Output: 1 4 9 16 25


[1]  1  4  9 16 25


Here, the anonymous function squares each element in the sequence `1:5`.

### **8. Lazy Evaluation**

R uses **lazy evaluation** for function arguments, meaning arguments are only evaluated when they are actually used inside the function.

#### **Example:**

In [97]:

lazy_function <- function(a, b) {
  print(a)
  # b is never used, so it won't be evaluated
}

lazy_function(10, stop("This will not be evaluated"))  # Output: 10


[1] 10


In this example, since `b` is not used inside the function, the error from `stop()` is never triggered.

### **9. Variable Scope in Functions**

In R, the scope of a variable refers to the regions of a program where the variable can be accessed. There are two types:
- **Local Scope**: Variables declared inside a function are local to that function and cannot be accessed outside it.
- **Global Scope**: Variables declared outside any function are global and can be accessed from anywhere in the program.

#### **Example:**

In [98]:
x <- 10  # Global variable

my_function <- function() {
  y <- 5  # Local variable
  print(y)
}

my_function()  # Output: 5
print(x)  # Output: 10
# print(y)  # Error: object 'y' not found (because y is local to my_function)

[1] 5
[1] 10


### **10. Common Built-in Functions**

R comes with many useful built-in functions. Some commonly used ones include:

- **`sum()`**: Adds all elements in a vector.
- **`mean()`**: Computes the average of a numeric vector.
- **`length()`**: Returns the number of elements in a vector.
- **`max()` / `min()`**: Returns the maximum or minimum value in a vector.
- **`sqrt()`**: Computes the square root of a number.
- **`paste()`**: Concatenates strings.

### **11. Function Documentation**

It is a good practice to add comments or documentation inside your functions to explain what they do, what inputs they expect, and what outputs they produce. This makes it easier for others (and yourself) to understand the code later.

In [100]:

# Function to calculate the area of a rectangle
# Args:
#   length: The length of the rectangle.
#   width: The width of the rectangle.
# Returns:
#   The area of the rectangle.
calculate_area <- function(length, width) {
  area <- length * width
  return(area)
}


### **Conclusion:**

- Functions in R are essential tools for making your code reusable, modular, and easier to manage.
- You can define functions using the `function()` keyword, and they can take arguments, return values, and perform complex tasks.
- Functions can have default arguments, return values explicitly or implicitly, and even be anonymous or nested.
- Understanding functions in R will allow you to write cleaner, more efficient code, making it easier to handle complex tasks.

## Built-in functions in base R

R has a vast number of built-in functions, and they are organized into different packages that come pre-installed with R. Here’s a categorized summary of some of the most commonly used built-in functions in base R:

### **1. Basic Data Handling Functions**
- **`c()`**: Combine values into a vector or list.
- **`length()`**: Get the length of a vector or list.
- **`sum()`, `mean()`, `median()`, `sd()`, `var()`**: Basic statistical functions.
- **`min()`, `max()`, `range()`**: Find minimum, maximum, and range of values.
- **`sort()`, `order()`, `rank()`**: Sorting and ranking functions.

### **2. Type Conversion Functions**
- **`as.numeric()`, `as.character()`, `as.factor()`, `as.Date()`, `as.matrix()`, `as.data.frame()`**: Convert objects to different types.

### **3. Logical Functions**
- **`is.na()`, `anyNA()`**: Check for missing values.
- **`is.numeric()`, `is.character()`, `is.logical()`, `is.data.frame()`, etc.**: Test for specific data types.
- **`any()`, `all()`**: Check if any or all values meet a condition.
- **`which()`**: Identify indices that meet a condition.

### **4. Mathematical Functions**
- **`abs()`, `sqrt()`, `log()`, `exp()`, `round()`, `ceiling()`, `floor()`, `sin()`, `cos()`, `tan()`**: Mathematical and trigonometric operations.

### **5. Statistical Functions**
- **`mean()`, `median()`, `sd()`, `var()`, `cov()`, `cor()`, `quantile()`**: Summary statistics.
- **`summary()`**: Provides a summary of an object, including min, median, mean, max, etc.
- **`table()`, `prop.table()`**: Create frequency and proportion tables.

### **6. Random Number Generation and Probability Functions**
- **`sample()`**: Generate random samples.
- **`rnorm()`, `runif()`, `rbinom()`, `rpois()`, etc.**: Generate random numbers from specific distributions.
- **`dnorm()`, `dunif()`, `dbinom()`, `dpois()`, etc.**: Density functions.
- **`pnorm()`, `punif()`, `pbinom()`, `ppois()`, etc.**: Cumulative distribution functions.

### **7. Data Manipulation Functions**
- **`subset()`**: Subset data frames or matrices based on conditions.
- **`merge()`**: Merge two data frames.
- **`rbind()`, `cbind()`**: Row-bind and column-bind data frames or matrices.
- **`apply()`, `lapply()`, `sapply()`, `tapply()`, `mapply()`**: Apply functions over elements or dimensions.
- **`aggregate()`**: Aggregate data based on a grouping factor.

### **8. String Handling Functions**
- **`nchar()`**: Count the number of characters.
- **`toupper()`, `tolower()`**: Convert text to upper or lower case.
- **`paste()`, `paste0()`**: Concatenate strings.
- **`grep()`, `grepl()`, `gsub()`**: Search and replace patterns in strings.

### **9. Date and Time Functions**
- **`Sys.Date()`, `Sys.time()`**: Get the current date or time.
- **`as.Date()`, `as.POSIXct()`, `as.POSIXlt()`**: Convert to date or time formats.
- **`format()`**: Format dates and times.
- **`difftime()`**: Calculate time difference.

### **10. Plotting and Graphics Functions**
- **`plot()`, `hist()`, `boxplot()`, `barplot()`, `pie()`, `pairs()`**: Basic plotting functions.
- **`lines()`, `points()`, `text()`, `legend()`**: Add elements to existing plots.
- **`abline()`, `curve()`**: Add lines and curves to plots.
- **`par()`**: Set or query graphical parameters.

### **11. Control Flow Functions**
- **`if()`, `else`, `ifelse()`**: Conditional statements.
- **`for()`, `while()`, `repeat()`**: Looping structures.
- **`break`, `next`**: Control statements to exit or skip iterations in loops.

### **12. Environment and Workspace Functions**
- **`ls()`, `rm()`, `exists()`**: List, remove, or check objects in the environment.
- **`getwd()`, `setwd()`**: Get or set the working directory.
- **`save()`, `load()`**: Save or load R objects.
- **`library()`, `require()`**: Load packages.

### **13. File I/O Functions**
- **`read.csv()`, `read.table()`, `readLines()`, `scan()`**: Read data from files.
- **`write.csv()`, `write.table()`, `writeLines()`**: Write data to files.

### **14. Function Definition and Debugging Functions**
- **`function()`**: Define a function.
- **`formals()`, `body()`, `environment()`**: Get or set parts of a function.
- **`traceback()`, `debug()`, `browser()`, `tryCatch()`**: Debugging tools.

### **15. Utility Functions**
- **`print()`, `cat()`**: Print objects to the console.
- **`str()`**: Display the internal structure of an R object.
- **`typeof()`, `class()`**: Get the type or class of an object.
- **`rep()`, `seq()`**: Repeat or generate sequences.



### **16. Statistical Modeling and Regression Functions**
- **`lm()`**: Linear modeling for regression analysis.
- **`glm()`**: Generalized linear models, including logistic regression.
- **`nls()`**: Nonlinear least squares fitting.
- **`predict()`**: Predict values based on a model.
- **`aov()`, `anova()`**: Analysis of variance for comparing models.
- **`step()`**: Stepwise model selection.
- **`summary()`**: Summarize model results.

### **17. Hypothesis Testing Functions**
- **`t.test()`**: Perform a t-test.
- **`chisq.test()`**: Chi-squared test for independence.
- **`cor.test()`**: Test for correlation between variables.
- **`wilcox.test()`**: Wilcoxon rank-sum test.
- **`kruskal.test()`**: Kruskal-Wallis rank-sum test.
- **`fisher.test()`**: Fisher’s exact test for count data.

### **18. Advanced Matrix and Array Manipulation**
- **`matrix()`**: Create a matrix.
- **`array()`**: Create an array with more than two dimensions.
- **`apply()`, `sweep()`, `outer()`**: Apply functions over margins of arrays.
- **`diag()`, `t()`, `rowSums()`, `colSums()`, `rowMeans()`, `colMeans()`**: Matrix-specific functions.

### **19. Reshaping and Data Transformation**
- **`reshape()`, `reshape2::melt()`, `reshape2::cast()`**: Reshape data frames (from wide to long format and vice versa).
- **`cut()`**: Convert numeric data into categorical (factor) data.
- **`stack()`, `unstack()`**: Stack or unstack data frames for reshaping.
- **`split()`**: Split a data frame or vector by a factor.

### **20. Time Series and Date Analysis**
- **`ts()`**: Create a time series object.
- **`acf()`, `pacf()`**: Autocorrelation and partial autocorrelation functions.
- **`decompose()`**: Decompose a time series into seasonal, trend, and remainder components.
- **`stl()`**: Seasonal decomposition of time series by Loess smoothing.
- **`diff()`**: Calculate lagged differences for time series.

### **21. Optimization and Numerical Methods**
- **`optim()`**: General-purpose optimization function.
- **`nlm()`**: Nonlinear minimization.
- **`constrOptim()`**: Optimization with constraints.
- **`uniroot()`**: Find roots of continuous functions.
- **`integrate()`**: Numerical integration of functions.

### **22. Special Mathematical Functions**
- **`choose()`, `factorial()`, `gamma()`, `lgamma()`**: Combinatorial and gamma functions.
- **`round()`, `signif()`, `floor()`, `ceiling()`**: Rounding and approximation.
- **`Re()`, `Im()`**: Real and imaginary parts of complex numbers.
- **`Mod()`, `Arg()`**: Modulus and argument of complex numbers.

### **23. Set Operations**
- **`union()`, `intersect()`, `setdiff()`**: Set operations on vectors.
- **`unique()`, `duplicated()`**: Find unique or duplicated values.
- **`match()`, `%in%`**: Matching and logical operations on vectors.

### **24. Package Management and Utilities**
- **`install.packages()`**: Install a new package.
- **`library()`**: Load a package.
- **`require()`**: Conditionally load a package.
- **`update.packages()`**: Update installed packages.
- **`find.package()`**: Find paths of installed packages.

### **25. Memory and System Functions**
- **`gc()`**: Trigger garbage collection to free up memory.
- **`memory.size()`, `memory.limit()`**: Memory management on Windows.
- **`system()`**, **`system2()`**: Call external OS commands.
- **`Sys.info()`, `Sys.time()`, `Sys.sleep()`**: Get system info, time, or sleep function.

### **26. Utility Functions for Programming**
- **`identical()`, `all.equal()`**: Compare objects for equality.
- **`eval()`**: Evaluate an expression.
- **`substitute()`, `quote()`, `expression()`**: Expression handling functions.
- **`with()`, `within()`**: Evaluate code within a data context.

---

### **Specialized Packages with Additional Functions**

Apart from these base functions, R has specialized packages that provide even more functions tailored to specific needs. Some popular packages include:

- **`dplyr`**: Provides functions for data manipulation (e.g., `filter()`, `select()`, `mutate()`, `summarize()`).
- **`tidyr`**: Data tidying tools (e.g., `gather()`, `spread()`, `pivot_longer()`, `pivot_wider()`).
- **`ggplot2`**: Advanced data visualization functions (e.g., `ggplot()`, `geom_point()`, `geom_bar()`).
- **`stringr`**: String manipulation functions (e.g., `str_detect()`, `str_replace()`, `str_split()`).
- **`lubridate`**: Date and time manipulation (e.g., `ymd()`, `hms()`, `interval()`).
- **`caret`**: Functions for machine learning (e.g., `train()`, `predict()`, `confusionMatrix()`).

---



This list covers some of the core built-in functions in R, though there are many more specific functions in R’s standard and contributed packages. You can access the full list by typing `help.start()` in the R console, and for a quick reference, type `help()` followed by the function name (e.g., `help(mean)`) to get detailed documentation on any function.

Each of these categories or packages enables further possibilities, and for a full listing, it can be helpful to explore R's built-in help functions such as:

```r
help.start()  # Opens the full R documentation in a web browser.
help(package = "base")  # Lists all functions in the base package.
``` 

In combination, these functions cover almost all data manipulation, analysis, and visualization needs in R.

## Data Structures

Vectors, lists, matrices, data frames, and factors.

### Data Structures in R

R provides various data structures that allow you to organize and manipulate data efficiently. Each data structure is suited for specific tasks, making it crucial to choose the right one based on your needs. The most common data structures in R are:
- **Vectors**
- **Lists**
- **Matrices**
- **Data Frames**
- **Factors**
  
These structures vary in terms of dimension, type of data they can store, and how they are used.

### **1. Vectors**

A **vector** is the simplest and most basic data structure in R. It is a one-dimensional array that holds elements of the same type, such as numeric, character, logical, or integer.

#### **Types of Vectors:**
- **Numeric Vector**: Stores numbers (e.g., 1.5, 2.3, -5).
- **Integer Vector**: Stores integer values (e.g., 1, 2, -10).
- **Character Vector**: Stores text (e.g., "apple", "banana").
- **Logical Vector**: Stores boolean values (`TRUE`, `FALSE`).


- **Vector Arithmetic**: Vectors support element-wise operations.

In [5]:
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result <- vec1 + vec2  # Element-wise addition: c(5, 7, 9)

**1. What is a Vector?**
A vector in R is a one-dimensional data structure that contains elements of the same type (numeric, character, logical, etc.). Vectors can have any length, including zero (an empty vector).

- **Homogeneous Structure**: All elements within a vector must be of the same type. If you try to mix types, R will coerce them to the most flexible type.
- **Types of Vectors**: Numeric, character, logical, integer, and complex.

---

**2. Creating Vectors**
You can create vectors in R using a variety of methods.

- **Using `c()` (Combine)**:

In [19]:

numeric_vector <- c(1, 2, 3, 4, 5)      # Numeric vector
character_vector <- c("a", "b", "c")    # Character vector
logical_vector <- c(TRUE, FALSE, TRUE)  # Logical vector
  

  
- **Using `:` for Sequences**:

In [20]:

seq_vector <- 1:10  # Creates a sequence from 1 to 10
seq_vector

- **Using `seq()`**:

In [22]:

seq_vector <- seq(from = 1, to = 10, by = 2)  # Creates a sequence from 1 to 10 with step 2
seq_vector

- **Using `rep()` (Repeat)**:

In [23]:

rep_vector <- rep(5, times = 3)    # Repeats the number 5 three times: 5 5 5
rep_vector2 <- rep(1:3, each = 2)  # Repeats each element twice: 1 1 2 2 3 3
rep_vector
rep_vector2

---

**3. Vector Types and Coercion**
Vectors can only contain elements of the same type. If you attempt to combine different types, R will coerce elements to the most flexible type following this hierarchy:

**Logical < Integer < Numeric < Complex < Character**

- **Example of Coercion**:

In [25]:

mixed_vector <- c(1, "two", TRUE)  # R coerces all elements to character: "1" "two" "TRUE"
mixed_vector

**4. Vector Operations**

- **Arithmetic Operations**:
  Arithmetic operations on vectors are element-wise:

In [37]:

x <- c(1, 2, 3)
y <- c(4, 5, 6)
x + y  # Result: 5 7 9
x * y  # Result: 4 10 18


- **Logical Comparisons**:
  Logical operations are also element-wise:

In [38]:

x <- c(10, 20, 30)
x > 15  # Result: FALSE TRUE TRUE


- **Vectorized Functions**:
  Many functions are vectorized, meaning they can operate on each element of the vector simultaneously.

In [39]:

sqrt(c(4, 9, 16))  # Result: 2 3 4


- **Recycling Rule**:
  If vectors of different lengths are used in an operation, R will “recycle” the shorter vector to match the length of the longer one.

In [40]:

x <- c(1, 2, 3)
y <- c(4, 5)
x + y  # Result: 5 7 7 (since y becomes 4 5 4)


"longer object length is not a multiple of shorter object length"


---

**5. Accessing Vector Elements (Indexing)**
You can access and manipulate elements in a vector using indexing.

- **Single Element Access**:

In [58]:

x <- c(10, 20, 30, 40)
x[2]  # Accesses the second element: 20


- **Multiple Elements Access**:

In [59]:
x[c(1, 3)]  # Accesses the first and third elements: 10 30

- **Negative Indexing**:

In [60]:

x[-2]  # Excludes the second element: 10 30 40


- **Logical Indexing**:

In [61]:

x[x > 20]  # Selects elements greater than 20: 30 40


- **Named Indexing**:

In [62]:

named_vector <- c(a = 1, b = 2, c = 3)
named_vector["b"]  # Accesses element with name "b": 2


---

**6. Common Vector Functions**

- **Length of a Vector**:

In [63]:

length(x)  # Returns the number of elements in x


- **Summing and Statistical Functions**:

In [64]:
x
sum(x) 
mean(x)
min(x)
max(x)

- **Sorting and Ordering**:

In [65]:

sort(x)      # Sorts the vector in ascending order
order(x)     # Returns indices that sort the vector


- **Set Operations**:

In [66]:

union(x, y)       # Union of x and y
intersect(x, y)   # Intersection of x and y
setdiff(x, y)     # Elements in x but not in y


---

**7. Modifying Vectors**

- **Replacing Elements**:

In [67]:
x
x[2] <- 100  # Replaces the second element with 100
x

- **Appending Elements**:

In [68]:
x
x <- c(x, 200)  # Adds 200 at the end of the vector
x

- **Removing Elements**:

In [69]:
x
x <- x[-2]  # Removes the second element from the vector
x

---

**8. Handling Missing Values**

- **Introducing Missing Values**:

In [71]:

x <- c(1, 2, NA, 4)
x

- **Checking for Missing Values**:

In [72]:

is.na(x)  # Returns TRUE for elements that are NA


- **Removing Missing Values**:

In [73]:
x
x <- x[!is.na(x)]  # Removes all NA values from the vector
x

- **Replacing Missing Values**:

In [75]:
x <- c(1, 2, NA, 4)
x[is.na(x)] <- 0  # Replaces NA values with 0
x

---

**9. Special Types of Vectors**

- **Factors**: Categorical data stored as integers with associated levels.

In [76]:

factor_vector <- factor(c("low", "medium", "high"))
factor_vector

- **Lists (Special Case)**: While lists can contain different types, they can be thought of as vectors where each element is itself an R object.

In [77]:

list_vector <- list(name = "Alice", age = 25, scores = c(90, 85, 88))
list_vector

---

**10. Examples of Vector Use in Data Analysis**

- **Data Filtering**:

In [78]:

data <- c(100, 200, 300, 400)
data[data > 150]  # Select elements greater than 150
  

- **Statistical Summaries**:

In [79]:

  values <- c(5, 10, 15)
  mean(values)      # Calculate mean
  median(values)    # Calculate median


---

**11. Vectorized Operations and Efficiency**

R is optimized for vectorized operations, allowing it to perform calculations on entire vectors without explicit loops, which makes code concise and efficient. For example:

In [80]:

# Using vectorized operations instead of loops
x <- 1:1000
y <- x * 2  # Efficiently doubles each element in x
# y

Vectorized code in R is often faster and more readable than using loops for element-wise calculations.

---

**Summary**

- **Vectors** are foundational in R and are used in almost every analysis.
- They can hold data of one type only, with coercion occurring when mixed types are combined.
- Indexing, subsetting, and vectorized operations are key to efficiently working with vectors.
- **Vectorized operations** make R efficient, allowing calculations across an entire vector without needing explicit loops.

**Example**

In [81]:
# Initialize empty vectors to store even and odd numbers
even_numbers <- c()
odd_numbers <- c()

# Loop from 1 to 100
for (i in 1:100) {
  # Check if the number is even
  if (i %% 2 == 0) {
    even_numbers <- c(even_numbers, i)  # Append to even_numbers vector
  } else {
    odd_numbers <- c(odd_numbers, i)    # Append to odd_numbers vector
  }
}

# Print the results
cat("Even numbers:\n", even_numbers, "\n")
cat("Odd numbers:\n", odd_numbers, "\n")


Even numbers:
 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 
Odd numbers:
 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 


### **2. Lists**

A **list** is a more flexible data structure that can contain elements of different types (e.g., numeric, character, logical, vectors, even other lists). It is a collection of objects, allowing you to mix and match data types.

**1. What is a List?**

- A **list** in R is a collection of elements where each element can be of a different data type (e.g., numeric, character, vector, data frame, etc.).
- Unlike vectors, lists do not require elements to be of the same type.
- Lists are particularly useful when handling complex datasets or objects that have multiple attributes, such as the results of statistical tests or machine learning models.


**2. Creating a List**

You can create a list using the `list()` function and include elements of any data type.

**Basic List Creation**

In [95]:

my_list <- list(name = "Alice", age = 25, scores = c(85, 90, 88))
print(my_list)


$name
[1] "Alice"

$age
[1] 25

$scores
[1] 85 90 88



In this example:
- `name` is a character string.
- `age` is a numeric value.
- `scores` is a numeric vector.

#### **Unnamed Lists**

Elements of a list can also be created without names:

In [96]:

unnamed_list <- list("Alice", 25, c(85, 90, 88))
unnamed_list


**3. Accessing List Elements**

Elements in a list can be accessed in different ways:

**Using Dollar Sign (`$`) Notation**

- When list elements are named, use `$` followed by the element name:

In [97]:
 
my_list$name    # Accesses "Alice"
my_list$scores  # Accesses the vector c(85, 90, 88)


**Using Double Square Brackets (`[[ ]]`)**

- `[[ ]]` is used for accessing individual elements in the list, particularly when elements are unnamed or to return the element itself, not as a sub-list.

In [98]:

my_list[[1]]     # Accesses "Alice"
my_list[[3]]     # Accesses c(85, 90, 88)


#### **Using Single Square Brackets (`[ ]`)**

- Single square brackets `[]` return a sub-list rather than the element itself.

In [99]:

my_list[1]       # Returns a list with one element: "Alice"


#### **Accessing List Elements by Name in Brackets**

- Named elements can also be accessed by names within square brackets:

In [100]:

my_list[["age"]]  # Accesses 25


---

### **4. Modifying List Elements**

List elements can be modified by directly assigning new values to specific elements.

In [101]:
my_list
# Modify existing element
my_list$age <- 26
my_list[["name"]] <- "Bob"
my_list
# Add a new element
my_list$gender <- "Female"
my_list
# Remove an element
my_list$age <- NULL  # Removes "age" element from the list
my_list

---

### **5. List Operations**

#### **Combining Lists**

Use `c()` to combine multiple lists into a single list:

In [102]:
list1 <- list(a = 1, b = 2)
list2 <- list(c = 3, d = 4)
list1
list2
combined_list <- c(list1, list2)

combined_list

#### **Applying Functions to Lists**

`lapply()` and `sapply()` are used to apply functions to each element of a list.

- **`lapply()`**: Returns a list of the same length as the input.

In [103]:

num_list <- list(a = 1:5, b = 6:10)
lapply(num_list, mean)  # Calculates the mean of each element (sub-list)


- **`sapply()`**: Returns a simplified vector or matrix if possible.

In [104]:

sapply(num_list, mean)


#### **Converting Lists to Other Data Structures**

- **To Vector**: If all elements are of the same type, use `unlist()`:

In [106]:

simple_list <- list(1, 2, 3, 4)
simple_list
vector <- unlist(simple_list)  # Converts to vector: 1 2 3 4
vector

- **To Data Frame**: If list elements have compatible structures, use `as.data.frame()`:

In [107]:

df <- as.data.frame(my_list)
df

name,scores,gender
<chr>,<dbl>,<chr>
Bob,85,Female
Bob,90,Female
Bob,88,Female


---

### **6. Nested Lists**

A list can contain other lists as elements, creating nested lists.

In [109]:

nested_list <- list(
  name = "Alice",
  age = 25,
  scores = list(math = 90, science = 95)
)
nested_list

- Accessing elements in nested lists requires multiple indexing levels:

In [110]:

  nested_list$scores$math     # Accesses 90
  nested_list[["scores"]][["science"]]  # Accesses 95


### **7. Common List Functions**

| Function       | Description                                               | Example                                                   |
|----------------|-----------------------------------------------------------|-----------------------------------------------------------|
| `length()`     | Returns the number of elements in a list                  | `length(my_list)`                                         |
| `names()`      | Returns or sets the names of list elements                | `names(my_list)`                                          |
| `unlist()`     | Flattens a list to a vector                               | `unlist(my_list)`                                         |
| `lapply()`     | Applies a function to each element of a list, returns list| `lapply(num_list, mean)`                                  |
| `sapply()`     | Applies a function to each element, returns simplified output| `sapply(num_list, mean)`                             |
| `is.list()`    | Checks if an object is a list                             | `is.list(my_list)`                                        |
| `as.list()`    | Converts another object (e.g., vector) to a list          | `as.list(c(1, 2, 3))`                                     |
| `str()`        | Displays the structure of a list                          | `str(nested_list)`                                        |

### **8. Examples of List Usage**

#### **Storing and Accessing Statistical Results**

Lists are often used to store results from statistical functions that return multiple outputs.

In [112]:

model <- lm(mpg ~ hp, data = mtcars)

summary_list <- summary(model)

print(summary_list)

summary_list$coefficients  # Access coefficients



Call:
lm(formula = mpg ~ hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,	Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07



Unnamed: 0,Estimate,Std. Error,t value,Pr(>|t|)
(Intercept),30.09886054,1.633921,18.421246,6.642736e-18
hp,-0.06822828,0.0101193,-6.742389,1.787835e-07


#### **Using Lists in Data Analysis Pipelines**

Lists can hold multiple datasets or intermediate results in a data analysis workflow.

In [114]:

analysis_results <- list(
  summary = summary(mtcars),
  structure = str(mtcars),
  correlation = cor(mtcars$mpg, mtcars$hp)
)

analysis_results


'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...


$summary
      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.  

### **9. Best Practices with Lists**

- **Naming Elements**: When creating lists, it’s helpful to name elements, making it easier to understand and access list contents.
- **Consistency**: If possible, keep similar types or structures in a list to simplify processing.
- **Nested Lists**: Nested lists are powerful but can become complex. Try to use clear naming and consistent structure when nesting lists.

#### **Lists vs. Vectors:**
- **Homogeneity**: Vectors can only hold elements of the same type, while lists can store different types of objects.
- **Length**: Lists can contain elements of varying lengths, while vectors must have elements of the same length.

### **10. Summary**

- Lists in R are flexible containers that can store heterogeneous data types, including other lists.
- Lists are commonly used for storing complex outputs, handling nested data, and managing multi-object data.
- Access elements using `$`, `[[ ]]`, and `[ ]` indexing based on your needs.
- R provides useful functions like `lapply()`, `sapply()`, and `unlist()` for manipulating lists efficiently.

Understanding lists is essential for effective R programming, particularly in scenarios involving complex data or objects, making lists one of the most versatile and essential data structures in R.

### **3. Matrices**

A **matrix** is a two-dimensional array that contains elements of the same type (numeric, logical, or character). Matrices are essentially vectors arranged in rows and columns.

#### **Creating a Matrix:**

Matrices in R are two-dimensional, homogeneous data structures used for storing data in rows and columns. Unlike lists or data frames, all elements in a matrix must be of the same data type, making them suitable for numerical computations, linear algebra operations, and organizing tabular data in a mathematical format.

**1. What is a Matrix?**

- A **matrix** in R is a collection of elements arranged in a rectangular layout with rows and columns.
- All elements in a matrix must be of the same data type, typically numeric or character.
- Matrices are commonly used for mathematical operations, statistics, and machine learning models due to their structure and efficient computation.

#### **2. Creating Matrices**

Matrices can be created using the `matrix()` function, combining vectors, or reshaping arrays.

**Using `matrix()` Function**

The most common way to create a matrix is with the `matrix()` function, where you specify the elements, number of rows, and number of columns.

In [139]:

# Create a 3x3 numeric matrix
m <- matrix(1:9, nrow = 3, ncol = 3)

m

0,1,2
1,4,7
2,5,8
3,6,9


**Setting Data by Rows or Columns**

By default, the `matrix()` function fills data column-wise. Use the `byrow = TRUE` argument to fill data row-wise.

In [126]:

# Fill matrix by rows
m_byrow <- matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE)
m_byrow


0,1,2
1,2,3
4,5,6
7,8,9


**Creating a Matrix from Vectors**

You can combine vectors to form a matrix using `cbind()` (column-bind) or `rbind()` (row-bind) functions.

In [127]:
# Create vectors
v1 <- c(1, 2, 3)
v2 <- c(4, 5, 6)

In [128]:
# Combine vectors into a matrix
m_cbind <- cbind(v1, v2)  # Column bind
m_cbind

v1,v2
1,4
2,5
3,6


In [129]:
m_rbind <- rbind(v1, v2)  # Row bind
m_rbind

0,1,2,3
v1,1,2,3
v2,4,5,6


**3. Dimensions of a Matrix**

- The `dim()` function returns the dimensions of a matrix as a vector, showing the number of rows and columns.

In [130]:

dim(m)  # Returns c(3, 3)


- Other useful functions for examining dimensions:
  - `nrow(m)`: Returns the number of rows.
  - `ncol(m)`: Returns the number of columns.
  - `length(m)`: Returns the total number of elements in the matrix.

**4. Accessing Matrix Elements**

Matrix elements can be accessed by specifying the row and column indices in square brackets `[,]`.

In [131]:

# Access element at row 2, column 3
m[2, 3]

# Access entire row or column
m[2, ]   # Entire second row
m[, 3]   # Entire third column


**Using Logical and Named Indexing**

- **Logical Indexing**: You can access elements based on conditions.

In [132]:

m[m > 5]  # Access elements greater than 5


- **Named Rows and Columns**: You can set row and column names to make accessing elements easier.

In [140]:
m <- matrix(1:9, nrow = 3, ncol = 3)
m

0,1,2
1,4,7
2,5,8
3,6,9


In [141]:
rownames(m) <- c("A", "B", "C")
colnames(m) <- c("X", "Y", "Z")

m["A", "Y"]

**5. Modifying Matrix Elements**

You can modify elements in a matrix by assigning new values to specific indices.

In [142]:
m
# Modify element at row 1, column 1
m[1, 1] <- 10
m

Unnamed: 0,X,Y,Z
A,1,4,7
B,2,5,8
C,3,6,9


Unnamed: 0,X,Y,Z
A,10,4,7
B,2,5,8
C,3,6,9


In [143]:
# Modify an entire row or column
m
m[1, ] <- c(10, 20, 30)  # Set first row to new values
m

Unnamed: 0,X,Y,Z
A,10,4,7
B,2,5,8
C,3,6,9


Unnamed: 0,X,Y,Z
A,10,20,30
B,2,5,8
C,3,6,9


**6. Matrix Operations**

R provides a variety of operations specifically for matrices, including basic arithmetic and advanced mathematical functions.

**Arithmetic Operations**

Matrix arithmetic operations are performed element-wise:

In [145]:
m1 <- matrix(1:4, nrow = 2)
m2 <- matrix(5:8, nrow = 2)
m1
m2

0,1
1,3
2,4


0,1
5,7
6,8


In [146]:
# Element-wise addition, subtraction, multiplication, division
m_add <- m1 + m2
m_add

0,1
6,10
8,12


In [147]:
m_subtract <- m1 - m2
m_subtract

0,1
-4,-4
-4,-4


In [148]:
m_multiply <- m1 * m2
m_multiply

0,1
5,21
12,32


In [149]:
m_divide <- m1 / m2
m_divide

0,1
0.2,0.4285714
0.3333333,0.5


**Matrix Multiplication**

Use `%*%` for matrix multiplication (dot product) instead of `*`.

In [150]:

# Matrix multiplication
m_mult <- m1 %*% m2  # Only works if matrices are conformable (e.g., 2x3 %*% 3x2)
m_mult

0,1
23,31
34,46


**Transpose of a Matrix**

Transpose flips the rows and columns. Use the `t()` function.

In [152]:

m_transpose <- t(m)
m_transpose

Unnamed: 0,A,B,C
X,10,2,3
Y,20,5,6
Z,30,8,9


**Matrix Inversion**

The inverse of a matrix is calculated using `solve()`.

In [153]:

m_inverse <- solve(m)  # Only square matrices with a non-zero determinant


ERROR: Error in solve.default(m): Lapack routine dgesv: system is exactly singular: U[3,3] = 0


**7. Matrix Functions**

R provides many built-in functions to perform calculations on matrices.

| Function       | Description                                               | Example                        |
|----------------|-----------------------------------------------------------|--------------------------------|
| `rowSums()`    | Calculates the sum of each row                            | `rowSums(m)`                   |
| `colSums()`    | Calculates the sum of each column                         | `colSums(m)`                   |
| `rowMeans()`   | Calculates the mean of each row                           | `rowMeans(m)`                  |
| `colMeans()`   | Calculates the mean of each column                        | `colMeans(m)`                  |
| `diag()`       | Extracts or sets the diagonal elements                    | `diag(m)`                      |
| `det()`        | Calculates the determinant of a square matrix             | `det(m)`                       |
| `apply()`      | Applies a function to rows or columns of a matrix         | `apply(m, 1, sum)`             |
| `eigen()`      | Calculates eigenvalues and eigenvectors                   | `eigen(m)`                     |
| `svd()`        | Singular Value Decomposition                              | `svd(m)`                       |

**8. Using `apply()` with Matrices**

`apply()` is a powerful function for applying operations to rows or columns of a matrix.

- The syntax for `apply()` is: `apply(matrix, margin, function)`.
  - `margin = 1`: Apply the function to rows.
  - `margin = 2`: Apply the function to columns.

In [154]:

# Sum of each row
apply(m, 1, sum)

# Mean of each column
apply(m, 2, mean)


**9. Combining Matrices**

Use `rbind()` to add rows and `cbind()` to add columns to matrices.

In [156]:

# Adding a row
m <- rbind(m, c(10, 11, 12))
m
# Adding a column
m <- cbind(m, c(13, 14, 15, 16))
m

"number of columns of result is not a multiple of vector length (arg 2)"


Unnamed: 0,X,Y,Z,Unnamed: 4
A,10,20,30,13
B,2,5,8,14
C,3,6,9,15
,10,11,12,16
,10,11,12,10


"number of rows of result is not a multiple of vector length (arg 2)"


Unnamed: 0,X,Y,Z,Unnamed: 4,Unnamed: 5
A,10,20,30,13,13
B,2,5,8,14,14
C,3,6,9,15,15
,10,11,12,16,16
,10,11,12,10,13


**10. Converting Other Data Types to Matrices**

You can convert vectors, lists, and data frames to matrices using `as.matrix()`.

In [158]:
# Convert a vector to a matrix
v <- 1:9
v

In [159]:
matrix_v <- matrix(v, nrow = 3, ncol = 3)
matrix_v

0,1,2
1,4,7
2,5,8
3,6,9


In [160]:
# Convert a data frame to a matrix
df <- data.frame(a = 1:3, b = 4:6)
df

a,b
<int>,<int>
1,4
2,5
3,6


In [161]:
matrix_df <- as.matrix(df)
matrix_df

a,b
1,4
2,5
3,6


**11. Special Matrices**

- **Identity Matrix**: A square matrix with `1`s on the diagonal and `0`s elsewhere. Use `diag()` to create an identity matrix.

In [162]:

identity_matrix <- diag(3)  # Creates a 3x3 identity matrix
identity_matrix

0,1,2
1,0,0
0,1,0
0,0,1


- **Diagonal Matrix**: Create a matrix with specified values on the diagonal using `diag()`.

In [163]:
diag_matrix <- diag(c(2, 4, 6))  # Diagonal elements are 2, 4, and 6
diag_matrix

0,1,2
2,0,0
0,4,0
0,0,6



**12. Summary of Matrices in R**

- **Creation**: Use `matrix()`, `cbind()`, and `rbind()` functions.
- **Access and Modify**: Use `[row, column]` notation, logical conditions, or names.
- **Operations**: Supports arithmetic, transpose (`t()`), inverse (`solve()`), and matrix multiplication (`%*%`).
- **Functions**: Use built-in functions like `rowSums()`, `colSums()`, and `apply()` to operate on matrices efficiently.
- **Conversions**: Use `as.matrix()` to convert data frames or other types to matrices.
- **Special Matrices**: Use `diag()` to create identity or diagonal matrices.

Matrices are integral for performing calculations across rows and columns, simplifying mathematical operations, and handling structured numeric data in R. They’re especially valuable in fields like statistics, data analysis, and machine learning, where matrix operations are frequently required.

In [108]:

# Create a 3x3 numeric matrix
matrix_1 <- matrix(1:9, nrow = 3, ncol = 3)

# Create a matrix by combining vectors by row
matrix_2 <- rbind(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9))

# Create a matrix by combining vectors by column
matrix_3 <- cbind(c(1, 4, 7), c(2, 5, 8), c(3, 6, 9))


In [109]:

mat1 <- matrix(1:4, nrow = 2)
mat2 <- matrix(5:8, nrow = 2)

result_add <- mat1 + mat2  # Element-wise addition
result_mult <- mat1 %*% mat2  # Matrix multiplication


### **4. Data Frames**

Data frames in R are versatile data structures that are essential for data analysis and manipulation. They are two-dimensional, tabular structures where each column represents a variable, and each row represents an observation. Data frames allow different data types for each column, making them suitable for representing real-world data.



#### **1. What is a Data Frame?**

- A **data frame** is a collection of variables (columns) that can hold different types of data, such as numeric, character, factor, or logical.
- Each column in a data frame represents a variable, while each row represents an observation or a record.
- Data frames are similar to tables in a database or Excel spreadsheets, which makes them especially useful for data analysis.



#### **2. Creating Data Frames**

Data frames can be created using the `data.frame()` function, by importing data from external files, or by converting other data structures like lists or matrices.

**Using `data.frame()` Function**

The `data.frame()` function is the simplest way to create a data frame in R.

In [168]:

# Create a data frame manually
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 35),
  Gender = c("F", "M", "M"),
  Salary = c(50000, 55000, 60000)
)

df

Name,Age,Gender,Salary
<chr>,<dbl>,<chr>,<dbl>
Alice,25,F,50000
Bob,30,M,55000
Charlie,35,M,60000


**Creating Data Frames from Vectors**

You can combine individual vectors to create a data frame, ensuring they have the same length.

In [177]:

# Vectors
names <- c("Alice", "Bob", "Charlie")
ages <- c(25, 30, 35)
salaries <- c(50000, 55000, 60000)

# Create data frame from vectors
df <- data.frame(Name = names, Age = ages, Salary = salaries)
print(df)


     Name Age Salary
1   Alice  25  50000
2     Bob  30  55000
3 Charlie  35  60000


**3. Importing Data into Data Frames**

Data frames are often created by importing data from external sources like CSV files, Excel sheets, or databases.

In [170]:

# Importing a CSV file
df <- read.csv("data.csv")

# Importing from Excel (requires the readxl package)
library(readxl)
df <- read_excel("data.xlsx")


"cannot open file 'data.csv': No such file or directory"


ERROR: Error in file(file, "rt"): cannot open the connection


**4. Exploring Data Frames**

Once a data frame is created, it’s important to explore its structure, dimensions, and contents.

**Checking the Structure**

Use the `str()` function to get a quick overview of the data frame, including the types and number of elements in each column.

In [185]:
# Vectors
names <- c("Alice", "Bob", "Charlie")
ages <- c(25, 30, 35)
salaries <- c(50000, 55000, 60000)


# Create data frame from vectors
df <- data.frame(Name = names, Age = ages, Salary = salaries)
print(df)

     Name Age Salary
1   Alice  25  50000
2     Bob  30  55000
3 Charlie  35  60000


In [179]:

str(df)


'data.frame':	3 obs. of  3 variables:
 $ Name  : chr  "Alice" "Bob" "Charlie"
 $ Age   : num  25 30 35
 $ Salary: num  50000 55000 60000


**Viewing Data**

- `head(df)`: Shows the first 6 rows of the data frame.
- `tail(df)`: Shows the last 6 rows of the data frame.

In [180]:

head(df)  # View the top 6 rows
tail(df)  # View the bottom 6 rows


Unnamed: 0_level_0,Name,Age,Salary
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
1,Alice,25,50000
2,Bob,30,55000
3,Charlie,35,60000


Unnamed: 0_level_0,Name,Age,Salary
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
1,Alice,25,50000
2,Bob,30,55000
3,Charlie,35,60000


**Dimensions of the Data Frame**

- `dim(df)`: Returns a vector with the number of rows and columns.
- `nrow(df)`: Returns the number of rows.
- `ncol(df)`: Returns the number of columns.

In [181]:

dim(df)
nrow(df)
ncol(df)


**Column Names**

Retrieve or set the column names with `names()`.

In [182]:

names(df)            # Get column names
names(df) <- c("A", "B", "C")  # Rename columns
names(df)

**5. Accessing Data Frame Elements**

Data frames can be accessed in various ways, including indexing by row and column or using column names.

**Using `$` to Access Columns**

The `$` operator is commonly used to access columns by name.

In [191]:
# Vectors
names <- c("Alice", "Bob", "Charlie")
ages <- c(25, 30, 35)
salaries <- c(50000, 55000, 60000)


# Create data frame from vectors
df <- data.frame(Name = names, Age = ages, Salary = salaries)
print(df)

     Name Age Salary
1   Alice  25  50000
2     Bob  30  55000
3 Charlie  35  60000


In [192]:

df$Name  # Access the "Name" column
df$Age   # Access the "Age" column


**Using `[]` for Subsetting**

- `df[row, column]` where `row` is the row index and `column` is the column index or name.
- Leaving the row or column index empty selects all rows or columns.

In [193]:

df[1, 2]        # Access element in the 1st row, 2nd column
df[ , "Salary"] # Access the "Salary" column
df[1:2, ]       # Access the first two rows
df[ , c("Name", "Salary")] # Access "Name" and "Salary" columns


Unnamed: 0_level_0,Name,Age,Salary
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
1,Alice,25,50000
2,Bob,30,55000


Name,Salary
<chr>,<dbl>
Alice,50000
Bob,55000
Charlie,60000


**Using `subset()` Function**

The `subset()` function allows for subsetting based on conditions.

In [194]:

subset(df, Age > 25)             # Filter rows where Age is greater than 25
subset(df, select = c(Name, Age)) # Select specific columns


Unnamed: 0_level_0,Name,Age,Salary
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
2,Bob,30,55000
3,Charlie,35,60000


Unnamed: 0_level_0,Name,Age
Unnamed: 0_level_1,<chr>,<dbl>
1,Alice,25
2,Bob,30
3,Charlie,35


**6. Modifying Data Frames**

**Adding Columns**

You can add a new column by assigning a vector to a new column name.

In [195]:

df$Department <- c("HR", "IT", "Finance")
df

Name,Age,Salary,Department
<chr>,<dbl>,<dbl>,<chr>
Alice,25,50000,HR
Bob,30,55000,IT
Charlie,35,60000,Finance


**Adding Rows**

Use `rbind()` to add rows to a data frame.

In [197]:
# Vectors
names <- c("Alice", "Bob", "Charlie")

ages <- c(25, 30, 35)
Gender<- c("M","M","F")
salaries <- c(50000, 55000, 60000)


# Create data frame from vectors
df <- data.frame(Name = names, Age = ages, Gender = Gender, Salary = salaries)
print(df)

     Name Age Gender Salary
1   Alice  25      M  50000
2     Bob  30      M  55000
3 Charlie  35      F  60000


In [198]:

new_row <- data.frame(Name = "David", Age = 28, Gender = "M", Salary = 62000)
df <- rbind(df, new_row)
df

Name,Age,Gender,Salary
<chr>,<dbl>,<chr>,<dbl>
Alice,25,M,50000
Bob,30,M,55000
Charlie,35,F,60000
David,28,M,62000


**Renaming Columns**

The `names()` function or `dplyr` package (if installed) can be used to rename columns.

In [199]:

names(df)[2] <- "Years"  # Rename the second column to "Years"


In [200]:
df

Name,Years,Gender,Salary
<chr>,<dbl>,<chr>,<dbl>
Alice,25,M,50000
Bob,30,M,55000
Charlie,35,F,60000
David,28,M,62000


**Removing Columns or Rows**

- Remove a column by setting it to `NULL`.
- Use `subset()` or `[-row, ]` notation to remove rows.

In [201]:
df
df$Gender <- NULL  # Remove "Gender" column
df <- df[-1, ]         # Remove the first row
df

Name,Years,Gender,Salary
<chr>,<dbl>,<chr>,<dbl>
Alice,25,M,50000
Bob,30,M,55000
Charlie,35,F,60000
David,28,M,62000


Unnamed: 0_level_0,Name,Years,Salary
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
2,Bob,30,55000
3,Charlie,35,60000
4,David,28,62000


**7. Data Frame Operations**

Data frames support various operations such as sorting, filtering, and applying functions.

**Sorting**

Use `order()` to sort data frames by a column.

In [202]:
df

Unnamed: 0_level_0,Name,Years,Salary
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
2,Bob,30,55000
3,Charlie,35,60000
4,David,28,62000


In [203]:
df_sorted <- df[order(df$Salary), ]  # Sort by Salary in ascending order
df_sorted

Unnamed: 0_level_0,Name,Years,Salary
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
2,Bob,30,55000
3,Charlie,35,60000
4,David,28,62000


In [204]:
df_sorted <- df[order(-df$Salary), ] # Sort by Salary in descending order
df_sorted

Unnamed: 0_level_0,Name,Years,Salary
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
4,David,28,62000
3,Charlie,35,60000
2,Bob,30,55000


**Filtering**

Use logical conditions within brackets to filter rows.

In [205]:

df_high_salary <- df[df$Salary > 55000, ]  # Filter rows where Salary > 55000


In [206]:
df_high_salary

Unnamed: 0_level_0,Name,Years,Salary
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
3,Charlie,35,60000
4,David,28,62000


**Applying Functions with `apply()`**

Use `apply()` to apply functions to rows or columns of a data frame.

In [207]:

apply(df[, 2:3], 2, mean)  # Calculate mean of Age and Salary columns


**Using `lapply()` and `sapply()`**

`lapply()` and `sapply()` apply functions to each column in a data frame.

In [208]:

lapply(df, class)     # Get class of each column
sapply(df, mean, na.rm = TRUE) # Calculate mean for each numeric column


"argument is not numeric or logical: returning NA"


**8. Handling Missing Data**

Use functions like `is.na()`, `na.omit()`, and `na.fill()` to handle missing values.

In [209]:

# Identify missing values
is.na(df)

# Remove rows with missing values
df_clean <- na.omit(df)

# Replace missing values (requires dplyr or tidyr package)
df[is.na(df)] <- 0


Unnamed: 0,Name,Years,Salary
2,False,False,False
3,False,False,False
4,False,False,False


**9. Data Frame Functions**

| Function       | Description                                               | Example                        |
|----------------|-----------------------------------------------------------|--------------------------------|
| `str()`        | Displays structure of the data frame                      | `str(df)`                      |
| `summary()`    | Provides summary statistics for each column               | `summary(df)`                  |
| `nrow()`       | Returns the number of rows                                | `nrow(df)`                     |
| `ncol()`       | Returns the number of columns                             | `ncol(df)`                     |
| `dim()`        | Returns the dimensions of the data frame                  | `dim(df)`                      |
| `names()`      | Gets or sets column names                                 | `names(df)`                    |
| `head()`       | Displays the first few rows                               | `head(df)`                     |
| `tail()`       | Displays the last few rows                                | `tail(df)`                     |
| `rbind()`      | Adds rows to the data frame                               | `rbind(df, new_row)`           |
| `cbind()`      | Adds columns to the data frame                            | `cbind(df, new_col)`           |

**10. Converting Other Data Structures to Data Frames**

- **From Matrix**: Convert a matrix to a data frame using `as.data.frame()`.

In [211]:

mat <- matrix(1:9, nrow = 3)
df_from_mat <- as.data.frame(mat)
df_from_mat

V1,V2,V3
<int>,<int>,<int>
1,4,7
2,5,8
3,6,9


- **From Lists**: Lists can also be converted to data frames if each element has the same length.

In [212]:

lst <- list(Name = c("Alice", "Bob"), Age = c(25, 30))
df_from_list <- as.data.frame(lst)
df_from_list

Name,Age
<chr>,<dbl>
Alice,25
Bob,30


 **11. Summary of Data Frames in R**

- **Data Structure**: Data frames are two-dimensional, allowing for different data types in each column.
- **Creation**: Use `data.frame()` function, `read.csv()`, or convert other structures.
- **Exploration**: Functions like `str()`, `head()`, `summary()`, `nrow()`, and `ncol()` help in understanding data frames.
- **Manipulation**: You can add, remove, rename, and sort columns or rows.
- **Operations**: Useful functions like

In [110]:

# Create a data frame with different types of columns
df <- data.frame(
  Name = c("Alice", "Bob", "Carol"),
  Age = c(25, 30, 35),
  Score = c(90, 85, 88)
)
print(df)


   Name Age Score
1 Alice  25    90
2   Bob  30    85
3 Carol  35    88


### **5. Factors**

A **factor** is used to represent categorical data. Factors are important in statistical modeling as they define the levels of categorical variables (e.g., "Male", "Female" for gender). Factors are stored as integers with labels.

#### **Creating Factors:**

In [115]:

# Create a factor with levels
gender <- factor(c("Male", "Female", "Female", "Male"))
print(gender)

# Check the levels
levels(gender)  # Output: "Female", "Male"


[1] Male   Female Female Male  
Levels: Female Male


#### **Ordered Factors:**
You can create ordered factors where levels have a specific order (e.g., "Low", "Medium", "High").

In [116]:

# Create an ordered factor
education <- factor(c("High School", "College", "Graduate", "College"),
                    levels = c("High School", "College", "Graduate"),
                    ordered = TRUE)
print(education)


[1] High School College     Graduate    College    
Levels: High School < College < Graduate


#### **Factors vs. Character Vectors:**
- Factors are stored as integers internally with corresponding levels, while character vectors store text directly.
- Factors are more efficient for storing categorical data and are used in statistical modeling.

### **6. Arrays**

An **array** is similar to a matrix but can have more than two dimensions (i.e., higher-dimensional data). Each element in an array must be of the same type.

#### **Creating an Array:**

In [117]:

# Create a 3-dimensional array
arr <- array(1:24, dim = c(4, 3, 2))  # 4 rows, 3 columns, 2 matrices
print(arr)


, , 1

     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

, , 2

     [,1] [,2] [,3]
[1,]   13   17   21
[2,]   14   18   22
[3,]   15   19   23
[4,]   16   20   24



#### **Accessing Elements in an Array:**

In [119]:

arr[2, 3, 1]  # Access element at [row=2, column=3, matrix=1]


### **7. Important Operations on Data Structures**

- **Length**: `length()` returns the number of elements in a vector or list.

In [121]:
length(c(1, 2, 3, 4))  # Output: 4

- **Dimensions**: `dim()` returns the dimensions of matrices and data frames.

In [123]:
dim(matrix(1:9, nrow=3))  # Output: 3 3 (rows and columns)

- **Structure**: `str()` gives a summary of the structure of any R object.

In [124]:
str(df)  # Displays the structure of a data frame

'data.frame':	3 obs. of  3 variables:
 $ Name: chr  "Alice" "Bob" "Carol"
 $ Age : num  25 30 35
 $ City: chr  "New York" "London" "Paris"


- **Summary**: `summary()` provides a statistical summary of a data structure.

In [125]:
summary(df)  # Displays summary statistics of each column in the data frame

     Name                Age           City          
 Length:3           Min.   :25.0   Length:3          
 Class :character   1st Qu.:27.5   Class :character  
 Mode  :character   Median :30.0   Mode  :character  
                    Mean   :30.0                     
                    3rd Qu.:32.5                     
                    Max.   :35.0                     

- **Combine Vectors**: `cbind()` and `rbind()` are used to combine vectors/matrices by columns and rows, respectively.

In [127]:
cbind(vec1, vec2)  # Combine vectors column-wise
rbind(vec1, vec2)  # Combine vectors row-wise

vec1,vec2
1,4
2,5
3,6


0,1,2,3
vec1,1,2,3
vec2,4,5,6


### **Conclusion:**

- **Vectors** are one-dimensional arrays holding elements of the same type.
- **Lists** can hold different types of objects, including vectors, matrices, and other lists.
- **Matrices** are two-dimensional arrays that contain elements of the same type.
- **Data Frames** are two-dimensional tables where each column can hold different data types (most common structure for datasets).
-

## Appendix: Loops and data structure

#### **Looping Through a Vector**

In [226]:

# Numeric vector
numeric_vector <- c(5, 7, 9)

# Character vector
char_vector <- c("a", "b", "c")


In [227]:

for (element in numeric_vector) {
  print(element)
}


[1] 5
[1] 7
[1] 9


In [229]:
for (i in seq_along(numeric_vector)) {
    print(i)
    cat("Element at position", i, "is", numeric_vector[i], "\n")
}


[1] 1
Element at position 1 is 5 
[1] 2
Element at position 2 is 7 
[1] 3
Element at position 3 is 9 


#### **Looping Through a List**

In [216]:
my_list <- list(name = "Alice", age = 25, scores = c(88, 92, 95))

for (item in my_list) {
  print(item)
}


[1] "Alice"
[1] 25
[1] 88 92 95


In [230]:
# Loop through the list by index
for (i in seq_along(my_list)) {
  cat("Element at position", i, "is:\n")
  print(my_list[[i]])
}


Element at position 1 is:
[1] "Alice"
Element at position 2 is:
[1] 25
Element at position 3 is:
[1] 88 92 95


#### **Looping Through a Matrix**

In [217]:

# Creating a 2x3 matrix
matrix_1 <- matrix(1:6, nrow = 2, ncol = 3)


You can loop over rows, columns, or individual elements in a matrix.

**Example**:

In [218]:

# Loop through each element
for (i in 1:nrow(matrix_1)) {
  for (j in 1:ncol(matrix_1)) {
    print(matrix_1[i, j])
  }
}


[1] 1
[1] 3
[1] 5
[1] 2
[1] 4
[1] 6


#### **Looping Through Data Frames**

In [219]:

# Create a data frame
df <- data.frame(
  Name = c("Alice", "Bob"),
  Age = c(25, 30),
  Gender = c("F", "M")
)


In [221]:
# Loop through each row
for (i in 1:nrow(df)) {
  print(df[i, ])
}

   Name Age Gender
1 Alice  25      F
  Name Age Gender
2  Bob  30      M


In [222]:
# Loop through each column
for (col in names(df)) {
  print(df[[col]])
}

[1] "Alice" "Bob"  
[1] 25 30
[1] "F" "M"


#### **Summary of Looping and Data Structures in R**

| Data Structure | Description                                                                                      | Common Looping Technique           |
|----------------|--------------------------------------------------------------------------------------------------|------------------------------------|
| **Vector**     | One-dimensional array with elements of the same type.                                           | `for` loop                         |
| **List**       | Collection of elements, each possibly of different types.                                        | `for` loop, `lapply()`, `sapply()` |
| **Matrix**     | Two-dimensional array with elements of the same type.                                            | Nested `for` loops, `apply()`      |
| **Data Frame** | Table-like structure with columns of different types.                                            | `for` loop, `apply()`, `lapply()`  |