# R

# General terminology

### 1. Programming Language
A **programming language** is a formal language used to write instructions that a computer can execute. These instructions perform specific tasks or solve problems. Examples of programming languages include Python, R, MATLAB, Java, C++, etc.

---

### 2. Interpreter
An **interpreter** is a program that executes code line by line. It reads the source code, converts it into machine-readable instructions, and executes it on the spot. This allows you to run and test code quickly, but execution can be slower compared to compiled code.

- **Example:** Python, R, and MATLAB use interpreters.
- **Advantage:** Immediate execution and testing.
- **Disadvantage:** Slower execution for large programs compared to compiled languages.

---

### 3. Compiler
A **compiler** translates the entire source code of a program into machine code (binary code) in one go before execution. The machine code is saved as an executable file that can run directly on a computer.

- **Example:** C, C++, and Java use compilers.
- **Advantage:** Faster execution after compilation.
- **Disadvantage:** Errors are found only after the entire code is compiled, and debugging can be harder.

---

### 4. High-Level Programming Language
A **high-level language** is a programming language that is easy for humans to read and write. It uses natural language elements, which makes it more abstract from machine-level details.

- **Examples:** Python, R, MATLAB, Java.
- **Features:** Easier to use, focuses on problem-solving rather than hardware specifics.
- **Advantage:** Higher productivity and easier to debug and maintain.
- **Disadvantage:** Slower than low-level languages since code needs to be translated into machine code.

---

### 5. Low-Level Programming Language
A **low-level language** is closer to machine code and deals with hardware-specific operations. It provides less abstraction and is more difficult to write and maintain but offers more control over the hardware.

- **Examples:** Assembly language, machine code.
- **Features:** More efficient and faster as it directly interacts with hardware.
- **Advantage:** Speed and control over the system.
- **Disadvantage:** Hard to write and debug.

---

### 6. Syntax
**Syntax** refers to the rules that define the correct structure of commands in a programming language. Each programming language has its own syntax, which you need to follow when writing code.

- **Example:** In Python, print statements are written as `print("Hello World")`, whereas in C, it would be `printf("Hello World");`.
- **Importance:** Correct syntax is crucial to ensure the code runs without errors.

---

### 7. Variable
A **variable** is a storage location in memory with a name and a value. It is used to store data that can be manipulated and retrieved during program execution.

- **Example:** In Python, `x = 10` creates a variable `x` that stores the value `10`.
- **Types:** Variables can hold different types of data such as numbers (integers, floats), strings, or lists.

---

### 8. Data Types
A **data type** defines the type of data that a variable can store. Common data types include:

- **Integer:** Whole numbers (e.g., 5, -10).
- **Float:** Numbers with decimals (e.g., 3.14, -2.7).
- **String:** Text data (e.g., "Hello World").
- **Boolean:** True or False values.
  
Knowing the data type helps in choosing the right operations to apply to a variable.

---

### 9. Function
A **function** is a block of code that performs a specific task. Functions take inputs (called arguments), process them, and return an output.

- **Example:** In R, `mean()` is a function that calculates the average of numbers.
- **Purpose:** It helps in organizing code, reusing it, and improving readability.

---

### 10. Control Structures
**Control structures** determine the flow of execution in a program based on certain conditions or loops. Common control structures are:

- **If-Else Statements:** Execute code based on conditions.
  - Example: `if (x > 0) { print("Positive") } else { print("Negative") }`
  
- **Loops:** Execute a block of code repeatedly.
  - **For Loop:** Repeats code for a set number of times.
  - **While Loop:** Repeats code while a condition is true.

---

### 11. Algorithm
An **algorithm** is a step-by-step procedure for solving a problem or performing a task. It defines the logical sequence of actions that must be followed to achieve a specific goal.

- **Example:** A sorting algorithm arranges data in a particular order (like ascending or descending).

---

### 12. Debugging
**Debugging** is the process of finding and fixing errors (bugs) in the program. It involves running the program, identifying where it behaves unexpectedly, and correcting the underlying issues.

- **Tools:** Many development environments provide debugging tools to help you step through code, set breakpoints, and inspect variable values.

---

### 13. IDE (Integrated Development Environment)
An **IDE** is software that provides tools to write, test, and debug code efficiently. It often includes a code editor, debugger, and other features that make coding easier.

- **Examples:** RStudio (for R), MATLAB IDE, PyCharm (for Python).

---

### 14. Library/Package
A **library** or **package** is a collection of pre-written code that provides specific functionality, such as mathematical operations, data visualization, or data manipulation. Using libraries helps save time and effort by reusing existing solutions.

- **Example:** In R, `ggplot2` is a package for data visualization, and in Python, `numpy` is used for numerical computations.

---

### 15. Object-Oriented Programming (OOP)
**Object-Oriented Programming** is a programming paradigm where code is organized into objects that represent real-world entities. These objects can contain both data (attributes) and functions (methods) that act on the data.

- **Examples:** Classes and objects are key concepts in OOP.
- **Advantages:** OOP makes code more modular, reusable, and easier to manage in large projects.

---

### 16. API (Application Programming Interface)
An **API** is a set of functions and protocols that allow different software applications to communicate with each other. APIs define how requests and responses should be structured.

- **Example:** A weather API might allow you to retrieve the current temperature for a location by making an HTTP request.

---


### 17. Console
The **console** (or command line interface) is a tool for interacting with the computer by typing commands. In programming, it is often used to run scripts, display outputs, and manage files.

- **Example:** The Python shell (when you run Python in terminal) or R console in RStudio.

---

### 18. Terminal
The **terminal** is a command-line interface that allows you to execute commands directly on your operating system. It provides access to the underlying file system and other utilities via text-based commands.

- **Example:** Bash terminal on Linux/Mac or Command Prompt on Windows.

---

### 19. REPL (Read-Eval-Print Loop)
A **REPL** is an interactive environment used to execute code line by line and immediately see the results. Many interpreted languages (like Python, R, and JavaScript) use REPL.

- **Example:** Python's interactive shell (`>>>`), where you can type commands and see results instantly.

---

### 20. Shell
A **shell** is a user interface that allows access to the operating system’s services. It can be a command-line shell (like Bash or Command Prompt) or a graphical shell (like Windows Explorer).

- **Example:** Bash shell in Linux, or PowerShell in Windows.

---

### 21. Script
A **script** is a set of instructions written in a programming language, often interpreted, that performs a specific task. Scripts are commonly used for automation and small tasks.

- **Example:** Python scripts (`.py` files) or shell scripts (`.sh` files).

---

### 22. Class
A **class** is a blueprint for creating objects in object-oriented programming. It defines the properties (attributes) and behaviors (methods) that the objects created from the class will have.

- **Example:** A `Car` class might have attributes like `color` and `model`, and methods like `drive()` or `stop()`.

---

### 23. Object
An **object** is an instance of a class. It represents a real-world entity with attributes and methods.

- **Example:** If `Car` is a class, then `myCar = Car()` creates an object `myCar` that is an instance of the `Car` class.

---

### 24. Module
A **module** is a file containing a set of functions, classes, or variables that you can import into other programs or scripts.

- **Example:** In Python, you can create a module by saving a `.py` file and importing it into other scripts using `import module_name`.

---

### 25. Framework
A **framework** is a collection of libraries, tools, and best practices designed to simplify software development. It provides a predefined structure for developers to build applications faster.

- **Examples:** Django (for Python web development), React (for JavaScript).

---

### 26. Version Control
**Version control** is a system that tracks changes to your code, allowing you to revert to previous versions and collaborate with others. Git is the most popular version control system.

- **Example:** GitHub is a platform for hosting Git repositories and collaborating on projects.

---

### 27. Repository
A **repository** is a storage location for your project files, along with their version history, typically managed by a version control system like Git.

- **Example:** A GitHub repository stores code, documentation, and the version history of a project.

---


### 28. Command-Line Argument
A **command-line argument** is an input passed to a script or program via the command line when it is executed.

- **Example:** In Python, `python script.py arg1 arg2` passes `arg1` and `arg2` as arguments to `script.py`.

---

**Conclusion:**
Understanding these fundamental programming terms will give you a strong foundation as you learn to code. Each term represents a building block that you'll frequently encounter when writing and executing programs in any language.

## Setting Up the R Environment

[swirl](https://swirlstats.com/students.html)

- **Installation:** Install R from CRAN and RStudio IDE from RStudio.
- **RStudio Basics:** Overview of the RStudio interface (Console, Script Editor, Environment, and Plots pane).
- **Package Management:** How to install, load, and update packages using `install.packages()` and `library()` functions.

# Introduction to Programming Concepts

- **Variables and Data Types:** Explain what variables are and the different data types (numeric, integer, character, logical, etc.).

In [128]:
"hello world"

In [129]:
print("hello world!")

[1] "hello world!"


## Variables and Data Types in R

When learning R programming, two key concepts are **variables** and **data types**. Here’s a breakdown of both:

---

### 1. Variables in R

A **variable** is a storage location in memory, identified by a name, that holds a value. In R, variables are used to store data that can be used and manipulated later in the program.

#### Creating Variables
In R, you can assign a value to a variable using the assignment operator `<-` (or sometimes `=`). Here’s an example:

#### Assigning values to variables

In [130]:
x <- 10       # Numeric variable

In [131]:
y <- "Hello"  # Character (string) variable

In [132]:
z <- TRUE     # Logical (boolean) variable

- **x** stores a numeric value `10`.
- **y** stores a string `"Hello"`.
- **z** stores a logical value `TRUE`.

#### Variable Naming Rules
- A variable name can contain letters, numbers, underscores (`_`), and periods (`.`).
- Variable names **cannot start with a number**.
- R is **case-sensitive**, meaning `Var1` and `var1` are two different variables.

#### Example of Valid and Invalid Variable Names:

In [133]:
valid_name <- 100

In [134]:
invalid-name <- 200   # Error: Hyphens are not allowed in variable names

ERROR: Error in invalid - name <- 200: object 'invalid' not found


In [135]:
1variable <- 300      # Error: Cannot start a variable name with a number

ERROR: Error in parse(text = input): <text>:1:2: unexpected symbol
1: 1variable
     ^


### 2. Data Types in R

A **data type** refers to the type of value that a variable can store. R supports various data types, and understanding them is essential to writing efficient R programs.

#### Primary Data Types in R:

#### 1. **Integer**:
   - Represents whole numbers. Use the `L` suffix to define integers explicitly.
   - Example:

In [192]:
int <- 42L  # Integer
int
typeof(int)
class(int)

In [163]:
int <- -2L  # Integer
int
typeof(int)

#### 2. **Double**: (or Numeric)
   - Represents decimal numbers.
   - Example:

In [193]:
decimal <- 3.14  # Numeric (floating-point)
typeof(decimal)
class(decimal)

In [194]:
num <- 42        # Numeric
typeof(num)
class(num)

#### 3. **Character (String)**:
   - Represents text or a sequence of characters.
   - Example:

In [166]:
char <- "Hello, R!"  # Character (string)
char

In [167]:
typeof(char)

#### 4. **Logical (Boolean)**:
   - Represents `TRUE` or `FALSE` values. (T/F)
   - Example:

In [176]:
is_sunny <- TRUE     # Logical
is_sunny
typeof(is_sunny)

In [177]:
is_raining <- FALSE  # Logical
typeof(is_raining)

In [178]:
is_sunny <- T     # Logical
is_sunny
typeof(is_sunny)

In [179]:
is_raining <- F  # Logical
is_raining
typeof(is_raining)

5. **Complex**:
   - Represents complex numbers with real and imaginary parts.
   - Example:

In [180]:
complex_num <- 4 + 2i  # Complex number (4 is real part, 2i is imaginary part)

In [181]:
complex_num

In [182]:
typeof(complex_num)

#### 6. **Raw**

A `raw` data type specifies values as raw bytes. You can use the following methods to convert character data types to a raw data type and vice-versa:

- `charToRaw()` - converts character data to raw data
- `rawToChar()` - converts raw data to character data

In [212]:
# convert character to raw
raw_variable <- charToRaw("Welcome to Programiz")

print(raw_variable)
print(class(raw_variable))

# convert raw to character
char_variable <- rawToChar(raw_variable)

print(char_variable)
print(class(char_variable))

 [1] 57 65 6c 63 6f 6d 65 20 74 6f 20 50 72 6f 67 72 61 6d 69 7a
[1] "raw"
[1] "Welcome to Programiz"
[1] "character"


In [207]:
single_raw <- as.raw(255)
single_raw
typeof(single_raw)
class(single_raw)

[1] ff

### 3. Checking Data Types

- `class()` - what kind of object is it (high-level)?
- `typeof()` - what is the object’s data type (low-level)?
- `length()` - how long is it? What about two dimensional objects?
- `attributes()` - does it have any metadata?

You can check the type of a variable using the `class()` or `typeof()` functions in R.

- **Example**:

In [210]:
x <- 10
class(x)    # Returns "numeric"
typeof(x)   # Returns "double"

is.integer(x)
is.double(x)

In [211]:
x <- 10L
class(x)    # Returns "integer"
typeof(x)   # Returns "integer"

is.integer(x)
is.double(x)

### 4. Type Conversion

You can convert between data types using functions like `as.numeric()`, `as.character()`, `as.logical()`, and so on.

#### Example of Type Conversion:

[Read: Conversion Functions in R](https://www.scaler.com/topics/conversion-functions-in-r/)

[Source: Convert ](https://cran.r-project.org/web/packages/hablar/vignettes/convert.html)

In [201]:
num <- "100"         # Character variable (string)
converted_num <- as.numeric(num)  # Convert to numeric
class(converted_num)  # Returns "numeric"

In [202]:
typeof(converted_num)

In [203]:
num <- "100"         # Character variable (string)
converted_num <- as.numeric(num)  # Convert to numeric
class(converted_num)  # Returns "numeric"

### 5. Special Data Types: NULL, NA, NaN, and Inf

- **NULL**: Represents the absence of a value or an empty object.

In [221]:
x <- NULL
x
#class(x)
#typeof(x)

NULL

- **NA**: Represents a missing or undefined value (Not Available).

In [220]:
x <- NA
x
#class(x)
#typeof(x)

- **NaN**: Stands for "Not a Number" and occurs in undefined mathematical operations like `0/0`.

In [219]:
x <- 0/0
x
#class(x)
#typeof(x)

- **Inf**: Represents infinity. For example, dividing a number by zero results in `Inf`.

In [223]:
x <- 10 / 0  # Returns Inf
x
#class(x)
#typeof(x)

### 6. Data Structures (Related to Variables and Data Types)

In R, data can also be organized into various structures. These are collections of variables and their values.

1. **Vector**: A sequence of data elements of the same type.

In [55]:
vec <- c(1, 2, 3, 4)  # Numeric vector
vec

2. **List**: A collection of elements of different types.
   ```R

In [54]:
lst <- list(1, "Hello", TRUE)  # List with different data types
lst

3. **Matrix**: A two-dimensional array of the same type.

In [53]:
mat <- matrix(1:6, nrow = 2)  # 2x3 numeric matrix
mat

0,1,2
1,3,5
2,4,6


4. **Data Frame**: A table with columns that can store different data types.

In [51]:
df <- data.frame(Name = c("John", "Doe"), Age = c(25, 30))

In [52]:
df

Name,Age
<chr>,<dbl>
John,25
Doe,30


5. **Data Frame**: Used to represent categorical data with a fixed set of values (called levels).


In [227]:
factor_data <- factor(c("low", "medium", "high"))
factor_data

### **Conclusion:**

- **Variables** are containers that store data, and in R, you assign values to variables using `<-`.
- **Data types** define the kind of values a variable can store, such as numeric, character, or logical.
- Understanding data types helps you manage and manipulate data correctly in your R programs. You can also check and convert between data types as needed.

Understanding these concepts is crucial for writing efficient and error-free R programs.

## Operators:

Arithmetic, relational, and logical operators.

### Operators in R

Operators in R are symbols or combinations of symbols that perform operations on variables and values. R provides a variety of operators to carry out arithmetic, relational, logical, assignment, and other types of operations. Here’s an overview of the main types of operators in R:

### **1. Arithmetic Operators**

Arithmetic operators perform mathematical calculations.

| Operator | Description           | Example         |
|----------|-----------------------|-----------------|
| `+`      | Addition              | `5 + 2` = 7     |
| `-`      | Subtraction           | `5 - 2` = 3     |
| `*`      | Multiplication        | `5 * 2` = 10    |
| `/`      | Division              | `5 / 2` = 2.5   |
| `^` or `**` | Exponentiation    | `5 ^ 2` = 25    |
| `%%`     | Modulus (remainder)   | `5 %% 2` = 1    |
| `%/%`    | Integer Division      | `5 %/% 2` = 2   |

#### **Example:**

In [229]:

x <- 10
y <- 3

sum <- x + y      # Addition
diff <- x - y     # Subtraction
prod <- x * y     # Multiplication
quotient <- x / y # Division
power <- x ^ y    # Exponentiation
remainder <- x %% y # Modulus
int_div <- x %/% y # Integer Division

print(c(sum, diff, prod, quotient, power, remainder, int_div))


[1]   13.000000    7.000000   30.000000    3.333333 1000.000000    1.000000
[7]    3.000000


### **2. Relational (Comparison) Operators**

Relational operators compare two values and return a logical value (`TRUE` or `FALSE`).

| Operator | Description             | Example         |
|----------|-------------------------|-----------------|
| `==`     | Equal to                | `5 == 2` = FALSE|
| `!=`     | Not equal to            | `5 != 2` = TRUE |
| `>`      | Greater than            | `5 > 2` = TRUE  |
| `<`      | Less than               | `5 < 2` = FALSE |
| `>=`     | Greater than or equal to| `5 >= 2` = TRUE |
| `<=`     | Less than or equal to   | `5 <= 2` = FALSE|

#### **Example:**

In [75]:

x <- 5
y <- 2

print(x == y)   # Returns FALSE
print(x != y)   # Returns TRUE
print(x > y)    # Returns TRUE
print(x < y)    # Returns FALSE
print(x >= y)   # Returns TRUE
print(x <= y)   # Returns FALSE


[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE
[1] TRUE
[1] FALSE


### **3. Logical Operators**

Logical operators are used to combine multiple conditions and return a logical result (`TRUE` or `FALSE`).

| Operator | Description        | Example         |
|----------|--------------------|-----------------|
| `&`      | Logical AND (element-wise) | `TRUE & FALSE` = FALSE |
| `&&`     | Logical AND (first element) | `TRUE && FALSE` = FALSE|
| `|`      | Logical OR (element-wise)  | `TRUE | FALSE` = TRUE  |
| `||`     | Logical OR (first element) | `TRUE || FALSE` = TRUE |
| `!`      | Logical NOT         | `!TRUE` = FALSE |

- `&` and `|` evaluate the condition for **all elements**.
- `&&` and `||` evaluate the condition for the **first element only**.

#### **Example:**

In [76]:

x <- c(TRUE, FALSE, TRUE)
y <- c(FALSE, FALSE, TRUE)

print(x & y)    # Element-wise AND: Returns c(FALSE, FALSE, TRUE)
print(x && y)   # First element AND: Returns FALSE
print(x | y)    # Element-wise OR: Returns c(TRUE, FALSE, TRUE)
print(x || y)   # First element OR: Returns TRUE
print(!x)       # NOT operator: Returns c(FALSE, TRUE, FALSE)


[1] FALSE FALSE  TRUE


"'length(x) = 3 > 1' in coercion to 'logical(1)'"
"'length(x) = 3 > 1' in coercion to 'logical(1)'"


[1] FALSE
[1]  TRUE FALSE  TRUE


"'length(x) = 3 > 1' in coercion to 'logical(1)'"


[1] TRUE
[1] FALSE  TRUE FALSE


### **4. Assignment Operators**

Assignment operators are used to assign values to variables.

| Operator | Description            | Example           |
|----------|------------------------|-------------------|
| `<-`     | Leftward assignment    | `x <- 5`          |
| `->`     | Rightward assignment   | `5 -> x`          |
| `<<-`    | Global leftward assignment | `x <<- 5`      |
| `->>`    | Global rightward assignment| `5 ->> x`      |
| `=`      | Assignment (used in functions)| `x = 5`    |

- **`<-`** and **`=`** are the most common assignment operators in R.
- **`<<-`** assigns a value to a global variable from within a function.

#### **Example:**

In [80]:

x <- 10   # Leftward assignment
20 -> y   # Rightward assignment (equivalent to y <-20)

print(x)
print(y)


[1] 10
[1] 20


### **5. Miscellaneous Operators**

#### **a. Colon Operator (`:`)**
The colon (`:`) is used to create a sequence of numbers.

In [81]:

x <- 1:5  # Creates a sequence from 1 to 5
print(x)  # Prints c(1, 2, 3, 4, 5)


[1] 1 2 3 4 5


#### **b. Sequence Generation (`seq()`)**
The `seq()` function is used to generate a sequence with specific increments.

In [82]:

x <- seq(1, 10, by=2)  # Creates a sequence from 1 to 10 with a step of 2
print(x)  # Prints c(1, 3, 5, 7, 9)


[1] 1 3 5 7 9


#### **c. Element Selection (`[]`, `[[]]`)**
Used to select elements from a vector, matrix, or list.

In [83]:

vec <- c(10, 20, 30, 40)
print(vec[2])   # Access the second element (prints 20)


[1] 20


#### **d. List Access (`$`)**
The `$` operator is used to access elements by name in a list or data frame.

In [84]:

data <- list(a = 1, b = 2)
print(data$a)  # Access the element named 'a' (prints 1)


[1] 1


### **6. Special Operators**

R provides a few operators that are specific to certain tasks.

#### **a. Matrix Multiplication (`%*%`)**
Used for matrix multiplication.

In [85]:

mat1 <- matrix(1:4, nrow=2)
mat2 <- matrix(5:8, nrow=2)
result <- mat1 %*% mat2
print(result)  # Matrix multiplication


     [,1] [,2]
[1,]   23   31
[2,]   34   46


#### **b. Modulus and Integer Division (`%%` and `%/%`)**
These operators are used for division-related operations:

- **`%%`** gives the remainder.
- **`%/%`** gives the integer part of division.

In [86]:

x <- 10
y <- 3

print(x %% y)  # Returns 1 (remainder)
print(x %/% y) # Returns 3 (integer division)


[1] 1
[1] 3


#### **c. In Operator (`%in%`)**
Used to check whether an element exists in a vector or list.

In [87]:

x <- 3
y <- c(1, 2, 3, 4, 5)

print(x %in% y)  # Returns TRUE (3 is in the vector y)


[1] TRUE


### **7. Summary of Operators**

1. **Arithmetic Operators**: Perform basic math operations (`+`, `-`, `*`, `/`, etc.).
2. **Relational Operators**: Compare values (`==`, `!=`, `>`, `<`, etc.).
3. **Logical Operators**: Combine conditions (`&`, `|`, `!`, etc.).
4. **Assignment Operators**: Assign values to variables (`<-`, `=`, etc.).
5. **Miscellaneous Operators**: Sequence creation (`:`), element selection (`[]`), list access (`$`), etc.
6. **Special Operators**: Matrix multiplication (`%*%`), modulus (`%%`), and membership test (`%in%`).

### **Conclusion:**

Understanding these operators is crucial for performing calculations, comparisons, and data manipulations in R. Mastering how to use them efficiently will help you write more robust and flexible R programs.

## Control Structures:

- if statements, for loops, while loops.

Control structures in R are used to control the flow of execution in a program. They help in making decisions, repeating tasks, or breaking out of certain operations. The most common control structures in R include **conditional statements** and **loops**.

### **1. Conditional Statements (if, else if, else)**

Conditional statements allow the program to execute different pieces of code based on certain conditions.

#### **if Statement**
The `if` statement checks whether a condition is `TRUE`. If it is, the code inside the block is executed.

In [65]:

x <- 5

if (x > 3) {
  print("x is greater than 3")
}


[1] "x is greater than 3"


#### **if-else Statement**
The `else` block is executed when the condition in the `if` statement is `FALSE`.

In [64]:

x <- 2

if (x > 3) {
  print("x is greater than 3")
} else {
  print("x is less than or equal to 3")
}


[1] "x is less than or equal to 3"


#### **else if Statement**
The `else if` statement allows you to check multiple conditions sequentially. If none of the conditions are `TRUE`, the `else` block (if provided) is executed.

In [63]:

x <- 5

if (x > 10) {
  print("x is greater than 10")
} else if (x > 3) {
  print("x is greater than 3 but less than or equal to 10")
} else {
  print("x is less than or equal to 3")
}


[1] "x is greater than 3 but less than or equal to 10"




### **2. Loops**

Loops are used to repeat a block of code multiple times. There are several types of loops in R:

#### **for Loop**
The `for` loop repeats a block of code a specified number of times, iterating over elements in a sequence or vector.

In [66]:

# Example: Print numbers 1 to 5
for (i in 1:5) {
  print(i)
}


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5


In this example, the loop iterates over the numbers 1 to 5, printing each one.

#### **while Loop**
The `while` loop repeats a block of code as long as a specified condition is `TRUE`.

In [67]:

# Example: Print numbers from 1 to 5 using a while loop
x <- 1

while (x <= 5) {
  print(x)
  x <- x + 1
}


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5


In this example, the loop keeps running as long as `x` is less than or equal to 5. The value of `x` is incremented after each iteration.

#### **repeat Loop**
The `repeat` loop repeats a block of code indefinitely until a `break` statement is encountered.

In [68]:

# Example: Print numbers from 1 to 5 using a repeat loop
x <- 1

repeat {
  print(x)
  x <- x + 1
  
  if (x > 5) {
    break  # Exit the loop when x becomes greater than 5
  }
}


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5


In this example, the loop continues until `x` is greater than 5, at which point the `break` statement is used to exit the loop.



### **3. Loop Control (break, next)**

Loop control statements allow you to change the normal flow of loops.

#### **break**
The `break` statement is used to exit a loop prematurely when a specific condition is met.

In [69]:

# Example: Exit the loop when i equals 3
for (i in 1:5) {
  if (i == 3) {
    break  # Exit the loop when i is 3
  }
  print(i)
}


[1] 1
[1] 2


In this example, the loop stops when `i` equals 3, so only the numbers 1 and 2 are printed.

#### **next**
The `next` statement is used to skip the current iteration of a loop and move on to the next one.

In [71]:

# Example: Skip the number 3
for (i in 1:5) {
  if (i == 3) {
    next  # Skip the current iteration when i is 3
  }
  print(i)
}


[1] 1
[1] 2
[1] 4
[1] 5


In this example, the loop skips the iteration where `i` equals 3, so 1, 2, 4, and 5 are printed, but not 3.

### **4. Logical Operators**

Logical operators are used within control structures to create complex conditions.

- **& (AND)**: Both conditions must be `TRUE`.
- **| (OR)**: At least one of the conditions must be `TRUE`.
- **! (NOT)**: Negates a condition (turns `TRUE` into `FALSE`, and vice versa).

#### **Example with Logical Operators:**

In [72]:

x <- 7

if (x > 3 & x < 10) {
  print("x is between 3 and 10")
}


[1] "x is between 3 and 10"


In this example, the condition is `TRUE` because `x` is greater than 3 and less than 10.



### **5. Vectorized ifelse() Function**

R also provides the `ifelse()` function, which allows you to apply conditional logic over vectors. This is useful when you want to apply conditions element-wise.

In [73]:

# Example: Check whether each number in a vector is even or odd
numbers <- 1:5
result <- ifelse(numbers %% 2 == 0, "Even", "Odd")
print(result)


[1] "Odd"  "Even" "Odd"  "Even" "Odd" 


In this example, the `ifelse()` function checks whether each element of `numbers` is even or odd. If the number is divisible by 2 (`numbers %% 2 == 0`), it returns `"Even"`, otherwise `"Odd"`.

### **Conclusion:**

- **Conditional statements** (`if`, `else if`, `else`) allow for decision-making in your code.
- **Loops** (`for`, `while`, `repeat`) enable you to execute a block of code multiple times.
- **Loop control statements** (`break`, `next`) allow you to control the flow of loops, letting you exit or skip iterations.
- R also provides the vectorized `ifelse()` function for efficient element-wise condition checking.

Understanding and using these control structures will enable you to build dynamic and flexible programs in R.

## Functions

What functions are, how to define and call them.

Functions are a fundamental concept in R, allowing you to group blocks of code into reusable components. They make your code modular, easier to debug, and more readable. A function takes inputs (called arguments), processes them, and returns an output (result). R also comes with many **built-in functions**, but you can create **user-defined functions** as well.

### **1. What are Functions?**

A function is a set of statements organized together to perform a specific task. Functions in R can:
- Take inputs (called parameters or arguments).
- Perform some operations using these inputs.
- Return one or more outputs.

#### **Built-in Functions Example:**

In [88]:

# Example: sqrt() is a built-in function to calculate square root
result <- sqrt(16)
print(result)  # Output: 4


[1] 4


In this example, `sqrt()` is a built-in function that takes one input (16) and returns its square root.

### **2. Defining a Function in R**

To define a function in R, use the `function()` keyword. A function has the following components:
- **Function Name**: A name to call the function.
- **Arguments/Parameters**: Inputs the function will use.
- **Body**: The block of code that performs the task.
- **Return Value**: The output returned by the function.

#### **Syntax:**

In [90]:

function_name <- function(arg1, arg2, ...) {
  # Code to execute
  return(value)  # Optional, if not provided, the last evaluated expression is returned
}


#### **Example: Simple Function**

In [91]:

# Define a function to add two numbers
add_numbers <- function(a, b) {
  sum <- a + b
  return(sum)  # Return the sum
}

# Call the function
result <- add_numbers(5, 3)
print(result)  # Output: 8


[1] 8


In this example, the function `add_numbers()` takes two inputs (`a` and `b`), adds them, and returns the result.

### **3. Calling a Function**

Once a function is defined, you can call it by using its name followed by parentheses containing the arguments.

#### **Example:**

In [92]:

# Function call with arguments
result <- add_numbers(10, 20)
print(result)  # Output: 30


[1] 30


You can pass arguments directly when calling the function.

### **4. Function Arguments**

Functions in R can have several types of arguments:
- **Positional Arguments**: Arguments are matched to function parameters by their position.
- **Named Arguments**: Arguments can be passed using names, in any order.
- **Default Arguments**: You can provide default values for arguments, which will be used if no value is supplied during the function call.

#### **Example with Named and Default Arguments:**

In [93]:

# Define a function with a default argument
greet <- function(name, greeting = "Hello") {
  paste(greeting, name)
}

# Call the function with both arguments
print(greet("Simran", "Hi"))  # Output: "Hi Simran"

# Call the function using default value for greeting
print(greet("Simran"))  # Output: "Hello Simran"


[1] "Hi Simran"
[1] "Hello Simran"


### **5. Returning Values from Functions**

A function can return a value using the `return()` statement, but if you don’t use `return()`, R will automatically return the value of the last evaluated expression.

#### **Example:**

In [94]:

# Function without explicit return statement
multiply <- function(a, b) {
  a * b  # The last evaluated expression is returned
}

# Call the function
result <- multiply(4, 5)
print(result)  # Output: 20


[1] 20


In this case, R automatically returns the result of `a * b` because it’s the last expression in the function.

### **6. Nested Functions**

You can define functions within other functions. The inner function is only accessible within the outer function.

#### **Example:**

In [95]:

# Define a function with a nested function
outer_function <- function(x) {
  inner_function <- function(y) {
    return(y + 2)
  }
  
  return(inner_function(x) * 3)
}

# Call the outer function
result <- outer_function(5)
print(result)  # Output: 21 (5 + 2 = 7; 7 * 3 = 21)


[1] 21


### **7. Anonymous Functions**

In R, you can create anonymous functions, i.e., functions without a name, especially useful for short tasks like passing functions as arguments to other functions.

#### **Example:**

In [96]:

# Define an anonymous function inside sapply
result <- sapply(1:5, function(x) x^2)  # Squaring numbers from 1 to 5
print(result)  # Output: 1 4 9 16 25


[1]  1  4  9 16 25


Here, the anonymous function squares each element in the sequence `1:5`.

### **8. Lazy Evaluation**

R uses **lazy evaluation** for function arguments, meaning arguments are only evaluated when they are actually used inside the function.

#### **Example:**

In [97]:

lazy_function <- function(a, b) {
  print(a)
  # b is never used, so it won't be evaluated
}

lazy_function(10, stop("This will not be evaluated"))  # Output: 10


[1] 10


In this example, since `b` is not used inside the function, the error from `stop()` is never triggered.

### **9. Variable Scope in Functions**

In R, the scope of a variable refers to the regions of a program where the variable can be accessed. There are two types:
- **Local Scope**: Variables declared inside a function are local to that function and cannot be accessed outside it.
- **Global Scope**: Variables declared outside any function are global and can be accessed from anywhere in the program.

#### **Example:**

In [98]:
x <- 10  # Global variable

my_function <- function() {
  y <- 5  # Local variable
  print(y)
}

my_function()  # Output: 5
print(x)  # Output: 10
# print(y)  # Error: object 'y' not found (because y is local to my_function)

[1] 5
[1] 10


### **10. Common Built-in Functions**

R comes with many useful built-in functions. Some commonly used ones include:

- **`sum()`**: Adds all elements in a vector.
- **`mean()`**: Computes the average of a numeric vector.
- **`length()`**: Returns the number of elements in a vector.
- **`max()` / `min()`**: Returns the maximum or minimum value in a vector.
- **`sqrt()`**: Computes the square root of a number.
- **`paste()`**: Concatenates strings.

### **11. Function Documentation**

It is a good practice to add comments or documentation inside your functions to explain what they do, what inputs they expect, and what outputs they produce. This makes it easier for others (and yourself) to understand the code later.

In [100]:

# Function to calculate the area of a rectangle
# Args:
#   length: The length of the rectangle.
#   width: The width of the rectangle.
# Returns:
#   The area of the rectangle.
calculate_area <- function(length, width) {
  area <- length * width
  return(area)
}


### **Conclusion:**

- Functions in R are essential tools for making your code reusable, modular, and easier to manage.
- You can define functions using the `function()` keyword, and they can take arguments, return values, and perform complex tasks.
- Functions can have default arguments, return values explicitly or implicitly, and even be anonymous or nested.
- Understanding functions in R will allow you to write cleaner, more efficient code, making it easier to handle complex tasks.

## Data Structures

Vectors, lists, matrices, data frames, and factors.

### Data Structures in R

R provides various data structures that allow you to organize and manipulate data efficiently. Each data structure is suited for specific tasks, making it crucial to choose the right one based on your needs. The most common data structures in R are:
- **Vectors**
- **Lists**
- **Matrices**
- **Data Frames**
- **Factors**
  
These structures vary in terms of dimension, type of data they can store, and how they are used.

### **1. Vectors**

A **vector** is the simplest and most basic data structure in R. It is a one-dimensional array that holds elements of the same type, such as numeric, character, logical, or integer.

#### **Types of Vectors:**
- **Numeric Vector**: Stores numbers (e.g., 1.5, 2.3, -5).
- **Integer Vector**: Stores integer values (e.g., 1, 2, -10).
- **Character Vector**: Stores text (e.g., "apple", "banana").
- **Logical Vector**: Stores boolean values (`TRUE`, `FALSE`).

#### **Creating Vectors:**

In [102]:

# Numeric vector
num_vector <- c(1.2, 3.4, 5.6)

# Integer vector
int_vector <- c(1L, 2L, 3L)  # 'L' indicates an integer

# Character vector
char_vector <- c("apple", "banana", "cherry")

# Logical vector
log_vector <- c(TRUE, FALSE, TRUE)

# Sequence of numbers
seq_vector <- 1:10  # Creates a sequence from 1 to 10


#### **Common Operations on Vectors:**
- **Accessing Elements**: Use square brackets `[]` to access elements.

In [103]:
num_vector[2]  # Access second element (3.4)

- **Vector Arithmetic**: Vectors support element-wise operations.

In [105]:
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result <- vec1 + vec2  # Element-wise addition: c(5, 7, 9)

### **2. Lists**

A **list** is a more flexible data structure that can contain elements of different types (e.g., numeric, character, logical, vectors, even other lists). It is a collection of objects, allowing you to mix and match data types.

#### **Creating a List:**

In [106]:

# List containing different types of data
my_list <- list(name = "John", age = 25, scores = c(90, 85, 88))

# Accessing elements in a list
my_list$name      # Access 'name' element
my_list$age       # Access 'age' element
my_list[[3]]      # Access third element (scores vector)


- **Named Lists**: You can assign names to the list elements for easier access.

In [107]:

my_list <- list(name = "John", age = 25)
my_list$name  # Access element by name


#### **Lists vs. Vectors:**
- **Homogeneity**: Vectors can only hold elements of the same type, while lists can store different types of objects.
- **Length**: Lists can contain elements of varying lengths, while vectors must have elements of the same length.

### **3. Matrices**

A **matrix** is a two-dimensional array that contains elements of the same type (numeric, logical, or character). Matrices are essentially vectors arranged in rows and columns.

#### **Creating a Matrix:**

In [108]:

# Create a 3x3 numeric matrix
matrix_1 <- matrix(1:9, nrow = 3, ncol = 3)

# Create a matrix by combining vectors by row
matrix_2 <- rbind(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9))

# Create a matrix by combining vectors by column
matrix_3 <- cbind(c(1, 4, 7), c(2, 5, 8), c(3, 6, 9))


#### **Matrix Operations:**
- **Element Access**: Use row and column indices to access elements.
  ```R
  matrix_1[2, 3]  # Access element in 2nd row, 3rd column
  ```

- **Matrix Arithmetic**: You can perform element-wise or matrix multiplication.

In [109]:

mat1 <- matrix(1:4, nrow = 2)
mat2 <- matrix(5:8, nrow = 2)

result_add <- mat1 + mat2  # Element-wise addition
result_mult <- mat1 %*% mat2  # Matrix multiplication


### **4. Data Frames**

A **data frame** is a two-dimensional table where each column can contain different types of data (e.g., numeric, character, logical). It’s similar to a matrix but allows mixed data types. Data frames are widely used for handling tabular data, such as datasets for statistical analysis.

#### **Creating a Data Frame:**

In [110]:

# Create a data frame with different types of columns
df <- data.frame(
  Name = c("Alice", "Bob", "Carol"),
  Age = c(25, 30, 35),
  Score = c(90, 85, 88)
)
print(df)


   Name Age Score
1 Alice  25    90
2   Bob  30    85
3 Carol  35    88


#### **Accessing Data in Data Frames:**
- **By Column Name**:

In [112]:
df$Name   # Access the 'Name' column

- **By Indexing**:

In [113]:

df[1, ]    # Access the first row
df[, 2]    # Access the second column
df[1, 2]   # Access the element in the first row, second column


Unnamed: 0_level_0,Name,Age,Score
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
1,Alice,25,90


#### **Adding/Removing Columns:**

In [114]:

# Add a new column
df$City <- c("New York", "London", "Paris")

# Remove a column
df$Score <- NULL


### **5. Factors**

A **factor** is used to represent categorical data. Factors are important in statistical modeling as they define the levels of categorical variables (e.g., "Male", "Female" for gender). Factors are stored as integers with labels.

#### **Creating Factors:**

In [115]:

# Create a factor with levels
gender <- factor(c("Male", "Female", "Female", "Male"))
print(gender)

# Check the levels
levels(gender)  # Output: "Female", "Male"


[1] Male   Female Female Male  
Levels: Female Male


#### **Ordered Factors:**
You can create ordered factors where levels have a specific order (e.g., "Low", "Medium", "High").

In [116]:

# Create an ordered factor
education <- factor(c("High School", "College", "Graduate", "College"),
                    levels = c("High School", "College", "Graduate"),
                    ordered = TRUE)
print(education)


[1] High School College     Graduate    College    
Levels: High School < College < Graduate


#### **Factors vs. Character Vectors:**
- Factors are stored as integers internally with corresponding levels, while character vectors store text directly.
- Factors are more efficient for storing categorical data and are used in statistical modeling.

### **6. Arrays**

An **array** is similar to a matrix but can have more than two dimensions (i.e., higher-dimensional data). Each element in an array must be of the same type.

#### **Creating an Array:**

In [117]:

# Create a 3-dimensional array
arr <- array(1:24, dim = c(4, 3, 2))  # 4 rows, 3 columns, 2 matrices
print(arr)


, , 1

     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

, , 2

     [,1] [,2] [,3]
[1,]   13   17   21
[2,]   14   18   22
[3,]   15   19   23
[4,]   16   20   24



#### **Accessing Elements in an Array:**

In [119]:

arr[2, 3, 1]  # Access element at [row=2, column=3, matrix=1]


### **7. Important Operations on Data Structures**

- **Length**: `length()` returns the number of elements in a vector or list.

In [121]:
length(c(1, 2, 3, 4))  # Output: 4

- **Dimensions**: `dim()` returns the dimensions of matrices and data frames.

In [123]:
dim(matrix(1:9, nrow=3))  # Output: 3 3 (rows and columns)

- **Structure**: `str()` gives a summary of the structure of any R object.

In [124]:
str(df)  # Displays the structure of a data frame

'data.frame':	3 obs. of  3 variables:
 $ Name: chr  "Alice" "Bob" "Carol"
 $ Age : num  25 30 35
 $ City: chr  "New York" "London" "Paris"


- **Summary**: `summary()` provides a statistical summary of a data structure.

In [125]:
summary(df)  # Displays summary statistics of each column in the data frame

     Name                Age           City          
 Length:3           Min.   :25.0   Length:3          
 Class :character   1st Qu.:27.5   Class :character  
 Mode  :character   Median :30.0   Mode  :character  
                    Mean   :30.0                     
                    3rd Qu.:32.5                     
                    Max.   :35.0                     

- **Combine Vectors**: `cbind()` and `rbind()` are used to combine vectors/matrices by columns and rows, respectively.

In [127]:
cbind(vec1, vec2)  # Combine vectors column-wise
rbind(vec1, vec2)  # Combine vectors row-wise

vec1,vec2
1,4
2,5
3,6


0,1,2,3
vec1,1,2,3
vec2,4,5,6


### **Conclusion:**

- **Vectors** are one-dimensional arrays holding elements of the same type.
- **Lists** can hold different types of objects, including vectors, matrices, and other lists.
- **Matrices** are two-dimensional arrays that contain elements of the same type.
- **Data Frames** are two-dimensional tables where each column can hold different data types (most common structure for datasets).
-