# Variables

## 1. Numeric variables
Numeric variables in R are used to store numerical data, including both integers and floating-point numbers. They are fundamental for performing calculations, statistical analyses, and mathematical operations. 

**Defining Numeric Variables:**
You can define numeric variables in R by simply assigning a numerical value to a variable name. Here's an example related to medicine:

```R
# Store a patient's age in years as a numeric variable
patient_age <- 45
```

**Operations with Numeric Variables:**
Numeric variables can be used in various mathematical operations, such as addition, subtraction, multiplication, and division. For instance, you can calculate a patient's body mass index (BMI) using numeric variables:

```R
# Calculate BMI for a patient
patient_weight_kg <- 70
patient_height_m <- 1.75
bmi <- patient_weight_kg / (patient_height_m^2)
```

**Practical Notes:**

1. **Units of Measurement:** Always be mindful of the units of measurement when working with numeric variables in the context of biology and medicine. Ensure that your data is consistent with the appropriate units (e.g., kilograms, centimeters, milligrams, etc.) to avoid errors in calculations.

2. **Data Validation:** Validate numeric data to prevent errors. For instance, check that patient ages are within a reasonable range (e.g., no negative ages), and ensure that weight and height values are non-zero.

3. **Missing Values:** Numeric variables can have missing or undefined values. In R, missing values are represented as `NA`. Handle missing data appropriately in your analyses.

4. **Data Transformation:** Numeric variables often require transformation for analysis. For example, you might need to convert measurements from different units or logarithmically transform skewed data.

5. **Data Visualization:** Use data visualization tools (e.g., ggplot2) to explore and visualize numeric data, such as creating histograms to understand the distribution of patient ages or scatterplots to examine relationships between variables.

6. **Statistical Analysis:** Numeric variables are crucial for statistical analyses in biology and medicine. R offers a wide range of statistical functions and packages for conducting hypothesis tests, regression analyses, and more.

7. **Data Privacy:** Be mindful of patient data privacy and confidentiality when working with numeric variables in medical contexts. Follow relevant data protection regulations and guidelines.

8. **Reproducibility:** Document your code and analysis steps to ensure reproducibility. It's essential to maintain transparency and allow others to verify your results.

Numeric variables are at the core of quantitative analysis in biology and medicine. By understanding how to work with them effectively and responsibly, you can gain valuable insights from your data and contribute to research and healthcare practices.

**Exercise 1: Calculate BMI**

Calculate the BMI (Body Mass Index) for a patient given their weight in kilograms and height in meters. The formula for BMI is $$BMI = weight (kg) / (height (m))^2$$

In [1]:
# Exercise 1: Calculate BMI
# Given patient's weight and height
patient_weight_kg <- 75
patient_height_m <- 1.80

# Calculate BMI
bmi <- patient_weight_kg / (patient_height_m^2)

# Display the BMI
bmi

**Exercise 2: Convert Temperature**

Convert a temperature from Celsius to Fahrenheit using the formula: $$Fahrenheit = (Celsius * 9/5) + 32$$

In [2]:
# Exercise 2: Convert Temperature
# Given temperature in Celsius
temperature_celsius <- 20

# Convert to Fahrenheit
temperature_fahrenheit <- (temperature_celsius * 9/5) + 32

# Display the temperature in Fahrenheit
temperature_fahrenheit

**Exercise 3: Calculate Dosage**

Calculate the dosage of a medication for a patient based on their weight in kilograms and the recommended dosage in milligrams per kilogram (mg/kg).

In [3]:
# Exercise 3: Calculate Dosage
# Given patient's weight and recommended dosage
patient_weight_kg <- 60
recommended_dosage_mg_kg <- 10

# Calculate the medication dosage
medication_dosage_mg <- patient_weight_kg * recommended_dosage_mg_kg

# Display the medication dosage
medication_dosage_mg

**Exercise 4: Calculate Mean and Standard Deviation**

Calculate the mean and standard deviation of a dataset representing the cholesterol levels (in mg/dL) of a group of patients.

In [5]:
# Exercise 4: Calculate Mean and Standard Deviation
# Given cholesterol levels of patients
cholesterol_levels <- c(180, 210, 190, 220, 200, 195, 205, 215, 185, 225)

# Calculate the mean
mean_cholesterol <- mean(cholesterol_levels)

# Calculate the standard deviation
sd_cholesterol <- sd(cholesterol_levels)

# Display the mean and standard deviation
mean_cholesterol
sd_cholesterol

## 2. Character variables
Character variables in R are used to store text or string data. They are valuable for representing information that consists of words, letters, or other textual characters. In the context of bioinformatics, biology, and medicine, character variables can be used to store various types of textual information. Here's how to work with character variables in R, along with some examples:

**Creating Character Variables:**

You can create character variables in R by assigning text enclosed in either single (' ') or double (" ") quotes to a variable name.

In [20]:
# Creating character variables
gene_name <- "BRCA1"
gene_name

In [21]:
patient_name <- "John Smith"
patient_name

In [22]:
sequence <- "ATCGCTAGTGGCTA"
sequence

**Concatenation:**

You can concatenate (combine) character variables or strings using the `paste()` function.

In [23]:
# Concatenating character variables
full_name <- paste("John", "Smith")
full_name

In [24]:
sequence_info <- paste("Gene:", gene_name, "Sequence:", sequence)
sequence_info

**Subsetting and Manipulation:**
* **Subsetting Character Variables:**

The `substr()` function in R is used to extract a substring from a character vector (string). It allows you to specify the starting position and the length of the substring you want to extract. Here's the syntax for `substr()`:

```R
substr(x, start, stop)
```

- `x`: The input character vector or string from which you want to extract a substring.
- `start`: The position (index) at which you want to start extracting characters. It can be an integer or a vector of integers indicating the starting positions.
- `stop`: The position (index) at which you want to stop extracting characters. It can be an integer or a vector of integers indicating the stopping positions.

**Note:** 
- If `start` and `stop` are both integers, `substr()` extracts characters starting from the `start` position up to and including the character at the `stop` position.
- If `start` is a vector and `stop` is a single integer, `substr()` extracts characters starting from each position specified in the `start` vector up to the character at the `stop` position.
- If `start` is a single integer and `stop` is a vector, `substr()` extracts characters starting from the `start` position up to and including each character at the positions specified in the `stop` vector.
- If both `start` and `stop` are vectors, `substr()` extracts characters for each pair of corresponding positions in `start` and `stop`.

**Examples:**

```R
# Extract the first character from a string
substr("Hello", 1, 1)  # Returns "H"

# Extract a range of characters from a string
substr("Hello, World!", 1, 5)  # Returns "Hello"

# Extract characters starting from position 3 to the end
substr("Data Science", 3, nchar("Data Science"))  # Returns "ta Science"
```

`substr()` is commonly used for extracting specific portions of text data, such as substrings within a larger text, gene sequences, or other forms of textual data.

In [38]:
# Sample character variable
gene_name <- "BRCA1"

# Accessing specific characters (e.g., first character)
first_character <- substr(gene_name, 1, 1)
first_character

In [40]:
second_character <- substr(gene_name, 2, 2)
second_character

In [41]:
# Accessing a range of characters
characters_2_to_4 <- substr(gene_name, 2, 4)
characters_2_to_4

In [18]:
# Manipulating character variables
uppercase_name <- toupper(patient_name)
uppercase_name

* **Changing Case:**
You can change the case of characters within a string using functions like `toupper()` (convert to uppercase) and `tolower()` (convert to lowercase).

In [43]:
# Converting to uppercase
upper_gene_name <- toupper(gene_name)
upper_gene_name

In [44]:
# Converting to lowercase
lower_gene_name <- tolower(gene_name)
lower_gene_name

* **Removing Whitespace:** You can remove leading and trailing whitespace (spaces, tabs, etc.) from a string using the `trimws()` function.

In [45]:
# Removing leading and trailing whitespace
text <- "  Hello, World!   "
cleaned_text <- trimws(text)  
cleaned_text

* **Substitution:** You can replace specific characters or substrings within a string using functions like `gsub()` (global substitution).

In [47]:
# Substituting characters
text <- "apple banana apple cherry"
replaced_text <- gsub("apple", "fruit", text)  
replaced_text

* **Splitting and Joining:**
You can split a string into multiple substrings using the `strsplit()` function and join multiple strings into one using `paste()`.

In [48]:
# Splitting a string
text <- "apple,banana,cherry"
split_text <- strsplit(text, ",")
split_text

In [49]:
# Joining strings
words <- c("apple", "banana", "cherry")
joined_text <- paste(words, collapse = ",")
joined_text

R provides packages like *stringr* and *stringi* that offer extensive functions for advanced string manipulation and regular expressions.

**String Length:**
To find the length of a character variable (the number of characters in the string), you can use the `nchar()` function.

In [51]:
patient_name

In [52]:
# Finding the length of a character variable
name_length <- nchar(patient_name)
name_length

**Comparison:**
You can compare character variables using operators like `==` (equal to) or `!=` (not equal to) to check if two strings are the same.

In [53]:
# Comparing character variables
is_john <- patient_name == "John Smith"
is_john

**String Manipulation Functions:**

R provides a wide range of string manipulation functions through packages like `stringr` and `gsub()`. These functions allow you to perform tasks like pattern matching, substitution, and more.

In [55]:
# Using string manipulation functions
library(stringr)
first_name <- str_extract(patient_name, "[A-Z][a-z]+")
first_name

Here's a practical example of working with character variables in R in the context of bioinformatics. In this example, we'll manipulate gene names, extract specific information, and perform some string operations.

**Objective:** Given a list of gene names, we want to extract the initials of each gene name and count how many gene names start with each letter of the alphabet.


In [56]:
# Sample list of gene names
gene_names <- c("BRCA1", "TP53", "EGFR", "PTEN", "KRAS", "AKT1", "BRAF", "SMAD4", "VEGFA")

# Initialize a vector to count gene names starting with each letter
letter_counts <- numeric(26)

# Loop through each gene name
for (gene_name in gene_names) {
  # Extract the first letter of the gene name
  initial <- substr(gene_name, 1, 1)
  
  # Convert the initial to uppercase (if not already)
  initial <- toupper(initial)
  
  # Convert the initial to a numeric code (A=1, B=2, ..., Z=26)
  initial_code <- as.numeric(charToRaw(initial)) - as.numeric(charToRaw("A")) + 1
  
  # Increment the count for the corresponding letter
  letter_counts[initial_code] <- letter_counts[initial_code] + 1
}

# Display the counts for each letter
for (i in 1:26) {
  cat(paste0("Letter '", rawToChar(as.raw(i + as.numeric(charToRaw("A")) - 1)), "': ", letter_counts[i], "\n"))
}


Letter 'A': 1
Letter 'B': 2
Letter 'C': 0
Letter 'D': 0
Letter 'E': 1
Letter 'F': 0
Letter 'G': 0
Letter 'H': 0
Letter 'I': 0
Letter 'J': 0
Letter 'K': 1
Letter 'L': 0
Letter 'M': 0
Letter 'N': 0
Letter 'O': 0
Letter 'P': 1
Letter 'Q': 0
Letter 'R': 0
Letter 'S': 1
Letter 'T': 1
Letter 'U': 0
Letter 'V': 1
Letter 'W': 0
Letter 'X': 0
Letter 'Y': 0
Letter 'Z': 0


**Explanation:**
- We start with a list of gene names stored in the `gene_names` vector.
- We initialize a numeric vector called `letter_counts` with 26 elements, one for each letter of the alphabet.
- We loop through each gene name, extract the first letter, convert it to uppercase (to handle case insensitivity), and then convert it to a numeric code based on its position in the alphabet.
- We increment the count for the corresponding letter in the `letter_counts` vector.
- Finally, we display the counts for each letter of the alphabet.

This practical example demonstrates how character variables can be used to manipulate and extract information from gene names, which is a common task in bioinformatics and biology.

### Inspecting Variables in R

In R, you can identify the type, size, and memory usage of a variable using various functions and techniques. Here's how you can do it:

**1. Variable Type:**
   - To identify the type of a variable, you can use the `class()` function. It returns the class or data type of the object.

   ```R
   # Check the data type of a variable
   x <- 5
   class(x)  # Returns "numeric"

   y <- "Hello, World!"
   class(y)  # Returns "character"
   ```

**2. Variable Size:**
   - To determine the size of a variable in terms of memory usage, you can use the `object.size()` function. It returns the size of the object in bytes.

   ```R
   # Check the memory size of a variable
   z <- c(1, 2, 3, 4, 5)
   object.size(z)  # Returns the size in bytes
   ```

   Keep in mind that the size reported by `object.size()` includes the memory used not only by the variable itself but also by its associated attributes.

**3. Memory Identification:**
   - To identify the memory location (address) of a variable, you can use the `pryr::address()` function from the `pryr` package.

   ```R
   # Install and load the pryr package
   install.packages("pryr")
   library(pryr)

   # Get the memory address of a variable
   memory_address <- address(z)
   ```

   Note that the memory address is a hexadecimal value that represents the location of the object in memory. It can be useful for advanced debugging and profiling but is not commonly needed for routine programming tasks.

These functions allow you to inspect and gather information about the type, size, and memory identification of variables in R. Understanding the memory usage of variables can be important for optimizing your code, especially when dealing with large datasets or objects.

In [57]:
patient_name

In [58]:
class(patient_name)

In [59]:
object.size(patient_name)

120 bytes

### Why Single Characters in R Consume 112 Bytes: Understanding Memory Overhead
In R, it may seem counterintuitive that a single character consumes 112 bytes of memory. However, this memory usage is not solely attributed to the character itself but includes the overhead associated with R's internal data structures and memory management. Let me explain why a single character typically occupies 112 bytes in R:

1. **Character Encoding:** R uses a 32-bit encoding for character data. This means that each character is stored as a 32-bit (4-byte) integer in memory, even though most characters can be represented in 8 bits (1 byte). This encoding ensures compatibility with a wide range of character sets and languages.

2. **Internal Structure:** R stores character vectors as a sequence of individual characters, each represented by a 32-bit integer. This requires additional memory for indexing, metadata, and pointers to maintain the integrity and structure of the vector.

3. **Memory Management Overhead:** R manages memory dynamically, and this involves overhead to keep track of allocated memory, deallocate memory when it's no longer needed, and ensure memory safety. This overhead is part of the memory consumption reported by `object.size()`.

4. **Vectorization:** R is designed to work with vectorized operations, and this design choice impacts memory usage. Even for single characters, R often allocates memory in chunks or vectors to optimize performance. These chunks may contain more than one character.

Here's an example to illustrate the memory usage of single characters in R:

```R
# Create a single character
char_var <- "A"

# Check the memory size of the character variable
size_in_bytes <- object.size(char_var)
cat("Size in bytes:", size_in_bytes, "\n")
```

The reported memory usage is typically around 112 bytes for a single character, as you mentioned.

While it may seem inefficient, this memory usage is a trade-off for R's flexibility and robust support for character encoding and manipulation. In practice, the memory overhead of a single character is usually not a significant concern. However, if you need to work with a large number of characters in a memory-intensive application, you might consider more memory-efficient representations or data structures.


In [61]:
object.size("A")

112 bytes

In [62]:
object.size("AB")

112 bytes

In [63]:
object.size("ABC")

112 bytes

In [68]:
object.size("ABCDEFG")

112 bytes

In [69]:
object.size("ACDEFGH")

120 bytes

In [74]:
object.size("ABCDEFGHIJKLMNO")

120 bytes

In [82]:
object.size("ABCDEFGHIJKLMNOQ")

136 bytes

In [96]:
object.size(1)

56 bytes

In [83]:
class("ABCDEFGHIJKLMNOQ")

### Understanding Object Size in R for Numeric Values: Fixed at 56 Bytes

The object size in R for numeric values is typically fixed at 56 bytes, regardless of the magnitude of the number. This fixed size is a characteristic of how R internally represents numeric data, and it's not influenced by the size of the number itself.

In R, numeric values are typically stored as double-precision floating-point numbers (commonly referred to as "doubles"). A double-precision number requires 8 bytes of memory to store. However, the reported object size in R for numeric values is larger than 8 bytes because of additional memory overhead and attributes associated with the numeric object.

The fixed object size of 56 bytes for numeric values includes:

1. The actual numeric value, which is stored as a double-precision floating-point number (8 bytes).
2. Additional memory overhead for the object's structure and attributes.

This fixed object size is an implementation detail of R and is designed to ensure efficient memory management and compatibility with various operations and data structures in R.

Regardless of whether you have a small or large numeric value, the reported object size will typically remain at 56 bytes. If you need to optimize memory usage for very large datasets of numeric values, you may consider using specialized data structures or packages designed for that purpose.

In [97]:
object.size(10)

56 bytes

In [98]:
object.size(10000)

56 bytes

In [99]:
object.size(1000000)

56 bytes

In [104]:
object.size(10^10000000000000000000)

56 bytes

In [84]:
class(2020)

In [85]:
class("2020")

In [89]:
class(as.character(2020))

In [93]:
as.character(2020)

In [92]:
as.integer(as.character(2020))

In [94]:
as.numeric("42.78")

In [95]:
as.character(42.78)

## Methods and Functions for Character and Numeric Variables in R

Certainly, here's an extended table with descriptions for each function and method related to character and numeric variables in R:

| Operation                          | Description                                        | Character Variables   | Numeric Variables    |
|------------------------------------|----------------------------------------------------|-----------------------|-----------------------|
| **Conversion**                     | Change variable type                              | `as.character()`: Convert to character | `as.numeric()`: Convert to numeric |
| **Character Length**               | Determine string length                           | `nchar()`: Compute the number of characters in a string | -                     |
| **Concatenation**                  | Combine strings                                   | `paste()`: Concatenate strings | -                     |
| **Subsetting**                     | Extract part of a string                         | `substr()`: Extract substrings | -                     |
| **Changing Case**                  | Convert to uppercase or lowercase                | `toupper()`, `tolower()`: Change case | -                   |
| **Pattern Matching**               | Find patterns in strings                         | `grep()`: Match patterns in character vectors | -                     |
| **Substitution**                   | Replace characters or patterns                    | `gsub()`: Global substitution in strings | -                     |
| **Splitting**                      | Split strings into substrings                    | `strsplit()`: Split strings based on a delimiter | -                     |
| **Trimming**                       | Remove leading/trailing whitespace                | `trimws()`: Trim leading/trailing whitespace | -                     |
| **Sorting**                        | Sort character vectors                           | `sort()`: Sort character vectors | -                     |
| **Unique Values**                  | Find unique values                                | `unique()`: Find unique elements in a vector | -                     |
| **Frequency Count**                | Count occurrences of values                      | `table()`: Generate frequency tables | -                     |
| **Regular Expressions**            | Pattern matching with regular expressions           | `grep()`, `grepl()`, `sub()`, `gsub()`, `regexpr()`, `regexec()`, `regmatches()`: Advanced pattern matching | - |
| **String Manipulation (stringr)**  | Advanced string manipulation and pattern matching   | `stringr` package (e.g., `str_extract()`, `str_replace()`, `str_split()`, `str_sub()`, etc.): Enhanced string manipulation | - |
| **String Conversion (sprintf)**   | Format and convert variables to strings            | `sprintf()`: Format strings | -                     |
| **Character Encoding**             | Convert between different character encodings      | `iconv()`: Character encoding conversion | -                     |
| **String Comparison**              | Compare strings                                   | `==`, `!=`, `grepl()`, etc.: String comparison | -                   |
| **Character Vectorization**        | Perform operations on each character in a vector   | `sapply()`, `lapply()`, `vapply()`, etc.: Apply functions element-wise | -               |
| **Date and Time Manipulation**     | Working with date and time data                   | `as.Date()`, `format()`, `difftime()`, etc.: Date and time operations | -               |
| **Statistical Tests**              | Hypothesis testing and statistical analysis      | -                   | `t.test()`, `cor()`, `chisq.test()`, etc.: Statistical tests |
| **Descriptive Statistics**         | Calculate summary statistics                      | -                   | `summary()`, `quantile()`, `range()`, etc.: Descriptive statistics |
| **Data Transformation (dplyr)**    | Data manipulation using the `dplyr` package       | -                   | `mutate()`, `filter()`, `group_by()`, etc.: Data transformation |
| **Data Aggregation (dplyr)**       | Aggregate data using the `dplyr` package         | -                   | `summarize()`, `count()`, `aggregate()`, etc.: Data aggregation |
| **Data Reshaping (reshape2)**      | Reshape data frames                              | -                   | `melt()`, `dcast()`, etc.: Data reshaping |
| **Time Series Analysis (xts)**     | Handling time series data                        | -                   | Time series functions (e.g., `period.apply()`): Time series analysis |
| **Matrix Operations**              | Working with matrices                            | -                   | Matrix algebra, eigenvalues, etc.: Matrix operations |
| **Plotting (base and ggplot2)**    | Data visualization                               | -                   | Plotting functions (e.g., `plot()`, `ggplot()`, etc.): Data visualization |
| **File I/O**                       | Reading and writing data files                   | `read.table()`, `write.table()`, etc.: File I/O | `read.table()`, `write.table()`, etc.: File I/O |

This table provides a comprehensive overview of functions and methods for working with character and numeric variables in R, along with brief descriptions of their purposes and functionalities.