### **1. Setting and Checking the Working Directory**

R uses a concept of a "working directory," which is the folder where R will look for files to read or save. Managing the working directory allows you to specify where R reads or writes files.

#### **Check Current Working Directory**

To check the current working directory:

In [9]:
getwd()

#### **Set Working Directory**

You can change the working directory to any folder path:

In [10]:
# Set working directory
setwd("C://Users//admin//Desktop//GitHub//R-Learn//new_folder_name")

In [11]:
getwd()

In [12]:
setwd("C://Users//admin//Desktop//GitHub//R-Learn")

In [13]:
getwd()

Replace `"path/to/your/directory"` with the path of your desired directory. You can also use relative paths, such as `setwd("..")` to move up one level in the directory tree.

---

### **2. Listing Files in a Directory**

You can list all files and folders in the current working directory, or in any specified directory.

In [14]:
# List files in the current working directory
list.files()

In [16]:
# List files in a specific directory
list.files("C://Users//admin//Desktop//GitHub//R-Learn//new_folder_name")

#### **Advanced Listing with Options**

In [17]:
# List all files including hidden files
list.files(all.files = TRUE)

In [18]:
# List files with full paths
list.files(full.names = TRUE)

### **3. Creating, Copying, Moving, and Deleting Files**

These operations allow you to create new directories, copy, move, and delete files from within R.

#### **Creating a Directory**

You can create a new directory:

In [19]:
dir.create("new_folder_name1")

#### **Copying Files**

To copy files from one location to another:

In [20]:
# Copy a file
file.copy("data.csv", "new_folder_name/source_file.csv")

#### **Moving or Renaming Files**

To move or rename files, use `file.rename()`:

In [21]:

# Rename or move a file
file.rename("scatter_plot.png", "scatter_plot1.png")


"cannot rename file 'scatter_plot.png' to 'scatter_plot1.png', reason 'The system cannot find the file specified'"


#### **Deleting Files**

To delete a file:

In [22]:
file.remove("file_to_delete.txt")

"cannot remove file 'file_to_delete.txt', reason 'No such file or directory'"


### **4. Reading and Writing Files**

R offers several functions to read from and write to files. Here’s an example with a `.csv` file.

#### **Reading a File**

In [12]:
data <- read.csv("modified_file.csv")
head(data)

Unnamed: 0_level_0,Index,Living.Space..sq.ft.,Beds,Baths,Zip,Year,List.Price....
Unnamed: 0_level_1,<int>,<int>,<int>,<dbl>,<int>,<int>,<int>
1,1,2222,3,3.5,32312,1981,250000
2,2,1628,3,2.0,32308,2009,185000
3,3,3824,5,4.0,32312,1954,399000
4,4,1137,3,2.0,32309,1993,150000
5,5,3560,6,4.0,32309,1973,315000
6,6,2893,4,3.0,32312,1994,699000


#### **Writing to a File**

In [13]:

write.csv(data, "output_file.csv")


### **5. Calling Functions from External Scripts**

Sometimes, you might want to separate functions into different files to organize your code better. You can source these files and call their functions in your main R script.

#### **Example: Sourcing an External Script**

Suppose you have a file called `my_functions.R` in a folder. Here’s how you’d source it and call a function from it:

1. **Create `my_functions.R`**:

```R
   # Inside my_functions.R file
   add_numbers <- function(a, b) {
       return(a + b)
   }



   sub_numbers <- function(a, b) {
       return(a - b)
   }
```

2. **Source and Call the Function**:

In [None]:
# In your main R script
source("my_funtion.R")

In [None]:
   # Call the function from my_functions.R
result <- add_numbers(10, 20)
print(result) # Output should be 30

[1] 30


In [19]:
result <- sub_numbers(10, 20)
print(result) # Output should be 30

[1] -10


---

### **6. Checking File Properties**

R also provides functions to check for the existence of files or directories, file size, and modification time.

#### **Check if a File Exists**

In [21]:
file.exists("my_funtion.R")

#### **Get File Information**

In [22]:
# File size
file.info("my_funtion.R")$size

In [24]:
# Modification time
file.info("my_funtion.R")$mtime

[1] "2024-11-06 19:02:19 CET"



### **7. Practical Example of File and Directory Management in R**

Here’s an example workflow using multiple file system functions in R.


```R
# Set working directory
setwd("path/to/your/main/directory")

# Check current working directory
print(getwd())

# Create a new directory if it doesn't exist
if (!dir.exists("data_folder")) {
    dir.create("data_folder")
}

# Copy a file to the new directory
file.copy("source_file.csv", "data_folder/source_file.csv")

# List files in the new directory
list.files("data_folder")

# Source and use a function from an external script
source("data_folder/my_functions.R")
result <- add_numbers(5, 10)
print(result)

# Delete the copied file
file.remove("data_folder/source_file.csv")

# Remove the new directory
unlink("data_folder", recursive = TRUE)

```



### **8. Summary of Commands**

| Function                   | Purpose                                                      |
|----------------------------|--------------------------------------------------------------|
| `getwd()`                  | Get current working directory                                |
| `setwd("path")`            | Set working directory                                        |
| `list.files("path")`       | List files in a directory                                    |
| `dir.create("folder")`     | Create a new folder                                          |
| `file.copy("from", "to")`  | Copy a file                                                  |
| `file.rename("old", "new")`| Rename or move a file                                        |
| `file.remove("file")`      | Delete a file                                                |
| `file.exists("file")`      | Check if a file exists                                       |
| `file.info("file")`        | Get file information (size, modification time)               |
| `source("file.R")`         | Source an external R script and call its functions           |
| `unlink("folder", TRUE)`   | Delete a directory and all its contents                      |


### **1. Reading and Writing Files in R**

R provides several functions for reading from and writing to different types of files, especially those used in data analysis, such as CSV and text files.

#### **Reading Files**

1. **Reading CSV Files**  
   CSV (Comma-Separated Values) files are one of the most common file formats for data storage.

   ```r
   # Basic CSV reading
   data <- read.csv("path/to/your/file.csv")
   
   # Options
   data <- read.csv("file.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
   ```

   - `header`: Logical, if `TRUE`, the first row is treated as column names.
   - `sep`: Defines the delimiter (comma, tab, etc.).
   - `stringsAsFactors`: Prevents character strings from being converted to factors.

2. **Reading Excel Files**  
   Use the `readxl` package for reading Excel files.

   ```r
   # Install and load the package
   install.packages("readxl")
   library(readxl)
   
   # Read the Excel file
   data <- read_excel("path/to/your/file.xlsx", sheet = "Sheet1")
   ```

3. **Reading Text Files**  
   Use `read.table` for general text files.

   ```r
   data <- read.table("file.txt", header = TRUE, sep = "\t")
   ```

   - `sep` parameter can be customized based on delimiter (e.g., `sep = ","` for CSV, `sep = "\t"` for tab-separated files).

4. **Reading JSON Files**  
   JSON files can be read using the `jsonlite` package.

   ```r
   # Install and load the package
   install.packages("jsonlite")
   library(jsonlite)
   
   # Read JSON file
   data <- fromJSON("path/to/your/file.json")
   ```

#### **Writing Files**

1. **Writing CSV Files**  
   `write.csv` is commonly used for writing data to a CSV file.

   ```r
   write.csv(data, "output_file.csv", row.names = FALSE)
   ```

2. **Writing Excel Files**  
   Use the `writexl` package to write data frames to Excel files.

   ```r
   # Install and load the package
   install.packages("writexl")
   library(writexl)
   
   # Write to Excel
   write_xlsx(data, "output_file.xlsx")
   ```

3. **Writing Text Files**  
   `write.table` writes data to a general text file.

   ```r
   write.table(data, "output_file.txt", sep = "\t", row.names = FALSE, col.names = TRUE)
   ```

4. **Writing JSON Files**  
   To write JSON files, use the `jsonlite` package.

   ```r
   # Write to JSON
   toJSON(data, pretty = TRUE, auto_unbox = TRUE, file = "output_file.json")
   ```

---

### **2. Cleaning Data in R**

Cleaning data often involves removing or transforming unwanted characters, handling missing values, and formatting data for analysis.

#### **Handling Missing Data**

1. **Detecting Missing Values**  
   `NA` is used to represent missing values in R.

   ```r
   # Identify missing values
   is.na(data)
   
   # Count missing values in each column
   colSums(is.na(data))
   ```

2. **Removing Missing Data**  
   You can remove rows with missing values using `na.omit` or `drop_na` from `dplyr`.

   ```r
   # Remove rows with missing values
   data_cleaned <- na.omit(data)
   
   # Using dplyr
   library(dplyr)
   data_cleaned <- drop_na(data)
   ```

3. **Replacing Missing Values**  
   Replace missing values with a specific value, like the column mean.

   ```r
   # Replace NA in a specific column with the column mean
   data$column[is.na(data$column)] <- mean(data$column, na.rm = TRUE)
   ```

#### **Removing Unwanted Characters**

1. **Removing Whitespace**  
   Use `trimws()` to remove leading and trailing whitespace.

   ```r
   data$column <- trimws(data$column)
   ```

2. **Removing Specific Characters**  
   Use `gsub()` for substitution.

   ```r
   # Remove all digits from a column
   data$column <- gsub("[0-9]", "", data$column)
   ```

3. **Converting Case**  
   Convert strings to uppercase or lowercase.

   ```r
   data$column <- tolower(data$column) # Lowercase
   data$column <- toupper(data$column) # Uppercase
   ```

---

### **3. Character String Handling in R**

Character string manipulation is crucial for cleaning and transforming text data.

#### **Basic String Operations**

1. **Concatenating Strings**  
   Use `paste()` or `paste0()` to concatenate strings.

   ```r
   # Concatenate with space
   paste("Hello", "World") # Output: "Hello World"
   
   # Concatenate without space
   paste0("Hello", "World") # Output: "HelloWorld"
   ```

2. **Splitting Strings**  
   Use `strsplit()` to split a string into parts based on a separator.

   ```r
   # Split by space
   strsplit("Hello World", " ")
   ```

3. **Extracting Substrings**  
   Use `substr()` to extract parts of a string.

   ```r
   # Extract first 5 characters
   substr("HelloWorld", 1, 5) # Output: "Hello"
   ```

#### **Pattern Matching and Replacement**

1. **Searching for Patterns**  
   Use `grep()` or `grepl()` to find patterns.

   ```r
   # Find rows where a column contains "apple"
   rows <- grep("apple", data$column)
   ```

2. **Replacing Patterns**  
   Use `gsub()` to replace parts of a string.

   ```r
   # Replace "apple" with "orange"
   data$column <- gsub("apple", "orange", data$column)
   ```

### **4. Example Workflow**

Below is an example that combines reading, cleaning, and handling strings in R:

```r
# Step 1: Read data from a CSV file
data <- read.csv("data_file.csv", stringsAsFactors = FALSE)

# Step 2: Check for missing values and remove rows with NA
data <- na.omit(data)

# Step 3: Remove unwanted characters from a specific column
data$column <- gsub("[^a-zA-Z0-9 ]", "", data$column) # Keep only alphanumeric characters and spaces

# Step 4: Convert all text to lowercase
data$column <- tolower(data$column)

# Step 5: Save the cleaned data to a new CSV file
write.csv(data, "cleaned_data.csv", row.names = FALSE)
```

### **Summary**

| Task                                 | Function                    | Example                                                        |
|--------------------------------------|-----------------------------|----------------------------------------------------------------|
| **Read CSV**                         | `read.csv()`               | `data <- read.csv("file.csv")`                                 |
| **Write CSV**                        | `write.csv()`              | `write.csv(data, "file.csv")`                                  |
| **Check Missing Values**             | `is.na()`, `colSums()`     | `colSums(is.na(data))`                                         |
| **Remove Rows with NA**              | `na.omit()`                | `data <- na.omit(data)`                                        |
| **Replace NA with Mean**             | Subset and `mean()`        | `data$col[is.na(data$col)] <- mean(data$col, na.rm = TRUE)`    |
| **Remove Specific Characters**       | `gsub()`                   | `data$col <- gsub("[0-9]", "", data$col)`                      |
| **Concatenate Strings**              | `paste()`, `paste0()`      | `paste("Hello", "World")`                                      |
| **Split String**                     | `strsplit()`               | `strsplit("Hello World", " ")`                                 |
| **Extract Substring**                | `substr()`                 | `substr("HelloWorld", 1, 5)`                                   |
| **Find Pattern in Text**             | `grep()`, `grepl()`        | `grep("pattern", data$column)`                                 |
| **Replace Pattern in Text**          | `gsub()`                   | `data$col <- gsub("pattern", "replacement", data$col)`         |

The `sub()` function in R is used for substituting the first occurrence of a specified pattern within a character string. Unlike `gsub()`, which replaces *all* occurrences of a pattern in each element of a string, `sub()` only replaces the *first* occurrence of the pattern.

Here’s how `sub()` fits in with `gsub()` and other string manipulation functions:

---

### **Pattern Replacement with `sub()`**

- **Purpose**: Replace the first occurrence of a pattern in each element of a character vector.
- **Syntax**: `sub(pattern, replacement, x)`

  - `pattern`: A regular expression defining the pattern you want to match.
  - `replacement`: The string that will replace the matched pattern.
  - `x`: The character vector where the replacement is to be performed.

#### **Example with `sub()`**

In [23]:

# Original vector with some repeated words
text <- c("apple pie", "apple tart", "apple crisp")

# Replace only the first occurrence of "apple" in each element
result <- sub("apple", "orange", text)
print(result)
# Output: "orange pie" "orange tart" "orange crisp"


[1] "orange pie"   "orange tart"  "orange crisp"


In this example, `sub()` only replaces the first occurrence of "apple" in each string, so "apple pie" becomes "orange pie," but if "apple" appeared multiple times in any element, only the first would be replaced.

#### **Comparison with `gsub()`**

If we wanted to replace *all* occurrences of "apple" with "orange" in each element, we’d use `gsub()`:

In [24]:

result_all <- gsub("apple", "orange", text)
print(result_all)
# Output: "orange pie" "orange tart" "orange crisp"


[1] "orange pie"   "orange tart"  "orange crisp"


For this specific example, both `sub()` and `gsub()` return the same result because "apple" only appears once per element. However, in cases where multiple occurrences exist, `gsub()` would replace each instance.

---

### **Practical Usage of `sub()` and `gsub()` Together**

You might use `sub()` and `gsub()` in combination to achieve different types of substitutions. For example, you may want to change the first occurrence of a term in one field while changing all occurrences in another.

---

### **Example Workflow Using `sub()` in Data Cleaning**

In [26]:
# Example character vector with inconsistent text
text <- c("123-apple-456", "789-apple-apple", "apple-101")

In [27]:
# Using sub to replace only the first "apple" in each element
first_only <- sub("apple", "fruit", text)
print(first_only)
# Output: "123-fruit-456" "789-fruit-apple" "fruit-101"

[1] "123-fruit-456"   "789-fruit-apple" "fruit-101"      


In [28]:
# Using gsub to replace all occurrences of "apple" in each element
all_occurrences <- gsub("apple", "fruit", text)
print(all_occurrences)
# Output: "123-fruit-456" "789-fruit-fruit" "fruit-101"

[1] "123-fruit-456"   "789-fruit-fruit" "fruit-101"      


### **Summary of `sub()` vs `gsub()`**

| Function  | Description                               | Replaces                  | Example                                 |
|-----------|-------------------------------------------|---------------------------|-----------------------------------------|
| `sub()`   | Replaces the first occurrence of a pattern in each string element | First occurrence only     | `sub("pattern", "replacement", x)`      |
| `gsub()`  | Replaces all occurrences of a pattern in each string element     | Every occurrence          | `gsub("pattern", "replacement", x)`     |