This notebook renames all of the Editing/Non-editing and proprtion samples so we can merge the samples into one dataframe and be able to distinguish by sample.

In [1]:
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.0     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [2]:
input_directory_path <- '/mnt/vast/hpc/csg/hcs2152/ZFR_RNA_Editing/JACUSA2/all_dpf/all_dpf_singles/Edited'
output_directory_path <- '/mnt/vast/hpc/csg/hcs2152/ZFR_RNA_Editing/JACUSA2/all_dpf/all_dpf_singles/Edited'

In [3]:
# Get a list of files in the input directory
file_list <- list.files(input_directory_path, pattern = "\\___Edited_Column_Added.tsv$", full.names = TRUE)

In [4]:
# Function to process each file
process_file <- function(file_path, output_directory) {
  # Print the file path being processed
  cat("Processing file:", file_path, "\n")
  
  # Read the TSV file
  df <- read.table(file_path, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
  
  # Extract base name before the first underscore
  base_name <- tools::file_path_sans_ext(basename(file_path))
  
  cat("Head of the DataFrame for file:", base_name, "\n")
  print(head(df))
  
  # Extract the prefix before the second underscore
  parts <- unlist(strsplit(base_name, "_"))
  prefix <- paste(parts[1:2], collapse = "_")
  
  # Print the extracted prefix
  cat("Extracted Prefix for file:", base_name, "is", prefix, "\n")
  
  # Add the base name to specified column names
  colnames(df)[2:4] <- paste0(prefix, "_", colnames(df)[2:4])

  # Print the processed data frame
  cat("Processed DataFrame for file:", base_name, "\n")
  print(head(df))
  
  # Write the processed table to a new TSV file in the subfolder
  output_file_path <- file.path(output_directory_path, paste0(base_name, "___Renamed.tsv"))
  write.table(df, file = output_file_path, sep = "\t", quote = FALSE, row.names = FALSE)
}

In [5]:
# Process each file in the directory
for (file_path in file_list) {
  process_file(file_path)
}

Processing file: /mnt/vast/hpc/csg/hcs2152/ZFR_RNA_Editing/JACUSA2/all_dpf/all_dpf_singles/Edited/Ctrl-01_bases___Edited_Column_Added.tsv 
Head of the DataFrame for file: Ctrl-01_bases___Edited_Column_Added 
      ID Non_Edited_Count Edited_Count Edited_Count_Proportion
1 1_5476               42            0               0.0000000
2 1_5479               11           31               0.7380952
3 1_5504               47            0               0.0000000
4 1_5505               48            0               0.0000000
5 1_5510               45            0               0.0000000
6 1_5523               48            0               0.0000000
Extracted Prefix for file: Ctrl-01_bases___Edited_Column_Added is Ctrl-01_bases 
Processed DataFrame for file: Ctrl-01_bases___Edited_Column_Added 
      ID Ctrl-01_bases_Non_Edited_Count Ctrl-01_bases_Edited_Count
1 1_5476                             42                          0
2 1_5479                             11                         31
3 