# Calculating Average And Standard Deviation For Wald Test

Mean and standard deviation per row need to be calculated for all the samples as well as just for controls.

## Set Output Directory For Average And Standard Deviation Files:

The output directory for the average and standard deviation files is located in a subfolder of the parent directory called `"4___Calculating_Average_And_Standard_Deviation`:

Make sure the normalized counts you use are generated after removing the summary statistics.

In [None]:
# Specify the parent directory path

parent_directory <- "/path/to/Treatment/Only/Folder/Of/Your/Choice/that/contains/previous/steps/folders"


## Create The Associated SubDirectory:

In [None]:

# Create the output directory path
output_directory <- file.path(parent_directory, "4___Calculating_Average_And_Standard_Deviation")

# Check if the output directory already exists, if not, create it
if (!dir.exists(output_directory)) {
  dir.create(output_directory)
  cat("Output directory created:", output_directory, "\n")
} else {
  cat("Output directory already exists:", output_directory, "\n")
}


## Loading in the Normalized Counts:

First, the normalized counts must be loaded.

For DEseq the normalized counts must be loaded as a `.tsv`/`txt` with tab separated values.

The normalized
The file I used is called `Normalized_Counts.tsv` (it is a tab separated file)

In [None]:
# Construct the path to Normalized_Counts.tsv within the 2___Normalized_Counts_DEseq2 folder
data <- file.path(parent_directory, "2___Normalized_Counts_DEseq2", "Normalized_Counts.tsv")

# Read the data from the Normalized_Counts.tsv file
normalized_counts <- read.table(data, header = TRUE, sep = "\t", stringsAsFactors = FALSE)

data <- normalized_counts

# Print the first few rows of the data
head(data)

The colnames are below for reference. 

In [None]:
colnames(data)

## Calculating The Average And Standard Deviation For All Samples:

### Selecting For All The Numeric Data Columns:

Select for all the numeric columns (all columns except the Ensembl_ID column)

In [None]:

# Select the numeric columns (columns 2 to 13)
numeric_columns <- data[, 2:7]

head(numeric_columns)

Now, calculate the row standard deviation and the row average.

In [None]:

# Calculate standard deviation per row for the numeric columns
data$Row_Standard_Deviation <- apply(numeric_columns, 1, sd)

# Calculate row average for the numeric columns
data$Row_Average <- apply(numeric_columns, 1, mean)

# Print the updated data frame
head(data)

## Calculating The Average And Standard Deviation For Control Samples Alone:

### Selecting For Control Data Alone:

First, I have to select for the numeric columns that are from control samples alone:

In [None]:

# Select the numeric columns (columns 2 to 7)
numeric_column_controls <- data[, 2:7]

head(numeric_column_controls)


### Calculating Mean And Standard Deviation For Controls:

In [None]:

# Calculate row standard deviation for columns 2 to 7
data$Controls_Row_Standard_Deviation <- apply(numeric_column_controls, 1, sd)

# Calculate row mean for columns 2 to 7
data$Controls_Row_Mean <- apply(numeric_column_controls, 1, mean)

# Print the updated data frame
head(data)


## Write To File:

In this case, I am writing to a `.tsv` and a `.csv`:

In [None]:
# Assuming you have a normalized counts data frame named normalized_counts_df

# Write to a CSV file in the specified directory
write.csv(data, file.path(output_directory, "Average_And_Standard_Deviation_Using_Normalized_Counts_DEseq2.csv"), row.names = FALSE)

# Write to a TSV file in the specified directory
write.table(data, file.path(output_directory, "Average_And_Standard_Deviation_Using_Normalized_Counts_DEseq2.tsv"), sep = "\t", row.names = FALSE)


## Session Information:

In [None]:
sessionInfo()