# Calculating Average And Standard Deviation For Wald Test

Mean and standard deviation per row need to be calculated for all the samples as well as just for controls.

## Set Output Directory For Average And Standard Deviation Files:


Make sure the normalized counts you use are generated after removing the summary statistics.

In [1]:
# Set the output directory in a variable
output_directory <- "path/to/Average_And_Standard_Deviation_Added"

## Loading in the Normalized Counts:

First, the normalized counts must be loaded.

For DEseq the normalized counts must be loaded as a `.tsv`/`txt` with tab separated values:

In [2]:
# Read a TSV file
data <- read.table(
    "/path/to/Normalized_Counts.tsv",
                   header = TRUE,
                   sep = "\t")

# Print the first few rows of the data
head(data)

Unnamed: 0_level_0,Ensembl_ID,Ctrl.01__Control,Ctrl.02__Control,Ctrl.03__Control,Ctrl.04__Control,Ctrl.05__Control,Ctrl.06__Control,NO.01__Experimental,NO.02__Experimental,NO.03__Experimental,NO.04__Experimental,NO.05__Experimental,NO.06__Experimental
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,ENSDARG00000000002,320.939,291.2259,271.5823,360.3755,406.0819,425.8428,275.4685,312.9917,378.3274,340.817,369.1428,325.4757
2,ENSDARG00000000018,2350.4365,2637.988,2317.9232,1287.4868,1119.2563,1275.6401,2355.9802,2396.2376,2258.8071,1271.768,1238.7503,1287.1477
3,ENSDARG00000000019,4204.4044,4394.3196,3963.2066,4834.8702,4800.9907,4482.2083,3596.7964,3219.6638,3783.2737,4902.392,4674.7409,4706.812
4,ENSDARG00000000068,442.4595,471.7461,405.2681,1358.9579,1362.2305,1372.8946,413.2027,416.2004,458.4557,1536.195,1501.3942,1616.0951
5,ENSDARG00000000069,2500.0003,2521.2982,2326.3444,2220.6379,2430.8671,2415.3126,2291.9459,2499.4464,2591.4918,2949.83,2632.8448,2706.2216
6,ENSDARG00000000086,6870.5867,7999.7361,7090.6134,6553.197,7299.3507,6990.0543,7755.4035,8111.9823,7291.6769,7471.111,6695.017,7242.9183


The colnames are below for reference. 

In [3]:
colnames(data)

## Calculating The Average And Standard Deviation For All Samples:

### Selecting For All The Numeric Data Columns:

Select for all the numeric columns (all columns except the Ensembl_ID column)

In [4]:

# Select the numeric columns (columns 2 to 13)
numeric_columns <- data[, 2:13]

head(numeric_columns)

Unnamed: 0_level_0,Ctrl.01__Control,Ctrl.02__Control,Ctrl.03__Control,Ctrl.04__Control,Ctrl.05__Control,Ctrl.06__Control,NO.01__Experimental,NO.02__Experimental,NO.03__Experimental,NO.04__Experimental,NO.05__Experimental,NO.06__Experimental
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,320.939,291.2259,271.5823,360.3755,406.0819,425.8428,275.4685,312.9917,378.3274,340.817,369.1428,325.4757
2,2350.4365,2637.988,2317.9232,1287.4868,1119.2563,1275.6401,2355.9802,2396.2376,2258.8071,1271.768,1238.7503,1287.1477
3,4204.4044,4394.3196,3963.2066,4834.8702,4800.9907,4482.2083,3596.7964,3219.6638,3783.2737,4902.392,4674.7409,4706.812
4,442.4595,471.7461,405.2681,1358.9579,1362.2305,1372.8946,413.2027,416.2004,458.4557,1536.195,1501.3942,1616.0951
5,2500.0003,2521.2982,2326.3444,2220.6379,2430.8671,2415.3126,2291.9459,2499.4464,2591.4918,2949.83,2632.8448,2706.2216
6,6870.5867,7999.7361,7090.6134,6553.197,7299.3507,6990.0543,7755.4035,8111.9823,7291.6769,7471.111,6695.017,7242.9183


Now, calculate the row standard deviation and the row average.

In [5]:

# Calculate standard deviation per row for the numeric columns
data$Row_Standard_Deviation <- apply(numeric_columns, 1, sd)

# Calculate row average for the numeric columns
data$Row_Average <- apply(numeric_columns, 1, mean)

# Print the updated data frame
head(data)

Unnamed: 0_level_0,Ensembl_ID,Ctrl.01__Control,Ctrl.02__Control,Ctrl.03__Control,Ctrl.04__Control,Ctrl.05__Control,Ctrl.06__Control,NO.01__Experimental,NO.02__Experimental,NO.03__Experimental,NO.04__Experimental,NO.05__Experimental,NO.06__Experimental,Row_Standard_Deviation,Row_Average
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,ENSDARG00000000002,320.939,291.2259,271.5823,360.3755,406.0819,425.8428,275.4685,312.9917,378.3274,340.817,369.1428,325.4757,49.54119,339.8559
2,ENSDARG00000000018,2350.4365,2637.988,2317.9232,1287.4868,1119.2563,1275.6401,2355.9802,2396.2376,2258.8071,1271.768,1238.7503,1287.1477,603.27901,1816.4518
3,ENSDARG00000000019,4204.4044,4394.3196,3963.2066,4834.8702,4800.9907,4482.2083,3596.7964,3219.6638,3783.2737,4902.392,4674.7409,4706.812,547.64475,4296.9732
4,ENSDARG00000000068,442.4595,471.7461,405.2681,1358.9579,1362.2305,1372.8946,413.2027,416.2004,458.4557,1536.195,1501.3942,1616.0951,539.77782,946.2583
5,ENSDARG00000000069,2500.0003,2521.2982,2326.3444,2220.6379,2430.8671,2415.3126,2291.9459,2499.4464,2591.4918,2949.83,2632.8448,2706.2216,198.66921,2507.1867
6,ENSDARG00000000086,6870.5867,7999.7361,7090.6134,6553.197,7299.3507,6990.0543,7755.4035,8111.9823,7291.6769,7471.111,6695.017,7242.9183,489.21309,7280.9706


## Calculating The Average And Standard Deviation For Control Samples Alone:

### Selecting For Control Data Alone:

First, I have to select for the numeric columns that are from control samples alone:

In [6]:

# Select the numeric columns (columns 2 to 7)
numeric_column_controls <- data[, 2:7]

head(numeric_column_controls)


Unnamed: 0_level_0,Ctrl.01__Control,Ctrl.02__Control,Ctrl.03__Control,Ctrl.04__Control,Ctrl.05__Control,Ctrl.06__Control
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,320.939,291.2259,271.5823,360.3755,406.0819,425.8428
2,2350.4365,2637.988,2317.9232,1287.4868,1119.2563,1275.6401
3,4204.4044,4394.3196,3963.2066,4834.8702,4800.9907,4482.2083
4,442.4595,471.7461,405.2681,1358.9579,1362.2305,1372.8946
5,2500.0003,2521.2982,2326.3444,2220.6379,2430.8671,2415.3126
6,6870.5867,7999.7361,7090.6134,6553.197,7299.3507,6990.0543


### Calculating Mean And Standard Deviation For Controls:

In [7]:

# Calculate row standard deviation for columns 2 to 7
data$Controls_Row_Standard_Deviation <- apply(numeric_column_controls, 1, sd)

# Calculate row mean for columns 2 to 7
data$Controls_Row_Mean <- apply(numeric_column_controls, 1, mean)

# Print the updated data frame
head(data)


Unnamed: 0_level_0,Ensembl_ID,Ctrl.01__Control,Ctrl.02__Control,Ctrl.03__Control,Ctrl.04__Control,Ctrl.05__Control,Ctrl.06__Control,NO.01__Experimental,NO.02__Experimental,NO.03__Experimental,NO.04__Experimental,NO.05__Experimental,NO.06__Experimental,Row_Standard_Deviation,Row_Average,Controls_Row_Standard_Deviation,Controls_Row_Mean
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,ENSDARG00000000002,320.939,291.2259,271.5823,360.3755,406.0819,425.8428,275.4685,312.9917,378.3274,340.817,369.1428,325.4757,49.54119,339.8559,62.22158,346.0079
2,ENSDARG00000000018,2350.4365,2637.988,2317.9232,1287.4868,1119.2563,1275.6401,2355.9802,2396.2376,2258.8071,1271.768,1238.7503,1287.1477,603.27901,1816.4518,673.5796,1831.4551
3,ENSDARG00000000019,4204.4044,4394.3196,3963.2066,4834.8702,4800.9907,4482.2083,3596.7964,3219.6638,3783.2737,4902.392,4674.7409,4706.812,547.64475,4296.9732,338.43039,4446.6666
4,ENSDARG00000000068,442.4595,471.7461,405.2681,1358.9579,1362.2305,1372.8946,413.2027,416.2004,458.4557,1536.195,1501.3942,1616.0951,539.77782,946.2583,507.03104,902.2595
5,ENSDARG00000000069,2500.0003,2521.2982,2326.3444,2220.6379,2430.8671,2415.3126,2291.9459,2499.4464,2591.4918,2949.83,2632.8448,2706.2216,198.66921,2507.1867,112.66024,2402.4101
6,ENSDARG00000000086,6870.5867,7999.7361,7090.6134,6553.197,7299.3507,6990.0543,7755.4035,8111.9823,7291.6769,7471.111,6695.017,7242.9183,489.21309,7280.9706,491.15401,7133.923


## Write To File:

In this case, I am writing to a `.tsv` and a `.csv`:

In [8]:
# Assuming you have a normalized counts data frame named normalized_counts_df

# Write to a CSV file in the specified directory
write.csv(data, file.path(output_directory, "Average_And_Standard_Deviation_Using_Normalized_Counts_DEseq2.csv"), row.names = FALSE)

# Write to a TSV file in the specified directory
write.table(data, file.path(output_directory, "Average_And_Standard_Deviation_Using_Normalized_Counts_DEseq2.tsv"), sep = "\t", row.names = FALSE)


## Session Information:

In [9]:
sessionInfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /mnt/mfs/cluster/bin/R-4.2.2.10/lib/libRblas.so
LAPACK: /mnt/mfs/cluster/bin/R-4.2.2.10/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] fansi_1.0.4     crayon_1.5.2    digest_0.6.33   utf8_1.2.3     
 [5] IRdisplay_1.1   repr_1.1.6      lifecycle_1.0.3 jsonlite_1.8.7 
 [9] evaluate_0.21   pillar_1.9.0    rlang_1.1.1     cli_3.6.1      
[13] uuid_1.1-1      vctrs_0.6.3   