Changing housekeeping genes seems effectless #10

mcanouil · 2019-09-17T09:00:46Z

NACHO still guess housekeeping genes even when a list of housekeeping genes is provided.

library(GEOquery)
gse <- getGEO("GSE70970")
targets <- pData(phenoData(gse[[1]]))
getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir())
untar(
  tarfile = paste0(tempdir(), "/GSE70970/GSE70970_RAW.tar"), 
  exdir = paste0(tempdir(), "/GSE70970/Data")
)
# Add IDs
targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))

library(NACHO)
GSE70970_sum <- summarise(
  data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
  ssheet_csv = targets, # The samplesheet
  id_colname = "IDFILE", # Name of the column that contains the identfiers
  housekeeping_genes = NULL, # Custom list of housekeeping genes
  housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
  normalisation_method = "GEO", # Geometric mean or GLM
  n_comp = 5 # Number indicating the number of principal components to compute. 
)
#> [NACHO] Importing RCC files.
#> [NACHO] Performing QC and formatting data.
#> [NACHO] Searching for the best housekeeping genes.
#> [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
#> [NACHO] The following predicted housekeeping genes will be used for normalisation:
#>   - hsa-miR-103
#>   - hsa-let-7e
#>   - hsa-miR-1260
#>   - hsa-miR-500+hsa-miR-501-5p
#>   - hsa-miR-1274b
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Missing values have been replaced with zeros for PCA.
#> [NACHO] Normalising data using "GEO" method with housekeeping genes.
#> [NACHO] Returning a list.
#>   $ access              : character
#>   $ housekeeping_genes  : character
#>   $ housekeeping_predict: logical
#>   $ housekeeping_norm   : logical
#>   $ normalisation_method: character
#>   $ remove_outliers     : logical
#>   $ n_comp              : numeric
#>   $ data_directory      : character
#>   $ pc_sum              : data.frame
#>   $ nacho               : data.frame
#>   $ outliers_thresholds : list
#>   $ raw_counts          : data.frame
#>   $ normalised_counts   : data.frame

unlink(paste0(tempdir(), "/GSE70970"), recursive = TRUE)

my_housekeeping <- GSE70970_sum[["housekeeping_genes"]][-c(1, 2)]

GSE70970_norm <- normalise(
  nacho_object = GSE70970_sum,
  housekeeping_genes = my_housekeeping,
  housekeeping_norm = TRUE,
  normalisation_method = "GEO", 
  remove_outliers = TRUE
)
#> [NACHO] Normalising "GSE70970_sum" with new value for parameters:
#>   - housekeeping_genes = TRUE
#>   - remove_outliers = TRUE
#> [NACHO] Searching for the best housekeeping genes.
#> [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
#> [NACHO] The following predicted housekeeping genes will be used for normalisation:
#>   - hsa-let-7e
#>   - hsa-miR-1260
#>   - hsa-miR-1274b
#>   - hsa-miR-103
#>   - hsa-miR-16
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Missing values have been replaced with zeros for PCA.
#> [NACHO] Returning a list.
#>   $ access              : character
#>   $ housekeeping_genes  : character
#>   $ housekeeping_predict: logical
#>   $ housekeeping_norm   : logical
#>   $ normalisation_method: character
#>   $ remove_outliers     : logical
#>   $ n_comp              : numeric
#>   $ data_directory      : character
#>   $ pc_sum              : data.frame
#>   $ nacho               : data.frame
#>   $ outliers_thresholds : list
#>   $ raw_counts          : data.frame
#>   $ normalised_counts   : data.frame


GSE70970_sum[["housekeeping_genes"]]
#> [1] "hsa-miR-103"                "hsa-let-7e"                
#> [3] "hsa-miR-1260"               "hsa-miR-500+hsa-miR-501-5p"
#> [5] "hsa-miR-1274b"
my_housekeeping
#> [1] "hsa-miR-1260"               "hsa-miR-500+hsa-miR-501-5p"
#> [3] "hsa-miR-1274b"
GSE70970_norm[["housekeeping_genes"]]
#> [1] "hsa-let-7e"    "hsa-miR-1260"  "hsa-miR-1274b" "hsa-miR-103"  
#> [5] "hsa-miR-16"

mcanouil · 2019-10-03T13:09:25Z

housekeeping_predict need to be set explictly to FALSEto avoid new housekeeping prediction.

GSE70970_norm <- normalise(
  nacho_object = GSE70970_sum,
  housekeeping_genes = my_housekeeping,
  housekeeping_predict = FALSE,
  housekeeping_norm = TRUE,
  normalisation_method = "GEO", 
  remove_outliers = TRUE
)

mcanouil added the bug label Sep 17, 2019

mcanouil added this to the CRAN release v0.6.0 milestone Sep 17, 2019

mcanouil self-assigned this Sep 17, 2019

mcanouil closed this as completed in a80a42e Oct 3, 2019

mcanouil added a commit that referenced this issue Sep 15, 2021

Fix #10

1574e6f

mcanouil added a commit that referenced this issue Dec 5, 2022

Fix #10

12251e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing housekeeping genes seems effectless #10

Changing housekeeping genes seems effectless #10

mcanouil commented Sep 17, 2019

mcanouil commented Oct 3, 2019

Changing housekeeping genes seems effectless #10

Changing housekeeping genes seems effectless #10

Comments

mcanouil commented Sep 17, 2019

mcanouil commented Oct 3, 2019