Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing housekeeping genes seems effectless #10

Closed
mcanouil opened this issue Sep 17, 2019 · 1 comment
Closed

Changing housekeeping genes seems effectless #10

mcanouil opened this issue Sep 17, 2019 · 1 comment
Assignees

Comments

@mcanouil
Copy link
Owner

NACHO still guess housekeeping genes even when a list of housekeeping genes is provided.


library(GEOquery)
gse <- getGEO("GSE70970")
targets <- pData(phenoData(gse[[1]]))
getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir())
untar(
  tarfile = paste0(tempdir(), "/GSE70970/GSE70970_RAW.tar"), 
  exdir = paste0(tempdir(), "/GSE70970/Data")
)
# Add IDs
targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))

library(NACHO)
GSE70970_sum <- summarise(
  data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
  ssheet_csv = targets, # The samplesheet
  id_colname = "IDFILE", # Name of the column that contains the identfiers
  housekeeping_genes = NULL, # Custom list of housekeeping genes
  housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
  normalisation_method = "GEO", # Geometric mean or GLM
  n_comp = 5 # Number indicating the number of principal components to compute. 
)
#> [NACHO] Importing RCC files.
#> [NACHO] Performing QC and formatting data.
#> [NACHO] Searching for the best housekeeping genes.
#> [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
#> [NACHO] The following predicted housekeeping genes will be used for normalisation:
#>   - hsa-miR-103
#>   - hsa-let-7e
#>   - hsa-miR-1260
#>   - hsa-miR-500+hsa-miR-501-5p
#>   - hsa-miR-1274b
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Missing values have been replaced with zeros for PCA.
#> [NACHO] Normalising data using "GEO" method with housekeeping genes.
#> [NACHO] Returning a list.
#>   $ access              : character
#>   $ housekeeping_genes  : character
#>   $ housekeeping_predict: logical
#>   $ housekeeping_norm   : logical
#>   $ normalisation_method: character
#>   $ remove_outliers     : logical
#>   $ n_comp              : numeric
#>   $ data_directory      : character
#>   $ pc_sum              : data.frame
#>   $ nacho               : data.frame
#>   $ outliers_thresholds : list
#>   $ raw_counts          : data.frame
#>   $ normalised_counts   : data.frame

unlink(paste0(tempdir(), "/GSE70970"), recursive = TRUE)

my_housekeeping <- GSE70970_sum[["housekeeping_genes"]][-c(1, 2)]

GSE70970_norm <- normalise(
  nacho_object = GSE70970_sum,
  housekeeping_genes = my_housekeeping,
  housekeeping_norm = TRUE,
  normalisation_method = "GEO", 
  remove_outliers = TRUE
)
#> [NACHO] Normalising "GSE70970_sum" with new value for parameters:
#>   - housekeeping_genes = TRUE
#>   - remove_outliers = TRUE
#> [NACHO] Searching for the best housekeeping genes.
#> [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
#> [NACHO] The following predicted housekeeping genes will be used for normalisation:
#>   - hsa-let-7e
#>   - hsa-miR-1260
#>   - hsa-miR-1274b
#>   - hsa-miR-103
#>   - hsa-miR-16
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Missing values have been replaced with zeros for PCA.
#> [NACHO] Returning a list.
#>   $ access              : character
#>   $ housekeeping_genes  : character
#>   $ housekeeping_predict: logical
#>   $ housekeeping_norm   : logical
#>   $ normalisation_method: character
#>   $ remove_outliers     : logical
#>   $ n_comp              : numeric
#>   $ data_directory      : character
#>   $ pc_sum              : data.frame
#>   $ nacho               : data.frame
#>   $ outliers_thresholds : list
#>   $ raw_counts          : data.frame
#>   $ normalised_counts   : data.frame


GSE70970_sum[["housekeeping_genes"]]
#> [1] "hsa-miR-103"                "hsa-let-7e"                
#> [3] "hsa-miR-1260"               "hsa-miR-500+hsa-miR-501-5p"
#> [5] "hsa-miR-1274b"
my_housekeeping
#> [1] "hsa-miR-1260"               "hsa-miR-500+hsa-miR-501-5p"
#> [3] "hsa-miR-1274b"
GSE70970_norm[["housekeeping_genes"]]
#> [1] "hsa-let-7e"    "hsa-miR-1260"  "hsa-miR-1274b" "hsa-miR-103"  
#> [5] "hsa-miR-16"
@mcanouil mcanouil added the bug label Sep 17, 2019
@mcanouil mcanouil added this to the CRAN release v0.6.0 milestone Sep 17, 2019
@mcanouil mcanouil self-assigned this Sep 17, 2019
@mcanouil
Copy link
Owner Author

mcanouil commented Oct 3, 2019

housekeeping_predict need to be set explictly to FALSEto avoid new housekeeping prediction.

GSE70970_norm <- normalise(
  nacho_object = GSE70970_sum,
  housekeeping_genes = my_housekeeping,
  housekeeping_predict = FALSE,
  housekeeping_norm = TRUE,
  normalisation_method = "GEO", 
  remove_outliers = TRUE
)

mcanouil added a commit that referenced this issue Sep 15, 2021
mcanouil added a commit that referenced this issue Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant