Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example in vignette error #12

Closed
sheucke opened this issue Nov 15, 2019 · 4 comments
Closed

example in vignette error #12

sheucke opened this issue Nov 15, 2019 · 4 comments
Assignees

Comments

@sheucke
Copy link

sheucke commented Nov 15, 2019

if I follow the example in the vignette I encounter this error:

Add IDs

targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))
library(NACHO)

Attaching package: 'NACHO'

The following object is masked from 'package:BiocGenerics':

normalize

GSE70970_sum <- summarise(

  • data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
  • ssheet_csv = targets, # The samplesheet
  • id_colname = "IDFILE", # Name of the column that contains the identfiers
  • housekeeping_genes = NULL, # Custom list of housekeeping genes
  • housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
  • normalisation_method = "GEO", # Geometric mean or GLM
  • n_comp = 5 # Number indicating the number of principal components to compute.
  • )
    [NACHO] Importing RCC files.
    Error: Column cols must be length 1 (the number of rows), not 3
@mcanouil
Copy link
Owner

mcanouil commented Nov 15, 2019

Hi,

I can't replicate your error.
And the vignette successfully compiled as you can see on the website

Below is a full reproducible example of the code you mentionned, as you can see I don't have your error. Please check the session information in the end.

library(GEOquery)
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter,
#>     Find, get, grep, grepl, intersect, is.unsorted, lapply, Map,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, Position, rank, rbind, Reduce, rownames, sapply,
#>     setdiff, sort, table, tapply, union, unique, unsplit, which,
#>     which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Setting options('download.file.method.GEOquery'='auto')
#> Setting options('GEOquery.inmemory.gpl'=FALSE)
# Download data
gse <- getGEO("GSE70970")
#> Found 1 file(s)
#> GSE70970_series_matrix.txt.gz
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   ID_REF = col_character()
#> )
#> See spec(...) for full column specifications.
#> File stored at:
#> /tmp/RtmpKA2y6S/GPL20699.soft
# Get phenotypes
targets <- pData(phenoData(gse[[1]]))
getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir())
#>                                                                    size
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       1986560
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz     672
#>                                                                 isdir mode
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       FALSE  644
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz FALSE  644
#>                                                                               mtime
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       2019-11-15 11:25:23
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz 2019-11-15 11:25:24
#>                                                                               ctime
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       2019-11-15 11:25:23
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz 2019-11-15 11:25:24
#>                                                                               atime
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       2019-11-15 11:25:21
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz 2019-11-15 11:25:23
#>                                                                  uid gid
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       1738  50
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz 1738  50
#>                                                                    uname
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       mcanouil
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz mcanouil
#>                                                                 grname
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                        staff
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz  staff
# Unzip data
untar(
  tarfile = paste0(tempdir(), "/GSE70970/GSE70970_RAW.tar"), 
  exdir = paste0(tempdir(), "/GSE70970/Data")
)
# Add IDs
targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))

library(NACHO)
#> 
#> Attaching package: 'NACHO'
#> The following object is masked from 'package:BiocGenerics':
#> 
#>     normalize
GSE70970_sum <- summarise(
  data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
  ssheet_csv = targets, # The samplesheet
  id_colname = "IDFILE", # Name of the column that contains the identfiers
  housekeeping_genes = NULL, # Custom list of housekeeping genes
  housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
  normalisation_method = "GEO", # Geometric mean or GLM
  n_comp = 5 # Number indicating the number of principal components to compute. 
)
#> [NACHO] Importing RCC files.
#> [NACHO] Performing QC and formatting data.
#> [NACHO] Searching for the best housekeeping genes.
#> [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
#> [NACHO] The following predicted housekeeping genes will be used for normalisation:
#>   - hsa-miR-103
#>   - hsa-let-7e
#>   - hsa-miR-1260
#>   - hsa-miR-500+hsa-miR-501-5p
#>   - hsa-miR-1274b
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Missing values have been replaced with zeros for PCA.
#> [NACHO] Normalising data using "GEO" method with housekeeping genes.
#> [NACHO] Returning a list.
#>   $ access              : character
#>   $ housekeeping_genes  : character
#>   $ housekeeping_predict: logical
#>   $ housekeeping_norm   : logical
#>   $ normalisation_method: character
#>   $ remove_outliers     : logical
#>   $ n_comp              : numeric
#>   $ data_directory      : character
#>   $ pc_sum              : data.frame
#>   $ nacho               : data.frame
#>   $ outliers_thresholds : list
#>   $ raw_counts          : data.frame
#>   $ normalised_counts   : data.frame

sessioninfo::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Debian GNU/Linux 9 (stretch)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_GB.UTF-8                 
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Etc/UTC                     
#>  date     2019-11-15                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package      * version date       lib source        
#>  assertthat     0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
#>  backports      1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
#>  Biobase      * 2.44.0  2019-05-02 [1] Bioconductor  
#>  BiocGenerics * 0.30.0  2019-05-02 [1] Bioconductor  
#>  cli            1.1.0   2019-03-19 [1] CRAN (R 3.6.1)
#>  colorspace     1.4-1   2019-03-18 [1] CRAN (R 3.6.1)
#>  crayon         1.3.4   2017-09-16 [1] CRAN (R 3.6.1)
#>  curl           4.2     2019-09-24 [1] CRAN (R 3.6.1)
#>  digest         0.6.21  2019-09-20 [1] CRAN (R 3.6.1)
#>  dplyr          0.8.3   2019-07-04 [1] CRAN (R 3.6.1)
#>  ellipsis       0.3.0   2019-09-20 [1] CRAN (R 3.6.1)
#>  evaluate       0.14    2019-05-28 [1] CRAN (R 3.6.1)
#>  GEOquery     * 2.52.0  2019-05-02 [1] Bioconductor  
#>  ggplot2        3.2.1   2019-08-10 [1] CRAN (R 3.6.1)
#>  glue           1.3.1   2019-03-12 [1] CRAN (R 3.6.1)
#>  gtable         0.3.0   2019-03-25 [1] CRAN (R 3.6.1)
#>  highr          0.8     2019-03-20 [1] CRAN (R 3.6.1)
#>  hms            0.5.1   2019-08-23 [1] CRAN (R 3.6.1)
#>  htmltools      0.4.0   2019-10-04 [1] CRAN (R 3.6.1)
#>  knitr          1.25    2019-09-18 [1] CRAN (R 3.6.1)
#>  lazyeval       0.2.2   2019-03-15 [1] CRAN (R 3.6.1)
#>  lifecycle      0.1.0   2019-08-01 [1] CRAN (R 3.6.1)
#>  limma          3.40.6  2019-07-26 [1] Bioconductor  
#>  magrittr       1.5     2014-11-22 [1] CRAN (R 3.6.1)
#>  munsell        0.5.0   2018-06-12 [1] CRAN (R 3.6.1)
#>  NACHO        * 0.6.1   2019-10-12 [1] CRAN (R 3.6.1)
#>  pillar         1.4.2   2019-06-29 [1] CRAN (R 3.6.1)
#>  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 3.6.1)
#>  purrr          0.3.3   2019-10-18 [1] CRAN (R 3.6.1)
#>  R6             2.4.0   2019-02-14 [1] CRAN (R 3.6.1)
#>  Rcpp           1.0.2   2019-07-25 [1] CRAN (R 3.6.1)
#>  readr          1.3.1   2018-12-21 [1] CRAN (R 3.6.1)
#>  rlang          0.4.0   2019-06-25 [1] CRAN (R 3.6.1)
#>  rmarkdown      1.16    2019-10-01 [1] CRAN (R 3.6.1)
#>  scales         1.0.0   2018-08-09 [1] CRAN (R 3.6.1)
#>  sessioninfo    1.1.1   2018-11-05 [1] CRAN (R 3.6.1)
#>  stringi        1.4.3   2019-03-12 [1] CRAN (R 3.6.1)
#>  stringr        1.4.0   2019-02-10 [1] CRAN (R 3.6.1)
#>  tibble         2.1.3   2019-06-06 [1] CRAN (R 3.6.1)
#>  tidyr          1.0.0   2019-09-11 [1] CRAN (R 3.6.1)
#>  tidyselect     0.2.5   2018-10-11 [1] CRAN (R 3.6.1)
#>  vctrs          0.2.0   2019-07-05 [1] CRAN (R 3.6.1)
#>  withr          2.1.2   2018-03-15 [1] CRAN (R 3.6.1)
#>  xfun           0.10    2019-10-01 [1] CRAN (R 3.6.1)
#>  xml2           1.2.2   2019-08-09 [1] CRAN (R 3.6.1)
#>  yaml           2.2.0   2018-07-25 [1] CRAN (R 3.6.1)
#>  zeallot        0.1.0   2018-01-28 [1] CRAN (R 3.6.1)
#> 
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/local/lib/R/library

@sheucke
Copy link
Author

sheucke commented Nov 15, 2019

I restarted R and tried again now it worked, sry dont know what went wrong the first time.

best regards
Sebastian

library(GEOquery)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply,
parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq,
Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)

gse <- getGEO("GSE70970")
Found 1 file(s)
GSE70970_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE70nnn/GSE70970/matrix/GSE70970_series_matrix.txt.gz'
Content type 'application/x-gzip' length 351607 bytes (343 KB)
==================================================
downloaded 343 KB

Parsed with column specification:
cols(
.default = col_double(),
ID_REF = col_character()
)
See spec(...) for full column specifications.
File stored at:
/tmp/RtmpQb9ReH/GPL20699.soft

targets <- pData(phenoData(gse[[1]]))
getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir())
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE70nnn/GSE70970/suppl//GSE70970_RAW.tar?tool=geoquery'
Content type 'application/x-tar' length 1986560 bytes (1.9 MB)
==================================================
downloaded 1.9 MB

trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE70nnn/GSE70970/suppl//GSE70970_characteristics_readme.txt.gz?tool=geoquery'
Content type 'application/x-gzip' length 672 bytes

downloaded 672 bytes

                                                               size isdir mode               mtime               ctime

/tmp/RtmpQb9ReH/GSE70970/GSE70970_RAW.tar 1986560 FALSE 664 2019-11-15 11:31:34 2019-11-15 11:31:34
/tmp/RtmpQb9ReH/GSE70970/GSE70970_characteristics_readme.txt.gz 672 FALSE 664 2019-11-15 11:31:35 2019-11-15 11:31:35
atime uid gid uname grname
/tmp/RtmpQb9ReH/GSE70970/GSE70970_RAW.tar 2019-11-15 11:31:32 1000 1000 sebastian sebastian
/tmp/RtmpQb9ReH/GSE70970/GSE70970_characteristics_readme.txt.gz 2019-11-15 11:31:34 1000 1000 sebastian sebastian

untar(

  • tarfile = paste0(tempdir(), "/GSE70970/GSE70970_RAW.tar"),
  • exdir = paste0(tempdir(), "/GSE70970/Data")
  • )

targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))
library(NACHO)

Attaching package: ‘NACHO’

The following object is masked from ‘package:BiocGenerics’:

normalize

library(NACHO)
GSE70970_sum <- summarise(

  • data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
  • ssheet_csv = targets, # The samplesheet
  • id_colname = "IDFILE", # Name of the column that contains the identfiers
  • housekeeping_genes = NULL, # Custom list of housekeeping genes
  • housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
  • normalisation_method = "GEO", # Geometric mean or GLM
  • n_comp = 5 # Number indicating the number of principal components to compute.
  • )
    [NACHO] Importing RCC files.
    |========================================================================================================|100% ~0 s remaining
    [NACHO] Performing QC and formatting data.
    [NACHO] Searching for the best housekeeping genes.
    [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
    [NACHO] The following predicted housekeeping genes will be used for normalisation:
    • hsa-miR-103
    • hsa-let-7e
    • hsa-miR-1260
    • hsa-miR-500+hsa-miR-501-5p
    • hsa-miR-1274b
      [NACHO] Computing normalisation factors using "GEO" method.
      [NACHO] Missing values have been replaced with zeros for PCA.
      [NACHO] Normalising data using "GEO" method with housekeeping genes.
      [NACHO] Returning a list.
      $ access : character
      $ housekeeping_genes : character
      $ housekeeping_predict: logical
      $ housekeeping_norm : logical
      $ normalisation_method: character
      $ remove_outliers : logical
      $ n_comp : numeric
      $ data_directory : character
      $ pc_sum : data.frame
      $ nacho : data.frame
      $ outliers_thresholds : list
      $ raw_counts : data.frame
      $ normalised_counts : data.frame

sessioninfo::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.1 (2019-07-05)
os Ubuntu 18.04.3 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Berlin
date 2019-11-15

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
Biobase * 2.44.0 2019-05-02 [1] Bioconductor
BiocGenerics * 0.30.0 2019-05-02 [1] Bioconductor
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.1)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
curl 4.2 2019-09-24 [1] CRAN (R 3.6.1)
dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.1)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
GEOquery * 2.52.0 2019-05-02 [1] Bioconductor
ggplot2 3.2.1 2019-08-10 [1] CRAN (R 3.6.1)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.1)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1)
hms 0.5.2 2019-10-30 [1] CRAN (R 3.6.1)
knitr 1.26 2019-11-12 [1] CRAN (R 3.6.1)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.1)
lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.1)
limma 3.40.6 2019-07-26 [1] Bioconductor
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1)
NACHO * 0.6.1 2019-10-12 [1] CRAN (R 3.6.1)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.1)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.1)
rlang 0.4.1 2019-10-24 [1] CRAN (R 3.6.1)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.1)
scales 1.0.0 2018-08-09 [1] CRAN (R 3.6.1)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.1)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.1)
tidyr 1.0.0 2019-09-11 [1] CRAN (R 3.6.1)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.1)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1)
xfun 0.11 2019-11-12 [1] CRAN (R 3.6.1)
xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.1)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.1)

[1] /home/sebastian/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

@mcanouil
Copy link
Owner

Perfect!
Enjoy NACHO ;)

@athulmenon
Copy link

Hi Mcanouil,

Restarted R and tried to run the code fresh again. Still the same error!
`> GSE70970_sum <- summarize(

  • data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
    
  • ssheet_csv = targets, # The samplesheet
    
  • id_colname = "IDFILE", # Name of the column that contains the identfiers
    
  • housekeeping_genes = NULL, # Custom list of housekeeping genes
    
  • housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
    
  • normalisation_method = "GEO", # Geometric mean or GLM
    
  • n_comp = 5 # Number indicating the number of principal components to compute. 
    
  • )`

Error goes like this : [NACHO] Importing RCC files. Error: Column cols must be length 1 (the number of rows), not 3

Any other solutions?
Thanks for quick response.

Athul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants