We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi,
@ShanEllis found a bug in the TCGA metadata. Basically, a column has a mixture of data. She also found that by re-running https://github.com/leekgroup/recount-website/blob/master/metadata/tcga_prep/tcga_clinical.R the problematic column gets fixed. It looks like cgc_case_age_at_diagnosis has the data that is weird in age_at_initial_pathologic_diagnosis.
cgc_case_age_at_diagnosis
age_at_initial_pathologic_diagnosis
For now, this will be a known issue while I update the TCGA files in https://github.com/leekgroup/recount-website.
Best, Leo
library('recount') library('devtools') md <- all_metadata('TCGA') table(md$xml_age_at_initial_pathologic_diagnosis) md <- recount::all_metadata('TCGA') weird <- which(md$xml_age_at_initial_pathologic_diagnosis %in% c('Trigone', 'Wall Anterior', 'Wall Lateral', 'Wall NOS', 'Wall Posterior')) md[weird, colnames(md)[grep('age', colnames(md))]] ## Reproducibility information print('Reproducibility information:') Sys.time() proc.time() options(width = 120) session_info()
> library('recount') > library('devtools') > > md <- all_metadata('TCGA') 2017-02-24 13:28:09 downloading the metadata to /var/folders/cx/n9s558kx6fb7jf5z_pgszgb80000gn/T//RtmpSiDQCg/metadata_clean_tcga.Rdata trying URL 'https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_tcga.Rdata?raw=true' Content type 'application/octet-stream' length 16351229 bytes (15.6 MB) ================================================== downloaded 15.6 MB > table(md$xml_age_at_initial_pathologic_diagnosis) 0 14 15 16 17 18 19 20 21 22 25 1 4 2 4 8 11 17 14 11 23 24 25 26 27 28 29 30 31 32 22 29 22 28 25 33 40 50 46 46 33 34 35 36 37 38 39 40 41 42 53 72 67 62 66 89 81 102 100 107 43 44 45 46 47 48 49 50 51 52 124 99 147 138 159 176 158 165 235 184 53 54 55 56 57 58 59 60 61 62 219 227 226 240 261 275 274 302 292 296 63 64 65 66 67 68 69 70 71 72 290 265 276 273 258 270 262 247 231 208 73 74 75 76 77 78 79 80 81 82 230 229 202 163 161 140 144 114 95 81 83 84 85 86 87 88 89 90 Trigone Wall Anterior 64 74 51 28 35 24 10 49 1 6 Wall Lateral Wall NOS Wall Posterior 11 1 9 > md <- recount::all_metadata('TCGA') 2017-02-24 13:28:19 downloading the metadata to /var/folders/cx/n9s558kx6fb7jf5z_pgszgb80000gn/T//RtmpSiDQCg/metadata_clean_tcga.Rdata trying URL 'https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_tcga.Rdata?raw=true' Content type 'application/octet-stream' length 16351229 bytes (15.6 MB) ================================================== downloaded 15.6 MB > weird <- which(md$xml_age_at_initial_pathologic_diagnosis %in% c('Trigone', 'Wall Anterior', 'Wall Lateral', 'Wall NOS', 'Wall Posterior')) > md[weird, colnames(md)[grep('age', colnames(md))]] DataFrame with 28 rows and 39 columns gdc_cases.diagnoses.tumor_stage gdc_cases.diagnoses.age_at_diagnosis cgc_case_age_at_diagnosis cgc_case_clinical_stage cgc_case_pathologic_stage <character> <numeric> <integer> <character> <character> 1 stage ii 25672 70 NA Stage II 2 stage iv 23236 63 NA Stage IV 3 stage iv 26893 73 NA Stage IV 4 stage iv 26874 73 NA Stage IV 5 stage ii 28204 77 NA Stage II ... ... ... ... ... ... 24 stage iv 27963 76 NA Stage IV 25 stage iii 25185 68 NA Stage III 26 stage iii 28328 77 NA Stage III 27 stage iv 27816 76 NA Stage IV 28 stage iv 21196 58 NA Stage IV xml_primary_pathology_age_at_initial_pathologic_diagnosis xml_age_at_initial_pathologic_diagnosis xml_stage_event_system_version <integer> <character> <character> 1 NA Wall Lateral 7th 2 NA Wall Lateral 7th 3 NA Wall Lateral 7th 4 NA Wall Lateral 7th 5 NA Wall Lateral 7th ... ... ... ... 24 NA Wall Posterior 7th 25 NA Wall Lateral 7th 26 NA Wall Posterior 7th 27 NA Wall Anterior 7th 28 NA Wall Anterior 7th xml_stage_event_clinical_stage xml_stage_event_pathologic_stage xml_stage_event_tnm_categories xml_stage_event_psa xml_stage_event_gleason_grading <character> <character> <character> <character> <integer> 1 NA Stage II T2aN0MX NA NA 2 NA Stage IV T2T3N2MX NA NA 3 NA Stage IV T2T3aN2MX NA 6 4 NA Stage IV T2T4bN1MX NA 7 5 NA Stage II T2aN0MX NA NA ... ... ... ... ... ... 24 NA Stage IV T3bN2M0 NA 6 25 NA Stage III T2T3bN0MX NA NA 26 NA Stage III T1T3aN0MX NA 7 27 NA Stage IV T3bN3M1 NA 6 28 NA Stage IV T3bN2MX NA 6 xml_stage_event_ann_arbor xml_stage_event_serum_markers xml_stage_event_igcccg_stage xml_stage_event_masaoka_stage xml_asbestos_exposure_age <character> <character> <character> <character> <integer> 1 NA NA NA NA NA 2 NA NA NA NA NA 3 NA NA NA NA NA 4 NA NA NA NA NA 5 NA NA NA NA NA ... ... ... ... ... ... 24 NA NA NA NA NA 25 NA NA NA NA NA 26 NA NA NA NA NA 27 NA NA NA NA NA 28 NA NA NA NA NA xml_asbestos_exposure_age_last xml_birth_control_pill_history_usage_category xml_age_began_smoking_in_years xml_axillary_lymph_node_stage_method_type <integer> <character> <integer> <character> 1 NA NA 12 NA 2 NA NA NA NA 3 NA NA NA NA 4 NA NA NA NA 5 NA NA 18 NA ... ... ... ... ... 24 NA NA 15 NA 25 NA NA 25 NA 26 NA NA NA NA 27 NA NA NA NA 28 NA NA NA NA xml_axillary_lymph_node_stage_other_method_descriptive_text xml_er_level_cell_percentage_category xml_history_of_esophageal_cancer <character> <character> <character> 1 NA NA NA 2 NA NA NA 3 NA NA NA 4 NA NA NA 5 NA NA NA ... ... ... ... 24 NA NA NA 25 NA NA NA 26 NA NA NA 27 NA NA NA 28 NA NA NA xml_primary_pathology_esophageal_tumor_cental_location xml_primary_pathology_esophageal_tumor_involvement_sites <character> <character> 1 NA NA 2 NA NA 3 NA NA 4 NA NA 5 NA NA ... ... ... 24 NA NA 25 NA NA 26 NA NA 27 NA NA 28 NA NA xml_primary_pathology_tumor_infiltrating_macrophages xml_cumulative_agent_total_dose xml_hydroxyurea_agent_administered_day_count <character> <integer> <integer> 1 NA NA NA 2 NA NA NA 3 NA NA NA 4 NA NA NA 5 NA NA NA ... ... ... ... 24 NA NA NA 25 NA NA NA 26 NA NA NA 27 NA NA NA 28 NA NA NA xml_person_history_nonmedical_leukemia_causing_agent_type xml_lab_procedure_blast_cell_outcome_percentage_value <character> <integer> 1 NA NA 2 NA NA 3 NA NA 4 NA NA 5 NA NA ... ... ... 24 NA NA 25 NA NA 26 NA NA 27 NA NA 28 NA NA xml_prior_tamoxifen_administered_usage_category xml_radiosensitizing_agent_administered_indicator <character> <character> 1 NA NA 2 NA NA 3 NA NA 4 NA NA 5 NA NA ... ... ... 24 NA NA 25 NA NA 26 NA NA 27 NA NA 28 NA NA xml_person_concomitant_prostate_carcinoma_pathologic_t_stage xml_first_diagnosis_age_asth_ecz_hay_fev_mold_dust xml_first_diagnosis_age_of_food_allergy <character> <character> <character> 1 NA NA NA 2 NA NA NA 3 NA NA NA 4 NA NA NA 5 NA NA NA ... ... ... ... 24 NA NA NA 25 NA NA NA 26 NA NA NA 27 7thStage IVT3bN3M16 NA NA 28 7thStage IVT3bN2MX6 NA NA xml_first_diagnosis_age_of_animal_insect_allergy xml_undescended_testis_corrected_age <character> <character> 1 NA NA 2 NA NA 3 NA NA 4 NA NA 5 NA NA ... ... ... 24 NA NA 25 NA NA 26 NA NA 27 NA NA 28 NA NA > > ## Reproducibility information > print('Reproducibility information:') [1] "Reproducibility information:" > Sys.time() [1] "2017-02-24 13:28:27 EST" > proc.time() user system elapsed 20.786 1.699 28.800 > options(width = 120) > session_info() Session info ----------------------------------------------------------------------------------------------------------- setting value version R Under development (unstable) (2016-10-26 r71594) system x86_64, darwin13.4.0 ui AQUA language (EN) collate en_US.UTF-8 tz America/New_York date 2017-02-24 Packages --------------------------------------------------------------------------------------------------------------- package * version date source acepack 1.4.1 2016-10-29 CRAN (R 3.4.0) AnnotationDbi 1.37.3 2017-02-09 Bioconductor assertthat 0.1 2013-12-06 CRAN (R 3.4.0) backports 1.0.5 2017-01-18 CRAN (R 3.4.0) base64enc 0.1-3 2015-07-28 CRAN (R 3.4.0) Biobase * 2.35.1 2017-02-23 Bioconductor BiocGenerics * 0.21.3 2017-01-12 Bioconductor BiocParallel 1.9.5 2017-01-24 Bioconductor biomaRt 2.31.4 2017-01-13 Bioconductor Biostrings 2.43.4 2017-02-02 Bioconductor bitops 1.0-6 2013-08-17 CRAN (R 3.4.0) BSgenome 1.43.5 2017-02-02 Bioconductor bumphunter 1.15.0 2016-10-23 Bioconductor checkmate 1.8.2 2016-11-02 CRAN (R 3.4.0) cluster 2.0.5 2016-10-08 CRAN (R 3.4.0) codetools 0.2-15 2016-10-05 CRAN (R 3.4.0) colorspace 1.3-2 2016-12-14 CRAN (R 3.4.0) data.table 1.10.4 2017-02-01 CRAN (R 3.4.0) DBI 0.5-1 2016-09-10 CRAN (R 3.4.0) DelayedArray * 0.1.7 2017-02-17 Bioconductor derfinder 1.9.6 2017-01-13 Bioconductor derfinderHelper 1.9.3 2016-11-29 Bioconductor devtools * 1.12.0 2016-12-05 CRAN (R 3.4.0) digest 0.6.12 2017-01-27 CRAN (R 3.4.0) doRNG 1.6 2014-03-07 CRAN (R 3.4.0) downloader 0.4 2015-07-09 CRAN (R 3.4.0) foreach 1.4.3 2015-10-13 CRAN (R 3.4.0) foreign 0.8-67 2016-09-13 CRAN (R 3.4.0) Formula 1.2-1 2015-04-07 CRAN (R 3.4.0) GenomeInfoDb * 1.11.9 2017-02-08 Bioconductor GenomeInfoDbData 0.99.0 2017-02-14 Bioconductor GenomicAlignments 1.11.9 2017-02-02 Bioconductor GenomicFeatures 1.27.8 2017-02-11 Bioconductor GenomicFiles 1.11.3 2016-11-29 Bioconductor GenomicRanges * 1.27.22 2017-02-02 Bioconductor GEOquery 2.41.0 2016-10-25 Bioconductor ggplot2 2.2.1 2016-12-30 CRAN (R 3.4.0) gridExtra 2.2.1 2016-02-29 CRAN (R 3.4.0) gtable 0.2.0 2016-02-26 CRAN (R 3.4.0) Hmisc 4.0-2 2016-12-31 CRAN (R 3.4.0) htmlTable 1.9 2017-01-26 CRAN (R 3.4.0) htmltools 0.3.5 2016-03-21 CRAN (R 3.4.0) htmlwidgets 0.8 2016-11-09 CRAN (R 3.4.0) httr 1.2.1 2016-07-03 CRAN (R 3.4.0) IRanges * 2.9.18 2017-02-02 Bioconductor iterators 1.0.8 2015-10-13 CRAN (R 3.4.0) jsonlite 1.2 2016-12-31 CRAN (R 3.4.0) knitr 1.15.1 2016-11-22 CRAN (R 3.4.0) lattice 0.20-34 2016-09-06 CRAN (R 3.4.0) latticeExtra 0.6-28 2016-02-09 CRAN (R 3.4.0) lazyeval 0.2.0 2016-06-12 CRAN (R 3.4.0) locfit 1.5-9.1 2013-04-20 CRAN (R 3.4.0) magrittr 1.5 2014-11-22 CRAN (R 3.4.0) Matrix 1.2-8 2017-01-20 CRAN (R 3.4.0) matrixStats * 0.51.0 2016-10-09 CRAN (R 3.4.0) memoise 1.0.0 2016-01-29 CRAN (R 3.4.0) munsell 0.4.3 2016-02-13 CRAN (R 3.4.0) nnet 7.3-12 2016-02-02 CRAN (R 3.4.0) pkgmaker 0.22 2014-05-14 CRAN (R 3.4.0) plyr 1.8.4 2016-06-08 CRAN (R 3.4.0) qvalue 2.7.0 2016-10-23 Bioconductor R6 2.2.0 2016-10-05 CRAN (R 3.4.0) RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.4.0) Rcpp 0.12.9 2017-01-14 CRAN (R 3.4.0) RCurl 1.95-4.8 2016-03-01 CRAN (R 3.4.0) recount * 1.1.18 2017-02-22 Github (leekgroup/recount@ced5db4) registry 0.3 2015-07-08 CRAN (R 3.4.0) rentrez 1.0.4 2016-10-26 CRAN (R 3.4.0) reshape2 1.4.2 2016-10-22 CRAN (R 3.4.0) rngtools 1.2.4 2014-03-06 CRAN (R 3.4.0) rpart 4.1-10 2015-06-29 CRAN (R 3.4.0) Rsamtools 1.27.12 2017-01-24 Bioconductor RSQLite 1.1-2 2017-01-08 CRAN (R 3.4.0) rtracklayer 1.35.6 2017-02-19 cran (@1.35.6) S4Vectors * 0.13.15 2017-02-14 cran (@0.13.15) scales 0.4.1 2016-11-09 CRAN (R 3.4.0) stringi 1.1.2 2016-10-01 CRAN (R 3.4.0) stringr 1.2.0 2017-02-18 CRAN (R 3.4.0) SummarizedExperiment * 1.5.7 2017-02-23 Bioconductor survival 2.40-1 2016-10-30 CRAN (R 3.4.0) tibble 1.2 2016-08-26 CRAN (R 3.4.0) VariantAnnotation 1.21.17 2017-02-12 Bioconductor withr 1.0.2 2016-06-20 CRAN (R 3.4.0) XML 3.98-1.5 2016-11-10 CRAN (R 3.4.0) xtable 1.8-2 2016-02-05 CRAN (R 3.4.0) XVector 0.15.2 2017-02-02 Bioconductor zlibbioc 1.21.0 2016-10-23 Bioconductor
The text was updated successfully, but these errors were encountered:
Looks like it was a TCGAbiolinks issue rather than an issue with the GDC data from skimming through https://github.com/Bioconductor-mirror/TCGAbiolinks/commit/8a35266df471593939538b5d63d110bfe3daca32.
TCGAbiolinks
Sorry, something went wrong.
This has been solved as of today March 1st, 2017
lcolladotor
No branches or pull requests
Hi,
@ShanEllis found a bug in the TCGA metadata. Basically, a column has a mixture of data. She also found that by re-running https://github.com/leekgroup/recount-website/blob/master/metadata/tcga_prep/tcga_clinical.R the problematic column gets fixed. It looks like
cgc_case_age_at_diagnosis
has the data that is weird inage_at_initial_pathologic_diagnosis
.For now, this will be a known issue while I update the TCGA files in https://github.com/leekgroup/recount-website.
Best,
Leo
Unevaluated code
Evaluated code
The text was updated successfully, but these errors were encountered: