-
Notifications
You must be signed in to change notification settings - Fork 32
/
NEWS
executable file
·492 lines (402 loc) · 21.2 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
Changes in version 2.12.0
-------------------------
NEW FEATURES
o Added --num-eigen-vectors to PureCN.R to set the number of eigen vectors.
Tuning parameter for coverage normalization. Default should in most cases
be a good compromise between removing most normal noise and not
overfitting to pool of normal samples.
o Added findHighQualitySNPs function to extract good SNPs from mapping bias
database.
Changes in version 2.10.0
-------------------------
NEW FEATURES
o adjustLogRatio function for adjusting a tumor vs normal coverage
ratio for purity and ploidy. Useful for downstream tools that
expect ratios instead of absolute copy numbers such as GISTIC.
Thanks @tedtoal (#40).
SIGNIFICANT USER-VISIBLE CHANGES
o Provide interval-level likelihood scores in runAbsoluteCN return
object. Thanks @tinyheero (#335).
o Documentation updates. Thanks @ddrichel (#325).
BUGFIXES
o Bugfix #296 was not merged into the developer branch and did not make
it into 2.8.0.
o Log ratios not shiften to median of sample medians as intended (#356).
Thanks @sleyn.
o Fixed crash with small toy examples (fewer than 2000 baits, #363)
Changes in version 2.8.0
------------------------
SIGNIFICANT USER-VISIBLE CHANGES
o Make processMultipleSamples temporarily defunct because the
copynumber package was removed from Bioconductor
o Make it possible to specify saveRDS version to make output files
readable by old R versions prior to 3.6.0 (#255)
BUGFIXES
o Fixed an issue with callLOH and --model-homozygous (#254)
o Fixed crash when VCF contained NAs in base quality scores (#249)
o Fixed wrong check for outdir write permissions in Coverage.R and
NormalDB.R (#258)
o Fixed inverted return codes in couple of scripts (#284)
o Fixed an issue in callAlterations where the id argument was largely
ignored (#292)
o Fixed broken support for GenomicsDB-R from their developer branch
(#296)
o Fixed crash in plotAbs (#260)
o Fixed an issue with gene-level calls when annotation contained
non-official symbols found on multiple chromosomes (#298)
o Fixed a wrongly formatted error message when no germline database
information was found (#302)
o Fixed a crash when DB field in VCF only contains NAs(#301)
o Added --min-base-quality argument for PureCN.R (#320)
Changes in version 2.2.0
------------------------
NEW FEATURES
o Added chunks parameter to Coverage.R and calculateBamCoverageByInterval
to reduce memory usage (#218)
SIGNIFICANT USER-VISIBLE CHANGES
o When base quality scores are found in the VCF, they are now used to
calculate the minimum number of supporting reads (instead of assuming a
default BQ of 30). By default BQ is capped at 50 and variants below 25
are ignored. Set min.supporting.reads to 0 to turn this off (#206).
o More robust annotation of intervals with gene symbols
o Remove chromosomes not present in the centromeres GRanges object; useful
to remove altcontigs somehow present (should not happen with intervals
generated by IntervalFile.R)
BUGFIXES
o Fixed an issue with old R versions where factors were not converted to
strings, resulting in numbers instead of gene symbols
o Fix for a crash when there are no off-target reads in off-target regions
(#209).
o Fixed parsing of base quality scores in Mutect 2.2
o Fixed crash in GenomicsDB parsing when there were no variants in contig
(#225)
Changes in version 2.0.0
------------------------
NEW FEATURES
o Report median absolute pairwise difference (MAPD) of tumor vs normal log2
ratios in runAbsoluteCN
o Improved mapping bias estimates: variants with insufficient information
for position-specific fits (default 3-6 heterozygous variants)
are clustered and assigned to the most similar fit
o Make Cosmic.CNT INFO field name customizable
SIGNIFICANT USER-VISIBLE CHANGES
o Cleanup of naming of command line arguments (will throw lots of deprecated
warnings, but was long overdue)
o More robust alignment of on- and off-target tumor vs normal log2 ratios.
Ratios are shifted so that median difference of neighboring on/off-target
pairs is 0. This should fix spurious segments consisting of only on- or
off-target regions in high quality samples where those minor off-sets
sometimes exceeded the noise.
o Added min.variants argument to runAbsoluteCN
o Added PureCN version to runAbsoluteCN results object (ret$version)
o Addressed observed over-segmentations in very clean data:
- Do not attempt two-step segmentation in PSCBS when off-target noise is
still very small (< 0.15, min.logr.sdev in runAbsoluteCN)
- Increase automatically determined undo.SD in all segmentation functions
when noise is very small (< min.logr.sdev)
- min.logr.sdev is now accessible in PureCN.R via --min-logr-sdev
o Added pairwise sample distances to normalDB output object helpful for
finding noisy samples or batches in normal databases
o Do not error out readCurationFile when CSV is missing and directory
is not writable when re-generating it (#196)
o Add segmentation parameters as attributes to segmentation data.frame
o Added min.betafit.rho and max.betafit.rho to calculateMappingBias*
o Made --normal_panel in PureCN.R defunct
o Added GATK/Picard header with sequence lengths to interval file,
added readIntervalFile function to parse it
BUGFIXES
o Fix for crash when --normal_panel in NormalDB.R contained no variants
(#180).
o Fix for crash when rtracklayer failed to parse --infile in
FilterCallableLoci.R (#182)
o More robust parsing of VCF with missing GT field (#184)
o Fix for bug and crash when mapping bias RDS file contains variants with
multiple alt alleles (#184)
o Added missing dependency 'markdown'
o Fix for crash when only a small number of off-target intervals pass
filters (#190)
o Fix for crash when PSCBS segmentation was selected without VCF file
(#190)
o Fix for crash when Hclust segmentation was selected without segmentation
file (#190)
o Fix for crashes when not many variant pass filters (#192, #195)
o Fix for crash when provided segmentation does not have chromosomes
in common with VCF (#192) or does not provide all chromosomes present in
the coverage file (#192)
Changes in version 1.22.0
-------------------------
NEW FEATURES
o calculateNormalDatabase now suggests an off-target interval width that
minimizes noise while keeping the resolution as high as possible
o Added support for GATK4 CollectAllelicCounts output as alternative
to Mutect
o Added segmentationGATK4 to use GATK4's segmentation function
ModelSegments
SIGNIFICANT USER-VISIBLE CHANGES
o Added min.total.counts filter to filterIntervals to remove
intervals with low number of read counts in combined tumor and normal.
Useful especially for off-target filtering in highly efficient assays
where standard filters keep too many high variance regions.
o Changed default of min.mappability in preprocessIntervals for on-target
intervals to 0.6 (from 0.5)
o Added min.mappability also to filterIntervals so that more conservative
cutoffs can be tested after normalDB generation
o PSCBS: 1.20.0 two-step segmentation slightly tweaked in that only
high quality on-target intervals (high mappability and low PoN noise)
are used in the first segmentation
o Added --skipgcnorm flag to Coverage.R to skip GC-normalization
o Added AF.info.field option to calculateMappingBiasGatk4 for non-standard
GenomicsDB imports
o If segmentation functions add breakpoints within baits, these
breakpoints are now moved to the beginning or end of that bait to avoid
that a single bait is assigned to two segments
o Dx.R now always generates a _signatures.csv file with --signatures, even
if insufficient number of mutations
o Removed defunct calculateIntervalWeights function
BUGFIXES
o Fix for nonsensical error message when VCF does not contain germline
variants (#166).
o Fix for various issues related to the seqlevelsStyle function (e.g.
#171)
o Fix for crash in calculateMappingBiasGatk4 when not all samples had
a single variant call on a particular chromosome (chrY)
o Fix related to annotating mapping bias with triallelic sites and
GenomicsDB
o Fixed an issue in Mutect 1.1.7 data in which good SNPs were ignored
(#174)
Changes in version 1.20.0
-------------------------
NEW FEATURES
o Support for GATK4 GenomicsDB import for mapping bias calculation
o Added --additionaltumors to PureCN.R to provide coverage files
from additional biopsies from the same patient when available
o PSCBS segmentation now identifies on-target breakpoints first when
off-target is noisy, thus boosting sensitivity in on-target regions
o Beta-binomial model in runAbsoluteCN now uses the fits in mapping bias
database. We plan to set this as default in upcoming versions and
appreciate feedback.
SIGNIFICANT USER-VISIBLE CHANGES
o We now check if POP_AF or POPAF is -log10 scaled as new Mutect2 versions
do.
o Added support for GERMQ info field containing Phred-scaled germline
probabilities.
o Detect Mutect2 VCF more reliably
o Updated Mutect2 failure flags: "strand_bias", "slippage", "weak_evidence",
"orientation", "haplotype"
o Removed defunct normal.panel.vcf.file from setMappingBiasVcf
o Removed defunct interval.weight.file from segmentationPSCBS,
segmentationCBS and processMultipleSamples
o Made calculateIntervalWeights defunct
o Changed default of min.normals in calculateMappingBiasVcf/Gatk4 to 1
from 2
o Changed default of --signature_databases to
"signatures.exome.cosmic.v3.may2019" (v3 instead of v2)
o Now warn if recommended -funsegmentation is not used
o Added parallel option for callAmplificationsInLowPurity
o callMutationBurden now uses all non-filtered targets as callable region
when callable is not provided
o plotAbs in chromosome mode now displays wider range of log2 ratios
(makes it possible to examine outliers)
o Moved vcf.field.prefix from predictSomatic to runAbsoluteCN since it now
adds more fields like prior somatic and mapping bias to the VCF
o Changed default of runAbsoluteCN min.ploidy to 1.4
BUGFIXES
o Fix for crash with CNVkit input when log-ratio contained highly negative
outliers
o Fixed a bug in preprocessIntervals/IntervalFile.R when input contained
overlapping and stranded intervals
o Fix for crash when GC-correction is attempted on empty coverage (for
example off-target region without any off-target reads)
o Fix for crash when VCF FA field contained missing values
o Fix for a bug in callAmplificationsInLowPurity that can cause a wrong
chromosome percentile
Changes in version 1.18.0
-------------------------
SIGNIFICANT USER-VISIBLE CHANGES
o callAlterations: columns C and seg.mean now provide the values of the
segment listed in seg.id. This changes the behaviour in cases where the
gene contains breakpoints and thus multiple segments overlap (#112)
BUGFIXES
o Fix for bug that can result in crash when candidates were provided in
runAbsoluteCN and test.purity, max.ploidy and/or min.ploidy were set to
non-default values
Changes in version 1.16.0
-------------------------
NEW FEATURES
o Flag segments in poor quality regions
o predictSomatic now provides log-likelihood of allelic balance
(ALLELIC.IMBALANCE column) for each variant
o Added readLogRatioFile function to read GATK4 DenoiseReadCounts
output files containing log2 tumor/normal ratios
o Added readSegmentationFile function to read GATK4 ModelSegment
output files containing segmented log2 tumor/normal ratios
o Added callAmplificationsInLowPurity to call gene-level
amplifications in samples < 10% purity
o Dx.R now reports chromosomal instability scores
(available also via callCIN function)
o Dx.R supports deconstructSigs 1.9.0 and COSMIC signatures v3.
To run both v2 and v3, simply add --signature_databases
signatures.exome.cosmic.v3.may2019:signatures.cosmic to Dx.R
SIGNIFICANT USER-VISIBLE CHANGES
o Made filterTargets and createTargetWeights defunct
o setMappingBiasVcf now returns a data.frame
o Best practices vignette now HTML-based
o Renamed normal.panel.vcf.file in setMappingBiasVcf to mapping.bias.file;
in 1.18, setMappingBiasVcf will not accept a VCF anymore but requires
a precomputed mapping bias RDS file.
o calculateIntervalWeights now directly called by createNormalDatabase and
information included in the normalDB RDS object. This function is thus
deprecated.
o Column gene.mean in callAlterations output now weighted by interval
weights when available
o Changed default of min.target.width in preprocessIntervals from 10 to 100
(#73)
o replaced write.table with data.table::fwrite to automatically support
producing gzipped output (requires data.table 1.12.4, #106)
o Coverage.R now gzips BAM file coverage (requires data.table 1.12.4, #106)
o Output coverage files now code FALSE as 0 and TRUE as 1
o PureCN.R now bgzips and tabix indexes VCFs when --vcf is provided
BUGFIXES
o Fix for bug in CCF calculation resulting in NAs (happens in high
coverage samples, early mutations with > 1 allele copy number)
o Fix for a bug in preprocessIntervals when small targets
(< min.target.width) were present
o Fix for a bug in callMutationBurden when VCF contained indels
(#82)
o Die with helpful error message when snp.blacklist import failed
o Check input segmentation files for missing values resulting in crash
o Fixed a crash in Varscan2 produced VCFs when ALT field missed ref counts
(#109)
Changes in version 1.14.0
-------------------------
NEW FEATURES
o support for copynumber package and its multisample segmentation
o beta support for PSCBS weighting
o support for gene symbol filtering in FilterCallableLoci.R
(e.g. --exclude "^HLA")
o added segmentationHclust function that clusters provided segmentation
using log2-ratio and B-allele frequencies
o min.target.width and small.targets in preprocessIntervals to
automatically deal with too small targets
o calculate confidence intervals for cellular fractions
o throw additional warning when sample is flagged as NON-ABERRANT and
pick the diploid solution with lowest purity as best
SIGNIFICANT USER-VISIBLE CHANGES
o significant runtime improvements
o callLOH now reports all segments, even if there are no informative
SNPs since some users were not aware that segments are missing from
this output. Use keep.no.snp.segments = FALSE to restore old behaviour.
o more detailed output of callLOH
o renamed num.snps.segment to num.snps in callAlterations output
BUGFIXES
o fixed crash in PureCN.R when gene symbols are missing from
interval file
o fixed crash in runAbsoluteCN with matched normals and high test.purity
minimum (#74)
Changes in version 1.12.0
-------------------------
NEW FEATURES
o normalDB does not need input normal coverage files anymore after creation
(so the resulting normalDB.rds file can be moved)
o base quality filtering can be turned off by setting min.base.quality to 0
or NULL
o possible to change the POP_AF info field name
o possible to change POP_AF cutoff to set a high germline prior
o possible to change min.cosmic.cnt and max.homozygous.loss in PureCN.R
o set number of cores in PureCN.R (thanks Brad)
SIGNIFICANT USER-VISIBLE CHANGES
o renamed reptimingbinsize to reptimingwidth in IntervalFile.R, added
this feature to preprocessIntervals
o clarified "targets" vs. "intervals"; whenever something affects both
on-target and off-target, it is now called "intervals". When only targets,
e.g. in annotateTargets, "targets" was kept.
o made gc.gene.file defunct
o new default for min.cosmic.cnt = 6 (instead of 4)
BUGFIXES
o catch various input problems and provide better error messages instead
of crashing
o stranded input BED files do not cause problems anymore
o fixed a bug when only a single local optimum was tested (happens only
when users copy the examples that restrict the search speach to avoid
long runtimes)
o added missing QC flag to predictSomatic VCF annotation
Changes in version 1.10.0
-------------------------
SIGNIFICANT USER-VISIBLE CHANGES
o New normal database format
o Runtime performance improvements (skip unlikely local optima, support for
BiocParallel in runAbsoluteCN, pre-calculation of mapping bias)
o Support for replication timing scores in coverage normalization
o More accurate confidence intervals in callMutationBurden
o More accurate copy numbers for high-level amplifications
o Very low or high coverage samples are now by default dropped in normal
database creation (less than 25% or more than 4 times the median sample
coverage)
o Improved support for third-party upstream tools like GATK4 (experimental)
o More checks for wrong or sub-optimal input and providing suggestions for
fixing those issues
o Gibbs sampling of log tumor/normal coverage error rate
o Better imputation of mapping bias (instead of smoothing
over neighboring variants in the sample, smooth over neighboring SNPs
in the pool of normals - only available when pre-calculated)
o Experimental support for indels
o Code cleanups (switch to testthat, removed several obsolete and minor
features)
API CHANGES
o renamed gc.gene.file to interval.file since it now provides more than
GC-content and gene symbols
o plotAbs ids changed to id (this function now only plots a single
purity/ploidy solution)
o changed default of runAbsoluteCN max.logr.sdev to 0.6 (from 0.75)
o createTargetWeights does not require tumor coverages anymore
o calculateGCContentByInterval was renamed to preprocessIntervals
o renamed plot.gc.bias to plot.bias in correctCoverageBias since it now
also includes replication timing
o added calculateMappingBiasVcf to pre-compute mapping bias from a
panel of normal VCF, thus avoiding time loading and parsing
of huge VCFs
o max.homozygous.loss now defines the maximum fraction of a chromosome
lost, not the whole genome, to avoid wrong maximum likelihood solutions with
completely deleted chromosome arms
Changes in version 1.8.0
------------------------
NEW FEATURES
o Support for off-target reads in copy number normalization and segmentation
o Added mutation burden calculation
o More robust mapping bias estimation
o Added support for CNVkit coverage files (*.cnn, *.cnr)
o IntervalFile.R can annotate targets with gene symbols and automatically
convert chromosome naming styles
o Better artifact filtering by using normalDB more efficiently
o Support for mappability scores
o Coverage calculation can now include duplicates
o calculateBamCoverageByInterval now provides fragment counts and
duplication rates
o findBestNormal pooling now fragment count based, not coverage based
o Experimental support for GATK4
o predictSomatic now reports posterior probabilites of minor segment copy
numbers, flags segments if copy numbers are unreliable
o Targets can be annotated with multiple gene symbols (comma separated)
o Code cleanups (switch to GRanges where possible, switch to optparse in
command line tools)
API CHANGES
o Due to novel optimizations of provided bait intervals, we highly recommend
to regenerate the interval files and normal databases and recalculate all
coverages from BAM files
o New functions: annotateTargets, callMutationBurden
o Defunct functions: createSNPBlacklist, getDiploid, autoCurateResults,
readCoverageGatk
o min.normals defaults to 2 (changed from 4) in setMappingBiasVcf
o normalDB.min.coverage defaults to 0.25 (changed from 0.2) in filterTargets
o log.ratio.calibration defaults to 0.1 (from 0.25) in runAbsoluteCN; now
relative to purity, not log-ratio noise
o Removed gc.data from filterTargets since gc_bias is now added to tumor
coverage
o dropped purecn.output from correctCoverageBias (no two-pass anymore)
o Coverage.R argument --gatkcoverage renamed to --coverage
o Dropped GC-normalization functionality in NormalDB, since this is
now conveniently done in Coverage.R
o Renamed PureCN.R --outdir argument to --out. Can now specify a file
prefix as in GATK. Filenames are thus not forced to sample id anymore.
If --out is a directory, it will behave like before and will use
out/sampleid_suffix as filename.