From b38e424b0952ded1537c62f11f1bf67e0fa31a75 Mon Sep 17 00:00:00 2001 From: Andrew Whitwham Date: Mon, 21 Jun 2021 13:57:09 +0100 Subject: [PATCH 1/7] Summer 2021 update. --- NEWS | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 88 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 4a6c90cf1..6c71ce837 100644 --- a/NEWS +++ b/NEWS @@ -4,12 +4,97 @@ Noteworthy changes in release a.b Features and Updates -------------------- -* New method `hts_idx_nseq` returns the number of contigs covered by reads - from an index structure. - * In case a PG header line has multiple ID tags supplied by other applications, the header API now selects the first one encountered as the identifying tag and issues a warning when detecting subsequent ID tags. + (#1256; fixed samtools/samtools#1393) + +* VCF header reading function (vcf_hdr_read) no longer tries to download a + remote index file by default. + (#1266; fixes #380) + +* Transparently treat FASTQ as unmapped data as if it was SAM, BAM or CRAM. + (#1156) + +* Add GCP requester pays bucket access. Thanks to @indraniel. + (#1255) + +* Make mpileup's overlap removal choose a random sequence. + (#1273; fixes samtools/bcftools#1459) + +* Permit platform specific BAQ parameters. It will also select long-read + parameters for read lengths bigger than 1kb. This helps bcftools mpileup call + SNPs on PacBio CCS reads. + (#1275) + +* Improve bcf_remove_allele_set. Fixes a bug that stopped iteration over + alleles prematurely, marks removed alleles as 'missing' and does automatic + lazy unpacking. + (#1288; fixes #1259) + +* [CRAM] Improve compression metrics for unsorted files. This improves the + choice of codecs when handling unsorted data. + (#1291) + +* Initialise the missing entries of the linear index in reverse order. + Populates empty linear index entries from the right rather from the left. + Two functions have been renamed, dump_index is now idx_dump and + hts_idx_save_core is idx_save_core. The new hts_bin_level function computes + the index level of a bin. Thanks to @carsonh for reporting the issue. + (#1286; fixes #486) + +* Related to the above, the new method, hts_idx_nseq, returns the total number + of contigs from an index. + (#1295 and #1299) + +* Added bracket handling to bcf_hdr_parse_line. Thanks to Alberto Casas Orrtiz. + +Build changes +------------- + +These are compiler, configuration and makefile based changes. + +* Added a curl/curl.h check to configure and improved INSTALL documentation on + build options. Thanks to John Marshall. + (#1265) + +* Some fixes to address GCC 11.1 warnings. + (#1280, #1284, #1285; fixes #1283) + +* Support building HTSlib in a separate directory. Thanks to John Marshall. + (#1277; fixes #231) + +Bug fixes +--------- + +* Remove compressBound assertions on opening bgzf files. Thanks to + Gurt Hulselmans for reporting the issue. + (#1258; fixed #1257) + +* Duplicate sample name error message for a VCF file now only displays the + duplicated name rather the entire same name list. + (#1262; fixes samtools/bcftools#1451) + +* Fix to make samtools cat work on CRAMs again. + (#1276; fixes samtools/samtools#1420) + +* Fix for double memory free in SAM header creation. Thanks to @ihsineme. + (#1274) + +* Prevent assert in bcf_sr_set_regions. Thanks to Dr K D Murray. + (#1270) + +* Fix the update of the minimum offset when unmapped reads are present. This + bug broke subsetting. Thanks to Daniel Cooke for reporting the issue. + (#1281; fixes #1279) + +* Fix crash in knet_open() etc stubs. Thanks to John Marshall. + (#1289) + +* Fix filter expression "cigar" on unmapped reads. Stop treating an empty + CIGAR string as an error. Thanks to Chang Y for reporting the issue. + (#1298, fixes samtools/samtools#1445) + Noteworthy changes in release 1.12 (17th March 2021) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 543637a3fb57f4c2a7dc2568d4641fd94ca6825d Mon Sep 17 00:00:00 2001 From: Andrew Whitwham Date: Tue, 22 Jun 2021 15:45:27 +0100 Subject: [PATCH 2/7] Minor edit removing internal function renames. --- NEWS | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 6c71ce837..fa32ec0e3 100644 --- a/NEWS +++ b/NEWS @@ -38,9 +38,8 @@ Features and Updates * Initialise the missing entries of the linear index in reverse order. Populates empty linear index entries from the right rather from the left. - Two functions have been renamed, dump_index is now idx_dump and - hts_idx_save_core is idx_save_core. The new hts_bin_level function computes - the index level of a bin. Thanks to @carsonh for reporting the issue. + The new hts_bin_level function computes the index level of a bin. + Thanks to @carsonh for reporting the issue. (#1286; fixes #486) * Related to the above, the new method, hts_idx_nseq, returns the total number From d925c24380fc4f2656a61c410cb231b157b5f613 Mon Sep 17 00:00:00 2001 From: Valeriu Ohan Date: Fri, 25 Jun 2021 15:49:19 +0100 Subject: [PATCH 3/7] Add MinGW 32-bit support news item. --- NEWS | 3 +++ 1 file changed, 3 insertions(+) diff --git a/NEWS b/NEWS index fa32ec0e3..20bc15485 100644 --- a/NEWS +++ b/NEWS @@ -63,6 +63,9 @@ These are compiler, configuration and makefile based changes. * Support building HTSlib in a separate directory. Thanks to John Marshall. (#1277; fixes #231) +* Support building HTSlib on MinGW 32-bit environments. Thanks to John Marshall. + (#1301) + Bug fixes --------- From 381908efb4c07679329d5099d9189e384a453781 Mon Sep 17 00:00:00 2001 From: Andrew Whitwham Date: Tue, 29 Jun 2021 11:56:52 +0100 Subject: [PATCH 4/7] Minor grammar changes. --- NEWS | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/NEWS b/NEWS index 20bc15485..1b646a6a8 100644 --- a/NEWS +++ b/NEWS @@ -16,34 +16,34 @@ Features and Updates * Transparently treat FASTQ as unmapped data as if it was SAM, BAM or CRAM. (#1156) -* Add GCP requester pays bucket access. Thanks to @indraniel. +* Added GCP requester pays bucket access. Thanks to @indraniel. (#1255) -* Make mpileup's overlap removal choose a random sequence. +* Made mpileup's overlap removal choose a random sequence. (#1273; fixes samtools/bcftools#1459) -* Permit platform specific BAQ parameters. It will also select long-read +* Now permits platform specific BAQ parameters. This also selects long-read parameters for read lengths bigger than 1kb. This helps bcftools mpileup call SNPs on PacBio CCS reads. (#1275) -* Improve bcf_remove_allele_set. Fixes a bug that stopped iteration over +* Improved bcf_remove_allele_set. This fixes a bug that stopped iteration over alleles prematurely, marks removed alleles as 'missing' and does automatic lazy unpacking. (#1288; fixes #1259) -* [CRAM] Improve compression metrics for unsorted files. This improves the +* [CRAM] Improved compression metrics for unsorted files. This improves the choice of codecs when handling unsorted data. (#1291) -* Initialise the missing entries of the linear index in reverse order. +* Now initialisea the missing entries of the linear index in reverse order. Populates empty linear index entries from the right rather from the left. The new hts_bin_level function computes the index level of a bin. Thanks to @carsonh for reporting the issue. (#1286; fixes #486) -* Related to the above, the new method, hts_idx_nseq, returns the total number - of contigs from an index. +* Related to the above, the new method, hts_idx_nseq, now returns the total + number of contigs from an index. (#1295 and #1299) * Added bracket handling to bcf_hdr_parse_line. Thanks to Alberto Casas Orrtiz. @@ -60,16 +60,17 @@ These are compiler, configuration and makefile based changes. * Some fixes to address GCC 11.1 warnings. (#1280, #1284, #1285; fixes #1283) -* Support building HTSlib in a separate directory. Thanks to John Marshall. +* Supports building HTSlib in a separate directory. Thanks to John Marshall. (#1277; fixes #231) -* Support building HTSlib on MinGW 32-bit environments. Thanks to John Marshall. +* Supports building HTSlib on MinGW 32-bit environments. Thanks to + John Marshall. (#1301) Bug fixes --------- -* Remove compressBound assertions on opening bgzf files. Thanks to +* Removed compressBound assertions on opening bgzf files. Thanks to Gurt Hulselmans for reporting the issue. (#1258; fixed #1257) @@ -80,20 +81,20 @@ Bug fixes * Fix to make samtools cat work on CRAMs again. (#1276; fixes samtools/samtools#1420) -* Fix for double memory free in SAM header creation. Thanks to @ihsineme. +* Fix for a double memory free in SAM header creation. Thanks to @ihsineme. (#1274) * Prevent assert in bcf_sr_set_regions. Thanks to Dr K D Murray. (#1270) -* Fix the update of the minimum offset when unmapped reads are present. This +* Fixed the update of the minimum offset when unmapped reads are present. This bug broke subsetting. Thanks to Daniel Cooke for reporting the issue. (#1281; fixes #1279) -* Fix crash in knet_open() etc stubs. Thanks to John Marshall. +* Fixed crash in knet_open() etc stubs. Thanks to John Marshall. (#1289) -* Fix filter expression "cigar" on unmapped reads. Stop treating an empty +* Fixed filter expression "cigar" on unmapped reads. Stop treating an empty CIGAR string as an error. Thanks to Chang Y for reporting the issue. (#1298, fixes samtools/samtools#1445) From 10f9afe18ac5abaf82649dd60091d0d95f486370 Mon Sep 17 00:00:00 2001 From: Andrew Whitwham Date: Tue, 29 Jun 2021 14:05:28 +0100 Subject: [PATCH 5/7] Update NEWS Co-authored-by: John Marshall --- NEWS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 1b646a6a8..5a60f9441 100644 --- a/NEWS +++ b/NEWS @@ -54,8 +54,8 @@ Build changes These are compiler, configuration and makefile based changes. * Added a curl/curl.h check to configure and improved INSTALL documentation on - build options. Thanks to John Marshall. - (#1265) + build options. Thanks to Melanie Kirsche and John Marshall. + (#1265; fixes #1261) * Some fixes to address GCC 11.1 warnings. (#1280, #1284, #1285; fixes #1283) From f0c0c966934e27b2a1df6626afcfb95535badb2c Mon Sep 17 00:00:00 2001 From: Valeriu Ohan Date: Wed, 30 Jun 2021 15:21:50 +0100 Subject: [PATCH 6/7] News corrections and improvements. --- NEWS | 47 ++++++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 21 deletions(-) diff --git a/NEWS b/NEWS index 5a60f9441..dd6dedf16 100644 --- a/NEWS +++ b/NEWS @@ -12,41 +12,42 @@ Features and Updates * VCF header reading function (vcf_hdr_read) no longer tries to download a remote index file by default. (#1266; fixes #380) - + * Transparently treat FASTQ as unmapped data as if it was SAM, BAM or CRAM. (#1156) * Added GCP requester pays bucket access. Thanks to @indraniel. (#1255) - + * Made mpileup's overlap removal choose a random sequence. (#1273; fixes samtools/bcftools#1459) - + * Now permits platform specific BAQ parameters. This also selects long-read parameters for read lengths bigger than 1kb. This helps bcftools mpileup call SNPs on PacBio CCS reads. (#1275) - + * Improved bcf_remove_allele_set. This fixes a bug that stopped iteration over alleles prematurely, marks removed alleles as 'missing' and does automatic lazy unpacking. (#1288; fixes #1259) - + * [CRAM] Improved compression metrics for unsorted files. This improves the choice of codecs when handling unsorted data. (#1291) - -* Now initialisea the missing entries of the linear index in reverse order. + +* Now initialises the missing entries of the linear index in reverse order. Populates empty linear index entries from the right rather from the left. The new hts_bin_level function computes the index level of a bin. Thanks to @carsonh for reporting the issue. (#1286; fixes #486) - + * Related to the above, the new method, hts_idx_nseq, now returns the total number of contigs from an index. (#1295 and #1299) - -* Added bracket handling to bcf_hdr_parse_line. Thanks to Alberto Casas Orrtiz. + +* Added bracket handling to bcf_hdr_parse_line. Thanks to Alberto Casas Ortiz. + (#1240) Build changes ------------- @@ -70,30 +71,34 @@ These are compiler, configuration and makefile based changes. Bug fixes --------- -* Removed compressBound assertions on opening bgzf files. Thanks to +* Fixed hts_itr_query() et al region queries: fixed bug introduced in + HTSlib 1.12, which led to iterators producing very few reads for some + queries (especially for larger target regions) when unmapped reads were + present. HTSlib 1.11 had a related problem in which iterators would omit + a few unmapped reads that should have been produced; cf #1142. + Thanks to Daniel Cooke for reporting the issue. + (#1281; fixes #1279) + +* Removed compressBound assertions on opening bgzf files. Thanks to Gurt Hulselmans for reporting the issue. - (#1258; fixed #1257) + (#1258; fixed #1257) * Duplicate sample name error message for a VCF file now only displays the duplicated name rather the entire same name list. (#1262; fixes samtools/bcftools#1451) - + * Fix to make samtools cat work on CRAMs again. (#1276; fixes samtools/samtools#1420) - + * Fix for a double memory free in SAM header creation. Thanks to @ihsineme. (#1274) - + * Prevent assert in bcf_sr_set_regions. Thanks to Dr K D Murray. (#1270) - -* Fixed the update of the minimum offset when unmapped reads are present. This - bug broke subsetting. Thanks to Daniel Cooke for reporting the issue. - (#1281; fixes #1279) - + * Fixed crash in knet_open() etc stubs. Thanks to John Marshall. (#1289) - + * Fixed filter expression "cigar" on unmapped reads. Stop treating an empty CIGAR string as an error. Thanks to Chang Y for reporting the issue. (#1298, fixes samtools/samtools#1445) From 76fdb62acd27a34ee6778de1d4021f1b03173c6c Mon Sep 17 00:00:00 2001 From: Rob Davies Date: Mon, 5 Jul 2021 14:36:20 +0100 Subject: [PATCH 7/7] More NEWS wording tweaks Add a little more explanation, and try to describe the outcome of some of the changes for end users. --- NEWS | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/NEWS b/NEWS index dd6dedf16..7b60c913e 100644 --- a/NEWS +++ b/NEWS @@ -13,18 +13,21 @@ Features and Updates remote index file by default. (#1266; fixes #380) -* Transparently treat FASTQ as unmapped data as if it was SAM, BAM or CRAM. +* Support reading and writing FASTQ format in the same way as SAM, BAM or CRAM. + Records read from a FASTQ file will be treated as unmapped data. (#1156) * Added GCP requester pays bucket access. Thanks to @indraniel. (#1255) -* Made mpileup's overlap removal choose a random sequence. +* Made mpileup's overlap removal choose which copy to remove at random instead + of always removing the second one. This avoids strand bias in experiments + where the +ve and -ve strand reads always appear in the same order. (#1273; fixes samtools/bcftools#1459) -* Now permits platform specific BAQ parameters. This also selects long-read - parameters for read lengths bigger than 1kb. This helps bcftools mpileup call - SNPs on PacBio CCS reads. +* It is now possible to use platform specific BAQ parameters. This also + selects long-read parameters for read lengths bigger than 1kb, which helps + bcftools mpileup call SNPs on PacBio CCS reads. (#1275) * Improved bcf_remove_allele_set. This fixes a bug that stopped iteration over @@ -32,21 +35,27 @@ Features and Updates lazy unpacking. (#1288; fixes #1259) -* [CRAM] Improved compression metrics for unsorted files. This improves the +* Improved compression metrics for unsorted CRAM files. This improves the choice of codecs when handling unsorted data. (#1291) -* Now initialises the missing entries of the linear index in reverse order. - Populates empty linear index entries from the right rather from the left. - The new hts_bin_level function computes the index level of a bin. +* Linear index entries for empty intervals are now initialised with the file + offset in the next non-empty interval instead of the previous one. This + may reduce the amount of data iterators have to discard before reaching + the desired region, when the starting location is in a sequence gap. Thanks to @carsonh for reporting the issue. (#1286; fixes #486) -* Related to the above, the new method, hts_idx_nseq, now returns the total +* A new hts_bin_level API function has been added, to compute the level of a + given bin in the binning index. + (#1286) + +* Related to the above, a new API method, hts_idx_nseq, now returns the total number of contigs from an index. (#1295 and #1299) -* Added bracket handling to bcf_hdr_parse_line. Thanks to Alberto Casas Ortiz. +* Added bracket handling to bcf_hdr_parse_line, for use with ##META lines. + Thanks to Alberto Casas Ortiz. (#1240) Build changes