Skip to content

[CVE-2024-28757] Prevent billion laughs attacks in isolated external parser (part of #839) #842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 7, 2024

Conversation

hartwork
Copy link
Member

@hartwork hartwork commented Mar 5, 2024

Part of #839

CC (in alphabetic order) @catenacyber @RMJ10 @Snild-Sony

@hartwork hartwork added this to the 2.6.2 milestone Mar 5, 2024
@hartwork hartwork mentioned this pull request Mar 6, 2024
28 tasks
@Snild-Sony
Copy link
Contributor

I would like a lot more info in the fix's commit message: how does the problem happen, and why does the commit fix it?

Also, is it even valid to run an external parser without having parsed a single byte of the parent document? Is there a valid usecase for that, or should it just be disallowed?

@hartwork
Copy link
Member Author

hartwork commented Mar 6, 2024

I would like a lot more info in the fix's commit message: how does the problem happen, and why does the commit fix it?

@Snild-Sony that's a valid point. I'm a bit shy with prose that goes into public Git history forever and find out a week later that something was off. I'll look into it 👍

Also, is it even valid to run an external parser without having parsed a single byte of the parent document? Is there a valid usecase for that, or should it just be disallowed?

Those are good questions. The API allows it and the current fuzzers do exactly that, which put a spotlight to it. Some XML IDE could do that maybe to parse a known-DTD file even without a user, not sure.

@Snild-Sony
Copy link
Contributor

I'm a bit shy with prose that goes into public Git history forever

As someone who does a bunch of debugging in unfamiliar code written by others, I'll say that good commit messages are extremely helpful in understanding why some piece of code is the way it is.

Think of it as "this is what I believed at the time of making this change". If it turns out to be wrong, that's a good thing, because it's a strong signal that the change itself might also need adjustment.

@hartwork
Copy link
Member Author

hartwork commented Mar 6, 2024

I'm a bit shy with prose that goes into public Git history forever

As someone who does a bunch of debugging in unfamiliar code written by others, I'll say that good commit messages are extremely helpful in understanding why some piece of code is the way it is.

Think of it as "this is what I believed at the time of making this change". If it turns out to be wrong, that's a good thing, because it's a strong signal that the change itself might also need adjustment.

@Snild-Sony I agree but the picture misses public perception and reputation. If I'm wrong, I'd rather be wrong in private. Could be cultural.

@RMJ10
Copy link
Contributor

RMJ10 commented Mar 6, 2024

Would it be acceptable to reference (and preferably summarise) #839? That would at least give future debuggers (future you!) somewhere to start.

@hartwork hartwork force-pushed the issue-839-billion-laughs-isolated-external-parser branch from 20bb257 to 58431b0 Compare March 6, 2024 22:37
@hartwork
Copy link
Member Author

hartwork commented Mar 6, 2024

@Snild-Sony @RMJ10 you're right, I have now extended the commit message as following:

    lib/xmlparse.c: Detect billion laughs attack with isolated external parser
    
    When parsing DTD content with code like ..
    
      XML_Parser parser = XML_ParserCreate(NULL);
      XML_Parser ext_parser = XML_ExternalEntityParserCreate(parser, NULL, NULL);
      enum XML_Status status = XML_Parse(ext_parser, doc, (int)strlen(doc), XML_TRUE);
    
    .. there are 0 bytes accounted as direct input and all input from `doc` accounted
    as indirect input.  Now function accountingGetCurrentAmplification cannot calculate
    the current amplification ratio as "(direct + indirect) / direct", and it did refuse
    to divide by 0 as one would expect, but it returned 1.0 for this case to indicate
    no amplification over direct input.  As a result, billion laughs attacks from
    DTD-only input were not detected with this isolated way of using an external parser.
    
    The new approach is to assume direct input of length not 0 but 22 -- derived from
    ghost input "<!ENTITY a SYSTEM 'b'>", the shortest possible way to include an external
    DTD --, and do the usual "(direct + indirect) / direct" math with "direct := 22".
    
    GitHub issue #839 has more details on this issue and its origin in ClusterFuzz
    finding 66812.

hartwork added 2 commits March 6, 2024 23:41
…arser

When parsing DTD content with code like ..

  XML_Parser parser = XML_ParserCreate(NULL);
  XML_Parser ext_parser = XML_ExternalEntityParserCreate(parser, NULL, NULL);
  enum XML_Status status = XML_Parse(ext_parser, doc, (int)strlen(doc), XML_TRUE);

.. there are 0 bytes accounted as direct input and all input from `doc` accounted
as indirect input.  Now function accountingGetCurrentAmplification cannot calculate
the current amplification ratio as "(direct + indirect) / direct", and it did refuse
to divide by 0 as one would expect, but it returned 1.0 for this case to indicate
no amplification over direct input.  As a result, billion laughs attacks from
DTD-only input were not detected with this isolated way of using an external parser.

The new approach is to assume direct input of length not 0 but 22 -- derived from
ghost input "<!ENTITY a SYSTEM 'b'>", the shortest possible way to include an external
DTD --, and do the usual "(direct + indirect) / direct" math with "direct := 22".

GitHub issue #839 has more details on this issue and its origin in ClusterFuzz
finding 66812.
@hartwork hartwork force-pushed the issue-839-billion-laughs-isolated-external-parser branch from 58431b0 to 072eca0 Compare March 6, 2024 22:41
Copy link
Contributor

@Snild-Sony Snild-Sony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great commit message, making both the problem and the solution very clear.

@hartwork hartwork merged commit 5026213 into master Mar 7, 2024
68 checks passed
@hartwork hartwork changed the title Prevent billion laughs attacks in isolated external parser (part of #839) [CVE-REQUESTED] Prevent billion laughs attacks in isolated external parser (part of #839) Mar 8, 2024
@hartwork hartwork deleted the issue-839-billion-laughs-isolated-external-parser branch March 8, 2024 11:56
@carnil
Copy link

carnil commented Mar 10, 2024

CVE-2024-28757 seems related to this issue.

@hartwork
Copy link
Member Author

@carnil I confirm, found it in my mailbox from ~7 hours ago.

@hartwork hartwork changed the title [CVE-REQUESTED] Prevent billion laughs attacks in isolated external parser (part of #839) [CVE-2024-28757] Prevent billion laughs attacks in isolated external parser (part of #839) Mar 10, 2024
@hartwork
Copy link
Member Author

@carnil PS: Releasing 2.6.2 with this fix on Wednesday 2024-03-13 UTC+1 evening is the current plan — could be sooner or later (..) — but that's the plan.

@hartwork
Copy link
Member Author

@carnil 2.6.2 is out by now

sgunin pushed a commit to sgunin/oe-openembedded-core-contrib that referenced this pull request Mar 17, 2024
Picked patch from libexpat/libexpat#842
which is referenced in the NVD CVE report.

Signed-off-by: Peter Marko <peter.marko@siemens.com>
Signed-off-by: Steve Sakoman <steve@sakoman.com>
halstead pushed a commit to yoctoproject/poky that referenced this pull request Mar 25, 2024
Picked patch from libexpat/libexpat#842
which is referenced in the NVD CVE report.

(From OE-Core rev: c02175e97348836429cecbfad15d89be040bbd92)

Signed-off-by: Peter Marko <peter.marko@siemens.com>
Signed-off-by: Steve Sakoman <steve@sakoman.com>
jpuhlman pushed a commit to MontaVista-OpenSourceTechnology/poky that referenced this pull request Apr 2, 2024
Source: poky
MR: 132404, 131331
Type: Security Fix
Disposition: Merged from poky
ChangeID: fe9d4cb
Description:

Picked patch from libexpat/libexpat#842
which is referenced in the NVD CVE report.

(From OE-Core rev: c02175e97348836429cecbfad15d89be040bbd92)

Signed-off-by: Peter Marko <peter.marko@siemens.com>
Signed-off-by: Steve Sakoman <steve@sakoman.com>
Signed-off-by: Jeremy A. Puhlman <jpuhlman@mvista.com>
rkausch-fender added a commit to cclsoftware/libexpat that referenced this pull request Oct 18, 2024
commit 88b3ed553d8ad335559254863a33360d55b9f1d6
Merge: 29ef43a0 f9cfbb7f
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Sep 4 12:20:17 2024 +0200

    Merge pull request #896 from libexpat/issue-894-prepare-release

    Prepare release 2.6.3 (part of #894, ETA 2024-09-04)

commit f9cfbb7fcedabc40326dc287ed35f83de406224d
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Sep 1 21:14:19 2024 +0200

    Sync file headers

commit 156d4bab9d54ba230927b00d89b1b3bf8a9bec15
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Sep 1 21:03:50 2024 +0200

    Set release date for 2.6.3

commit 8707e02e1f20d5a4bb9101773b7b8736303cc029
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Sep 1 20:59:53 2024 +0200

    Bump version to 2.6.3

commit 93e5971fb513f2390e011f8444be6de18e480fe4
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Sep 1 20:59:06 2024 +0200

    Bump version info from 10:2:9 to 10:3:9

    See https://verbump.de/ for what these numbers do

commit 71e487dc1b87f1e70bf28117158cfef07fc6105e
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Sep 1 20:25:10 2024 +0200

    Changes: Document changes in release Expat 2.6.3

commit 29ef43a0bab633b41e71dd6d900fff5f6b3ad5e4
Merge: b8a7dca4 9bf0f2c1
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Sep 3 18:18:03 2024 +0200

    Merge pull request #892 from libexpat/taiyou-nextscaffoldpart-overflow

    [CVE-2024-45492] lib: Detect integer overflow in function `nextScaffoldPart` (fixes #889)

commit b8a7dca4670973347892cfc452b24d9001dcd6f5
Merge: e5d6bf01 8e439a99
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Sep 3 18:17:46 2024 +0200

    Merge pull request #891 from libexpat/taiyou-dtdcopy-malloc-overflow

    [CVE-2024-45491] lib: Detect integer overflow in `dtdCopy` (fixes #888)

commit e5d6bf015ee531df0a8751baa618d25b2de73a7c
Merge: 234654c5 2db23301
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Sep 3 18:17:32 2024 +0200

    Merge pull request #890 from libexpat/taiyou-xml-parsebuffer-len

    [CVE-2024-45490] lib: Reject negative len for `XML_ParseBuffer` (fixes #887)

commit 234654c58b3833b881b5a32fc8b57b09d39ecd81
Merge: ed4090af c158a62e
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Sep 1 16:45:32 2024 +0200

    Merge pull request #886 from berkayurun/master

    Remove `XML_DTD` guards before `is_param` accesses

commit 8e439a9947e9dc80a395c0c7456545d8d9d9e421
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Aug 19 22:34:13 2024 +0200

    lib: Detect integer overflow in dtdCopy

    Reported by TaiYou

commit 2db233019f551fe4c701bbbc5eb0fa58ff349daa
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Aug 25 19:09:51 2024 +0200

    doc: Document that XML_Parse/XML_ParseBuffer reject "len < 0"

commit c12f039b8024d6b9a11c20858370495ff6ff5245
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Aug 20 22:57:12 2024 +0200

    tests: Cover "len < 0" for both XML_Parse and XML_ParseBuffer

commit 5c1a31642e243f4870c0bd1f2afc7597976521bf
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Aug 19 22:26:07 2024 +0200

    lib: Reject negative len for XML_ParseBuffer

    Reported by TaiYou

commit 9bf0f2c16ee86f644dd1432507edff94c08dc232
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Aug 19 22:37:16 2024 +0200

    lib: Detect integer overflow in function nextScaffoldPart

    Reported by TaiYou

commit c158a62e57f6b2e8c61a7735dc458a22181ca0d7
Author: Berkay Eren Ürün <berkay.ueruen@siemens.com>
Date:   Wed Aug 21 12:49:34 2024 +0200

    Remove XML_DTD guards before is_param accesses

    As a part of the ENTITY struct, is_param is correctly initialized even
    when XML_DTD is not defined. This can be seen in the 'lookup' function,
    which sets all the ENTITY memory, including the is_param flag, to zero
    during the ENTITY creation. Additionally, is_param can only be assigned
    XML_TRUE when XML_DTD is defined, which makes XML_DTD checks before
    is_param accesses not necessary.

    Currently, some of the is_param accesses are guarded by the XML_DTD and
    some not. This commit removes all XML_DTD guards that are meant for
    is_param accesses.

commit ed4090af841ebd8a7b2e367280407d74e748a7dd
Merge: b1ab4745 35753a8c
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Aug 21 19:57:51 2024 +0200

    Merge pull request #885 from libexpat/fix-in-code-comment-typo

    Fix typo in a code comment

commit 35753a8ccccaac17387b58934049a709e501e46a
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Aug 20 21:15:17 2024 +0200

    lib: Fix typo in a code comment

commit b1ab4745f39bcbee65fe2ef9b4fa3b9fa46b06ce
Merge: dfa90b81 05735b8f
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Aug 19 20:30:43 2024 +0200

    Merge pull request #884 from libexpat/dependabot/github_actions/codespell-project/actions-codespell-2.1

    Actions(deps): Bump codespell-project/actions-codespell from 2.0 to 2.1

commit 05735b8f68d4e6cc2f7f664450467ec2dae32057
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Aug 19 12:32:12 2024 +0000

    Actions(deps): Bump codespell-project/actions-codespell from 2.0 to 2.1

    Bumps [codespell-project/actions-codespell](https://github.com/codespell-project/actions-codespell) from 2.0 to 2.1.
    - [Release notes](https://github.com/codespell-project/actions-codespell/releases)
    - [Commits](https://github.com/codespell-project/actions-codespell/compare/94259cd8be02ad2903ba34a22d9c13de21a74461...406322ec52dd7b488e48c1c4b82e2a8b3a1bf630)

    ---
    updated-dependencies:
    - dependency-name: codespell-project/actions-codespell
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit dfa90b81153a18555d78d4441697aba68ee61a68
Merge: a8898cdb 61886f8d
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Aug 12 22:54:04 2024 +0200

    Merge pull request #883 from libexpat/dependabot/github_actions/actions/upload-artifact-4.3.6

    Actions(deps): Bump actions/upload-artifact from 4.3.5 to 4.3.6

commit 61886f8dbddf39500a137bfe3ab605a3d10d8ea5
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Aug 12 12:16:07 2024 +0000

    Actions(deps): Bump actions/upload-artifact from 4.3.5 to 4.3.6

    Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.5 to 4.3.6.
    - [Release notes](https://github.com/actions/upload-artifact/releases)
    - [Commits](https://github.com/actions/upload-artifact/compare/89ef406dd8d7e03cfd12d9e0a4a378f454709029...834a144ee995460fba8ed112a2fc961b36a5ec5a)

    ---
    updated-dependencies:
    - dependency-name: actions/upload-artifact
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit a8898cdb1e0c9cc8c0ec1f7468601e6b8fd243a0
Merge: 6b3f93c6 1f9da870
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Aug 6 20:21:17 2024 +0200

    Merge pull request #882 from libexpat/dependabot/github_actions/actions/upload-artifact-4.3.5

    Actions(deps): Bump actions/upload-artifact from 4.3.4 to 4.3.5

commit 1f9da870e197dc91360e68c5d6b1ea3ef8652f48
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Aug 5 12:38:32 2024 +0000

    Actions(deps): Bump actions/upload-artifact from 4.3.4 to 4.3.5

    Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.4 to 4.3.5.
    - [Release notes](https://github.com/actions/upload-artifact/releases)
    - [Commits](https://github.com/actions/upload-artifact/compare/0b2256b8c012f0828dc542b3febcab082c67f72b...89ef406dd8d7e03cfd12d9e0a4a378f454709029)

    ---
    updated-dependencies:
    - dependency-name: actions/upload-artifact
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 6b3f93c6caa0308455beeced0268cfae04df3584
Merge: 0b6ab7cd e19e5233
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Jul 13 20:19:27 2024 +0200

    Merge pull request #880 from libexpat/readme-promote-call-for-help

    `README.md`: Promote call for help in the Changes file

commit e19e52331b20a15699a3ea26f2ebc301c80fd20c
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Jul 13 17:48:38 2024 +0200

    README.md: Promote call for help in the Changes file

    Documentation on the used Markdown extension:
    - https://github.blog/changelog/2023-12-14-new-markdown-extension-alerts-provide-distinctive-styling-for-significant-content/
    - https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#alerts

commit 0b6ab7cd20a73cfb6bc1424a4770d54c247e5938
Merge: feb65c62 09f8eddd
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Jul 13 02:01:40 2024 +0200

    Merge pull request #879 from libexpat/autotools-sync-cmake-files

    autotools: Sync CMake templates with CMake 3.28

commit feb65c625ce5dcc3b5459e75c01bbf9afb214589
Merge: 4c3f8641 0e9863e4
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Jul 13 00:46:25 2024 +0200

    Merge pull request #878 from libexpat/dependabot/github_actions/actions/upload-artifact-4.3.4

    Actions(deps): Bump actions/upload-artifact from 4.3.3 to 4.3.4

commit 09f8eddd8ef2150f0f2b8d07d3723b13e4f07b75
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Jul 13 00:13:25 2024 +0200

    autotools: Sync CMake templates with CMake 3.28

commit 0e9863e4832a9ee7d024c40ce5690bdb2763b5c0
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Jul 8 12:49:27 2024 +0000

    Actions(deps): Bump actions/upload-artifact from 4.3.3 to 4.3.4

    Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.3 to 4.3.4.
    - [Release notes](https://github.com/actions/upload-artifact/releases)
    - [Commits](https://github.com/actions/upload-artifact/compare/65462800fd760344b1a7b4382951275a0abb4808...0b2256b8c012f0828dc542b3febcab082c67f72b)

    ---
    updated-dependencies:
    - dependency-name: actions/upload-artifact
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 4c3f8641a71625860928641ca87b393827274028
Merge: bfd178c6 9269f9e6
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Jun 17 21:01:31 2024 +0200

    Merge pull request #876 from libexpat/dependabot/github_actions/actions/checkout-4.1.7

    Actions(deps): Bump actions/checkout from 4.1.6 to 4.1.7

commit 9269f9e68ff8dc469f2726a39bbda843f3d243d2
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Jun 17 12:07:06 2024 +0000

    Actions(deps): Bump actions/checkout from 4.1.6 to 4.1.7

    Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.6 to 4.1.7.
    - [Release notes](https://github.com/actions/checkout/releases)
    - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
    - [Commits](https://github.com/actions/checkout/compare/a5ac7e51b41094c92402da3b24376905380afc29...692973e3d937129bcbf40652eb9f2f61becf3332)

    ---
    updated-dependencies:
    - dependency-name: actions/checkout
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit bfd178c6350bbb46dbc71dae299a1d531468bf8a
Merge: 322ab5ff 1ee828c7
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue May 21 01:45:03 2024 +0200

    Merge pull request #874 from libexpat/dependabot/github_actions/actions/checkout-4.1.6

    Actions(deps): Bump actions/checkout from 4.1.5 to 4.1.6

commit 1ee828c752e102f6305749a88eae77ab7837975a
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon May 20 12:54:15 2024 +0000

    Actions(deps): Bump actions/checkout from 4.1.5 to 4.1.6

    Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.5 to 4.1.6.
    - [Release notes](https://github.com/actions/checkout/releases)
    - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
    - [Commits](https://github.com/actions/checkout/compare/44c2b7a8a4ea60a981eaca3cf939b5f4305c123b...a5ac7e51b41094c92402da3b24376905380afc29)

    ---
    updated-dependencies:
    - dependency-name: actions/checkout
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 322ab5ff7a0edd8de5300ce04731c0b4390f8827
Merge: 2703c85b 4f44375e
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun May 19 15:58:40 2024 +0200

    Merge pull request #873 from libexpat/fix-coverage-ci

    `coverage.yml`: Fix for image `ubuntu-22.04` of `20240514.2.0`

commit 4f44375e3f84c354b4ced2073fce44e3b0168376
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun May 19 14:13:10 2024 +0200

    coverage.yml: Fix for image ubuntu-22.04 of 20240514.2.0

commit 2703c85b0a84d7dbc56c9d733476a631a5edd4fd
Merge: 3ec84e45 197275e3
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed May 15 20:11:25 2024 +0200

    Merge pull request #871 from libexpat/dependabot/github_actions/actions/checkout-4.1.5

    Actions(deps): Bump actions/checkout from 4.1.4 to 4.1.5

commit 197275e39110e43fde1a7c7939d288001ff01a50
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon May 13 21:16:53 2024 +0000

    Actions(deps): Bump actions/checkout from 4.1.4 to 4.1.5

    Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.4 to 4.1.5.
    - [Release notes](https://github.com/actions/checkout/releases)
    - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
    - [Commits](https://github.com/actions/checkout/compare/0ad4b8fadaa221de15dcec353f45205ec38ea70b...44c2b7a8a4ea60a981eaca3cf939b5f4305c123b)

    ---
    updated-dependencies:
    - dependency-name: actions/checkout
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 3ec84e457fc0ae8026ee43504c0e3b60645afab7
Merge: da88e9a4 b0e67383
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon May 13 23:16:12 2024 +0200

    Merge pull request #872 from libexpat/fix-clang-format-ci

    `lib/siphash.h`: Apply clang-format 18.1.5

commit b0e673830e963c092cd1a2b71691c344ef2b0be9
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon May 13 22:00:56 2024 +0200

    lib/siphash.h: Apply clang-format 18.1.5

commit da88e9a4443dba37daea123f5e51b18a2611e03b
Merge: b58b3871 1253273f
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun May 5 15:21:03 2024 +0200

    Merge pull request #869 from dag-erling/des/non-gnu-sed

    Drop dependency on GNU sed

commit 1253273fe4cfb322efd1834debda84118edff950
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Thu May 2 17:20:26 2024 +0200

    Drop dependency on GNU sed.

    GNU sed supports `-i` (in-place editing) with an optional suffix for the
    backup copy.  Non-GNU implementations also support `-i`, but the suffix
    is not optional.  Replacing all occurrences of naked `-i` with `-i.bak`
    ensures our scripts work equally well with both.

commit b58b38719507f6700dd5a7898d88c36c8b95154a
Merge: c40938db 59295bef
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat May 4 16:06:51 2024 +0200

    Merge pull request #863 from dag-erling/des/fix-xmltest-log

    Don't require dos2unix.

commit c40938dbe02282e5975157b502c47da21350627e
Merge: e0cf7c85 54400c2e
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat May 4 16:04:40 2024 +0200

    Merge pull request #870 from dag-erling/des/sizeof-void-p

    Simplify handling of `SIZEOF_VOID_P`

commit 54400c2e0c86acce56e13acc260cc8bf7ee28719
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Fri May 3 01:20:31 2024 +0200

    autotools: Simplify handling of `SIZEOF_VOID_P`.

commit 59295befca67d9af43e5c4338076b7fdde91996a
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Mon Apr 22 16:37:00 2024 +0200

    fix-xmltest-log.sh: Rewrite in pure sed.

    This removes the need for installing dos2unix in development and CI
    environments.

commit e0cf7c85446261fb5d9cd11747b675ca731678ff
Merge: 9cbdb916 2f205773
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri May 3 00:52:18 2024 +0200

    Merge pull request #868 from dag-erling/des/update-ci

    Fix various CI issues

commit 2f2057733f837c4dbfa1c9a770b70b5ab7bc37bc
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Thu May 2 23:17:16 2024 +0200

    github-ci: Drop requirement for GNU coreutils.

commit 58539734545fbd2e271f6cb3f798c253390024cd
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Thu May 2 21:36:16 2024 +0200

    github-ci: Remove obsolete comments referencing Travis CI.

commit 2083722b95e2f6decba4113be3c5fa499de7dcc6
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Wed May 1 12:24:11 2024 +0200

    github-ci: Install docbook-xml.

    Some tests use the xmlwf documentation as sample input.  It is written in
    DocBook, and the tests appear to be failing because they try to fetch it
    at run time, which is not allowed.  Work around this by installing it in
    advance.

commit 26be7c3f117dd7488bfa3091f5cfdc488accff74
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Wed May 1 12:16:29 2024 +0200

    github-ci: Enable exhaustive branch analysis in cppcheck job.

commit d69aee5244a270417fa32535361b966c394a586d
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Wed May 1 11:57:50 2024 +0200

    github-ci: Switch macOS tests over to supported releases.

commit 85e01c40046684d9464637addd62f144b2959ed2
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Wed May 1 11:55:02 2024 +0200

    github-ci: Drop requirement for GNU find.

commit 8e7c117e8ff7ae7ec1224780b95153b61152f748
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Wed May 1 11:54:07 2024 +0200

    github-ci: Don't die if  already exists.

commit 9cbdb916de2a7bd1aa649e55efc38d2426680359
Merge: c82ca17b 73627c74
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed May 1 21:47:08 2024 +0200

    Merge pull request #865 from Ferenc-/fix-define-for-linux-syscall

    Fix `check_c_source_compiles` of `HAVE_SYSCALL_GETRANDOM`

commit 73627c7456db80db5d537f178132978b686c6fb7
Author: Ferenc Géczi <ferenc.gm@gmail.com>
Date:   Sat Apr 27 00:00:01 2024 +0000

    Use feature test macro for syscall prototype

    In order to cover the largest number of glibc and musl libc versions,
    withouth warnings, the decision here is to use `_GNU_SOURCE`,
    even if it enables a larger than necessary feature set.

    A feature macro is needed, because otherwise the `check_c_source_compiles`
    for `HAVE_SYSCALL_GETRANDOM` fails in cases when for example
    the default compiler flags include `-std=c99`:

    ````
    src.c:6:13: error: implicit declaration of function ‘syscall’ [-Wimplicit-function-declaration]
        6 |             syscall(SYS_getrandom, NULL, 0, 0);
          |             ^~~~~~~
    ````
    But this check should pass, as `SYS_getrandom` is available,
    only the declaration of `syscall` in `unistd.h` is conditional behind a macro.

    The exact minimal public macros, for enabling this are in `features.h`, and
    are version dependent.

    According to [5.04](
    https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/Archive/man-pages-5.04.tar.gz)
    and older versions of the `man 2 syscall` page,
    the recommended feature test macro is `_GNU_SOURCE`.
    Later on in [5.05](
    https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/Archive/man-pages-5.05.tar.gz)
    this statement has changed, to provide a smaller minimal feature set.
    Namely up to `glibc 2.18`  is `_BSD_SOURCE || _SVID_SOURCE`,
    but after that the `_DEFAULT_SOURCE` is recommended,
    and `_BSD_SOURCE || _SVID_SOURCE` is deprecated, and emits warning in later versions.
    Regardless of that the `_GNU_SOURCE` is still fully supported
    in every version and is suitable for our purposes.

    The musl libc doesn't use `_SVID_SOURCE` at all, but `_BSD_SOURCE` always works,
    plus in some newer versions `_DEFAULT_SOURCE` also sets `_BSD_SOURCE`,
    but `_GNU_SOURCE` covers the largest set of versions and is unlikely
    to be deprecated in the future.

    Further info about feature test macros:

    In glibc:
    https://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html

    In musl libc under the `Feature Test Macros Supported by musl` section:
    https://musl.libc.org/doc/1.1.24/manual.html

    Signed-off-by: Ferenc Géczi <ferenc.gm@gmail.com>

commit c82ca17b61de7e37d3b748b55b33adfd1a091392
Merge: 1f1ac992 2ddf759f
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Apr 30 01:09:12 2024 +0200

    Merge pull request #866 from libexpat/dependabot/github_actions/actions/checkout-4.1.4

    Actions(deps): Bump actions/checkout from 4.1.3 to 4.1.4

commit 1f1ac992bffc4ba2bc123fb020a971c63bf3566d
Merge: a2b44bd2 33ed8172
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Apr 29 22:40:37 2024 +0200

    Merge pull request #867 from libexpat/dependabot/github_actions/actions/upload-artifact-4.3.3

    Actions(deps): Bump actions/upload-artifact from 4.3.2 to 4.3.3

commit 33ed8172fb5e70df2f460b90d6d1adad17e64274
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Apr 29 12:06:24 2024 +0000

    Actions(deps): Bump actions/upload-artifact from 4.3.2 to 4.3.3

    Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.2 to 4.3.3.
    - [Release notes](https://github.com/actions/upload-artifact/releases)
    - [Commits](https://github.com/actions/upload-artifact/compare/1746f4ab65b179e0ea60a494b83293b640dd5bba...65462800fd760344b1a7b4382951275a0abb4808)

    ---
    updated-dependencies:
    - dependency-name: actions/upload-artifact
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 2ddf759f591fc10ed2241cb5bd77e04f3c66ef67
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Apr 29 12:06:18 2024 +0000

    Actions(deps): Bump actions/checkout from 4.1.3 to 4.1.4

    Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.3 to 4.1.4.
    - [Release notes](https://github.com/actions/checkout/releases)
    - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
    - [Commits](https://github.com/actions/checkout/compare/1d96c772d19495a3b5c517cd2bc0cb401ea0529f...0ad4b8fadaa221de15dcec353f45205ec38ea70b)

    ---
    updated-dependencies:
    - dependency-name: actions/checkout
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit a2b44bd2d28a6638c1cece06882b093a2fafd05f
Merge: 9134d0d6 abb1c4a3
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Apr 27 23:19:35 2024 +0200

    Merge pull request #864 from dag-erling/des/tests-readme

    tests: Convert README to Markdown and update.

commit abb1c4a3805967febb94d4d375f1be22b5580b5d
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Tue Apr 23 11:28:24 2024 +0200

    tests: Convert README to Markdown and update.

commit 9134d0d6e073ed505f26bdb8c23747961c575e79
Merge: 46062b60 1b6a4f19
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Apr 23 03:17:22 2024 +0200

    Merge pull request #861 from dag-erling/des/mkdir-m4

    Ensure that the m4 directory always exists.

commit 46062b600d1f375c3358492b97273783d0f49df7
Merge: 8fd3e86f 886f7ea7
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Apr 23 03:09:57 2024 +0200

    Merge pull request #862 from dag-erling/des/squiggle

    Protect us against Emacs users.

commit 8fd3e86f28016cad3d4e41ea9cadaf28c180f8b6
Merge: 4c64d111 f16b7aa1
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Apr 22 23:41:55 2024 +0200

    Merge pull request #859 from libexpat/dependabot/github_actions/actions/upload-artifact-4.3.2

    Actions(deps): Bump actions/upload-artifact from 4.3.1 to 4.3.2

commit 4c64d11182655fa5c5c916c3ff9ac47be575afc2
Merge: e48ab660 cd363842
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Apr 22 23:40:34 2024 +0200

    Merge pull request #860 from libexpat/dependabot/github_actions/actions/checkout-4.1.3

    Actions(deps): Bump actions/checkout from 4.1.2 to 4.1.3

commit 886f7ea7b7c8dd51b8a6f294009bdd794d399856
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Mon Apr 22 16:37:53 2024 +0200

    Protect us against Emacs users.

commit 1b6a4f19c6f97643b41a0640180084567fca8c2f
Author: Dag-Erling Smørgrav <des@des.dev>
Date:   Mon Apr 22 16:34:07 2024 +0200

    Ensure that the m4 directory always exists.

commit cd363842314fa24c85482796c25b7b3b89f607d5
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Apr 22 12:20:20 2024 +0000

    Actions(deps): Bump actions/checkout from 4.1.2 to 4.1.3

    Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.2 to 4.1.3.
    - [Release notes](https://github.com/actions/checkout/releases)
    - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
    - [Commits](https://github.com/actions/checkout/compare/9bb56186c3b09b4f86b1c65136769dd318469633...1d96c772d19495a3b5c517cd2bc0cb401ea0529f)

    ---
    updated-dependencies:
    - dependency-name: actions/checkout
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit f16b7aa1ec0e6f394601a99713f2650d8ca08864
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Apr 22 12:20:09 2024 +0000

    Actions(deps): Bump actions/upload-artifact from 4.3.1 to 4.3.2

    Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.1 to 4.3.2.
    - [Release notes](https://github.com/actions/upload-artifact/releases)
    - [Commits](https://github.com/actions/upload-artifact/compare/5d5d22a31266ced268874388b861e4b58bb5c2f3...1746f4ab65b179e0ea60a494b83293b640dd5bba)

    ---
    updated-dependencies:
    - dependency-name: actions/upload-artifact
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit e48ab6604f04ee71f311b3e8170379b48ce77c2b
Merge: ef50fb20 d420c32d
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Apr 7 22:30:04 2024 +0200

    Merge pull request #851 from libexpat/autotools-sync-cmake-files

    autotools: Sync CMake templates with CMake 3.27

commit ef50fb208b9b448ae76fbecb464b3c15f2f4e865
Merge: 059a4aa7 26f7cbbf
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Apr 4 18:31:28 2024 +0200

    Merge pull request #855 from libexpat/issue-854-cmake-fix-use-of-check-symbol-exists

    cmake: Fix check for symbols `size_t` and `off_t` (fixes #854)

commit 059a4aa71df6c0f5ef90306db64e2ee546696db7
Merge: 5434a740 13e84bb3
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Apr 4 01:09:38 2024 +0200

    Merge pull request #856 from libexpat/fix-main

    Fix `main()` to `main(void)`

commit 26f7cbbf4abc019fc814fc4e960864c3e7ccea0f
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Apr 3 00:44:10 2024 +0200

    cmake: Fix check for symbols `size_t` and `off_t`

    The two issues with the previous approach were that:

    1. `check_symbol_exists` would store "1" or "" into
       variable `off_t` rather than string "off_t", and

    2. (`check_symbol_exists` would not find `off_t` or
       `size_t` on modern Linux).

    Was reported with NetBSD 9.3.

    `size_t` is part of C99 (which Expat requires), so
    only the `off_t` half remains.

commit 5434a74081a39bb43fb95cdec291dbb61cb9acd0
Merge: d450c1b4 2b8492d6
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Apr 3 23:51:03 2024 +0200

    Merge pull request #853 from bluhm/find-path

    Always provide path to find(1) for portability

commit 13e84bb374237fde1815cf0315a8b93c2c46e16b
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Apr 3 02:20:46 2024 +0200

    Fix `main()` to `main(void)`

commit 2b8492d622913829ff69c803acfbe2ae6673f471
Author: Alexander Bluhm <alexander.bluhm@gmx.net>
Date:   Mon Apr 1 22:56:49 2024 +0200

    Always provide path to find.

    Running find without path is a GNU extension.  GNU find uses current
    directory as starting-point in this case.  Better always use an
    explicit . in build scripts to support find on other systems.

commit d420c32d67d1b4829883ad05b9f024b3f4ed649a
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Mar 29 22:17:56 2024 +0100

    autotools: Sync CMake templates with CMake 3.27

commit d450c1b439ab56bfd04def5146d2c728666e0f00
Merge: d04f8ef8 2874a26e
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Mar 23 19:37:22 2024 +0100

    Merge pull request #741 from libexpat/drop-support-msvc-2017

    [>=2024-04-02] Drop support for Visual Studio 15 2017

commit 2874a26eeb33a2c6f1ad05ef94373d453eaf98d7
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Mar 23 17:52:37 2024 +0100

    win32/build_expat_iss.bat: Add missing "-A Win32" for Visual Studio 16 2019

commit f8fb85ec8c1f3749c3ac8d932ce8909f0051632a
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Sep 2 21:33:35 2023 +0200

    Drop support for Visual Studio 15 2017

commit d04f8ef8874c2829b624cb6c832bb5b9f7ce8dc2
Merge: a59c3edf 571a62c8
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Mar 19 22:50:07 2024 +0100

    Merge pull request #850 from libexpat/dependabot/github_actions/actions/checkout-4.1.2

    Actions(deps): Bump actions/checkout from 4.1.1 to 4.1.2

commit 571a62c8f5f979232e96691b29223dbae8bb603b
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Mar 18 12:03:05 2024 +0000

    Actions(deps): Bump actions/checkout from 4.1.1 to 4.1.2

    Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.1 to 4.1.2.
    - [Release notes](https://github.com/actions/checkout/releases)
    - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
    - [Commits](https://github.com/actions/checkout/compare/b4ffde65f46336ab88eb53be808477a3936bae11...9bb56186c3b09b4f86b1c65136769dd318469633)

    ---
    updated-dependencies:
    - dependency-name: actions/checkout
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit a59c3edffa54a77b8d7b268ef527da541076ca6a
Merge: fa75b965 91116dfa
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Mar 17 23:00:53 2024 +0100

    Merge pull request #849 from libexpat/allow-triggering-github-actions-workflows-manually

    Allow triggering GitHub Actions workflows manually

commit 91116dfa7eabcb0b3e115dd5b028fd0a39caf3b9
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Mar 17 15:28:36 2024 +0100

    Allow triggering GitHub Actions workflows manually

    Some already had "workflow_dispatch" enabled.

commit fa75b96546c069d17b8f80d91e0f4ef0cde3790d
Merge: 5bf8ed66 8548bc03
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Mar 13 17:37:37 2024 +0100

    Merge pull request #843 from libexpat/issue-838-prepare-release

    Prepare release 2.6.2 (part of #838, ETA 2024-03-13)

commit 8548bc03fdb887c8720f01e95440f1406bd15ffa
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Mar 9 16:30:09 2024 +0100

    Changes: Add call for help

commit 86d6052c5ec63fad3ef564cbdef1e1ac3d18f30e
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Mar 8 13:17:12 2024 +0100

    Set release date for 2.6.2

commit 13cff445fa383733da1d6011d593d0ecd9d456ab
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Mar 8 13:12:02 2024 +0100

    Bump version to 2.6.2

commit 557f1255f9f431646e27b84b1a2fe63380352b70
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Mar 8 13:07:38 2024 +0100

    Bump version info from 10:1:9 to 10:2:9

    See https://verbump.de/ for what these numbers do

commit 98ee1baef80da27e148e55d4b2ae96143716064f
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Mar 8 13:07:00 2024 +0100

    Changes: Document changes in release Expat 2.6.2

commit 5bf8ed66efd5672da960f89fceed3fda92f10cee
Merge: 50262138 c32ed081
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Mar 13 14:22:48 2024 +0100

    Merge pull request #847 from TomasKorbar/doc-makefile

    [2.6.1] Fix DOCBOOK_TO_MAN variable use in doc Makefile

commit c32ed08191f21206ef14c30852f9846ef6977276
Author: Tomas Korbar <tkorbar@redhat.com>
Date:   Wed Mar 13 11:01:52 2024 +0100

    Fix DOCBOOK_TO_MAN variable use in doc Makefile

    Not using quotes causes problems when DOCBOOK_TO_MAN contains
    command and argument

commit 5026213864ba1a11ef03ba2e8111af8654e9404d
Merge: 27525ada 072eca0b
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Mar 7 22:14:09 2024 +0100

    Merge pull request #842 from libexpat/issue-839-billion-laughs-isolated-external-parser

    Prevent billion laughs attacks in isolated external parser (part of #839)

commit 27525adabdac2a48a5ce5d2f5588fce741a0c8e3
Merge: 6bcb9915 565ab44a
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Mar 7 20:24:13 2024 +0100

    Merge pull request #841 from libexpat/issue-839-reject-direct-parameter-entity-recursion

    Reject direct parameter entity recursion (part of #839)

commit 072eca0b72373da103ce15f8f62d1d7b52695454
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Mar 5 22:43:03 2024 +0100

    tests: Cover amplification tracking for isolated external parser

commit 1d50b80cf31de87750103656f6eb693746854aa8
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Mar 4 23:49:06 2024 +0100

    lib/xmlparse.c: Detect billion laughs attack with isolated external parser

    When parsing DTD content with code like ..

      XML_Parser parser = XML_ParserCreate(NULL);
      XML_Parser ext_parser = XML_ExternalEntityParserCreate(parser, NULL, NULL);
      enum XML_Status status = XML_Parse(ext_parser, doc, (int)strlen(doc), XML_TRUE);

    .. there are 0 bytes accounted as direct input and all input from `doc` accounted
    as indirect input.  Now function accountingGetCurrentAmplification cannot calculate
    the current amplification ratio as "(direct + indirect) / direct", and it did refuse
    to divide by 0 as one would expect, but it returned 1.0 for this case to indicate
    no amplification over direct input.  As a result, billion laughs attacks from
    DTD-only input were not detected with this isolated way of using an external parser.

    The new approach is to assume direct input of length not 0 but 22 -- derived from
    ghost input "<!ENTITY a SYSTEM 'b'>", the shortest possible way to include an external
    DTD --, and do the usual "(direct + indirect) / direct" math with "direct := 22".

    GitHub issue #839 has more details on this issue and its origin in ClusterFuzz
    finding 66812.

commit 565ab44a42ac1c973fcc2f47ae0a9ba23b6c5f07
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Mar 3 01:35:35 2024 +0100

    tests: Cover rejection of direct parameter entity recursion

commit a4c86a395ee447c59175c762af3d17f7107b2261
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Mar 3 02:19:58 2024 +0100

    lib/xmlparse.c: Reject directly recursive parameter entities

commit 6bcb991574a084b3ed6d79c4ef9d78d14285d85f
Merge: a590b2d5 8f75c536
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Mar 1 20:15:09 2024 +0100

    Merge pull request #837 from libexpat/extend-2-6-1-change-log

    Add missing #821 #824 to 2.6.1 change log

commit 8f75c536158fd480f950925333fcd0730bac0142
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Feb 29 22:09:53 2024 +0100

    Changes: Add missing #821 #824 to 2.6.1 change log

commit a590b2d5846865412182805b853dd91d18f38c8d
Merge: 1cf882e7 58ff7c39
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Feb 29 21:19:01 2024 +0100

    Merge pull request #834 from libexpat/issue-832-prepare-release

    Prepare release 2.6.1 (part of #832, ETA 2024-02-29)

commit 1cf882e79cbacbd68a406b24a3859a12033465fa
Merge: a387201c ea528347
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Feb 29 20:07:11 2024 +0100

    Merge pull request #836 from libexpat/issue-828-expose-billion-laughs-api-with-xml-dtd-without-xml-ge

    Expose billion laughs API with `XML_DTD` without `XML_GE` (fixes #828)

commit 58ff7c39eae53215da9bba028c316ed85348d31b
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Feb 26 23:52:42 2024 +0100

    Sync file headers

commit fce4b9f3b300a9e5c05f6cd170dce3d4d10fe04a
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Feb 26 23:38:45 2024 +0100

    Set release date for 2.6.1

commit dfe043fe6aa8d35a3fa1d7ea96299f0925fe8114
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Feb 26 23:43:09 2024 +0100

    Bump version to 2.6.1

commit fbe7b9345b22f9d5cce61735219a253b34504000
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Feb 26 23:41:24 2024 +0100

    Bump version info from 10:0:9 to 10:1:9

    See https://verbump.de/ for what these numbers do

commit 3dc137ea05a0c108cd89c91ef68d85798920653d
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Feb 26 23:36:50 2024 +0100

    Changes: Document changes in release Expat 2.6.1

commit ea528347091bf56d4c02858da16ea4eecfb5fb2b
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Feb 28 20:29:47 2024 +0100

    doc/reference.html: Drop inaccurate statement about XML_* macros

    The statement is falsified by these macros:
    - XML_ATTR_INFO
    - XML_DTD
    - XML_GE

commit 1e028f2ef751b366ab2b4b9b682ae5691013e42e
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Feb 28 20:44:53 2024 +0100

    lib/expat.h: Expose billion laughs API for XML_DTD without XML_GE

    Regression from commit caa27198637683b15d810737bb8a6a81af19bfa5 .

commit a387201ca4fb68629b09cb2201145f7f004e736c
Merge: 9dcb74f5 0106682e
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Feb 28 00:55:34 2024 +0100

    Merge pull request #833 from libexpat/configure-ac-protect-multilib

    `configure.ac`: Protect against `expat_config.h.in` defining `SIZEOF_VOID_P`

commit 0106682ea6be456de3d850f037781d03603ff95e
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Feb 26 23:12:45 2024 +0100

    configure.ac: Protect against expat_config.h.in defining SIZEOF_VOID_P

commit 9dcb74f552d5f1604c8574665b2b37c8bc52ad2a
Merge: 5b940f4a 7e2a0da9
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Feb 26 21:41:30 2024 +0100

    Merge pull request #829 from libexpat/hide-test-only-code-behind-new-macro

    Hide test-only code behind new (internal) macro `XML_TESTING` (alternative to #826)

commit 7e2a0da9ba6581c073484ff400a98fd4037b5c73
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Feb 21 12:53:48 2024 +0100

    lib: Hide some test-only code behind new macro XML_TESTING

commit a4a420eedcc5426f67b8ed64bbf88f70fce53963
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Feb 21 12:53:03 2024 +0100

    Autotools: Turn libexpatinternal.la into standalone library

    .. so that we can now have code in say xmlparse.c that does not
    end up in libexpat.so but still runs when executing the test suite.

commit 5b940f4a650bd1d1d04dd1280d6e920547f0c580
Merge: b7e1a110 0f6b39d2
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 20 20:40:41 2024 +0100

    Merge pull request #824 from libexpat/issue-821-improve-make-clean-for-configure-without-docbook

    Autotools: Re-work handling of xmlwf.1 (fixes #821)

commit 0f6b39d2f513aae16c7377fc802c25ab81551c42
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 13 19:42:04 2024 +0100

    Autotools: Re-work handling of xmlwf.1

    File "doc/xmlwf.1" should not be cleaned when building with
    "./configure --without-docbook", and re-compilation of the file
    should take precedence over a pre-built copy where available.

    Also, variable CLEANFILES can be used to simplify things a bit
    in Makefile.am.

commit b7e1a1101133725c79e2170b0870cca8543e5323
Merge: 4ff4c544 dc8499f2
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 13 18:30:35 2024 +0100

    Merge pull request #817 from SonyMobile/clockless-test

    tests: Replace clock counting with counting scanned bytes

commit dc8499f295bcb5e0b1ff89aa3ca5ca65ddb4ca7b
Author: Snild Dolkow <snild@sony.com>
Date:   Wed Feb 7 13:00:45 2024 +0100

    tests: Replace clock counting with scanned bytes in linear-time test

    This removes the dependency on CLOCKS_PER_SEC that prevented this test
    from running properly on some platforms, as well as the inherent
    flakiness of time measurements.

    Since later commits have introduced g_bytesScanned (and before that,
    g_parseAttempts), we can use that value as a proxy for parse time
    instead of clock().

commit fe0177cd3fe73ab2ad2e5e749a6badbbd7ec1d83
Author: Snild Dolkow <snild@sony.com>
Date:   Wed Feb 7 12:57:19 2024 +0100

    tests: Replace g_parseAttempts with g_bytesScanned

    This was used to estimate the number of scanned bytes. Just exposing
    that number directly will be more precise.

commit 4ff4c544aa2a3912afb2735d31681dfefe8a732e
Merge: 226201d1 aed1ed76
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Feb 12 14:52:18 2024 +0100

    Merge pull request #820 from libexpat/dependabot/github_actions/actions/upload-artifact-4.3.1

    Actions(deps): Bump actions/upload-artifact from 4.3.0 to 4.3.1

commit aed1ed769d01ee9e616286282d3447094f608237
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Feb 12 12:09:58 2024 +0000

    Actions(deps): Bump actions/upload-artifact from 4.3.0 to 4.3.1

    Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.0 to 4.3.1.
    - [Release notes](https://github.com/actions/upload-artifact/releases)
    - [Commits](https://github.com/actions/upload-artifact/compare/26f96dfa697d77e81fd5907df203aa23a56210a8...5d5d22a31266ced268874388b861e4b58bb5c2f3)

    ---
    updated-dependencies:
    - dependency-name: actions/upload-artifact
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 226201d10dd63e4995955b6d43c96e71437f19ca
Merge: 4033d6dc 3f60a47c
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sun Feb 11 16:45:16 2024 +0100

    Merge pull request #819 from th1722/patch-1

    Fix compiler warnings

commit 3f60a47cb5716bb810789a12ef6024c1dc448164
Author: Taichi Haradaguchi <20001722@ymail.ne.jp>
Date:   Fri Feb 9 19:28:35 2024 +0900

    Fix compiler warnings

    > In file included from ./../lib/internal.h:149,
    >                  from codepage.c:38:
    > ./../lib/expat.h:1045:5: warning: "XML_GE" is not defined, evaluates to 0 [-Wundef]
    >  1045 | #if XML_GE == 1
    >       |     ^~~~~~
    > ./../lib/internal.h:158:5: warning: "XML_GE" is not defined, evaluates to 0 [-Wundef]
    >   158 | #if XML_GE == 1
    >       |     ^~~~~~

commit 4033d6dc5797eca420bb1e109a89a3953e50036e
Merge: 849da3e3 d4f958e3
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Feb 8 17:24:26 2024 +0100

    Merge pull request #818 from libexpat/fix-clang-format-ci

    Get clang-format CI back in sync

commit d4f958e345163a6028f69c0efa2f62f5189845be
Author: clang-format 18.1.0 <clang-format@tool.invalid>
Date:   Thu Feb 8 15:18:43 2024 +0100

    Mass-apply clang-format 18.1.0

commit 849da3e3fe727fccef5e96ef35482d66447f06a2
Merge: 8198e4bf 2a10e173
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 6 17:49:41 2024 +0100

    Merge pull request #776 from libexpat/issue-775-prepare-release

    Prepare release 2.6.0 (part of #775, ETA is 2024-02-07)

commit 2a10e173ab6a2468ee47b9426fce274168e4ea66
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 6 14:13:00 2024 +0100

    Sync file headers

commit 92f10eb800bac5477010eb4dc32005e698ba406a
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 6 12:32:08 2024 +0100

    .mailmap: Add Joyce Brum and Owain Davies

commit b5ae2481b0dcd8e682fcdd7f378506e1f4901ecd
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Feb 2 19:31:02 2024 +0100

    Set release date for 2.6.0

commit 310a1977f471b6211a37c7d73e05c1a26eeaa9df
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Feb 2 19:27:52 2024 +0100

    Bump version to 2.6.0

commit b9fd46523161147c12468f8e1da2cd0235518228
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Feb 2 19:23:22 2024 +0100

    Bump version info from 9:10:8 to 10:0:9

    See https://verbump.de/ for what these numbers do

commit ae06168b64e7520b0adaa2a8b7f9837df187a17a
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Oct 21 18:07:47 2023 +0200

    Changes: Document changes in release Expat 2.6.0

commit 8198e4bfede103c11ededa0167b634d0e88c7f77
Merge: 9944b712 9c16d1c5
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 6 11:09:42 2024 +0100

    Merge pull request #815 from libexpat/fix-pkg-config-file-for-static-build-on-windows

    pkg-config: Add missing `-DXML_STATIC` for Windows (alternative to #805)

commit 9c16d1c5b4959bed843bc0f1d065ad25a5e1b778
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 6 00:10:09 2024 +0100

    pkg-config: Add missing -DXML_STATIC (for Windows)

    This affects the output of command "pkg-config --cflags --static expat".

commit 9944b71234d96d529cff01a80b52776d974838fd
Merge: b6243248 bc7490a4
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 6 00:16:23 2024 +0100

    Merge pull request #813 from libexpat/issue-812-protect-against-closing-entities-out-of-order

    Protect against closing entities out of order (fixes #812)

commit b6243248a97026aacdf343a2bab7af53e66e1ea2
Merge: 127aa340 aba268e2
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Feb 6 00:00:00 2024 +0100

    Merge pull request #814 from libexpat/fix-make-check-for-arm64-freebsd

    tests: Fix `CLOCKS_PER_SEC` guard for arm64 FreeBSD reality

commit aba268e2c00069818a60809aed986d993f8dd397
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Feb 2 17:50:43 2024 +0100

    tests/basic_tests.c: Fix CLOCKS_PER_SEC guard for arm64 FreeBSD reality

    CLOCKS_PER_SEC turned out to be as small as 128 in practice
    on machine cfarm240.cfarm.net .

commit 127aa340d3435ba0e83ee643362419e3486eb5a1
Merge: 34b598c5 7352d303
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Wed Jan 31 01:49:59 2024 +0100

    Merge pull request #809 from libexpat/clang-format-18

    CI: Upgrade to clang-format 18

commit 7352d3035bb7881fc87e596b754e46bf35136ca9
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Jan 30 22:58:48 2024 +0100

    clang-*.yml: Fix accidental trailing whitespace

commit 37d018478161ad932405da7181af80ca9e52909c
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Jan 26 15:51:17 2024 +0100

    clang-format.yml: Bump to clang-format 18

commit 137a57808748d799558a88056a4aef4d04994320
Author: clang-format 18.1.0 <clang-format@tool.invalid>
Date:   Sat Jan 27 19:24:45 2024 +0100

    Mass-apply clang-format 18.1.0

commit c594eedfa802f85df3d2226b25ace52b902a63f4
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Jan 27 19:21:13 2024 +0100

    apply-clang-format.sh: Drop workaround for lib/siphash.h

    Does not seem needed anymore (or running the script would
    produce a diff).

commit 5d2a438af25e16297c1681539fe6d19852327320
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Sat Jan 27 19:17:25 2024 +0100

    apply-clang-format.sh: Use "git ls-files" rather than "find"

    .. and reduce difference with sibling script apply-clang-tidy.sh .

commit 34b598c5f594b015c513c73f06e7ced3323edbf1
Merge: 2becc8a8 d5b02e96
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Jan 30 22:54:37 2024 +0100

    Merge pull request #789 from SonyMobile/partial-token-perf

    Speed up parsing of big tokens

commit bc7490a4a79ab9302d9c76dcad8c0ce88f43d93f
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Jan 29 21:37:16 2024 +0100

    tests/misc_tests.c: Add regression test for closing entities out of order

commit c4208e7fd1aecb82dc9a44918df7134594560df5
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Tue Jan 30 01:34:43 2024 +0100

    lib/xmlparse.c: Protect against closing entities out of order

commit d5b02e96ab95d2a7ae0aea72d00054b9d036d76d
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Nov 9 19:28:05 2023 +0100

    xmlwf: Document argument "-q"

    Rebased-and-adapted-by: Snild Dolkow <snild@sony.com>

commit 09fdf998e7cf3f8f9327e6602077791095aedd4d
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Thu Nov 9 19:14:14 2023 +0100

    xmlwf: Support disabling reparse deferral

    Rebased-and-adapted-by: Snild Dolkow <snild@sony.com>

commit 8f8aaf5c8e8a6e812dd8dadd96cf9bd044bc085a
Author: Snild Dolkow <snild@sony.com>
Date:   Fri Nov 24 16:31:56 2023 +0100

    tests: Check heuristic bypass with varying buffer fill sizes

    The bypass works on the assumption that the application uses a
    consistent fill size. Let's make some assertions about what should
    happen when the application doesn't do that -- most importantly,
    that parsing does happen eventually, and that the number of scanned
    bytes doesn't explode.

commit 182bbc350ed8b3c547133a9a44a4f30a0ba3b77e
Author: Snild Dolkow <snild@sony.com>
Date:   Mon Jan 29 16:43:32 2024 +0100

    tests: Make it clear to clang-tidy that assert_true may not return

    The key is to have __attribute__((noreturn)) somewhere that clang-tidy
    can see it. In this case, this is the _fail() function, which is
    conditionally called from the assert_true() macro.

    This will ensure that clang-tidy doesn't complain about NULL values
    that we've asserted against in tests.

commit 2becc8a81d9694bdfb5810eeafee50050d55b43a
Merge: 183270d5 b04a01d4
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Mon Jan 29 17:58:59 2024 +0100

    Merge pull request #811 from libexpat/dependabot/github_actions/actions/upload-artifact-4.3.0

    Actions(deps): Bump actions/upload-artifact from 4.2.0 to 4.3.0

commit 3d8141d26a3b01ff948e00956cb0723a89dadf7f
Author: Snild Dolkow <snild@sony.com>
Date:   Mon Nov 20 16:11:24 2023 +0100

    Bypass partial token heuristic when nearing full buffer

    ...instead of only when approaching the maximum buffer size INT/2+1.

    We'd like to give applications a chance to finish parsing a large token
    before buffer reallocation, in case the reallocation fails.

    By bypassing the reparse deferral heuristic when getting close to the
    filling the buffer, we give them this chance -- if the whole token is
    present in the buffer, it will be parsed at that time.

    This may come at the cost of some extra reparse attempts. For a token
    of n bytes, these extra parses cause us to scan over a maximum of
    2n bytes (... + n/8 + n/4 + n/2 + n). Therefore, parsing of big tokens
    remains O(n) in regard how many bytes we scan in attempts to parse. The
    cost in reality is lower than that, since the reparses that happen due
    to the bypass will affect m_partialTokenBytesBefore, delaying the next
    ratio-based reparse. Furthermore, only the first token that "breaks
    through" a buffer ceiling takes that extra reparse attempt; subsequent
    large tokens will only bypass the heuristic if they manage to hit the
    new buffer ceiling.

    Note that this cost analysis depends on the assumption that Expat grows
    its buffer by doubling it (or, more generally, grows it exponentially).
    If this changes, the cost of this bypass may increase. Hopefully, this
    would be caught by test_big_tokens_take_linear_time or the new test.

    The bypass logic assumes that the application uses a consistent fill.
    If the app increases its fill size, it may miss the bypass (and the
    normal heuristic will apply). If the app decreases its fill size, the
    bypass may be hit multiple times for the same buffer size. The very
    worst case would be to always fill half of the remaining buffer space,
    in which case parsing of a large n-byte token becomes O(n log n).

    As an added bonus, the new test case should be faster than the old one,
    since it doesn't have to go all the way to 1GiB to check the behavior.

    Finally, this change necessitated a small modification to two existing
    tests related to reparse deferral. These tests are testing the deferral
    enabled setting, and assume that reparsing will not happen for any other
    reason. By pre-growing the buffer, we make sure that this new deferral
    does not affect those test cases.

commit 60b74209899a67d426d208662674b55a5eed918c
Author: Snild Dolkow <snild@sony.com>
Date:   Wed Oct 4 16:00:14 2023 +0200

    Bypass partial token heuristic when close to maximum buffer size

    For huge tokens, we may end up in a situation where the partial token
    parse deferral heuristic demands more bytes than Expat's maximum buffer
    size (currently ~half of INT_MAX) could fit.

    INT_MAX/2 is 1024 MiB on most systems. Clearly, a token of 950 MiB could
    fit in that buffer, but the reparse threshold might be such that
    callProcessor() will defer it, allowing the app to keep filling the
    buffer until XML_GetBuffer() eventually returns a memory error.

    By bypassing the heuristic when we're getting close to the maximum
    buffer size, it will once again be possible to parse tokens in the size
    range INT_MAX/2/ratio < size < INT_MAX/2 reliably.

    We subtract the last buffer fill size as a way to detect that the next
    XML_GetBuffer() call has a risk of returning a memory error -- assuming
    that the application is likely to keep using the same (or smaller) fill.

    We subtract XML_CONTEXT_BYTES because that's the maximum amount of bytes
    that could remain at the start of the buffer, preceding the partial
    token. Technically, it could be fewer bytes, but XML_CONTEXT_BYTES is
    normally small relative to INT_MAX, and is much simpler to use.

    Co-authored-by: Sebastian Pipping <sebastian@pipping.org>

commit ad9c01be8ee5d3d5cac2bfd3949ad764541d35e7
Author: Snild Dolkow <snild@sony.com>
Date:   Thu Oct 26 13:55:02 2023 +0200

    Make external entity parser inherit partial token heuristic setting

    The test is essentially a copy of the existing test for the setter,
    adapted to run on the external parser instead of the original one.

    Suggested-by: Sebastian Pipping <sebastian@pipping.org>
    CI-fighting-assistance-by: Sebastian Pipping <sebastian@pipping.org>

commit 8ddd8e86aa446d02eb8d398972d3b10d4cad908a
Author: Snild Dolkow <snild@sony.com>
Date:   Fri Sep 29 10:14:59 2023 +0200

    Try to parse even when incoming len is zero

    If the reparse deferral setting has changed, it may be possible to
    finish a token.

commit 1d3162da8a85a398ab451aadd6c2ad19587e5a68
Author: Snild Dolkow <snild@sony.com>
Date:   Mon Sep 11 15:31:24 2023 +0200

    Add app setting for enabling/disabling reparse heuristic

    Suggested-by: Sebastian Pipping <sebastian@pipping.org>
    CI-fighting-assistance-by: Sebastian Pipping <sebastian@pipping.org>

commit 09957b8ced725b96a95acff150facda93f03afe1
Author: Snild Dolkow <snild@sony.com>
Date:   Thu Oct 26 10:41:00 2023 +0200

    Allow XML_GetBuffer() with len=0 on a fresh parser

    len=0 was previously OK if there had previously been a non-zero call.
    It makes sense to allow an application to work the same way on a
    newly-created parser, and not have to care if its incoming buffer
    happens to be 0.

commit f1eea784d0429bc4813a3d66a8e24e6c9df56be7
Author: Snild Dolkow <snild@sony.com>
Date:   Mon Nov 6 09:22:48 2023 +0100

    tests: Add max_slowdown info in test_big_tokens_take_linear_time

    Suggested-by: Sebastian Pipping <sebastian@pipping.org>

commit 9fe3672459c1bf10926b85f013aa1b623d855545
Author: Snild Dolkow <snild@sony.com>
Date:   Mon Sep 18 20:32:55 2023 +0200

    tests: Run both with and without partial token heuristic

    If we always run with the heuristic enabled, it may hide some bugs by
    grouping up input into bigger parse attempts.

    CI-fighting-assistance-by: Sebastian Pipping <sebastian@pipping.org>

commit 1b9d398517befeb944cbbadadf10992b07e96fa2
Author: Snild Dolkow <snild@sony.com>
Date:   Mon Sep 4 17:21:14 2023 +0200

    Don't update partial token heuristic on error

    Suggested-by: Sebastian Pipping <sebastian@pipping.org>

commit 9cdf9b8d77d5c2c2a27d15fb68dd3f83cafb45a1
Author: Snild Dolkow <snild@sony.com>
Date:   Thu Aug 17 16:25:26 2023 +0200

    Skip parsing after repeated partials on the same token

    When the parse buffer contains the starting bytes of a token but not
    all of them, we cannot parse the token to completion. We call this a
    partial token.  When this happens, the parse position is reset to the
    start of the token, and the parse() call returns. The client is then
    expected to provide more data and call parse() again.

    In extreme cases, this means that the bytes of a token may be parsed
    many times: once for every buffer refill required before the full token
    is present in the buffer.

    Math:
      Assume there's a token of T bytes
      Assume the client fills the buffer in chunks of X bytes
      We'll try to parse X, 2X, 3X, 4X ... until mX == T (technically >=)
      That's (m²+m)X/2 = (T²/X+T)/2 bytes parsed (arithmetic progression)
      While it is alleviated by larger refills, this amounts to O(T²)

    Expat grows its internal buffer by doubling it when necessary, but has
    no way to inform the client about how much space is available. Instead,
    we add a heuristic that skips parsing when we've repeatedly stopped on
    an incomplete token. Specifically:

     * Only try to parse if we have a certain amount of data buffered
     * Every time we stop on an incomplete token, double the threshold
     * As soon as any token completes, the threshold is reset

    This means that when we get stuck on an incomplete token, the threshold
    grows exponentially, effectively making the client perform larger buffer
    fills, limiting how many times we can end up re-parsing the same bytes.

    Math:
      Assume there's a token of T bytes
      Assume the client fills the buffer in chunks of X bytes
      We'll try to parse X, 2X, 4X, 8X ... until (2^k)X == T (or larger)
      That's (2^(k+1)-1)X bytes parsed -- e.g. 15X if T = 8X
      This is equal to 2T-X, which amounts to O(T)

    We could've chosen a faster growth rate, e.g. 4 or 8. Those seem to
    increase performance further, at the cost of further increasing the
    risk of growing the buffer more than necessary. This can easily be
    adjusted in the future, if desired.

    This is all completely transparent to the client, except for:
    1. possible delay of some callbacks (when our heuristic overshoots)
    2. apps that never do isFinal=XML_TRUE could miss data at the end

    For the affected testdata, this change shows a 100-400x speedup.
    The recset.xml benchmark shows no clear change either way.

    Before:
    benchmark -n ../testdata/largefiles/recset.xml 65535 3
      3 loops, with buffer size 65535. Average time per loop: 0.270223
    benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 15.033048
    benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 0.018027
    benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 11.775362
    benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 11.711414
    benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 0.019362

    After:
    ./run.sh benchmark -n ../testdata/largefiles/recset.xml 65535 3
      3 loops, with buffer size 65535. Average time per loop: 0.269030
    ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 0.044794
    ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 0.016377
    ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 0.027022
    ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 0.099360
    ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3
      3 loops, with buffer size 4096. Average time per loop: 0.017956

commit 60dffa148c3ce26799cb933afdb0dc3581ad2098
Author: Snild Dolkow <snild@sony.com>
Date:   Wed Nov 15 15:54:37 2023 +0100

    tests: Use normal XML_Parse in test_suspend_resume_internal_entity

    When the parser is suspended, _XML_Parse_SINGLE_BYTES() will return
    early. At that point, there could be some amount of bytes that haven't
    been fed into Expat at all yet. This leaves us with an incomplete
    document.

    Furthermore, the last internal XML_Parse() call with isFinal=XML_TRUE
    will not have happened, so the parser will not know that no more input
    is to be expected. This is what allowed the test to pass when it was
    originally changed to use SINGLE_BYTES.

    With the new partial token heuristic, the lack of a final parse call
    means that we don't even reach the "Ho" text, and fail the test.

    The simplest solution is to go back to using XML_Parse() in this test.
    Another option would be to let SINGLE_BYTES expose how far it got in
    its loop, allowing for later continuation, but it doesn't seem worth the
    extra complexity.

commit 3484383fa75e0ea2aa716360088813c3b205b261
Author: Snild Dolkow <snild@sony.com>
Date:   Thu Aug 17 16:53:12 2023 +0200

    Add aaaaaa_*.xml with unreasonably large tokens

    Some of these currently take a very long time to parse. I set those to
    only run one loop in the run-benchmark make target.

    4096 may be a fairly small buffer, and definitely make the problem worse
    than it otherwise would've been, but similar sizes exist in real code:

     * 2048 bytes in cpython Modules/pyexpat.c
     * 4096 bytes in skia SkXMLParser.cpp
     * BUFSIZ bytes (8192 on my machine) in expat/examples

    The files, too, are inspired by real-life examples: Android stores
    depth and gain maps as base64-encoded JPEGs inside the XMP data of
    other JPEGs. Sometimes as a text element, sometimes as an attribute
    value. I've seen attribute values slightly over 5 MiB in size.

commit b04a01d43108d891bb475a362e861570c3ef6857
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Jan 29 12:14:39 2024 +0000

    Actions(deps): Bump actions/upload-artifact from 4.2.0 to 4.3.0

    Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.2.0 to 4.3.0.
    - [Release notes](https://github.com/actions/upload-artifact/releases)
    - [Commits](https://github.com/actions/upload-artifact/compare/694cdabd8bdb0f10b2cea11669e1bf5453eed0a6...26f96dfa697d77e81fd5907df203aa23a56210a8)

    ---
    updated-dependencies:
    - dependency-name: actions/upload-artifact
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 183270d5654b7376fbc7b34bc2101c7898d7dee2
Merge: f7ada131 6880fe49
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Jan 26 19:10:31 2024 +0100

    Merge pull request #810 from libexpat/clang-18

    CI: Upgrade to Clang 18 (except clang-tidy and clang-format)

commit f7ada131b775e850e3a31a202c3fb410a1c6d379
Merge: abd9542b 7acda8d1
Author: Sebastian Pipping <sebastian@pipping.org>
Date:   Fri Jan 26 18:30:17 2024 +0100

    Merge pull request #808 from libexpat/clang-tidy-18

    CI: Upgrade to clang-tidy 18

commit 6880fe4948121ad121ea3341f4c0f8ab082139d0
Author: Sebastian Pipping <sebastian@pippi…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants