Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[W.I.P.] Limit entity expansion to prevent DoS attacks #220

Closed
wants to merge 8 commits into from

Conversation

tiran
Copy link

@tiran tiran commented Sep 13, 2018

libexpat is vulnerable to billion laughs and quadratic blowup DoS attacks.
processInternalEntity() function now limits entity expansion and
nesting in three ways:

  • Entity nesting is limited to three levels of recursion.
  • Total length of an entity is limited to 1023 bytes.
  • The ratio of entity expansion to processed bytes cannot exceed 1:10.

The mitigation is enabled by default and can be disabled with
XML_SetOptions(parser, XML_OPTION_HUGE_ENTITES).

The xmlwf command has a new option -H to enable huge entities.

The new entity expansion limits cause one test to fail. The XML test
file tests/xmlconf/xmltest/valid/ext-sa/012.xml has five levels of
entity expansion.

Fixes: #34
Fixes: #46
Signed-off-by: Christian Heimes christian@python.org

  • write tests
  • update changelog
  • update documentation

@tiran tiran force-pushed the limit-entities branch 2 times, most recently from e5a40e9 to f6c14dd Compare September 14, 2018 16:55
Copy link
Member

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Christian,

I have added some implementation related comment below. On bird view level, one thing I wonder about is how important you consider 100% future API compatibility to defusedexpat. What do you think?

Best, Sebastian

@@ -124,9 +124,32 @@ enum XML_Error {
XML_ERROR_RESERVED_PREFIX_XMLNS,
XML_ERROR_RESERVED_NAMESPACE_URI,
/* Added in 2.2.1. */
XML_ERROR_INVALID_ARGUMENT
XML_ERROR_INVALID_ARGUMENT,
/* Added in XXX */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the feature merged (and then only) I'd want to go for version 2.3.0.

@@ -2418,6 +2455,10 @@ XML_ErrorString(enum XML_Error code)
/* Added in 2.2.5. */
case XML_ERROR_INVALID_ARGUMENT: /* Constant added in 2.2.1, already */
return XML_L("invalid argument");
case XML_ERROR_ENTITY_EXPANSION_LIMIT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add marker /* Added in 2.3.0 */ here as well

@@ -5391,6 +5432,13 @@ processInternalEntity(XML_Parser parser, ENTITY *entity,
enum XML_Error result;
OPEN_INTERNAL_ENTITY *openEntity;

if (!(parser->m_options & XML_OPTION_HUGE_ENTITES)) {
parser->m_entityRecursionLevel++;
if (parser->m_entityRecursionLevel >= XML_ENTITY_RECURSION_LIMIT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that XML_ENTITY_RECURSION_LIMIT is compared with "greater than or equal to" while XML_ENTITY_EXPANSION_SIZE is compared with "greater than". Is that difference intentional?

(parser->m_entityExpansionLen > XML_ENTITY_EXPANSION_SIZE) &&
(parser->m_entityExpansionLen > index * XML_ENTITY_EXPANSION_RATIO)) {
/* Ratio between processed XML bytes and all expanded entities is off */
return XML_ERROR_ENTITY_EXPANSION_LIMIT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code XML_ERROR_ENTITY_EXPANSION_LIMIT seems to be used for two (slightly) different error scenarios now. Is there some security concern why we need to obscure those two cases into one? Else I imagine debugging is going to be more fun if those two cases can be told apart. What do you think?

Passed: 1801
Failed: 8
Passed: 1800
Failed: 9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like we have a single failed test more now? Please share the story with this.

parser->m_entityExpansionLen += entity->textLen;
if ((index > 0) &&
(parser->m_entityExpansionLen > XML_ENTITY_EXPANSION_SIZE) &&
(parser->m_entityExpansionLen > index * XML_ENTITY_EXPANSION_RATIO)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That multiplication may need integer overflow detection/protection. What do you think?

@@ -603,6 +603,9 @@ struct XML_ParserStruct {
OPEN_INTERNAL_ENTITY *m_openInternalEntities;
OPEN_INTERNAL_ENTITY *m_freeInternalEntities;
XML_Bool m_defaultExpandInternalEntities;
int m_entityRecursionLevel;
long m_entityExpansionLen;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That long should probably be XML_Size instead, especially for people with XML_LARGE_SIZE defined.

@hartwork
Copy link
Member

Related tickets: #34, #46

@tiran tiran force-pushed the limit-entities branch 3 times, most recently from 6f2d630 to afde728 Compare September 18, 2018 12:01
@tiran
Copy link
Author

tiran commented Sep 18, 2018

I have updated the PR, added tests and addressed your comments.

I wrote defusedexpat many years ago as proof of concept. The code is unmaintained and doesn't work any more.

@tiran tiran force-pushed the limit-entities branch 2 times, most recently from e061ac7 to ef592df Compare September 18, 2018 12:18
parser->m_entityExpansionLen += entity->textLen;
index = XML_GetCurrentByteIndex(parser);
limit = index * XML_ENTITY_EXPANSION_RATIO;
if (limit < index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my best knowlegde signed integer overflow is undefined behavior, hence we'd need to prevent overflow before it even happened. Both index and XML_ENTITY_EXPANSION_RATIO are signed as far as I can see.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expat.h doesn't define MAX values for types. That makes it kinda complicated to check for overflows. What's your proposal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try division instead to bypass that issue. On a chalkboard I'd write in order:

  1. Right now we have:
    parser->m_entityExpansionLen > limit.
  2. Inlined, that's:
    parser->m_entityExpansionLen > index * XML_ENTITY_EXPANSION_RATIO.
  3. If we devide both sides by XML_ENTITY_EXPANSION_RATIO we get:
    parser->m_entityExpansionLen / XML_ENTITY_EXPANSION_RATIO > index.
  4. If we take the floor effect of integer division into account and correct for it to accept the same range we get:
    (parser->m_entityExpansionLen + (XML_ENTITY_EXPANSION_RATIO - 1)) / XML_ENTITY_EXPANSION_RATIO > index.

Does that look about right?

That could be one way. Another way could be leveraging (XML_Size)(-1) since that type is unsigned. I'd need to think about that more.

@hartwork
Copy link
Member

I have updated the PR, added tests and addressed your comments.

Thanks! I haven't mentioned yet how important this feature will be for the whole community and how happy I am that you're working on this. Very nice!

I'll comment on the new changes in detail shortly.

Copy link
Member

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look pretty good overall, please find detailed comments below

expat/README.md Outdated
@@ -2,7 +2,7 @@
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/libexpat/libexpat?svg=true)](https://ci.appveyor.com/project/libexpat/libexpat)


# Expat, Release 2.2.6
# Expat, Release 2.3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be awesome to have all the version bump changes split off to a single dedicated commit following the others. Thanks!

@@ -603,6 +603,9 @@ struct XML_ParserStruct {
OPEN_INTERNAL_ENTITY *m_openInternalEntities;
OPEN_INTERNAL_ENTITY *m_freeInternalEntities;
XML_Bool m_defaultExpandInternalEntities;
int m_entityRecursionLevel;
int m_options;
XML_Index m_entityExpansionLen;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why prefer XML_Index over XML_Size here? Right now we have a length-index mix here. If the idea is to save some casts later, I'd want to be sure that the loss in clean semantics is worth the gain.

@@ -10132,7 +10135,7 @@ START_TEST(test_alloc_realloc_param_entity_newline)
if (i == 0)
fail("Parse succeeded despite failing reallocator");
if (i == max_realloc_count)
fail("Parse failed at maximum reallocation count");
fail("Parse failed at maximum reallocation count 8");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool to avoid that hardcoding of 8 (and 9 a few lines down) one way or another (e.g. format string or doing things on preprocessor level).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 8 and 9 shouldn't have landed in the PR. They are remnants of a debugging session.

@@ -11984,6 +12129,7 @@ make_suite(void)
TCase *tc_misc = tcase_create("miscellaneous tests");
TCase *tc_alloc = tcase_create("allocation tests");
TCase *tc_nsalloc = tcase_create("namespace allocation tests");
TCase *tc_he = tcase_create("huge entities");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big problem, but huge_entities would read a lot better than he in my eyes (and it's less guessable that nsalloc, for instance). Just my two cents.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed he to huge_entities everywhere.

" <!ENTITY e4 'entity'>\n"
"]>\n"
"<he>&e1;</he>\n";
expect_failure(text, XML_ERROR_ENTITY_RECURSION_LIMIT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to lookup if expect_failure does parsing inside or not because the name does not say.
If you feel like adding a dedicated commit upfront that renames it to something like parse_and_expect_failure, my vote for it. Just an idea.

"ABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPABCDEFGHIJKLMNOP"
"'>]><he>"
/* 50 + 64 + 8 + (36 * 3) == 230
* 33 expansions of 64 bytes == 2304 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have that math, I tried to verify its details.
I get that 2304 is slightly bigger than 230 x 10 (=default ratio limit) so that extra 4 characters triggers the limit.

The first line is clear:

  • 50 seems to be the length of <!DOCTYPE he [<!ELEMENT he (#PCDATA)*><!ENTITY e '.
  • Followed by 64 for entity value A...P.
  • 8 is length of '>]><he>
  • 36 x 3 is all those &e;.

For the second I now notice that 33 expansions of 64 bytes should probably read 36 rather than 33.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I forgot to update that line when I added the <!ELEMENT> line.

return XML_L("entity is too large");
case XML_ERROR_ENTITY_EXPANSION_LIMIT:
return XML_L("entity expansion limit reached");
case XML_ERROR_ENTITY_RECURSION_LIMIT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking: If we don't need to be compatible with defusedexpat names for these constants, it would be nice to get them in good consistent shape. Just to give an example, I'd find something like

  • XML_ERROR_ENTITY_VIOLATION_SIZE
  • XML_ERROR_ENTITY_VIOLATION_RATIO
  • XML_ERROR_ENTITY_VIOLATION_DEPTH

more consistant. Maybe we can find some names nice that have that property. (The current _LIMIT part is also not ideal in my eyes.) What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your names sound better than mine.

*
* Mitigation against billion laugh and quadratic blowup attacks.
*
* XML_ENTITY_RECURSION_LIMIT confines nesting of entities within entities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think if "nesting" would be a better term since "recursion" seems to imply that something can reference itself (which XML entities cannot).

*/
#define XML_ENTITY_RECURSION_LIMIT 3
#define XML_ENTITY_EXPANSION_SIZE 1023
#define XML_ENTITY_EXPANSION_RATIO 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just to have said it somewhere: We will need to support adjustment of those defaults at runtime at some point, maybe even before merging to master. I'm all in with getting other things polished first, though.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I wouldn't go there. I made the settings flexible in defusedexpat. In retrospective it was unnecessary API bloat. Even libxml2 doens't allow to tune the limits. It has only an off-switch for limitation. If libxml2 doesn't need extra options, then expat surely doesn't need them at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thought. Let me think about it more.

XML_SetOptions(XML_Parser parser, int options);

XMLPARSEAPI(int)
XML_GetOptions(XML_Parser parser);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could cover those two functions by some simple test(s) as well. Some idea for a test:

  1. Call to XML_GetOptions returns default options
  2. Call to XML_SetOptions succeeds
  3. Call to XML_GetOptions returns what was set by XML_SetOptions just before
  4. Parsing is started
  5. Call to XML_SetOptions fails (due to parser started)
  6. Call to XML_GetOptions still returns unmodified previous options

Could also be split in half to make two tests where the second is dedicated to changes at runtime.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I added a new test scenario as you proposed.

@hartwork hartwork added this to the 2.3.0 milestone Sep 18, 2018
@hartwork hartwork self-assigned this Sep 18, 2018
XML_FALSE) == XML_STATUS_ERROR)
xml_failure(parser);
if (XML_SetOptions(parser, XML_OPTION_NONE)) {
fail("XML_SetOptions() should have failed.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe append " during/while parsing"

fail("XML_SetOptions() should have failed.");
}
if (XML_GetOptions(parser) != XML_OPTION_HUGE_ENTITES)
fail("Expected XML_OPTION_HUGE_ENTITES");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe append " (i.e. previously set options)"

@tiran tiran force-pushed the limit-entities branch 4 times, most recently from dea41f1 to 9455160 Compare September 19, 2018 10:24
return XML_STATUS_ERROR;
switch(option) {
case XML_OPTION_HUGE_ENTITIES:
parser->m_hugeEntities = *(XML_Bool*)(value) ? XML_TRUE : XML_FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That dereference makes the usage of XML_SetOption overly complicated. Compare current:

XML_Bool ENABLED = XML_TRUE;
XML_SetOption(parser, XML_OPTION_HUGE_ENTITIES, &ENABLED);

with possible

XML_SetOption(parser, XML_OPTION_HUGE_ENTITIES, (void *)XML_TRUE);

I believe each option should have a say if it needs a dereference or not and save that extra indirection for plain integer ones. What do you think?
I'm also wondering about use of uintptr_t/intptr_t rather than void *, just an idea.

<listitem>
<para>
Enables huge entities option, disabling all mitigations against entity
expansion attacks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote for mentioning "billion laughs" for an example and "DoS" to make things less abstract to people new to these topics.

/* overflow safe comparison */
XML_Size limit = (parser->m_entityExpansionLen +
(XML_ENTITY_EXPANSION_RATIO - 1)) / XML_ENTITY_EXPANSION_RATIO;
if (limit > (XML_Size)index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clearer semantics, I'd consider turning this into:

const XML_Index minimumRequiredIndex = (XML_Index)
  ((parser->m_entityExpansionLen + (XML_ENTITY_EXPANSION_RATIO - 1))
    / XML_ENTITY_EXPANSION_RATIO);
if (index < minimumRequiredIndex) {

Please note the flip of comparision order and operator.

Regarding the cast: The largest positive value for XML_Index and XML_Size are factor 2 apart, so if we divide by XML_ENTITY_EXPANSION_RATIO and that number is greater or equal 2, even m_entityExpansionLen of (XML_Size)(-1) would fit in XML_Index minimumRequiredIndex without truncation.

expat/Changes Outdated
relation to processed bytes as mitigation against billion
laughs and quadratic blowup attacks. Countermeasure is
enabled by default and can be disabled with
XML_SetOption(parser, XML_OPTION_HUGE_ENTITIES, &bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is CVE-2013-0340. If so, let's mention it.

};

enum XML_Option {
/* Added in 2.3.0 */
XML_OPTION_HUGE_ENTITIES = 1 /* XML_Bool, no limits for entity expansion */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also mention DoS and security here in case this ends up being the place where peokle look at the most.

# reldir includes trailing slash
RunXmlwfWF() {
file="$1"
reldir="$2"
$XMLWF -p -N -d "$OUTPUT$reldir" "$file" > outfile || return $?
options="$3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since options is plural, let's make it support multiple options. How about:

file="$1"
shift
reldir="$1"
shift
options=( "$@" )
$XMLWF -p -N -d "$OUTPUT$reldir" "${options[@]}" "$file" > outfile || return $?

Would that work?

xmlfile=012.xml
cd "$TS/xmlconf/$xmldir"
mkdir -p "$OUTPUT$xmldir"
if [[ -f "$xmlfile" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why that guard? Can that file ever be missing?

@@ -7,4 +7,4 @@ Expected not well-formed: oasis/p06fail1.xml
Expected not well-formed: oasis/p08fail1.xml
Expected not well-formed: oasis/p08fail2.xml
Passed: 1801
Failed: 8
Failed: 9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Failed count goes up by one for tests/xmlconf/xmltest/valid/ext-sa/012.xml without -H why does Passed count not go up by one for the new additional test with -H?

@tiran tiran force-pushed the limit-entities branch 6 times, most recently from e7a3551 to 2b4038a Compare September 20, 2018 15:24
libexpat is vulnerable to billion laughs and quadratic blowup DoS attacks.
processInternalEntity() function now limits entity expansion and
nesting in three ways:

* Entity nesting is limited to three levels of recursion.
* Total length of an entity is limited to 1023 bytes.
* The ratio of entity expansion to processed bytes cannot exceed 1:10.

The mitigation is enabled by default and can be disabled with
XML_SetOptions(parser, XML_OPTION_HUGE_ENTITES).

The xmlwf command has a new option -H to enable huge entities.

The new entity expansion limits cause one test to fail. The XML test
file tests/xmlconf/xmltest/valid/ext-sa/012.xml has five levels of
entity expansion.

Fixes: libexpat#34
Fixes: libexpat#46
Signed-off-by: Christian Heimes <christian@python.org>
Signed-off-by: Christian Heimes <christian@python.org>
XML_OPTION_NESTING_LIMIT, /* int, limit entity nesting depth */
XML_OPTION_EXPANSION_RATIO, /* int, ratio between expansions and processed text */
XML_OPTION_MAX_EXPANSION_SIZE, /* XML_Size, limit max entity size for single and nested entities */
XML_OPTION_MAX_HASH_TABLE_ENTRIES /* XML_Size, max entries in DTD hash tables */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Now that we also have a hash table option in here, maybe the entity ones should include the word "entity". On a blank slate, I'd maybe write:
    • XML_OPTION_ENTITIES_HUGE
    • XML_OPTION_ENTITIES_MAX_DEPTH
    • XML_OPTION_ENTITIES_MAX_RATIO
    • XML_OPTION_ENTITIES_MAX_SIZE
    • XML_OPTION_HASH_TABLE_DTD_MAX_ENTRY_COUNT
  • For size, we should mention XML_Chars to make clear it's not bytes. Maybe even append _CHARS to that name?
  • After = 1 I see no = 2, = 4 and so on. That looks unintended
  • Why go for int rather than unsigned int?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML_OPTION_ENTITIES_MAX_DEPTH is misleading. It's not a limitation of the depths but how many expansions are nested. For example

<!DOCTYPE xmlbomb [
<!ENTITY a "1234567890" >
<!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;&a;&a;">
<!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;">
]>
<bomb>&c;</bomb>

has depths of 3 but nesting of 111 (1c + 10b + 1010c).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe depth and nesting are both mis-leading. Maybe path, steps, calls, ..? In a way it's an expansion tree and we limit regarding the number of total nodes of that tree then.

#endif

#ifndef XML_ENTITY_EXPANSION_SIZE
/* 1MB text */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be 2MB with -DXML_UNICODE (or even 4MB if we get UTF32 support on day) since we check for number of XML_Chars.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll update the comment.

/* set and get */
he = XML_TRUE;
if (XML_SetOption(parser, XML_OPTION_HUGE_XML, &he) != XML_STATUS_OK) {
fail("XML_SetOptions() failed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML_SetOptions() should probably be XML_SetOption without s and without braces for consistency. There are more (leftover) occurrences like that below.

XML_ERROR_ENTITY_VIOLATION_NESTED_SIZE,
XML_ERROR_ENTITY_VIOLATION_RATIO,
XML_ERROR_ENTITY_VIOLATION_DEPTH,
XML_ERROR_HASH_TABLE_VIOLATION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe XML_ERROR_HASH_TABLE_VIOLATION_SIZE with _SIZE appended so that it's clear what the violation is about?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML_ERROR_HASH_TABLE_SIZE_VIOLATION ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would sound good on its own but not match the *_VIOLATION_* schema above. As you prefer.

#endif

#ifndef XML_ENTITY_NESTING_LIMIT
#define XML_ENTITY_NESTING_LIMIT 40
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Billion laughs demos use a nesting level of 10. Our default needs to be below 10 then.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The settings limits the amount of nested expansions, not the maximum depths of an expansion stack. Perhaps we need a better name or explanation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need a better name then. Some ideas mentioned further up.

return XML_STATUS_OK;
case XML_OPTION_MAX_HASH_TABLE_ENTRIES:
*(XML_Size *)(value) = parser->m_limit->maxHashTableEntries;
return XML_STATUS_OK;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these cases are not covered by test yet, it seems.

@@ -2478,6 +2594,8 @@ XML_GetFeatureList(void)
#ifdef XML_ATTR_INFO
{XML_FEATURE_ATTR_INFO, XML_L("XML_ATTR_INFO"), 0},
#endif
{XML_FEATURE_HUGE_XML, XML_L("XML_OPTION_HUGE_XML"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That XML_L seems to be off the indentation grid.

#endif /* XML_DTD */
dtdDestroy(parser->m_dtd, (XML_Bool)!parser->m_parentParser, &parser->m_mem);
}
/* LIMIT structure parser-m_limit is shared with all external parsers */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a > in parser-m_limit missing here.

@@ -6322,6 +6480,117 @@ normalizePublicId(XML_Char *publicId)
*p = XML_T('\0');
}

static LIMIT *
limitCreate(const XML_Memory_Handling_Suite *ms)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need something like limitReset to be used in XML_ParserReset.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll add it to XML_ParserReset.

}

static void
limitDestroy(LIMIT *limit,const XML_Memory_Handling_Suite *ms)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is limitDestroy called at all needed places, yet? I expected a call from XML_ParserFree.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in two places. I also need to free the structure in case another allocation in parserCreate fails. But it's just a simple FREE. I'll get rid of the function.

@tiran tiran force-pushed the limit-entities branch 2 times, most recently from 613dfb4 to 2841299 Compare September 24, 2018 12:30
* rename various constants and enum members
* change types to size_t and unsigned int
* add tests for getters and setters
* add limitReset() to XML_ParserReset()
* limit size in bytes, not sizeof(XML_Char)
* add ratio threshold
* more comments

Signed-off-by: Christian Heimes <christian@python.org>
typedef struct {
ENTITY *first_entity;
unsigned int entitiesNestingLevel;
size_t nestedEntitiesExpansionSize; /* in bytes, not XML_Cha) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo XML_Cha)

XML_Bool hugeXML;
unsigned int entitiesMaxNesting;
unsigned int entitiesMaxRatio;
unsigned int entitiesMaxRatioThreshold;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use XML_Size or XML_Index for entitiesMaxRatioThreshold.

@andy9a9
Copy link

andy9a9 commented May 31, 2019

Is it possible to bring this up?

@hartwork
Copy link
Member

Is it possible to bring this up?

I'm not sure I get the question. Please elaborate. Thank you!

@andy9a9
Copy link

andy9a9 commented May 31, 2019

Is it possible to bring this up?

I'm not sure I get the question. Please elaborate. Thank you!

I was thinking, if it's possible to make these changes ready to be merged.

@hartwork
Copy link
Member

I was thinking, if it's possible to make these changes ready to be merged.

At some point, but it will take more time I'm afraid. If you have someone at hand with some spare time to collaborate with me getting this "project" closer to the finishing line, I might find spare time a bit sooner.

@hartwork hartwork changed the title Limit entity expansion to prevent DoS attacks [W.I.P.] Limit entity expansion to prevent DoS attacks Aug 29, 2019
@hartwork hartwork removed this from the 2.4.0 milestone Apr 20, 2021
@hartwork
Copy link
Member

Closing as superseded by #466 (and #484)

@hartwork hartwork closed this May 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants