You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In commit 0366f13 (nearly 8 years ago) some optimizations were added to the UDMF parser, which - according to the commit message - made parsing 35% faster. These optimizations are quite clever, unfortunately they heavily rely on the formatting of the TEXTMAP. But we can not rely on a certain formatting as per the UDMF specs.
How the optimization works
The optimization works by not parsing identical lines that have been parsed before, and thus have the UDMF field's name and value known. Take this snippet:
thing // 0
{
x = 0;
y = 64;
type = 1;
}
thing // 1
{
x = 0;
y = 64;
type = 1;
}
Let's say the parser finished parsing x = 0; of thing 0. It then creates a UniversalEntry with the key x and the value 0 and adds it to the current UniversalCollection. Then it adds the line x = 0; to a cache (the variable is called matches) dictionary, where the line is the key and the just parsed UniversalEntry is the value.
For each new line it checks if that line is already in the cache, and if it is, it just adds that cached UniversalEntry to the current UniversalCollection, goes to the next line, checks if it's in the cache etc.
Why this is problematic
This works excellently - as long as the lines are formattet as in the above snippet. If they aren't (which is perfectly legal as per the UDMF specs) things go awry. Take this snippet:
thing // 0
{
x = 0; y = 64;
type = 1;
}
thing // 1
{
x = 0; y = 64;
type = 1;
}
Here the x and y definitions are on the same line. Now the following happens:
parsing thing 0 starts
x is parsed. The resulting UniversalEntry is added to the current UniversalCollection
the line x = 0; y = 64; is added to the cache, with the UniversalEntry having the key x and the value 0
y is parsed. The resulting UniversalEntry is added to the current UniversalCollection
the line x = 0; y = 64; is already in the cache, so nothing is added to the cache
parsing continues as normal
parsing thing 1 starts
the parser checks if the line x = 0; y = 64; is in the cache
it is! So it adds the associated UniversalEntry, which has the key x and the value 0, to the current UniversalCollection
then nothing further is done to the line x = 0; y = 64;, the parser moves to the next line
This means that the y value for thing 1 is completely skipped. In this particular case UDB will actually complain about that, because there's no valid default value for the position, but other fields do have a valid default value (like angle), where UDB will be completely silent.
Consequences of reverting the optimization
Relatively speaking reverting the change is quite significant. For example parsing the TEXTMAP of Bastion of Chaos, which has 4.14 million lines, goes from ~1.9 seconds to ~3.3 seconds on my 7 year old Core i5 4670K, which is a increate of ~73%. Absolutely speaking this isn't too significant, and obviously risking data loss isn't an option, so that's something we have to live with.
The text was updated successfully, but these errors were encountered:
In commit 0366f13 (nearly 8 years ago) some optimizations were added to the UDMF parser, which - according to the commit message - made parsing 35% faster. These optimizations are quite clever, unfortunately they heavily rely on the formatting of the
TEXTMAP
. But we can not rely on a certain formatting as per the UDMF specs.How the optimization works
The optimization works by not parsing identical lines that have been parsed before, and thus have the UDMF field's name and value known. Take this snippet:
Let's say the parser finished parsing
x = 0;
of thing 0. It then creates aUniversalEntry
with the keyx
and the value0
and adds it to the currentUniversalCollection
. Then it adds the linex = 0;
to a cache (the variable is calledmatches
) dictionary, where the line is the key and the just parsedUniversalEntry
is the value.For each new line it checks if that line is already in the cache, and if it is, it just adds that cached
UniversalEntry
to the currentUniversalCollection
, goes to the next line, checks if it's in the cache etc.Why this is problematic
This works excellently - as long as the lines are formattet as in the above snippet. If they aren't (which is perfectly legal as per the UDMF specs) things go awry. Take this snippet:
Here the
x
andy
definitions are on the same line. Now the following happens:x
is parsed. The resultingUniversalEntry
is added to the currentUniversalCollection
x = 0; y = 64;
is added to the cache, with theUniversalEntry
having the keyx
and the value0
y
is parsed. The resultingUniversalEntry
is added to the currentUniversalCollection
x = 0; y = 64;
is already in the cache, so nothing is added to the cachex = 0; y = 64;
is in the cacheUniversalEntry
, which has the keyx
and the value0
, to the currentUniversalCollection
x = 0; y = 64;
, the parser moves to the next lineThis means that the
y
value for thing 1 is completely skipped. In this particular case UDB will actually complain about that, because there's no valid default value for the position, but other fields do have a valid default value (likeangle
), where UDB will be completely silent.Consequences of reverting the optimization
Relatively speaking reverting the change is quite significant. For example parsing the
TEXTMAP
of Bastion of Chaos, which has 4.14 million lines, goes from ~1.9 seconds to ~3.3 seconds on my 7 year old Core i5 4670K, which is a increate of ~73%. Absolutely speaking this isn't too significant, and obviously risking data loss isn't an option, so that's something we have to live with.The text was updated successfully, but these errors were encountered: