Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out of range enum casts in deSerialize functions #14090

Merged
merged 6 commits into from Jan 17, 2024

Conversation

cx384
Copy link
Contributor

@cx384 cx384 commented Dec 9, 2023

  • When Desour reviewed my tool specific pointing PR he complained about undefined behavior. It appeared because I tried to make my code similar to the existing one. So it should probably also be fixed in the existing code.
  • This PR
    • Defines an underlying type for all enums that are cast from u8 to avoid undefined behavior. Security info
    • Sends warnings if the received enum values are not supported. (I'm not sure how necessary this is so please tell me if they should be removed.)
    • Defaults deserialized enum values which are not supported, previously they were unhandled and passed on.
    • Makes a few code style improvements. E.g. replaces some C-style casts with static_cast, since as far as I'm concerned they are bad practice in C++.
  • I looked into all of the deserializing code, but maybe I missed something, since I'm not very familiar with the Minetest code base.

Note:
This undefined behavior could have been found with sanitizers. E.g. using the -fsanitize=enum flag.

Does it resolve any reported issue?

I don't think so.

Does this relate to a goal in the roadmap?

Probably 2.2 Internal code refactoring

To do

This PR is Ready for Review.

How to test

  • Compile Minetets with -fsanitize=undefined
  • Pass out of range values to the deSerialize functions.
  • Read the warnings.

@Zughy Zughy added Maintenance Tasks to keep the codebase and related parts in order, including architectural improvements Bugfix 🐛 PRs that fix a bug Code quality @ Engine Core What happens inside the very engine labels Dec 9, 2023
@sfan5
Copy link
Member

sfan5 commented Dec 10, 2023

Where exactly is the undefined behaviour in using a C-style cast with enums?

src/nodedef.cpp Outdated Show resolved Hide resolved
@cx384
Copy link
Contributor Author

cx384 commented Dec 10, 2023

Where exactly is the undefined behaviour in using a C-style cast with enums?

As far as I'm concerned C-style cast are bad practice in C++, so I replaced them on the go, but this has nothing to do with the undefined behavior.

As I wrote in the first comment "makes sure the enums stay in scope (to avoid undefined behavior)", casting to them from value that isn't part of them is undefined behavior. https://en.cppreference.com/w/cpp/language/enum

At some places this is already checked e.g.

type = (PointedThingType) readU8(is);
switch (type) {
case POINTEDTHING_NOTHING:
break;
case POINTEDTHING_NODE:
node_undersurface = readV3S16(is);
node_abovesurface = readV3S16(is);
break;
case POINTEDTHING_OBJECT:
object_id = readU16(is);
break;
default:
throw SerializationError("unsupported PointedThingType");
}

src/nodedef.cpp Outdated Show resolved Hide resolved
@sfan5
Copy link
Member

sfan5 commented Dec 10, 2023

As I wrote in the first comment "makes sure the enums stay in scope (to avoid undefined behavior)", casting to them from value that isn't part of them is undefined behavior. https://en.cppreference.com/w/cpp/language/enum

I see.
However then this pattern:

x = static_cast<EnumName>(readU8(is));
switch(x) {
  case VALUE1:
  // ...
  default:
    throw SomeException();
}

already invokes UB in the first line and isn't safe either.

Another point touched by this is what should MT actually do when it sees an unsupported mode used?
Refusing to continue can make forwards-compatibility harder.

@Desour
Copy link
Member

Desour commented Dec 10, 2023

I had not read the cpp reference carefully enough, and assumed that values are out of range if they are not represented by an enum constructor. This is however not the case.
As a result, I may have caused confusion, sorry.

To cite the relevant parts of cppreference:

Values of integer, floating-point, and enumeration types can be converted by static_cast or explicit cast, to any enumeration type. If the underlying type is not fixed and the source value is out of range, the behavior is undefined. (The source value, [...], is in range if it would fit in the smallest bit field large enough to hold all enumerators of the target enumeration.) Otherwise, the result is the same as the result of implicit conversion to the underlying type.

Note that the value after such conversion may not necessarily equal any of the named enumerators defined for the enumeration.

Also:

An enumeration has the same size, value representation, and alignment requirements as its underlying type.

(Highlights by me.)

If I understand it correctly now, if an underlying type is specified ("the underlying type is fixed"), all values of the underlying type are valid enum values.
We shouldn't however mindlessly cast to enums if the underlying type is not specified.

Checking if the resulting enum value after cast is a known value probably still makes sense, because our code likely doesn't handle non-enum-constructor values well.

@cx384 cx384 changed the title Fix undefined behavior of enum casts in deSerialize functions Fix out of range enum casts in deSerialize functions Dec 10, 2023
@cx384
Copy link
Contributor Author

cx384 commented Dec 10, 2023

Thank you for the clarification, so the casts are mostly fine, but values may still be invalid and this should be addressed.
Before I can update the PR, I need an answer to sfan5's question. What should happen in such a case?
This two possibilities seem logical to me.

  • Throw a SerializationError. (This is currently the case for PointedThingType) (Of course catch the exception somewhere, but the deserialized data will be lost)
  • Just a warning and use a default value. (This may be better for compatibility)

@cx384
Copy link
Contributor Author

cx384 commented Dec 11, 2023

OK, everywhere where SerializationError will not be caught (so basically everywhere) I used a warning, since we already have a version parameter in most deserializing code to break compatibility if necessary, and I guess there is no reason to now introduce future incompatibility if it can be avoided somehow.

So my PR does not change any behavior besides

  • Warnings if the enum values are not supported. (I'm not sure how necessary they are so please tell me if they should be removed.)
  • Removed UB that occurs from casting from u8 to enums that have no defined underlying type. Security info
  • (Code style improvements, but they don't change the behavior)

I will also update the PR description to avoid any confusion.

@Desour
Copy link
Member

Desour commented Dec 22, 2023

Sorry for the late response.

  • I agree that using a default value is better than throwing an SerializationError if it wouldn't be catched. I think this is also in line to what was already done in the past. For example, I remember that when mesh nodes were added, old clients just rendered them as cubes.
    (There might be some exceptional cases where throwing a SerializationError makes more sense though. Feel free to decide differently in a case by case basis if necessary.)
  • Logging a warning every time can cause spam, as every packet can potentially include one or more invalid enum values.
    I don't think we have utilities for logging once, apart from script_log_unique, which only works in script contexts. So I suggest just not printing warnings at all. Or warn only once per process, e.g. using static variables:
    // called once per process
    // this is thread safe
    #define STATIC_CALL_ONCE(block) [&] { \
            static bool executed_once_macro_var = \
                    [&]{ {block} return true; }(); \
            (void)executed_once_macro_var; \
        }()
  • The PR is currently just printing warnings. It's missing the part where it defaults to default values.
  • A case enumname_END: switch case will not catch all values. Either use default: if the switched on value can be arbitrary, or make sure the value can only be below enumname_END. A case enumname_END: should never be necessary, I think.

(Rebase needed.)

@Desour Desour added Rebase needed The PR needs to be rebased by its author. Action / change needed Code still needs changes (PR) / more information requested (Issues) labels Dec 22, 2023
@cx384 cx384 force-pushed the deserialize_undefined_behavior branch from 4409f50 to 7eb12bd Compare January 4, 2024 08:29
@cx384
Copy link
Contributor Author

cx384 commented Jan 4, 2024

No problem, I also needed to rethink this a little bit.

  • I agree that using a default value is better than throwing an SerializationError if it wouldn't be catched. I think this is also in line to what was already done in the past. For example, I remember that when mesh nodes were added, old clients just rendered them as cubes.
    [...]
  • The PR is currently just printing warnings. It's missing the part where it defaults to default values.

OK, I remove the exceptions and defaulted the values.

I suggest just not printing warnings at all.

Done, I guess warnings wouldn't be very helpful anyway, since we already have version fields for most serialized things, which create warnings/errors in case of well behaved servers/clients.
Also, it's rather insignificant and non repeating warnings without providing any trace are quite meaningless.

  • A case enumname_END: switch case will not catch all values. Either use default: if the switched on value can be arbitrary, or make sure the value can only be below enumname_END. A case enumname_END: should never be necessary, I think.

The only reason for this is to get rid a compiler warning, the case will never be executed.
The alternative of adding a default case would be worse, since you wouldn't get a compiler warning when new enum values get introduced, and the default case would only handle invalid values which are checked elsewhere. (My PR does change this behavior.)

At first I just wanted to remove some possible undefined behavior with this PR, but now it also changes some actual behavior, it defaults enum values which are not supported. (as Desour suggested)

It's also rebased now, and ready for review again.

@Zughy Zughy removed Rebase needed The PR needs to be rebased by its author. Action / change needed Code still needs changes (PR) / more information requested (Issues) labels Jan 4, 2024
Copy link
Member

@Desour Desour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine otherwise. 👍

(Haven't checked if there are other occurrences of unchecked integer to enum casts.)

src/nodedef.cpp Outdated Show resolved Hide resolved
src/nodedef.cpp Outdated Show resolved Hide resolved
src/tileanimation.cpp Outdated Show resolved Hide resolved
src/util/pointedthing.cpp Show resolved Hide resolved
@Zughy Zughy added the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Jan 14, 2024
@Zughy Zughy removed the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Jan 15, 2024
Copy link
Member

@Desour Desour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sfan5 sfan5 merged commit 2ea8d9c into minetest:master Jan 17, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugfix 🐛 PRs that fix a bug Code quality @ Engine Core What happens inside the very engine Maintenance Tasks to keep the codebase and related parts in order, including architectural improvements One approval ✅ ◻️
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants