Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcaption: Optimize branch conditons #9184

Merged
merged 1 commit into from Jan 3, 2024
Merged

Conversation

r-value
Copy link
Contributor

@r-value r-value commented Jun 30, 2023

Description

Optimize the condition expression of c[0] >= 0 && c[0] <= ' ' to avoid warnings on platforms where char is equivalent to unsigned char.

Motivation and Context

I'm packaging OBS Studio for RISC-V architecture and it fails with an error message:

/build/obs-studio/src/obs-studio-29.1.1/deps/libcaption/src/utf8.c: In function ‘utf8_char_whitespace’:
/build/obs-studio/src/obs-studio-29.1.1/deps/libcaption/src/utf8.c:54:21: error: comparison is always true due to limited range of data type [-Werror=type-limits]
   54 |     if (!c || (c[0] >= 0 && c[0] <= ' ') || c[0] == 0x7F) {
      |                     ^~
cc1: all warnings being treated as errors

According to C reference:

char - type for character representation. Equivalent to either signed char or unsigned char (which one is implementation-defined and may be controlled by a compiler commandline switch), but char is a distinct type, different from both signed char and unsigned char.

As we define utf8_char_t as char, which is equivalent to unsigned char in some platforms, c[0] >= 0 will always be true and a warning will be raised, leading to a build failure.

How Has This Been Tested?

I have tested it on HiFive Unmatched and qemu-user-riscv64, the code compiles correctly.

Types of changes

Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code has been run through clang-format.
  • I have read the contributing document.
  • My code is not on the master branch.
  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

@r-value r-value force-pushed the patch-1 branch 5 times, most recently from 8efc166 to d5b055a Compare July 1, 2023 13:30
@WizardCM WizardCM added the Bug Fix Non-breaking change which fixes an issue label Jul 1, 2023
@norihiro
Copy link
Contributor

norihiro commented Jul 2, 2023

The first word for each commit should be capitalized.

@derrod
Copy link
Member

derrod commented Jul 8, 2023

Aren't there other things that also fail on RISC-V right now? e.g. because our SIMDe library is somewhat out of date. This also fails on macOS right now.

@PatTheMav
Copy link
Member

I don't think this is the right approach (and it's not even exhaustive as the compilation failure on macOS demonstrates).

My hunch is that the code specifically checks for the situation that char might be equivalent to signed char and ensures that the value is positive. That the check always passes on platforms where char is an unsigned char seems to be intentionally accepted.

So making this a signed char might actually break more things than it fixes. Instead the warning could be silenced with a #pragma block, preferably if char is equivalent to an unsigned char.

@r-value
Copy link
Contributor Author

r-value commented Jul 27, 2023

My hunch is that the code specifically checks for the situation that char might be equivalent to signed char and ensures that the value is positive. That the check always passes on platforms where char is an unsigned char seems to be intentionally accepted.

I don't think it should be accepted because the code comparing the signed byte with 0 might be checking the most significant bit of the byte. If it's unsigned, the test is meaningless. We should either ensure the byte is interpreted as signed or completely eliminate checking the most significant byte by the sign.

@norihiro
Copy link
Contributor

// 0x7F is DEL
if (!c || (c[0] >= 0 && c[0] <= ' ') || c[0] == 0x7F) {
return 1;
}

In this code, it looks the expression (c[0] >= 0 && c[0] <= ' ') is checking c[0] is within [0x00, 0x20]. I think both signed and unsigned are acceptable.

@r-value
Copy link
Contributor Author

r-value commented Jul 27, 2023

Maybe we can just replace the c[0] >= 0 with (c[0] & 0x80) == 0? The latter works for both signed and unsigned without behavioral change.(The c[0] <= ' ' have already eliminated values above 0x7F so ignore this plz) But I'm not sure whether some code also assumes utf8char_t as signed value and make some comparison like my comment above.

@r-value
Copy link
Contributor Author

r-value commented Jul 27, 2023

Instead the warning could be silenced with a #pragma block, preferably if char is equivalent to an unsigned char.

I found it not easy because it seems there isn't a unified way to do that. Different compiler need different #pragma. Also, I didn't find an attribute for this scenario.
Can we just explicitly cast c[0] to signed char here? It's the easiest way fixing it.

@PatTheMav
Copy link
Member

Instead the warning could be silenced with a #pragma block, preferably if char is equivalent to an unsigned char.

I found it not easy because it seems there isn't a unified way to do that. Different compiler need different #pragma. Also, I didn't find an attribute for this scenario. Can we just explicitly cast c[0] to signed char here? It's the easiest way fixing it.

It’s not a fix because it forces a sign mismatch on other platforms which we guard against on macOS.

I also have to admit I don’t understand what the issue is - the check may be superfluous on platforms where it’s always an unsigned type, but it’s also not malicious or will lead to false positives as far as I can see.

The proper fix would probably be to use unsigned char as the type and explicitly cast where needed if the codebase relies on that to be the case, but I also don’t think we want to futz around inside libcaption to begin with.

@r-value
Copy link
Contributor Author

r-value commented Jul 31, 2023

It’s not a fix because it forces a sign mismatch on other platforms which we guard against on macOS.

What I mean is to use if(... (signed char)c[0] >= 0 ...) and I think it's quite intuitive.

I also have to admit I don’t understand what the issue is - the check may be superfluous on platforms where it’s always an unsigned type, but it’s also not malicious or will lead to false positives as far as I can see.

I think the point of this warning is: if the condition clause isn't explicitly eliminated, the one who wrote the code must be expecting some runtime check. If it's definitive, the code won't work as the programmer expected. You should either eliminate it explicitly if the condition is exactly what you expect, or change it if it's not.

@umlaeute
Copy link

umlaeute commented Aug 2, 2023

obs-studio_29.1.3 currently fails to build on these archs, because on this archs gcc-13.2.0 uses unsigned chars:

  • armel
  • armhf (as found on Raspberry)
  • ppc64el
  • s390x

(the architecture names are the ones used in Debian)

you probably might think that armel, ppc64el and s390x are not the target platform for OBS Studio anyhow, but the armhf failure is a bit of a bummer.

@umlaeute
Copy link

umlaeute commented Aug 2, 2023

also note, that in the same function (a few lines below the erroneous code), there's already some type-casting to ensure the proper-signedness

if (0xC2 == (unsigned char)c[0] && 0xA0 == (unsigned char)c[1]) {

so 👍 for #9184 (comment)

@r-value
Copy link
Contributor Author

r-value commented Aug 2, 2023

also note, that in the same function (a few lines below the erroneous code), there's already some type-casting to ensure the proper-signedness

if (0xC2 == (unsigned char)c[0] && 0xA0 == (unsigned char)c[1]) {

so 👍 for #9184 (comment)

Thank you for your information and I'll push the fix soon, then.

deps/libcaption/src/utf8.c Outdated Show resolved Hide resolved
The optimization silences the warning about type limits on platforms
with `char` type as `unsigned char`.

The original condition is semantically identical to the optimized one
because the signed-to-unsigned cast is well-defined in C standard.
@r-value r-value changed the title libcaption: ensure utf8_char_t is signed libcaption: optimize branch condition expression Aug 11, 2023
@r-value
Copy link
Contributor Author

r-value commented Sep 19, 2023

Any progress there?

@PatTheMav
Copy link
Member

The recent version of the change seems innocuous enough, I don't really have any major concerns about that.

It should be noted that we do not support any other platforms apart from the ones we provide our own builds for (this includes RISC-V or Raspberry Pi). In general patches necessary to build on any unsupported platform need to be applied locally.

@RytoEX any objections?

@RytoEX
Copy link
Member

RytoEX commented Dec 13, 2023

The recent version of the change seems innocuous enough, I don't really have any major concerns about that.

It should be noted that we do not support any other platforms apart from the ones we provide our own builds for (this includes RISC-V or Raspberry Pi). In general patches necessary to build on any unsupported platform need to be applied locally.

@RytoEX any objections?

As you said, it seems innocuous enough. Mostly, I don't want to get into the slippery slope of having to be concerned about patches for other platforms that we do not officially support, as you also mentioned.

@r-value
Copy link
Contributor Author

r-value commented Dec 17, 2023

In general patches necessary to build on any unsupported platform need to be applied locally.

I think the change in this particular PR still can be considered a better practice, according to C standard documents. Generally it's better to comply with the standards if you're oblivious of the unsupported platform things, isn't it?

@PatTheMav
Copy link
Member

In general patches necessary to build on any unsupported platform need to be applied locally.

I think the change in this particular PR still can be considered a better practice, according to C standard documents. Generally it's better to comply with the standards if you're oblivious of the unsupported platform things, isn't it?

The comment was not aimed at this particular fix but that merging this requires us to make an exception for two policies at the same time and that's the slippery slope mentioned.

@umlaeute
Copy link

Are these policies documented somewhere?

My personal experience with this project is that fixes for people's problems are deferred to obscure policies and private discussions of the team and then left hanging in a limbo.

This is not exactly satisfying.
(Esp. with a PR like this that tries to fix undefined behaviour, which just happens to work on the "supported" platforms)

@RytoEX
Copy link
Member

RytoEX commented Dec 18, 2023

Are these policies documented somewhere?

For supported operating systems or platforms, we call out "unsupported builds" here:
https://github.com/obsproject/obs-studio/wiki/install-instructions

For architectures, generally, if we don't provide a binary for it, we don't support it. OBS may or may not work on any other architecture, your mileage may vary. We don't have an official Knowledge Base or Wiki page stating this. We could make one, though the low volume/frequency of requests about supported architectures makes me wonder if it's truly necessary to maintain one.

My personal experience with this project is that fixes for people's problems are deferred to obscure policies and private discussions of the team and then left hanging in a limbo.

This is not exactly satisfying. (Esp. with a PR like this that tries to fix undefined behaviour, which just happens to work on the "supported" platforms)

Some of us who regularly work on the project have off-thread communications to clarify our understanding and collect our thoughts before responding on GitHub to make sure that we're giving an accurate response. In this specific case, @PatTheMav and I briefly discussed this PR off-thread, and I replied here with the summary of my thoughts from that conversation. That is mostly to reduce the noise and round-trip time of he and I talking at each other on this PR.

In this specific instance, most of my hesitance is due to the fact that libcaption is a vendored third-party dependency. Normally, I would suggest that fixes be sent to the source repo, and then we can update the dependency. However, libcaption is archived and no longer being updated, so we're effectively responsible for any new changes.

This change itself as it is now is fine (it used to be more involved), and I didn't say that I opposed merging this change. What I don't want to signal, however, is merging this PR being an indicator that we suddenly provide first-party RISC-V support. We don't have the time, hardware, or capacity to ensure that everything builds and runs correctly there. That's all I wanted to clarify.

This PR is actually likely to be merged because it is simple, concise, and fixes undefined behavior. That it happens to fix a bug with RISC-V is coincidental to those concerns. I just wanted to publicly document my thoughts regarding the slippery slope first or I would likely forget to do so.

@umlaeute
Copy link

umlaeute commented Dec 18, 2023

fair enough. and thanks for the long reply.

This PR is actually likely to be merged because it is simple, concise, and fixes undefined behavior. That it happens to fix a bug with RISC-V is coincidental to those concerns.

exactly.
but if you re-read the PR description you will notice that "RISC-V" was only mentioned to give some context where the bug was discovered. nobody asked anybody to commit to "full support".

and i'm guilty as charged for bringing in other architectures that you do "not support" (armel, armhf, ppc64el, s390x).

but i really think it is always good to listen to unsupported architectures: i don't think anybody is going to run obs-studio on BigEndian machines anytime soon, but I find that being able to compile (and ideally run tests) on such exotic architectures is often a good way to spot implicit (and wrong) assumptions buried in the code.

as such, i think fixes coming from such attempts should be embraced, even if you do not plan to support any additional architectures/OSs/... in the foreseeable future.

@RytoEX RytoEX changed the title libcaption: optimize branch condition expression libcaption: Optimize branch conditons Jan 3, 2024
@RytoEX RytoEX merged commit c96a2a6 into obsproject:master Jan 3, 2024
14 checks passed
@RytoEX RytoEX added this to the OBS Studio (Next Version) milestone Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Fix Non-breaking change which fixes an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants