Skip to content
This repository was archived by the owner on Oct 8, 2025. It is now read-only.

Conversation

@ac000
Copy link
Member

@ac000 ac000 commented Jun 24, 2024

Due to 'char' (unless explicitly set) being signed or unsigned depending
on architecture, e.g on x86 it's signed, while on Arm it's unsigned,
this can lead to subtle bugs such if you use a plain char as a byte
thinking it's unsigned on all platforms (maybe you live in the world of
Arm).

What we can do is tell the compiler to treat 'char' as unsigned by
default, thus it will be consistent across platforms. Seeing as most of
the time it doesn't matter whether char is signed or unsigned, it
really only matters when you're dealing with 'bytes', which means it
makes sense to default char to unsigned.

The Linux Kernel made this change at the end of 2022.

This will also allow in the future to convert our u_char's to char's
(which will now be unsigned) and pass them directly into the libc
functions for example, without the need for casting.

Here is what the ISO C standard has to say

From §6.2.5 Types ¶15

  The three types char, signed char, and unsigned char are collectively
  called the character types. The implementation shall define char to
  have the same range, representation, and behavior as either signed
  char or unsigned char.[45]

and from Footnote 45)

  CHAR_MIN, defined in <limits.h>, will have one of the values 0 or
  SCHAR_MIN, and this can be used to distinguish the two options.
  Irrespective of the choice made, char is a separate type from the
  other two and is not compatible with either.

If you're still unsure why you'd want this change...

It was never clear to me, why we used u_char, perhaps that was used as
an alternative to -funsigned-char...

But that still leaves the potential for bugs with char being unsigned vs
signed...

Then because we use u_char but often need to pass such things into libc
(and perhaps other functions) which normally take a 'char' we need to
cast these cases.

So this change brings at least two (or more) benefits

  1) Removal of potential for char unsigned vs signed bugs.

  2) Removal of a bunch of casts. Reducing casting to the bare minimum
     is good. This helps the compiler to do proper type checking.

  3) Readability/maintainability, everything is now just char...

What if you want to work with bytes?

Well with char being unsigned (everywhere) you can of course use char.

However it would be much better to use the uint8_t type for that to
clearly signify that intention.

@ac000 ac000 force-pushed the funsigned-char branch from 0ab815b to 5eb8f02 Compare July 3, 2024 19:56
@ac000
Copy link
Member Author

ac000 commented Jul 3, 2024

  • Rebased with master
  • Commit message tweak
$ git range-diff 0ab815b8...5eb8f02f
 -:  -------- >  1:  5c4911a3 perl: Constify some local static variables
 -:  -------- >  2:  ab1b3f9d test/clone: Constify some local static variables
 -:  -------- >  3:  8e254a4d tstr, conf: Ensure error strings are nul-terminated
 -:  -------- >  4:  d62a5e2c contrib: updated njs to 0.8.5
 -:  -------- >  5:  a9aa9e76 python: Support application factories
 -:  -------- >  6:  d67d3501 tests: Add tests for python application factories
 -:  -------- >  7:  36213522 Fix certificate deletion for array type certificates
 -:  -------- >  8:  ff6d5045 python: Constify some local static variables
 1:  0ab815b8 !  9:  5eb8f02f Compile with -funsigned-char
    @@ Commit message
     
         Due to 'char' (unless explicitly set) being signed or unsigned depending
         on architecture, e.g on x86 it's signed, while on Arm it's unsigned,
    -    this can lead to subtle bugs if you use a plain char as a byte thinking
    -    it's unsigned on all platforms (maybe you live in the world of Arm).
    +    this can lead to subtle bugs such if you use a plain char as a byte
    +    thinking it's unsigned on all platforms (maybe you live in the world of
    +    Arm).
     
         What we can do is tell the compiler to treat 'char' as unsigned by
         default, thus it will be consistent across platforms. Seeing as most of

@ac000 ac000 force-pushed the funsigned-char branch from 5eb8f02 to 5019157 Compare July 3, 2024 20:17
@ac000
Copy link
Member Author

ac000 commented Sep 23, 2024

While I don't envisage this causing any issues I do want to give it a full development cycle to soak...

@ac000 ac000 marked this pull request as ready for review September 23, 2024 02:33
@hongzhidao
Copy link
Contributor

It seems tests are passed with the change, we don't need to touch any other code, right?

@ac000
Copy link
Member Author

ac000 commented Sep 23, 2024

That's right...

Copy link
Contributor

@javorszky javorszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny change, well reasoned need and consequences, tests pass, :shipit:

Due to 'char' (unless explicitly set) being signed or unsigned depending
on architecture, e.g on x86 it's signed, while on Arm it's unsigned,
this can lead to subtle bugs such if you use a plain char as a byte
thinking it's unsigned on all platforms (maybe you live in the world of
Arm).

What we can do is tell the compiler to treat 'char' as unsigned by
default, thus it will be consistent across platforms. Seeing as most of
the time it doesn't matter whether char is signed or unsigned, it
really only matters when you're dealing with 'bytes', which means it
makes sense to default char to unsigned.

The Linux Kernel made this change at the end of 2022.

This will also allow in the future to convert our u_char's to char's
(which will now be unsigned) and pass them directly into the libc
functions for example, without the need for casting.

Here is what the ISO C standard has to say

From §6.2.5 Types ¶15

  The three types char, signed char, and unsigned char are collectively
  called the character types. The implementation shall define char to
  have the same range, representation, and behavior as either signed
  char or unsigned char.[45]

and from Footnote 45)

  CHAR_MIN, defined in <limits.h>, will have one of the values 0 or
  SCHAR_MIN, and this can be used to distinguish the two options.
  Irrespective of the choice made, char is a separate type from the
  other two and is not compatible with either.

If you're still unsure why you'd want this change...

It was never clear to me, why we used u_char, perhaps that was used as
an alternative to -funsigned-char...

But that still leaves the potential for bugs with char being unsigned vs
signed...

Then because we use u_char but often need to pass such things into libc
(and perhaps other functions) which normally take a 'char' we need to
cast these cases.

So this change brings at least two (or more) benefits

  1) Removal of potential for char unsigned vs signed bugs.

  2) Removal of a bunch of casts. Reducing casting to the bare minimum
     is good. This helps the compiler to do proper type checking.

  3) Readability/maintainability, everything is now just char...

What if you want to work with bytes?

Well with char being unsigned (everywhere) you can of course use char.

However it would be much better to use the uint8_t type for that to
clearly signify that intention.

Link: <https://lore.kernel.org/lkml/Y1Bfg06qV0sDiugt@zx2c4.com/>
Link: <https://lore.kernel.org/lkml/20221019203034.3795710-1-Jason@zx2c4.com/>
Link: <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3bc753c06dd02a3517c9b498e3846ebfc94ac3ee>
Link: <https://www.iso-9899.info/n1570.html#6.2.5p15>
Suggested-by: Alejandro Colomar <alx@kernel.org>
Reviewed-by: Alejandro Colomar <alx@kernel.org>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
@ac000
Copy link
Member Author

ac000 commented Sep 24, 2024

Rebased with master

$ git range-diff 5019157e...355f038f
 -:  -------- >  1:  0bd18ecd Packages: clean up EOL debian-based distributions
 -:  -------- >  2:  151305eb Packages: added Ubuntu 24.04 "noble" support
 -:  -------- >  3:  6c04c7dc Packages: don't redefine FORTIFY_SOURCE on Ubuntu
 -:  -------- >  4:  804a12fe Packages: remove support for EOL Fedora versions (35-38)
 -:  -------- >  5:  929a275a Packages: removed CentOS 6 leftover
 -:  -------- >  6:  63fb95ed Packages: removed CentOS 7 due to EOL
 -:  -------- >  7:  0e6cc6dc auto: Fix some indentation in auto/modules/wasm-wasi-component
 -:  -------- >  8:  b5fe3eaf auto, wasm-wc: Copy the .so into build/lib/unit/modules/
 -:  -------- >  9:  706ea1a6 tools/unitctl: Enable Multi Socket Support
 -:  -------- > 10:  6ca27c32 contrib: make sha512sum check compatible with FreeBSD 14+
 -:  -------- > 11:  081e5115 status: Constify a bunch of local variables
 -:  -------- > 12:  c8d70c3f status: Use a variable to represent the status member index
 -:  -------- > 13:  55041ef9 Flow the language module name into nxt_app_lang_module_t
 -:  -------- > 14:  707f4ef8 status: Show list of loaded language modules
 -:  -------- > 15:  f4ba4b55 tests: Fix `/status' endpoint tests for new 'modules' section
 -:  -------- > 16:  ebb10b0a Fix a comment typo for 'Memory-only buffers' in src/nxt_buf.h
 -:  -------- > 17:  1c607662 status: Add a missing check for potential NULL
 -:  -------- > 18:  58fdff54 fuzzing: added cifuzz workflow
 -:  -------- > 19:  fcbaf8f3 fuzzing: fix harness bugs
 -:  -------- > 20:  61c13ade fuzzing: update directory path in README and build-fuzz.sh
 -:  -------- > 21:  90542dbd ci: cifuzz: Bump github/codeql-action from 2 to 3
 -:  -------- > 22:  57a75ea0 build(deps): bump openssl from 0.10.64 to 0.10.66 in /tools/unitctl
 -:  -------- > 23:  b892e994 tools/unitctl: update readme
 -:  -------- > 24:  1b484303 tools/unitctl: update readme
 -:  -------- > 25:  e743b6ce tools/unitctl: remove (default) from option text
 -:  -------- > 26:  2a243740 tools/unitctl: make json-pretty default output fmt
 -:  -------- > 27:  43faf99d tools/unitctl: reword freeform message for output
 -:  -------- > 28:  a91b961d tools/unitctl: make application directory configurable
 -:  -------- > 29:  e56c4ede fuzzing: code cleanup
 -:  -------- > 30:  900d25c3 fuzzing: fixed harness bug
 -:  -------- > 31:  bc49274d fuzzing: updated JSON target
 -:  -------- > 32:  3667c3e2 fuzzing: added new basic targets
 -:  -------- > 33:  06c4ea1f Add a basic .editorconfig file
 -:  -------- > 34:  0f313e2b CONTRIBUTING.md: Re-flow text
 -:  -------- > 35:  20224279 CONTRIBUTING.md: Update the 'Git Style Guide' section
 -:  -------- > 36:  5d32e500 Packaging: fix build-depends on multiarch debian systems
 -:  -------- > 37:  12c376ab README: Update number of supported languages
 -:  -------- > 38:  11a70a32 docs/openapi: Update the /status endpoint URL
 -:  -------- > 39:  ae4795aa docs/openapi: Add entries for the new /status/modules endpoint
 -:  -------- > 40:  f38201c2 auto: Add a check for Linux's sched_getaffinity(2)
 -:  -------- > 41:  2444d45e lib: Better available cpu count determination on Linux
 -:  -------- > 42:  57c88fd4 router: Make the number of router threads configurable
 -:  -------- > 43:  97c15fa3 socket: Use a default listen backlog of -1 on Linux
 -:  -------- > 44:  76489fb7 conf, router: Make the listen(2) backlog configurable
 -:  -------- > 45:  2eecee75 var: Restrict nxt_tstr_query() to only support syn
 -:  -------- >  1:  0bd18ecd Packages: clean up EOL debian-based distributions
 -:  -------- >  2:  151305eb Packages: added Ubuntu 24.04 "noble" support
 -:  -------- >  3:  6c04c7dc Packages: don't redefine FORTIFY_SOURCE on Ubuntu
 -:  -------- >  4:  804a12fe Packages: remove support for EOL Fedora versions (
35-38)
 -:  -------- >  5:  929a275a Packages: removed CentOS 6 leftover
 -:  -------- >  6:  63fb95ed Packages: removed CentOS 7 due to EOL
 -:  -------- >  7:  0e6cc6dc auto: Fix some indentation in auto/modules/wasm-wa
si-component
 -:  -------- >  8:  b5fe3eaf auto, wasm-wc: Copy the .so into build/lib/unit/mo
dules/
 -:  -------- >  9:  706ea1a6 tools/unitctl: Enable Multi Socket Support
 -:  -------- > 10:  6ca27c32 contrib: make sha512sum check compatible with Free
BSD 14+
 -:  -------- > 11:  081e5115 status: Constify a bunch of local variables
 -:  -------- > 12:  c8d70c3f status: Use a variable to represent the status mem
ber index
 -:  -------- > 13:  55041ef9 Flow the language module name into nxt_app_lang_mo
dule_t
 -:  -------- > 14:  707f4ef8 status: Show list of loaded language modules
 -:  -------- > 15:  f4ba4b55 tests: Fix `/status' endpoint tests for new 'modul
es' section
 -:  -------- > 16:  ebb10b0a Fix a comment typo for 'Memory-only buffers' in sr
c/nxt_buf.h
 -:  -------- > 17:  1c607662 status: Add a missing check for potential NULL
 -:  -------- > 18:  58fdff54 fuzzing: added cifuzz workflow
 -:  -------- > 19:  fcbaf8f3 fuzzing: fix harness bugs
 -:  -------- > 20:  61c13ade fuzzing: update directory path in README and build
-fuzz.sh
 -:  -------- > 21:  90542dbd ci: cifuzz: Bump github/codeql-action from 2 to 3
 -:  -------- > 22:  57a75ea0 build(deps): bump openssl from 0.10.64 to 0.10.66 
in /tools/unitctl
 -:  -------- > 23:  b892e994 tools/unitctl: update readme
 -:  -------- > 24:  1b484303 tools/unitctl: update readme
 -:  -------- > 25:  e743b6ce tools/unitctl: remove (default) from option text
 -:  -------- > 26:  2a243740 tools/unitctl: make json-pretty default output fmt
 -:  -------- > 27:  43faf99d tools/unitctl: reword freeform message for output
 -:  -------- > 28:  a91b961d tools/unitctl: make application directory configur
able
 -:  -------- > 29:  e56c4ede fuzzing: code cleanup
 -:  -------- > 30:  900d25c3 fuzzing: fixed harness bug
 -:  -------- > 31:  bc49274d fuzzing: updated JSON target
 -:  -------- > 32:  3667c3e2 fuzzing: added new basic targets
 -:  -------- > 33:  06c4ea1f Add a basic .editorconfig file
 -:  -------- > 34:  0f313e2b CONTRIBUTING.md: Re-flow text
 -:  -------- > 35:  20224279 CONTRIBUTING.md: Update the 'Git Style Guide' sect
ion
 -:  -------- > 36:  5d32e500 Packaging: fix build-depends on multiarch debian s
ystems
 -:  -------- > 37:  12c376ab README: Update number of supported languages
 -:  -------- > 38:  11a70a32 docs/openapi: Update the /status endpoint URL
 -:  -------- > 39:  ae4795aa docs/openapi: Add entries for the new /status/modu
les endpoint
 -:  -------- > 40:  f38201c2 auto: Add a check for Linux's sched_getaffinity(2)
 -:  -------- > 41:  2444d45e lib: Better available cpu count determination on L
inux
 -:  -------- > 42:  57c88fd4 router: Make the number of router threads configur
able
 -:  -------- > 43:  97c15fa3 socket: Use a default listen backlog of -1 on Linu
x
 -:  -------- > 44:  76489fb7 conf, router: Make the listen(2) backlog configurable
 -:  -------- > 45:  2eecee75 var: Restrict nxt_tstr_query() to only support synchronous operation
 -:  -------- > 46:  76a255b2 http: Refactor return action
 -:  -------- > 47:  08a23272 http: Refactor route pass query
 -:  -------- > 48:  9d19e7e0 http: Refactor static action
 -:  -------- > 49:  ecb3f86c http: Refactor access log write
 -:  -------- > 50:  5f6ae1a1 var: Remove unused functions and structure fields
 -:  -------- > 51:  82e168fe http: Refactor out nxt_tstr_cond_t from the access log module
 -:  -------- > 52:  57f93956 http: Get rid of nxt_http_request_access_log()
 -:  -------- > 53:  debd61c3 http: Add "if" option to the "match" object
 -:  -------- > 54:  43c4bfdc tests: "if" option in http route match
 -:  -------- > 55:  593564fd ci/unitctl: Update paths
 -:  -------- > 56:  cad6aed5 Tests: initial "wasm-wasi-component" test
 -:  -------- > 57:  05b1c769 docs/openapi: Fix brokenness
 -:  -------- > 58:  cff18f89 docs/openapi: Add new config options
 -:  -------- > 59:  71920769 fuzzing: fixed harness bug
 -:  -------- > 60:  932b9146 socket: Prevent buffer under-read in nxt_inet_addr()
 -:  -------- > 61:  1a685084 Docker: bump Go versions
 -:  -------- > 62:  5b47542e Docker: update Rust version
 -:  -------- > 63:  8eb5d128 Docker: introduce "slim" python images
 -:  -------- > 64:  4778099b Docker: leave artifacts when build targets succeed
 -:  -------- > 65:  6e3152c0 Remove .hgtags
 -:  -------- > 66:  778d81cc Remove .hgignore files
 -:  -------- > 67:  0951778d Added .gitignore for pkg/contrib/tarballs
 -:  -------- > 68:  f4298f94 tests: Fix `/status' endpoint to cater for lists
 -:  -------- > 69:  264f4af4 test/wasm-wc: Target wasm32-wasip1
 -:  -------- > 70:  19cd88ef test/wasm-wc: Rename test_wasm_component.py
 -:  -------- > 71:  337cba43 ci: Enable the wasm-wasi-component tests
 -:  -------- > 72:  011071aa wasm-wc: bump wasmtime to v24
 -:  -------- > 73:  56c237b3 wasm-wc: Enable environment inheritance
 -:  -------- > 74:  27d3a5c7 java: Update third-party components
 -:  -------- > 75:  5c58f9d0 ci: Fix tags on ad hoc unitctl releases
 -:  -------- > 76:  9998918d Packages: bump wasmtime to 24.0.0 and wasi-sysroot to 24.0.
 -:  -------- > 77:  c5846ba3 ci: Fix wasmtime paths in ci.yml
 -:  -------- > 78:  46ddb010 ci: Trigger ci.yml for changes under pkg/contrib
 -:  -------- > 79:  6976a614 tests: Fix routing tests in the no njs case
 -:  -------- > 80:  cff5e092 tests: Suppress cargo-component output
 -:  -------- > 81:  5fde2ff7 http: Fix router process crash whilst using proxy
 -:  -------- > 82:  50b1aca3 python: Don't decrement a reference to a borrowed object
 -:  -------- > 83:  3c563849 unitctl: Don't track unit-openapi/.openapi-generator/
 -:  -------- > 84:  63148a31 tools/unitctl: whitespace fixes
 -:  -------- > 85:  5e8a6893 tools/unitctl: rename app -> apps, fix readme
 -:  -------- > 86:  f2e05bc7 docs: remove security.txt file
 -:  -------- > 87:  cc863e15 docs: add SECURITY.md
 -:  -------- > 88:  0dcd3a91 tools/unitctl: rename UNIT -> Unit
 -:  -------- > 89:  9e5f961b tools/unitctl: add export subcommand to readme
 -:  -------- > 90:  7c48546a tools/unitctl: adjust readme for socket addresses
 -:  -------- > 91:  15f76506 tools/unitctl: change reload to restart
 -:  -------- > 92:  f7771378 pkg/docker: Update dockerfiles for 1.33.0
 -:  -------- > 93:  3144710f tools/unitctl: Update for version 1.33.0
 -:  -------- > 94:  c3d6e540 docs/changes.xml: Add 1.33.0 changelog entries
 -:  -------- > 95:  24ed91f4 Add 1.33.0 CHANGES
 -:  -------- > 96:  4d627c8f docs/unit-openapi.yaml: Update version for 1.33.0
 -:  -------- > 97:  ba234b4d Version bump
 1:  5019157e = 98:  355f038f Compile with -funsigned-char

@ac000 ac000 merged commit 355f038 into nginx:master Sep 24, 2024
23 checks passed
@ac000 ac000 deleted the funsigned-char branch September 24, 2024 14:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants