Skip to content

Commit

Permalink
[doc] Clarify that strings are arbitrary byte sequences.
Browse files Browse the repository at this point in the history
Add link to #FAQ posts.
  • Loading branch information
Andy Chu committed Jul 3, 2019
1 parent b16bdb9 commit 9db0d9a
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions doc/osh-manual.md
Expand Up @@ -295,16 +295,18 @@ or denoted in ASCII with C-escaped strings, i.e. `$''`:

#### Data Encoding

The **data** they operate on should also be UTF-8 / ASCII.
Strings in OSH are arbitrary sequences of **bytes**. Caveats:

For example, the length operator `${#s}` and slicing `${s:1:3}` perform UTF-8
decoding. Decoding errors are fatal if `shopt -s strict-word-eval` is on.
- When passed to external programs, strings are truncated at the first `NUL`
(`'\0'`) byte. This is just how Unix and C work.
- The length operator `${#s}` and slicing `${s:1:3}` require their input to be
**valid UTF-8**. Decoding errors are fatal if `shopt -s strict-word-eval` is
on.

The GNU `iconv` program converts text from one encoding to another.

Also see [Notes on Unicode in Shell][unicode.md].


[unicode.md]: https://github.com/oilshell/oil/blob/master/doc/unicode.md

### Bugs
Expand All @@ -313,6 +315,8 @@ Also see [Notes on Unicode in Shell][unicode.md].

### Links

- [Blog Posts Tagged #FAQ](http://www.oilshell.org/blog/tags.html?tag=FAQ#FAQ)
tell you why OSH exists and how it's designed.
- [Known Differences](known-differences.html) lists incompatibilities between
OSH and other shells. They are unlikely to appear in real programs, or
there is a trivial workaround.
Expand Down

0 comments on commit 9db0d9a

Please sign in to comment.