Skip to content

Commit

Permalink
clean up compile time constants
Browse files Browse the repository at this point in the history
  • Loading branch information
daanx committed Jan 13, 2024
1 parent db16187 commit 594e4e3
Show file tree
Hide file tree
Showing 6 changed files with 128 additions and 117 deletions.
2 changes: 1 addition & 1 deletion kklib/include/kklib.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
found in the LICENSE file at the root of this distribution.
---------------------------------------------------------------------------*/

#define KKLIB_BUILD 130 // modify on changes to trigger recompilation
#define KKLIB_BUILD 131 // modify on changes to trigger recompilation
// #define KK_DEBUG_FULL 1 // set to enable full internal debug checks

// Includes
Expand Down
64 changes: 34 additions & 30 deletions kklib/include/kklib/string.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@
and we cannot generally use C string functions to manipulate our strings.
There are four possible representations for strings:
- singleton empty string
- small string of at most 7 utf-8 bytes
- normal string of utf-8 bytes
- raw string pointing to a buffer of utf-8 bytes
These are not necessarily canonical (e.g. a normal or small string can have length 0 besides being empty)
Strings use kk_bytes_t directly: they are just bytes except always contain valid utf-8.
Expand All @@ -33,57 +33,57 @@

/*-------------------------------------------------------------------------------------------------------------
qutf-8 and qutf-16
There few important cases where *external* text is not quite utf-8 or utf-16.
We call these "qutf-8" and "qutf-16" for "quite like" utf-8/16:
- qutf-8: this is mostly utf-8, but allows invalid utf-8 like overlong sequences or lone
continuation bytes -- as such, any byte sequence is valid qutf-8. This occurs a lot in
practice, for example by bad json encoding containing binary data, but also as a result
practice, for example by bad json encoding containing binary data, but also as a result
of a _locale_ that cannot be decoded properly, or generally just random byte input.
- qutf-16: this is mostly utf-16 but allows again any invalid utf-16 which consists
of lone halves of surrogate pairs -- and again, any sequence of uint16_t is valid qutf-16.
This is actually what is used in Windows file names and JavaScript.
In particular for qutf-16 we would like to guarantee that decoding to utf-8 and encoding
of lone halves of surrogate pairs -- and again, any sequence of uint16_t is valid qutf-16.
This is actually what is used in Windows file names and JavaScript.
In particular for qutf-16 we would like to guarantee that decoding to utf-8 and encoding
again to qutf-16 is an identity transformation -- for example, we may list the contents
of a directory and then try to read each file. As a consequence we cannot replace invalid
codes in qutf-16 with a generic replacement character. One proposed solution for this is
to use wtf-8 (used in Rust <https://github.com/rust-lang/rust/issues/12056#issuecomment-55786546>)
instead of utf-8 internally.
codes in qutf-16 with a generic replacement character. One proposed solution for this is
to use wtf-8 (used in Rust <https://github.com/rust-lang/rust/issues/12056#issuecomment-55786546>)
instead of utf-8 internally.
We like to use strict utf-8 internally though, so we can always output valid utf-8 directly
without further conversions. (also, new formats like wtf-8 often have tricky edge cases, like
without further conversions. (also, new formats like wtf-8 often have tricky edge cases, like
naively appending strings may change the interpretation of surrogate pairs in wtf-8)
Instead, we solve this by staying in strict utf-8 internally, but we reserve a
particular set of code-points to have a special meaning when converting to/from qutf-8 and qutf-16.
For now, we use an (hopefully forever) unassigned range in the "supplementary special-purpose plane" (14)
- ED800 - EDFFF: corresponds to a lone half `h` of a surrogate pair where `h = code - E0000`.
- EE000 - EE07F: <unused>
- EE080 - EE0FF: corresponds to an invalid byte `b` in an invalid utf-8 sequence where `b = code - EE000`.
(note: invalid bytes in utf-8 are always >= 0x80 so we need only a limited range).
We call this the "raw range". The advantage over using the replacement character is that we
now retain full information what the original (invalid) sequences were (and can thus do an
We call this the "raw range". The advantage over using the replacement character is that we
now retain full information what the original (invalid) sequences were (and can thus do an
identity transform) -- and we stay with valid utf-8. Moreover, we can handle both invalid
utf-8 and invalid utf-16 with this.
When decoding qutf-8/16 to utf-8, we decode invalid sequences to these code points; and only when
encoding back to qutf-8/16, we encode these code points specially again to make this an identity
transformation.
transformation.
_Otherwise these are just regular code points and valid utf-8 with no special treatment_.
Security wise this is also good practice -- for example, we decode the overlong qutf-8
sequence `0xC0 0x80` not to a 0 character, but to two raw code points: 0xEE0C0 0xEE080. This
Security wise this is also good practice -- for example, we decode the overlong qutf-8
sequence `0xC0 0x80` not to a 0 character, but to two raw code points: 0xEE0C0 0xEE080. This
way, we maintain an identity transform while still preventing hidden embedded 0 characters.
(Actually, to make it a true identity transform, when decoding qutf-8/16 we also need to treat
bytes/surrogate pairs that happen be code points in our raw range as an invalid sequence.
This should be fine in practice as these are unassigned anyways).
This should be fine in practice as these are unassigned anyways).
------------------------------------------------------------------------------------------------------------*/

#define KK_RAW_PLANE ((kk_char_t)(0xE0000))
Expand Down Expand Up @@ -120,22 +120,26 @@ static inline kk_string_t kk_string_empty() {
#define kk_define_string_literal(decl,name,len,chars) \
static struct { struct kk_bytes_s _base; size_t length; char str[len+1]; } _static_##name = \
{ { { KK_HEADER_STATIC(0,KK_TAG_STRING) } }, len, chars }; \
decl kk_string_t name = { { (intptr_t)&_static_##name._base._block } };
decl kk_string_t name = { { (intptr_t)&_static_##name._base._block } };
#else
#define kk_declare_string_literal(decl,name,len,chars) \
static kk_ssize_t _static_len_##name = len; \
static const char* _static_##name = chars; \
decl kk_string_t name = { { kk_datatype_null_init } };

#define kk_init_string_literal(name,ctx) \
if (kk_datatype_is_null(name.bytes)) { name = kk_string_alloc_from_utf8n(_static_len_##name, _static_##name, ctx); }
if (kk_datatype_is_null(name.bytes)) { name = kk_string_alloc_from_utf8n(_static_len_##name, _static_##name, ctx); }

#define kk_define_string_literal(decl,name,len,chars,ctx) \
kk_declare_string_literal(decl,name,len,chars) \
kk_init_string_literal(name,ctx)

#endif

#define kk_define_string_literal_empty(decl,name) \
decl kk_string_t name = kk_string_empty();


static inline kk_string_t kk_string_unbox(kk_box_t v) {
return kk_unsafe_bytes_as_string( kk_bytes_unbox(v) );
}
Expand Down Expand Up @@ -210,7 +214,7 @@ static inline kk_string_t kk_string_alloc_dupn_valid_utf8(kk_ssize_t len, const
}

// must be guaranteed valid utf8
static inline kk_string_t kk_string_alloc_dup_valid_utf8(const char* s, kk_context_t* ctx) {
static inline kk_string_t kk_string_alloc_dup_valid_utf8(const char* s, kk_context_t* ctx) {
kk_assert_internal(kk_utf8_is_valid(s));
if (s == NULL) return kk_string_empty();
return kk_string_alloc_dupn_valid_utf8( kk_sstrlen(s), (const uint8_t*)s, ctx);
Expand Down Expand Up @@ -241,7 +245,7 @@ static inline kk_string_t kk_string_alloc_raw(const char* s, bool free, kk_conte
}

static inline const uint8_t* kk_string_buf_borrow(const kk_string_t str, kk_ssize_t* len, kk_context_t* ctx) {
return kk_bytes_buf_borrow(str.bytes, len, ctx);
return kk_bytes_buf_borrow(str.bytes, len, ctx);
}

static inline const char* kk_string_cbuf_borrow(const kk_string_t str, kk_ssize_t* len, kk_context_t* ctx) {
Expand Down Expand Up @@ -295,12 +299,12 @@ static inline bool kk_utf8_is_cont(uint8_t c) {
// Advance to the next codepoint. (does not advance past the end)
// This should not validate, but advance to the next non-continuation byte.
static inline const uint8_t* kk_utf8_next(const uint8_t* s) {
s++; // always skip first byte
s++; // always skip first byte
for (; kk_utf8_is_cont(*s); s++) {} // skip continuation bytes
return s;
}

// Retreat to the previous codepoint.
// Retreat to the previous codepoint.
// This should not validate, but backup to the previous non-continuation byte.
static inline const uint8_t* kk_utf8_prev(const uint8_t* s) {
s--; // skip back at least 1 byte
Expand Down
121 changes: 61 additions & 60 deletions lib/std/core.kk
Original file line number Diff line number Diff line change
Expand Up @@ -76,22 +76,6 @@ pub alias io-noexn = <div,io-total>
// The `:io` effect is used for functions that perform arbitrary I/O operations.
pub alias io = <exn,io-noexn>

// File locations
extern file/kk-modulename-extern() : string
inline ""

extern file/kk-line-extern() : string
inline ""

extern file/kk-fileinfo-extern() : string
inline ""

// Automatically replaced with the current file's module name
pub val file/kk-modulename : string = file/kk-modulename-extern()
// Automatically replaced with the current line of the file
pub val file/kk-line : string = file/kk-line-extern()
// Automatically replaced with the current file's name and line
pub val file/kk-fileinfo : string = file/kk-fileinfo-extern()

// ----------------------------------------------------------------------------
// Masking
Expand Down Expand Up @@ -2386,50 +2370,6 @@ pub fun unit/println( u : () )
printsln(show(()))
*/

// ----------------------------------------------------------------------------
// Trace, assert, todo
// ----------------------------------------------------------------------------

extern xtrace : ( message : string ) -> ()
c "kk_trace"
cs "Primitive.Trace"
js "_trace"

extern xtrace-any : forall<a> ( message: string, x : a ) -> ()
c "kk_trace_any"
cs "Primitive.TraceAny"
js "_trace_any"

val trace-enabled : ref<global,bool> = unsafe-total{ ref(True) }

// Trace a message used for debug purposes.
// The behaviour is system dependent. On a browser and node it uses
// `console.log` by default.
// Disabled if `notrace` is called.
pub fun trace( message : string ) : ()
unsafe-total
if !trace-enabled then xtrace(message)

pub fun trace-info( message : string, ?kk-fileinfo: string ) : ()
unsafe-total
if !trace-enabled then xtrace(message ++ " " ++ implicit/kk-fileinfo)

pub fun trace-any( message : string, x : a ) : ()
unsafe-total
if !trace-enabled then xtrace-any(message,x)

// Disable tracing completely.
pub noinline fun notrace() : st<global> ()
trace-enabled := False

extern unsafe-assert-fail( msg : string ) : ()
c "kk_assert_fail"
js inline "function() { throw new Error(\"assertion failed: \" + #1) }()"

pub fun assert( message : string, condition : bool ) : () // Compiler removes assert calls in optimized builds
if !condition then unsafe-assert-fail(message)


// ----------------------------------------------------------------------------
// Exceptions
// ----------------------------------------------------------------------------
Expand Down Expand Up @@ -2729,3 +2669,64 @@ pub extern phantom<a>() : a
c inline "kk_box_null()"
inline "undefined"


// ------------------------------------------------------------------------------
// File locations
// ------------------------------------------------------------------------------

// Compilation constant that is replaced with the current file's module name
pub val file/kk-module : string = ""

// Compilation constant that is replaced with the current line number
pub val file/kk-line : string = ""

// Compilation constant that is replaced with the current file name
pub val file/kk-file : string = ""

pub fun file/kk-file-line( ?kk-file, ?kk-line )
?kk-file ++ "(line " ++ ?kk-line ++ ")"

// ----------------------------------------------------------------------------
// Trace, assert, todo
// ----------------------------------------------------------------------------

extern xtrace : ( message : string ) -> ()
c "kk_trace"
cs "Primitive.Trace"
js "_trace"

extern xtrace-any : forall<a> ( message: string, x : a ) -> ()
c "kk_trace_any"
cs "Primitive.TraceAny"
js "_trace_any"

val trace-enabled : ref<global,bool> = unsafe-total{ ref(True) }

// Trace a message used for debug purposes.
// The behaviour is system dependent. On a browser and node it uses
// `console.log` by default.
// Disabled if `notrace` is called.
pub fun trace( message : string ) : ()
unsafe-total
if !trace-enabled then xtrace(message)

pub fun trace-info( message : string, ?kk-file-line : string ) : ()
trace(?kk-file-line ++ ": " ++ message)

pub fun trace-show( x : a, ?show : a -> string, ?kk-file-line : string ) : ()
trace-info(x.show)

pub fun trace-any( message : string, x : a ) : ()
unsafe-total
if !trace-enabled then xtrace-any(message,x)

// Disable tracing completely.
pub noinline fun notrace() : st<global> ()
trace-enabled := False

extern unsafe-assert-fail( msg : string ) : ()
c "kk_assert_fail"
js inline "function() { throw new Error(\"assertion failed: \" + #1) }()"

pub fun assert( message : string, condition : bool, ?kk-file-line : string ) : () // Compiler removes assert calls in optimized builds
if !condition then unsafe-assert-fail(kk-file-line ++ ": " ++ message)
4 changes: 3 additions & 1 deletion samples/syntax/basic.kk
Original file line number Diff line number Diff line change
Expand Up @@ -99,5 +99,7 @@ pub fun increment2(xs) {
xs.map(fn(x){ x + 1 }).filter(fn(x){ x > 2 })
}

// `trace-info` also traces the current file and line number.
pub fun example-trace(): ()
trace-info("Hello")
trace-info("example trace")
trace("module: " ++ kk-module)
10 changes: 5 additions & 5 deletions src/Common/NamePrim.hs
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ module Common.NamePrim
, nameCoreHnd
, isPrimitiveName
, nameOpExpr

-- Core names
, nameCoreFileLine, nameCoreModuleName, nameCoreFileInfo
, nameCoreFileLine, nameCoreFileModule, nameCoreFileFile

-- * Operations
, namePatternMatchError, nameMainConsole
Expand Down Expand Up @@ -200,9 +200,9 @@ nameTrace = preludeName "trace"
nameLog = preludeName "log"
namePhantom = preludeName "phantom"

nameCoreFileInfo = newLocallyQualified "std/core" "file" "kk-fileinfo"
nameCoreFileLine = newLocallyQualified "std/core" "file" "kk-line"
nameCoreModuleName = newLocallyQualified "std/core" "file" "kk-modulename"
nameCoreFileFile = qualify nameSystemCore (newLocallyQualified "" "file" "kk-file")
nameCoreFileLine = qualify nameSystemCore (newLocallyQualified "" "file" "kk-line")
nameCoreFileModule = qualify nameSystemCore (newLocallyQualified "" "file" "kk-module")


{--------------------------------------------------------------------------
Expand Down
Loading

0 comments on commit 594e4e3

Please sign in to comment.