Skip to content

LispKit Format

Matthias Zenger edited this page Aug 5, 2023 · 1 revision

Library (lispkit format) provides an implementation of Common Lisp's format procedure for LispKit. Procedure format can be used for creating formatted text using a format string similar to printf. The formatting formalism, though, is significantly more expressive, allowing users to display numbers in various formats (e.g. hex, binary, octal, roman numerals, natural language), applying conditional formatting, outputting text in a tabular format, iterating over data structures, and even applying format recursively to handle data that includes its own preferred formatting strings.

Usage overview

In its most simple form, procedure format gets invoked with a control string followed by an arbitrary number of arguments. The control string consists of characters that are copied verbatim into the output as well as formatting directives. All formatting directives start with a tilde (~) and end with a single character identifying the type of the directive. Directives may also take prefix parameters written immediately after the tilde character, separated by comma as well as modifiers (see below for details).

For example, the call of format below injects two integer arguments into the control string via directive ~D and returns the resulting string:

(format "There are ~D warnings and ~D errors." 12 7)
⇒ "There are 12 warnings and 7 errors."

Simple Directives

Here is a simple control string which injects a readable description of an argument via the directive ~A: "I received ~A as a response". Directive ~A refers to a the next argument provided to format when compiling the formatted output:

(format "I received ~A as a response" "nothing")
⇒ "I received nothing as a response"
(format "I received ~A as a response" "a long email")
⇒ "I received a long email as a response"

Directive ~A may be given parameters to influence the formatted output. The first parameter of ~A-directives defines the minimal length. If the length of the textual representation of the next argument is smaller than the minimal length, padding characters are inserted:

(format "|Name: ~10A|Location: ~13A|" "Smith" "New York")
⇒ "|Name: Smith     |Location: New York     |"
(format "|Name: ~10A|Location: ~13A|" "Williams" "San Francisco")
⇒ "|Name: Williams  |Location: San Francisco|"
(format "|Name: ~10,,,'_@A|Location: ~13,,,'-A|" "Garcia" "Los Angeles")
⇒ "|Name: ____Garcia|Location: Los Angeles--|"

The third example above utilizes more than one parameter and, in one case, includes a @ modifier. The directive ~13,,,'-A defines the first and the fourth parameter. The second and third parameter are omitted and thus defaults are used. The fourth parameter defines the padding character. If character literals are used in the parameter list, they are prefixed with a quote '. The directive ~10,,,'_@A includes an @ modifier which will result in padding of the output on the left.

It is possible to inject a parameter from the list of arguments. The following examples show how parameter v is used to do this for formatting a floating-point number with a configurable number of fractional digits.

(format "length = ~,vF" 2 pi)
⇒ "length = 3.14"
(format "length = ~,vF" 4 pi)
⇒ "length = 3.1416"

Here v is used as the second parameter of the fixed floating-point directive ~F, indicating the number of fractional digits. It refers to the next provided argument (which is either 2 or 4 in the examples above).

Composite Directives

The next example shows how one can refer to the total number of arguments that are not yet consumed in the formatting process by using # as a parameter value.

(format "~A left for formatting: ~#[none~;one~;two~:;many~]."
        "Arguments" "eins" 2)
⇒ "Arguments left for formatting: two."
(format "~A left for formatting: ~#[none~;one~;two~:;many~]."
        "Arguments")
⇒ "Arguments left for formatting: none."
(format "~A left for formatting: ~#[none~;one~;two~:;many~]."
        "Arguments", "eins", 2, "drei", "vier")
⇒ "Arguments left for formatting: many."

In these examples, the conditional directive ~[ is used. It is followed by clauses separared by directive ~; until ~] is reached. Thus, there are four clauses in the example above: none, one, two, and many. The parameter in front of the ~[ directive determines which of the clauses is being output. All other clauses will be discarded. For instance, ~1[zero~;one~;two~:;many~] will output one as clause 1 is chosen (which is the second one, given that numbering starts with zero). The last clause is special because it is prefixed with the ~; directive using a : modifier: this is a default clause which is chosen when none of the others are applicable. Thus, ~8[zero~;one~;two~:;many~] outputs many. This also explains how the example above works: here # refers to the number of arguments that are still available and this number drives what is being returned in this directive: ~#[...~].

Another powerful composite directive is the iteration directive ~{. With this directive it is possible to iterate over all elements of a sequence. The control string between ~{ and ~} gets repeated as long as there are still elements left in the sequence which is provided as an argument. For instance, Numbers:~{ ~A~} applied to argument ("one" "two" "three") results in the output Numbers: one two three. The control string between ~{ and ~} can also consume more than one element of the sequence. Thus, Numbers:~{ ~A=>~A~} applied to argument ("one" 1 "two" 2) outputs Numbers: one=>1 two=>2.

Of course, it is also possible to nest arbitrary composite directives. Here is an example for a control string that uses a combination of iteration and conditional directives to output the elements of a sequence separated by a comma: (~{~#[~;~A~:;~A, ~]~}). When this control string is used with the argument ("one" "two" "three"), the following formatted output is generated: (one, two, three).

Formatting language

Control strings consist of characters that are copied verbatim into the output as well as formatting directives. All formatting directives start with a tilde (~) and end with a single character identifying the type of the directive. Directives may take prefix parameters written immediately after the tilde character, separated by comma. Both integers and characters are allowed as parameters. They may be followed by formatting modifiers :, @, and +. This is the general format of a formatting directive:

~param1,param2,...mX

where

  • m is a potentially empty modifier, consisting of an arbitrary sequence of modifier characters :, @, and +
  • X is a character identifying a directive type
  • paramN is either a nummeric or character parameter according to the specification below.

The following grammar describes the syntax of directives formally in BNF:

<directive>  ::= "~" <modifiers> <char>
               | "~" <parameters> <modifiers> <char>
<modifiers>  ::= <empty>
               | ":" <modifiers>
               | "@" <modifiers>
               | "+" <modifiers>
<parameters> ::= <parameter>
               | <parameter> "," <parameters>
<parameter>  ::= <empty>
               | "#"
               | "v"
               | <number>
               | "-" <number>
               | <character>
<number>     ::= <digit>
               | <digit> <number>
<digit>      ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<character>  ::= "'" <char>

Formatting directives

The formatting directives supported by library (lispkit format) are based on the directives specified in Common Lisp the Language, 2nd Edition by Guy L. Steele Jr. Some directives have been extended to meet today's formatting requirements (e.g. to support localization) and to enable a powerful usage throughout LispKit. Extensions were introduced in a way to not impact backward compatibility.

Directive Explanation
~a
~A
ASCII:  ~mincol,colinc,minpad,padchar,maxcol,elcharA

The next argument arg is output as if procedure display was used, i.e. the output is without escape characters and if arg is a string, its characters will be output verbatim without surrounding quotes.

mincol (default: 0) specifies the minimal "width" of the output of the directive in characters, maxcol (default: ∞) specifies the maximum width. padchar (default: ' ') defines the character that is used to pad the output to make sure it is at least mincol characters long. By default, the output is padded on the right with at least minpad (default: 0) copies of padchar. Padding characters are then inserted colinc (default: 1) characters at a time until the total width is at least mincol. Padding is capped such that the output never exceeds maxcol characters. If, without padding, the output is already longer than maxcol, the output is truncated at width maxcol - 1 and the ellipsis character elchar (default: '…') is inserted at the end.

Modifier @ enables padding on the left to right-align the output.

~w
~W
WRITE:  ~mincol,colinc,minpad,padchar,maxcol,elcharW

The next argument arg is output as if procedure write was used, i.e. the output is with escape characters and if arg is a string, its characters will be output surrounded by quotes.

Parameters mincol (default: 0), colinc (default: 1), minpad (default: 0), padchar (default: ' '), maxcol (default: ∞), and elchar (default: '…') are used just as described for the ASCII directive ~A. Modifier @ enables padding on the left to right-align the output.

~s
~S
SOURCE:  ~mincol,colinc,minpad,padchar,maxcol,elcharS

The next argument arg is output using a type-specific control string. If no control string is registered for the type of arg, then ~S behaves like ~W for arg.

Parameters mincol (default: 0), colinc (default: 1), minpad (default: 0), padchar (default: ' '), maxcol (default: ∞), and elchar (default: '…') are used just as described for the ASCII directive ~A. Modifier @ enables padding on the left to right-align the output.

~c
~C
CHARACTER:  ~C

The next argument arg should be a character or a string consisting of one character. Directive ~C outputs arg in a form dependent on the modifiers used. Without any modifiers, arg is output as if the character was used in a string without any escaping.

If the @ modifier is provided alone, the character is output using Scheme's syntax for character literals. The modifier combination @: will lead to arg being output as Unicode code points. The combination @:+ will output arg as a sequence of Unicode scalar property names, separated by comma.

If the : modifier is used (without @), a representation of arg for the usage in XML documents is chosen. By default, a Unicode-based XML character encoding is used, unless : is combined with +, in which case the character is represented as a XML named character entity when possible, otherwise, the character is output in raw form.

If the + modifiers is used alone, the character is output as if it is a character of a string, escaped if necessary, and surrounded by quotes.

  (format "~C" #\A)A
  (format "~+C" #\A)"A"
  (format "~+C" #\newline)"\n"
  (format "~@C" "A")#\A
  (format "~@C" "\t")#\tab
  (format "~@:C" "©")U+00A9
  (format "~@:+C" "©")COPYRIGHT SIGN
  (format "~:C" "©")&#xA9;
  (format "~:+C" "©")&copy;

~d
~D
DECIMAL:  ~mincol,padchar,groupchar,groupcolD

The next argument arg is output in decimal radix. arg should be an integer, in which case no decimal point is printed. For floating-point numbers which do not represent an integer, a decimal point and a fractional part are output.

mincol (default: 0) specifies the minimal "width" of the output of the directive in characters with padchar (default: ' ') defining the character that is used to pad the output on the left to make sure it is at least mincol characters long.

  (format "Number: ~D" 8273)Number: 8273
  (format "Number: ~6D" 8273)Number:   8273
  (format "Number: ~6,'0D" 8273)Number: 008273

By default, the number is output without grouping separators. groupchar specifies which character should be used to separate sequences of groupcol digits in the output. Grouping of digits gets enabled with the : modifier.

  (format "|~10:D|" 1734865)| 1,734,865|
  (format "|~10,,'.:D|" 1734865)| 1.734.865|

A sign is output only if the number is negative. With the modifier @ it is possible to force output also of positive signs. To facilitate the localization of output, procedure format supports a locale parameter, which is also available via format config objects. Locale-specific output can be enabled for the ~D directive by using the + modifier.

  (format 'de_CH "~+D" 14321)14'321

~b
~B
BINARY:  ~mincol,padchar,groupchar,groupcolB

Binary directive ~B is just like decimal directive ~D but it outputs the next argument in binary radix (radix 2) instead of decimal. It uses the space character as the default for groupchar and has a default grouping size of 4 as the default for groupcol.

  (format "bin(~D) = ~B" 178 178)bin(178) = 10110010
  (format "~:B" 59701)1110 1001 0011 0101
  (format "~19,'0,'.:B" 31912)0111.1100.1010.1000

~o
~O
OCTAL:  ~mincol,padchar,groupchar,groupcolO

Octal directive ~O is just like decimal directive ~D but it outputs the next argument in octal radix (radix 8) instead of decimal. It uses the space character as the default for groupchar and has a default grouping size of 4 as the default for groupcol.

  (format "bin(~D) = ~O" 178 178)bin(178) = 262
  (format "~:O" 59701)16 4465
  (format "~9,'0,',:O" 31912)0007,6250

~x
~X
HEXADECIMAL:  ~mincol,padchar,groupchar,groupcolX

Hexadecimal directive ~X is just like decimal directive ~D but it outputs the next argument in hexadecimal radix (radix 16) instead of decimal. It uses the colon character as the default for groupchar and has a default grouping size of 2 as the default for groupcol. With modifier +, upper case characters are used for representing hexadecimal digits.

  (format "bin(~D) = ~X" 9968 9968)bin(9968) = 26f0
  (format "~:X" 999701)f:41:15
  (format "~+X" 999854)F41AE

~r
~R
RADIX:  ~radix,mincol,padchar,groupchar,groupcolR

The next argument arg is expected to be a fixnum number. It will be output with radix radix (default: 10). mincol (default: 0) specifies the minimal "width" of the output of the directive in characters with padchar (default: ' ') defining the character that is used to pad the output on the left to make it at least mincol characters long.

  (format "Number: ~10R" 1272)Number: 1272
  (format "Number: ~16,8,'0R" 7121972)Number: 006cac34
  (format "Number: ~2R" 173)Number: 10101101

By default, the number is output without grouping separators. groupchar specifies which character should be used to separate sequences of groupcol digits in the output. Grouping of digits is enabled with the : modifier.

  (format "~16,8,,':,2:R" 7121972)6c:ac:34
  (format "~2,14,'0,'.,4:R" 773)0011.0000.0101

A sign is output only if the number is negative. With the modifier @ it is possible to force output also of positive signs.

If parameter radix is not specified at all, then an entirely different interpretation is given. ~R outputs arg as a cardinal number in natural language. The form ~:R outputs arg as an ordinal number in natural language. ~@R outputs arg as a Roman numeral.

  (format "~R" 572)five hundred seventy-two
  (format "~:R" 3)3rd
  (format "~@R" 1272)MCCLXXII

Whenever output is provided in natural language, English is used as the language by default. By specifying the + modifier, it is possible to switch the language to the language of the locale provided to procedure format. In fact, modifier + plays two different roles: If the given radix is greater than 10, upper case characters are used for representing alphabetic digits. If the radix is omitted, usage of modifier + enables locale-specific output determined by the locale: parameter of procedure format.

  (format 'de_DE "~+R" 572)fünf­hundert­zwei­und­siebzig
  (format 'de_CH "~10+R" 14321)14'321
  (format "~16R vs ~16+R" 900939 900939)dbf4b vs DBF4B

~f
~F
FIXED FLOAT:  ~w,d,k,overchar,padchar,groupchar,groupcolF

The next argument arg is output as a floating-point number in a fixed format (ideally without exponent) of exactly w characters, if w is specified. First, leading padchar characters (default: ' ') are output, if necessary, to pad the field on the left. If arg is negative, then a minus sign is printed. If arg is not negative, then a plus sign is printed if and only if the @ modifier was specified. Then a sequence of digits, containing a single embedded decimal point, is printed. If parameter d is provided, then exactly d decimal places are output. This represents the magnitude of the value of arg times 10k, rounded to d fractional digits. There are no leading zeros, except that a single zero digit is output before the decimal point if the printed value is less than 1.0, and this single zero digit is not output after all if w = d + 1.

If it is impossible to print the value in the required format in a field of width w, then one of two actions is taken: If the parameter overchar is specified, then w copies of this character are printed. If overchar is omitted, then the scaled value of arg is printed using more than w characters.

If the width parameter w is omitted, then the output is of variable width and a value is chosen for w in such a way that no leading padding characters are needed and exactly d characters will follow the decimal point. For example, the directive ~,2F will output exactly two digits after the decimal point and as many as necessary before the decimal point.

If d is omitted, then there is no constraint on the number of digits to appear after the decimal point. A value is chosen for d in such a way that as many digits as possible may be printed subject to the width constraint imposed by w and the constraint that no trailing zero digits may appear in the fraction, except that if the fraction is zero, then a single zero digit should appear after the decimal point if permitted by the width constraint.

If w is omitted, then if the magnitude of arg is so large (or, if d is also omitted, so small) that more than 100 digits would have to be printed, then arg is output using exponential notation instead.

The ~F directive also supports grouping of the integer part of arg; this can be enabled via the : modifier. groupchar (default: ',') specifies which character should be used to separate sequences of groupcol (default: 3) digits in the integer part of the output. If locale-specific settings should be used, the + modifier needs to be set.

  (format "~F" 123.1415926)123.1415926
  (format "~8F" 123.1415926)123.1416
  (format "~8,,,'-F" 123.1415926)123.1416
  (format "~8,,,'-F" 123456789.12)--------
  (format "~8,,,,'0F" 123.14)00123.14
  (format "~8,3,,,'0F" 123.1415926)0123.142
  (format "~,4F" 123.1415926)123.1416
  (format "~,2@F" 123.1415926)+123.14
  (format "~,2,-2@F" 314.15926)+3.14
  (format "~,2:F" 1234567.891)1,234,567.89
  (format "~,2,,,,'',3:F" 1234567.891)1'234'567.89

~e
~E
EXPONENTIAL FLOAT:  ~w,d,e,k,overchar,padchar,expcharE

The next argument arg is output as a floating-point number in an exponential format of exactly w characters, if w is specified. Parameter d is the number of digits to print after the decimal point, e is the number of digits to use when printing the exponent, and k is a scale factor that defaults to 1.

First, leading padchar (default: ' ') characters are output, if necessary, to pad the output on the left. If arg is negative, then a minus sign is printed. If arg is not negative, then a plus sign is printed if and only if the @ modifier was specified. Then a sequence of digits, containing a single embedded decimal point, is output. The form of this sequence of digits depends on the scale factor k. If k is zero, then d digits are printed after the decimal point, and a single zero digit appears before the decimal point. If k is positive, then it must be strictly less than d + 2 and k significant digits are printed before the decimal point, and d − k + 1 digits are printed after the decimal point. If k is negative, then it must be strictly greater than −d. A single zero digit appears before the decimal point and after the decimal point, first −k zeros are output followed by d + k significant digits.

Following the digit sequence, the exponent is output following character expchar (default: 'E') and the sign of the exponent, i.e. either the plus or the minus sign. The exponent consists of e digits representing the power of 10 by which the fraction must be multiplied to properly represent the rounded value of arg.

If it is impossible to print the value in the required format in a field of width w, then one of two actions is taken: If the parameter overchar is specified, then w copies of this character are printed instead of arg. If overchar is omitted, then arg is printed using more than w characters, as many more as may be needed. If d is too small for the specified k or e is too small, then a larger value is used for d or e as may be needed.

If the w parameter is omitted, then the output is of variable width and a value is chosen for w in such a way that no leading padding characters are needed.

  (format "~E" 31.415926)3.1415926E+1
  (format "~,5E" 0.0003141592)3.14159E-4
  (format "~,4,2E" 0.0003141592)3.1416E-04
  (format "~9E" 31.415926)3.1416E+1
  (format "~10,3,,,,'#E" 31.415926)##3.142E+1
  (format "~10,4,,3,,'#E" 31.415926)#314.16E-1
  (format "~7,3,2,,'-E" 31.415926)-------
  (format "~10,4,,4,,'#@E" 31.415926)+3141.6E-2

~g
~G
GENERAL FLOAT:  ~w,d,e,k,overchar,padchar,expcharG

The next argument arg is output as a floating-point number in either fixed-format or exponential notation as appropriate. The format in which to print arg depends on the magnitude (absolute value) of arg. Let n be an integer such that 10n−1arg < 10n. If arg is zero, let n be 0. Let ee equal e + 2, or 4 if e is omitted. Let ww equal wee, or nil if w is omitted. If d is omitted, first let q be the number of digits needed to print arg with no loss of information and without leading or trailing zeros; then let d equal max(q, min(n, 7)). Let dd equal d − n.

If 0 ≤ ddd, then arg is output as if by the format directives:
    ~ww,dd,,overchar,padcharF~ee@T
Note that the scale factor k is not passed to the ~F directive. For all other values of dd, arg is printed as if by the format directive:
    ~w,d,e,k,overchar,padchar,expcharE
In either case, an @ modifier is specified to the ~F or ~E directive if and only if one was specified to the ~G directive.

  (format "|~G|" 712.72)|712.72    |
  (format "|~12G|" 712.72)|  712.72    |
  (format "|~9,2G|~9,3,2,3G|~9,3,2,0G|" 0.031415 0.031415 0.031415)
    ⟹ |  3.14E-2|314.2E-04|0.314E-01|
  (format "|~9,2G|~9,3,2,3G|~9,3,2,0G|" 0.314159 0.314159 0.314159)
    ⟹ | 0.31    |0.314    |0.314    |
  (format "|~9,2G|~9,3,2,3G|~9,3,2,0G|" 3.14159 3.14159 3.14159)
    ⟹ |  3.1    | 3.14    | 3.14    |
  (format "|~9,2G|~9,3,2,3G|~9,3,2,0G|" 314.159 314.159 314.159)
    ⟹ |  3.14E+2|  314    |  314    |
  (format "|~9,2G|~9,3,2,3G|~9,3,2,0G|" 3141.59 3141.59 3141.59)
    ⟹ |  3.14E+3|314.2E+01|0.314E+04|

~$ DOLLARS FLOAT:  ~d,n,w,padchar,curchar,groupchar,groupcol$

The next argument arg is output as a floating-point number in a fixed-format notation that is particularly well suited for outputting monetary values. Parameter d (default: 2) defines the number of digits to print after the decimal point. Parameter n (default: 1) defines the minimum number of digits to print before the decimal point. Parameter w (default: 0) is the minimum total width of the output.

First, padding and the sign are output. If arg is negative, then a minus sign is printed. If arg is not negative, then a plus sign is printed if and only if the @ modifier was specified. If the : modifier is used, the sign appears before any padding, and otherwise after the padding. If the number of characters, including the sign and a potential currency symbol is below width w, then character padchar (default: ' ') is used for padding the number in front of the integer part such that the overall output has w characters. After the padding, the currency symbol curchar is inserted, if available, followed by n digits representing the integer part of arg, prefixed by the right amount of '0' characters. If either parameter groupchar or groupcol is provided, the integer part is output in groups of groupcol characters (default: 3) separated by groupchar (default: ','). After the integer part, a decimal point is output followed by d digits of fraction, properly rounded.

If the magnitude of arg is so large that the integer part of arg cannot be output with at most n characters, then more characters are generated, as needed, and the total width might overrun as well.

For cases where a simple currency symbol is not sufficient, it is possible to use a numeric currency code as defined by ISO 4217 for parameter curchar. For positive codes, the shortest currency symbol is being used. For negative currency codes, the corresponding alphabetic code (ignoring the sign) is being used. Library (lispkit system) provides a conventient API to access currency codes.

By specifying the + modifier, it is possible to enable locale-specific output of the monetary value using the locale provided to format. In this case, also the currency associated with this locale is being used.

  (format "~$" 4930.351)4930.35
  (format "~3$" 4930.351)4930.351
  (format "~,6$" 4930.351)004930.35
  (format "~,6,12,'_$" 4930.351)___004930.35
  (format "~,6,12,'_@$" 4930.351)__+004930.35
  (format "~,6,12,'_@:$" 4930.351)+__004930.35
  (format "~,6,12,'_,'€$" 4930.351)__€004930.35
  (format "~,6,12,'_,'€@$" 4930.351)_+€004930.35
  (format "~,,,,,,3$" 4930.351)4,930.35
  (format "~,6,,,,,3$" 4930.351)004,930.35
  (format "~,,,,208$" 1234.567)kr 1234.57
  (format "~,,,,-208$" 1234.567)DKK 1234.57
  (format 'de_CH "~+$" 4930.351)CHF 4930.35
  (format 'en_US "~,,,,,,3+$" 4930.351)$4,930.35
  (format 'de_DE "~,6,14,'_,,,3+$" 4930.351)__004.930,35 €

~% NEWLINE:  ~n%

This directive outputs n (default: 1) newline characters, thereby terminating the current output line and beginning a new one. No arguments are being consumed. Simply putting n newline escape characters \n into the control string would also work, but ~% is often used because it makes the control string look nicer and more consistent.

~& FRESHLINE:  ~n&

Unless it can be determined that the output is already at the beginning of a line, this directive outputs a newline if n > 0. This conditional newline is followed by n − 1 newlines, it n > 1. Nothing is output if n = 0.

~| PAGE SEPARATOR:  ~n|

This directive outputs n (default: 1) page separator characters #\page.

~~ TILDE:  ~n~

This directive outputs n (default: 1) tilde characters.

~p
~P
PLURAL:  ~P

Depending on the next argument arg, which is expected to be an integer value, a different string is output. If arg is not equal to 1, a lowercase s is output. If arg is equal to 1, nothing is output.

If the : modifier is provided, the last argument is used instead for arg. This is useful after outputting a number using ~D. With the @ modifier, y is output if arg is 1, or ies if it is not.

  (format "~D tr~:@P/~D win~:P" 7 1)7 tries/1 win
  (format "~D tr~:@P/~D win~:P" 1 0)1 try/0 wins

~t
~T
TABULATE:  ~colnum,colincT

This directive will output sufficient spaces to move the cursor to column colnum (default: 1). If the cursor is already at or beyond column colnum, the directive will output spaces to move the cursor to column colnum + k × colinc for the smallest positive integer k possible, unless colinc (default: 1) is zero, in which case no spaces are output if the cursor is already at or beyond column colnum.

If modifier @ is provided, relative tabulation is performed. In this case, the directive outputs colnum spaces and then outputs the smallest non-negative number of additional spaces necessary to move the cursor to a column that is a multiple of colinc. For example, the directive ~3,8@T outputs three spaces and then moves the cursor to a "standard multiple-of-eight tab stop" if not at one already. If the current output column cannot be determined, however, then colinc is ignored, and exactly colnum spaces are output.

~* IGNORE ARGUMENT:  ~n*

The next n (default: 1) arguments are ignored. If the : modifier is provided, arguments are "ignored backwards", i.e. ~:* backs up in the list of arguments so that the argument last processed will be processed again. ~n:* backs up n arguments. When within a ~{ construct, the ignoring (in either direction) is relative to the list of arguments being processed by the iteration.

The form ~n@* is an "absolute goto" rather than a "relative goto": the directive goes to the n-th argument, where 0 means the first one. n defaults to 0 for this form, so ~@* goes back to the first argument. Directives after a ~n@* will take arguments in sequence beginning with the one gone to. When within a ~{ construct, the "goto" is relative to the list of arguments being processed by the iteration.

~? INDIRECTION:  ~?

The next argument arg must be a string, and the one after it lst must be a sequence (e.g. an array). Both arguments are consumed by the directive. arg is processed as a format control string, with the elements of the list lst as the arguments. Once the recursive processing of the control string has been finished, then processing of the control string containing the ~? directive is resumed.

  (format "~? ~D" "[~A ~D]" '("Foo" 5) 7)[Foo 5] 7
  (format "~? ~D" "[~A ~D]" '("Foo" 5 14) 7)[Foo 5] 7

Note that in the second example, three arguments are supplied to the control string "(~A ~D)", but only two are processed and the third is therefore ignored.

With the @ modifier, only one argument is directly consumed. The argument must be a string. It is processed as part of the control string as if it had appeared in place of the ~@? directive, and any directives in the recursively processed control string may consume arguments of the control string containing the ~@? directive.

  (format "~@? ~D" "[~A ~D]" "Foo" 5 7)[Foo 5] 7
  (format "~@? ~D" "[~A ~D]" "Foo" 5 14 7)[Foo 5] 14

~(…~) CONVERSION:  ~(str~)

The contained control string str is processed, and what it produces is subject to a conversion. Without the + modifier, a case conversion is performed. ~( converts every uppercase character to the corresponding lowercase character, ~:( capitalizes all words, ~@( capitalizes just the first word and forces the rest to lowercase, and ~:@( converts every lowercase character to the corresponding uppercase character. In the following example, ~@( is used to cause the first word produced by ~R to be capitalized:

  (format "~@(~R~) error~:P" 0)Zero errors
  (format "~@(~R~) error~:P" 1)One error
  (format "~@(~R~) error~:P" 23)Twenty-three errors

If the + modifier is provided together with the : modifier, all characters corresponding to named XML entities are being converted into names XML entities. If modifier @ is added, then only those characters are converted which conflict with XML syntax. The modifier combination +@ converts the output by stripping off all diacritics. Modifier + only will escape characters such that the result can be used as a Scheme string literal.

  (format "~+:(~A~)" "© 2021–2023 TÜV")
    ⟹ &copy; 2021&ndash;2023 T&Uuml;V
  (format "~+:@(~A~)" "<a href=\"t.html\">© TÜV</a>")
    ⟹ &lt;a href=&quot;t.html&quot;&gt;© TÜV&lt;/a&gt;
  (format "~+@(~A~)" "épistèmê")
    ⟹ episteme
  (format "~+(~A~)" "Hello \"World\"\n")
    ⟹ Hello \"World\"\n

~[…~] CONDITIONAL:  ~[str0~;str1~;…~;strn~]

This is a set of control strings, called clauses, one of which is chosen and used. The clauses are separated by ~; and the construct is terminated by ~].

Without default:  From a conditional directive ~[str0~;str1~;…~;strn~], the arg-th clause is selected, where the first clause is number 0. If a prefix parameter is given as ~n[, then the parameter n is used instead of an argument. This is useful only if the parameter is specified by #, to dispatch on the number of arguments remaining to be processed. If arg or n is out of range, then no clause is selected and no error is signaled. After the selected alternative has been processed, the control string continues after the ~].

With default:  Whenever the directive has the form ~[str0~;str1~;…~:;default~], i.e. the last clause is separated via ~:;, then the conditional directive has a default clause which gets performed whenever no other clause could be selected.

Optional selector:  Whenever the directive has the form ~:[none~;some~] the none control string is chosen if arg is nil, otherwise the some control string is chosen.

Boolean selector:  Whenever the directive has the form ~+[false~;true~] the false control string is chosen if arg is the boolean value false, otherwise the some control string is chosen.

Selector test:  Whenever the directive has the form ~@[true~], the next argument arg is tested for being non-nil. If arg is not nil, then the argument is not used up by the ~@[ directive but remains as the next one to be processed, and the one clause true is processed. If arg is nil, then the argument is used up, and the clause is not processed. The clause therefore should normally use exactly one argument, and may expect it to be non-nil.

~{…~} ITERATION:  ~n{str~}

The iteration directive is used to control how a sequence is output. Thus, the next argument arg should be a sequence which is used as a list of arguments as if for a recursive call to format. The string str is used repeatedly as the control string until all elements from arg are consumed. Each iteration can absorb as many elements of arg as it needs. For instance, if str uses up two arguments by itself, then two elements of arg will get used up each time around the loop. If before any iteration step the sequence is empty, then the iteration is terminated. Also, if a prefix parameter n is given, then there will be at most n repetitions of processing of str. Finally, the ~^ directive can be used to terminate the iteration prematurely. If the iteration is terminated before all the remaining arguments are consumed, then any arguments not processed by the iteration remain to be processed by any directives following the iteration construct.

  (format "Winners:~{ ~A~}." '("Fred" "Harry" "Jill"))
    ⟹ Winners: Fred Harry Jill.
  (format "Winners: ~{~#[~;~A~:;~A, ~]~}." '("Fred" "Harry" "Jill"))
    ⟹ Winners: Fred, Harry, Jill.
  (format "Pairs:~{ <~A,~S>~}." '("A" 1 "B" 2 "C" 3))
    ⟹ Pairs: <A, 1> <B, 2> <C, 3>.

~:n,m{str~} is similar, but the argument should be a list of sublists. At each repetition step (capped by n), one sublist is used as the list of arguments for processing str with an iteration cap of m. On the next repetition, a new sublist is used, whether or not all elements of the last sublist had been processed.

  (format "Pairs:~:{ <~A,~S>~}." '(("A" 1) ("B" 2) ("C" 3)))
    ⟹ Pairs: <A, 1> <B, 2> <C, 3>.

~@{str~} is similar to ~{str~}, but instead of using one argument that is a sequence, all the remaining arguments are used as the list of arguments for the iteration.

  (format "Pairs:~@{ <~A,~S>~}." "A" 1 "B" 2 "C" 3)
    ⟹ Pairs: <A, 1> <B, 2> <C, 3>.

~:@{str~} combines the features of ~:{str~} and ~@{str~}. All the remaining arguments are used, and each one must be a sequence. On each iteration, the next argument is used as a list of arguments to str.

  (format "Pairs:~:@{ <~A,~S>~}." '("A" 1) '("B" 2) '("C" 3))
    ⟹ Pairs: <A, 1> <B, 2> <C, 3>.

Terminating the repetition directive with ~:} instead of ~} forces str to be processed at least once, even if the initial sequence is empty. However, it will not override an explicit prefix parameter of zero. If str is empty, then an argument is used as str. It must be a string and precede any arguments processed by the iteration.

~<…~> JUSTIFICATION:   ~mincol,colinc,minpad,padchar,maxcol,elchar<str~>

This directive justifies the text produced by processing control string str within a field which is at least mincol columns wide (default: 0). str may be divided up into segments via directive ~;, in which case the spacing is evenly divided between the text segments.

With no modifiers, the leftmost text segment is left-justified in the field and the rightmost text segment is right-justified. If there is only one text element, it is right-justified. The : modifier causes spacing to be introduced before the first text segment. The @ modifier causes spacing to be added after the last text segment. The minpad parameter (default: 0) is the minimum number of padding characters to be output between each segment. Whenever padding is needed, the padding character padchar (default: ' ') is used. If the total width needed to satisfy the constraints is greater than mincol, then the width used is mincol + k × colinc for the smallest possible non-negative integer k with colinc defaulting to 1.

  (format "|~10,,,'.<foo~;bar~>|")|foo....bar|
  (format "|~10,,,'.:<foo~;bar~>|")|..foo..bar|
  (format "|~10,,,'.:@<foo~;bar~>|")|..foo.bar.|
  (format "|~10,,,'.<foobar~>|")|....foobar|
  (format "|~10,,,'.:<foobar~>|")|....foobar|
  (format "|~10,,,'.@<foobar~>|")|foobar....|
  (format "|~10,,,'.:@<foobar~>|")|..foobar..|

Note that str may include format directives. All the clauses in str are processed in order. It is the resulting pieces of text that are justified. The ~^ directive may be used to terminate processing of the clauses prematurely, in which case only the completely processed clauses are justified.

If the first clause of a ~< directive is terminated with ~:; instead of ~;, then it is used in a special way. All of the clauses are processed, but the first one is not used in performing the spacing and padding. When the padded result has been determined, then, if it fits on the current line of output, it is output, and the text for the first clause is discarded. If, however, the padded text does not fit on the current line, then the text segment for the first clause is output before the padded text. The first clause ought to contain a newline (such as a ~% directive). The first clause is always processed, and so any arguments it refers to will be used. The decision is whether to use the resulting segment of text, not whether to process the first clause. If the ~:; has a prefix parameter n, then the padded text must fit on the current line with n character positions to spare to avoid outputting the first clause’s text.

For example, the control string in the following example can be used to print a list of items separated by comma without breaking items over line boundaries, beginning each line with ;;. The prefix parameter 1 in ~1:; accounts for the width of the comma that will follow the justified item if it is not the last element in the list, or the period if it is. If ~:; has a second prefix parameter, like below, then it is used as the width of the line, overriding the line width as specified by format's linewidth: parameter (default: 80).

  (format "~%;; ~{~<~%;; ~1,30:; ~S~>~^,~}.~%"
          '("first line" "second" "a long third line"
            "fourth" "fifth"))
    ⟹ 
         ;; "first line", "second",
         ;; "a long third line",
         ;; "fourth", "fifth".
         

If there is only one text segment str and parameter maxcol is provided and the length of the output of str is exceeding maxcol, then the output is truncated at width maxcol - 1 and the ellipsis character elchar (default: '…') is inserted at the end.

~^ UP AND OUT:  ~^

Continue:  The ~^ directive is an escape construct. If there are no more arguments remaining to be processed, then the immediately enclosing ~{ or ~< directive is terminated. If there is no such enclosing directive, then the entire formatting operation is terminated. In the case of ~<, the formatting is performed, but no more segments are processed before doing the justification. The ~^ directive should appear only at the beginning of a ~< clause, because it aborts the entire clause it appears in, as well as all following clauses. ~^ may appear anywhere in a ~{ construct.

  (format "Done.~^ ~D warning~:P.~^ ~D error~:P.")
    ⟹ Done.
  (format "Done.~^ ~D warning~:P.~^ ~D error~:P." 3)
    ⟹ Done. 3 warnings.
  (format "Done.~^ ~D warning~:P.~^ ~D error~:P." 1 5)
    ⟹ Done. 1 warning. 5 errors.

If the directive has the form ~n^, then termination occurs if n is zero. If the directive has the form ~n,m^, termination occurs if the value of n equals the value of m. If the directive has the form ~n,m,o^, termination occurs if nmo. Of course, this is useless if all the prefix parameters are literals. At least one of them should be a # or a v parameter.

Break:  If ~^ is used within a ~:{ directive, then it merely terminates the current iteration step because in the standard case, it tests for remaining arguments of the current step only and the next iteration step commences immediately. To terminate the entire iteration process, use ~:^. ~:^ may only be used if the directive it would terminate is ~:{ or ~:@{. The entire iteration process is terminated if and only if the sublist that is supplying the arguments for the current iteration step is the last sublist (in the case of terminating a ~:{ directive) or the last argument to that call to format (in the case of terminating a ~:@{ directive).

Note that while ~^ is equivalent to ~#^ in all circumstances, ~:^ is not equivalent to ~#:^ because the latter terminates the entire iteration if and only if no arguments remain for the current iteration step, as opposed to no arguments remaining for the entire iteration process.

  (format "~:{/~A~^ …~}",
          '(("hot" "dog") ("hamburger") ("ice" "cream") ("french" "fries")))
    ⟹ /hot …/hamburger/ice …/french …
  (format "~:{/~A~:^ …~}"
          '(("hot" "dog") ("hamburger") ("ice" "cream") ("french" "fries")))
    ⟹ /hot …/hamburger …/ice …/french
  (format "~:{/~A~#:^ …~}"
          '(("hot" "dog") ("hamburger") ("ice" "cream") ("french" "fries")))
    ⟹ /hot …/hamburger

~`…~‘ UPACK:  ~`str~‘

This directive is used to format composite objects, such as rational numbers, complex numbers, colors, date-time objects, error objects, records, etc. Such objects get decomposed into a sequence of individual values which are formatted by the str control string.

The next argument arg can be any Scheme object. If there is a decomposition predefined for this type of objects, it is applied to arg and str is used to format the resulting sequence of values. If no decomposition is possible, str is output assuming there is one argument arg.

  (format "~S~:* = ~`(~S, ~S)~‘" 17/3)17/3 = (17, 3)
  (format "Bits =~`~*~{ ~D~}~‘" (bitset 1 2 7))Bits = 1 2 7
  (format "Color: ~`R=~F, G=~F, B=~F~‘" (color 0.3 1.0 0.74))
    ⟹ Color: R=0.3, G=1.0, B=0.74

Formatting configurations

A few formatting directives provided by procedure format require access to environment variables such as the locale, the width of tab characters, the length of lines, etc. Also the type-specific customization of the formatting of native and user-defined objects, e.g. via the ~S directive, is based on a formatting control registry defined by an environment variable.

All relevant environment variables are bundled together into format config objects. Format configurations are organized hierarchically. Each format configuration optionally refers to a parent configuration. It inherits all environment variables and allows their values to be overridden.

The root of this format configuration hierarchy constitutes base-format-config. Typically, changes to this object impact all invocations of format, unless format is called with a custom format config object which is not derived from base-format-config. Without a custom format config, format reads the environment variables from the current format config parameter current-format-config (which, by default, inherits from base-format-config). Like every other parameter object, it is possible to define a new config dynamically via parameterize.

Format config objects are also used in combination with type-specific formatting as provided by the ~S directive, as explained in the next section.

Type-specific formatting

Procedure format provides great means to format numbers, characters, strings, as well as sequences, i.e. lists and vectors. But as soon as values of data types encapsulating their state have to be output, only the default textual representation is supported, which is also used when a value is output via procedure write.

For this reason, procedure format supports the customization of how composite objects are formatted. The approach for doing this is simple: Internally, a composite object can be mapped ("unpacked") into a vector of "field values". These field values are then interpreted as arguments for an object type-specific control string which defines how the field values of such objects are formatted. If there is no object type-specific control string available, the object is output as if it was written via procedure write.

The following example shows how to customize the formatting of objects defined by a record type. The following record is used to model colored 2-dimensional points:

(define-record-type <point>
  (make-point x y c)
  point?
  (x point-x)
  (y point-y)
  (c point-color))

By default, objects of type <point> are output in the following way:

(define pt (make-point 7 13 (color 0.5 0.9 0)))
(format "~S" pt)
 ⇒ "#<record <point>: x=7, y=13, c=#<color 0.5 0.9 0.0>>"

LispKit defines a type tag for every type. This type tag will later be used to define a custom format for records of type <point>. We can retrieve the type tag for type <point> via procedure record-type-tag:

(define point-type-tag (record-type-tag <point>))

Now we can define a custom format for objects of type <point> in which we refer to the unpacked fields in the order as defined in the <point> record type definition following a fixnum value denoting the identity of the record. The following control string formats <point> records in this way: point{x=?,y=?,color=?}. Note that it skips the record identity via the ~* directive.

"point{x=~*~S,y=~S,c=~S}"

format refers to a number of environment variables via a formatting configuration (see previous section). The default configuration is defined by definition base-format-config and it includes custom type-specific formats. With procedure format-config-control-set! we can declare that all objects of type <point> should be formatted with the control string shown above:

(format-config-control-set!
  base-format-config
  point-type-tag
  "point{x=~*~S,y=~S,c=~S}")

Formatting records of type <point> via the ~S directive is now based on this new control string.

(format "~S" pt)
 ⇒ "point{x=7,y=13,c=#<color 0.5 0.9 0.0>}"

If we wanted to also change how colors are formatted, we could do that in a similar way:

(format-config-control-set!
  base-format-config
  color-type-tag
  "color{~S, ~S, ~S}")

Now colors are formatted differently:

(format "~S" pt)  ⇒ "point{x=7,y=13,c=#<color 0.5 0.9 0.0>}"
(format "~S" (color 1.0 0.3 0.7))  ⇒ "color{1.0, 0.3, 0.7}"

If we wanted to change the way how colors are formatted only in the context of formatting points, we could do that by creating a formatting configuration for colors and associate it only with the formatting control string for points. The following code first removes the global color format so that colors are formatted again using the default mechanism. Then it redefines the formatting control for points by also specifying a format configuration that is used while applying the point formatting control string.

(format-config-control-remove! base-format-config color-type-tag)
(format-config-control-set!
  base-format-config
  point-type-tag 
  "point{x=~*~S,y=~S,c=~S}"
  (format-config (list color-type-tag "color{~S, ~S, ~S}")))
(format "~S" (color 1.0 0.3 0.7))  ⇒ "#<color 1.0 0.3 0.7>"
(format "~S" pt)  ⇒ "point{x=7,y=13,c=color{0.5, 0.9, 0.0}}"

API

format-config-type-tag     [constant]

Symbol representing the format-config type. The type-for procedure of library (lispkit type) returns this symbol for all formatting configurations objects.

base-format-config     [procedure]

Formatting configurations can have parent configurations from which all formatting environment variables are being inherited. base-format-config is the root formatting configuration for repl-format-config and current-format-config.

repl-format-config     [procedure]

The formatting configuration that a read-eval-print loop might use for displaying the result of an evaluation. Initially, repl-format-config is set to an empty formatting configuration with parent base-format-config.

current-format-config     [parameter object]

Parameter object referring to the current formatting configuration that is used as a default whenever no specific formatting configuration is specified, e.g. by procedure format. Initially, current-format-config is set to an empty formatting configuration with parent base-format-config.

(format [port] [config] [locale] [tabw [linew]] cntrl arg ...)     [procedure]

format is the universal formatting procedure provided by library (lispkit format). format creates formatted output by outputting the characters of the control string cntrl while interpreting formatting directives embedded in cntrl. Each formatting directive is prefixed with a tilde which might be preceded by formatting parameters and modifiers. The next character identifies the formatting directive and thus determines what output is being generated by the directive. Most directives use one or more arguments arg as input.

Formatting configuration config defines environment variables influencing the output of some formatting directives. If config is not provided, the formatting configuration from parameter object current-format-config is used. For convenience, some environment variables, such as locale, can be overridden if they are provided when format is being invoked. locale refers to a locale identifier like en_US that is used by locale-specific formatting directives. tabw defines the maximum number of space characters that correspond to a single tab character. linew specifies the number of characters per line; this is used by the justification directive only.

(format-config? obj)     [procedure]

Returns #t if obj is a formatting configuration; otherwise #f is returned.

(format-config [parent] [locale] [tabw [linew]] (tag cntrl [config]) ...)     [procedure]

Creates a new formatting configuration with parent as parent configuration. If parent is not provided explicitly, current-format-config is used. If parent is #f, the new formatting configuation will not have a parent configuration. locale refers to a locale identifier like en_US that is used by locale-specific formatting directives. tabw defines the maximum number of space characters that correspond to a single tab character. linew specifies the maximum number of characters per line.

(make-format-config parent)     [procedure]
(make-format-config parent locale)
(make-format-config parent locale tabw)
(make-format-config parent locale tabw linew)

Creates a new formatting configuration with parent as parent configuration. If parent is #f, the new formatting configuation does not have a parent configuration. The remaining arguments define overrides for the environment variables inherited from parent.

locale refers to a locale identifier like en_US that is used by locale-specific formatting directives. tabw defines the maximum number of space characters that correspond to a single tab character. linew specifies the maximum number of characters per line.

(copy-format-config config)     [procedure]
(copy-format-config config collapse?)

Returns a copy of formatting configuration config. If either collapse? is omitted or set to #f, a 1:1 copy of config is being made. If collapse? is set to true, a new format config without parent configuration is created which contains the same values for the supported formatting environment variables as config.

(merge-format-config child parent)     [procedure]

Merges the format configurations child and parent by creating a new collapsed copy of child whose parent configuration parent is.

(format-config-locale)     [procedure]
(format-config-locale config)

Returns the locale defined by format configuration config. If config defines a locale itself, it is being returned. Otherwise, the locale of the parent configuration of config gets returned. If config is not provided, the default configuration current-format-config is used.

(format-config-locale-set! locale)     [procedure]
(format-config-locale-set! config locale)

Sets the locale of the format configuration config to locale. If locale is #f, the locale setting gets removed from config (but might still get inherited from config's parents). If config is not provided, the default configuration current-format-config gets mutated.

(format-config-tabwidth)     [procedure]
(format-config-tabwidth config)

Returns the width of a tab character in terms of space characters defined by format configuration config. If config defines a tab width itself, it is being returned. Otherwise, the tab width of the parent configuration of config gets returned. If config is not provided, the default configuration current-format-config is used.

(format-config-tabwidth-set! tabw)     [procedure]
(format-config-tabwidth-set! config tabw)

Sets the tab width of the format configuration config to tabw. If tabw is #f, the tab width setting gets removed from config (but might still get inherited from config's parents). If config is not provided, the default configuration current-format-config gets mutated. The "tab width" is the maximum number of space characters representing one tab character.

(format-config-linewidth)     [procedure]
(format-config-linewidth config)

Returns the maximum number of characters per line defined by format configuration config. If config defines a line width itself, it is being returned. Otherwise, the line width of the parent configuration of config gets returned. If config is not provided, the default configuration current-format-config is used.

(format-config-linewidth-set! linew)     [procedure]
(format-config-linewidth-set! config linew)

Sets the line width of the format configuration config to linew. If linew is #f, the line width setting gets removed from config (but might still get inherited from config's parents). If config is not provided, the default configuration current-format-config gets mutated. The "line width" is the maximum number of characters per line.

(format-config-control-set! tag cntrl)     [procedure]
(format-config-control-set! tag cntrl sconf)
(format-config-control-set! config tag cntrl)
(format-config-control-set! config tag cntrl sconf)

Declares for formatting configuration config that objects whose type has type tag tag are being formatted with control string cntrl by formatting directive ~S. If formatting configuration sconf is provided, it is used as a type-specific configuration that is merged with the current configuration when ~S formats objects of type tag tag. If cntrl is #f, type-specific formatting rules for tag are being removed from conf (but might still be inherited from the parent of conf). If cntrl is #t, native formatting is being forced for tag, no matter what is inherited from the parent of config. If config is not provided, the default configuration current-format-config gets mutated.

(format-config-control-remove! tag)     [procedure]
(format-config-control-remove! config tag)

Removes any type-specific formatting with directive ~S for objects whose type has tag tag from formatting configuration config. If config is not provided, the default configuration current-format-config gets mutated.

(format-config-controls)     [procedure]
(format-config-controls config)

Returns a list of type tags, i.e. symbols, for which there is a type-specific formatting control string defined by formatting configuration config or its parents. If config is not provided, the default configuration current-format-config gets mutated.

(format-config-parent)     [procedure]
(format-config-parent config)

Returns the parent configuration of format configuration config. If config is not provided, the default configuration current-format-config is used. format-config-parent returns #f if config does not have a parent formatting configuration.