Skip to content

Latest commit

 

History

History
294 lines (251 loc) · 17.7 KB

compound_strings_characters.md

File metadata and controls

294 lines (251 loc) · 17.7 KB

Compound Strings and non-ASCII characters

Link to the main program: placemat.ps. And link to the stand-alone PostScript program which logs all the glyphs from the font specified in its line 15.

Links to documentation: ▶︎ Introduction, and a first placemat  ▶︎ Fonts and glass decoration  ▷︎ Compound Strings and non‑ASCII characters  ▶︎ Page‑level controls  ▶︎ Arrangement of glasses on the page  ▶︎ Non‑Glasses Pages  ▶︎ Document‑level controls  ▶︎ Type sizes  ▶︎ Translations  ▶︎ Code injection  ▶︎ Bitmap images  ▶︎ Debugging


Some wanted text is more complicated than a plain ASCII string. PostScript is an early-1980s language, and one of the ways in which it shows its age is its incompatibility with Unicode. This document shows how to access non-ASCII characters, and the more general concept of a compound string.

Strings, and compound strings

The basic unit of Postscript text is the string, which are delimited with round brackets.

(This is an unexciting PostScript string.)

Strings can contain any of the ninety-five printable ASCII characters, so those numbered 32–126.

Other characters can be accessed by their name. E.g., /acircumflex is “â”; /dagger is “†“; /sterling is “£“; and /fi is the ligature “fi“, a single character which joins the ‘f’ and ‘i’ without an ugly collision between the tail of the former and the dot of the latter.

These concepts have been unified and extended by the placemat software into a ‘compound string’. A compound string is any of:

  • A string;
  • A glyph name;
  • Code, which can either change the graphics state, or leave compound strings on the stack;
  • An array, so bounded by square brackets = [], containing other compound strings.

Some examples:

Compound string Rendered as
/dagger
[/daggerdbl]
/sterling £
/dollar $
($) $
[ [/yen] ] ¥
[(C) /aacute (lem)] Cálem
[(Roz) /egrave (s)] Rozès
[(Po) /ccedilla (as)] Poças
[(Quinta do Bom) /fi (m)] Quinta do Bomfim
[(Croft Quinta da Ro) /ecircumflex (da S) /emacron (rikos)] Croft Quinta da Roêda Sērikos
[(Tinta C) /atilde (o)] Tinta Cão
[(Ch) /acircumflex (teau L) /eacute (oville-Barton)] Château Léoville-Barton
[(Mo) /edieresis (t & Chandon)] Moët & Chandon
(JDAW) ‘JDAW’, not kerned
[(JDA) {-0.06 Kern} (W)] ‘JDAW’, kerned

These examples are not entirely idle. Five glasses fit very elegantly on one sheet of A4. So if there are four wines, a spare circle is labelled †. If there are eight glasses on 2×A4, the spares are † ‡. Seven glasses on 2×A4 have three spares, usually £ $ ¥.

The last two rows of the table show the effect of Kerning, code being in curly brackets {}, the number being the horizontal movement as a proportion of the font size (font = /DejaVuSerif).

Some fonts have thousands of glyphs. Other fonts have fewer: not every glyph is present in every font. E.g., /emacron = ē is present in the fonts of the /TrebuchetMS family, but not in (my computer’s version of) the /Garamond family, nor in many other fonts. Indeed, /emacron is missing from so many fonts that the text on Croft’s website doesn’t use it consistently, even though the “ē” appears on the labels of Sērikos bottles. If a glyph is missing from a font then there should be a warning on the log page, but, anyway, check your output.

There follow a selection of other glyphs that might be useful to users of the placemat software.

Glyph names

Punctuation:     ‘ /quoteleft     ’ /quoteright     “ /quotedblleft     ” /quotedblright     … /ellipsis     – /endash     — /emdash     ‹ /guilsinglleft     › /guilsinglright     ‚ /quotesinglbase     „ /quotedblbase

Ligatures and letters:     fi /fi     fl /fl     æ /ae     Æ /AE     œ /oe     Œ /OE     ß /germandbls    

Symbols:     †︎ /dagger     ‡︎ /daggerdbl     ◊︎ /lozenge     •︎ /bullet     ·︎ /periodcentered     §︎ /section     ©︎ /copyright     ®︎ /registered     ™︎ /trademark     ♠︎ /spade     ♥︎ /heart     ♦︎ /diamond     ♣︎ /club

Currencies, fractions, maths:     £︎ /sterling     €︎ /Euro     ¥︎ /yen     ₩︎ /won     ¢︎ /cent     ½︎ /onehalf     ¼︎ /onequarter     ¾︎ /threequarters     ⅛︎ /oneeighth     ⅜︎ /threeeighths     ⅔︎ /twothirds     ≈︎ /approxequal     ≥︎ /greaterequal     ≤︎ /lessequal     ×︎ /multiply     ÷︎ /divide

Greeks:     α /alpha     β /beta     γ /gamma     δ /delta     ε /epsilon     ζ /zeta     η /eta     θ /theta     ι /iota     κ /kappa     λ /lambda     μ /mu     ν /nu     ξ /xi     ο /omicron     π /pi     ρ /rho     σ /sigma     τ /tau     υ /upsilon     φ /phi     χ /chi     ψ /psi     ω /omega     Α /Alpha     …     Ω /Omega

Arrows:     →︎ /arrowright     ←︎ /arrowleft     ↑︎ /arrowup     ↓︎ /arrowdown     ↔︎ /arrowboth     ⇒︎ /arrowdblright     ⇐︎ /arrowdblleft     ⇑︎ /arrowdblup     ⇓︎ /arrowdbldown     ⇔︎ /arrowdblboth     ►︎ /triagrt     ◄︎ /triaglf     ▲︎ /triagup     ▼︎ /triagdn

Accents and diacritics

Some wines, and some people, have accents in their names.

In PostScript the name of an accented character consists of a base, and a diacritic. The name of the glyph is the former followed by the latter. For example, ã = /atilde and à= /Atilde. Observe the case sensitivity.

The table shows the names of the diacritics, and with which base characters which of these are available in the font /TimesNewRomanPS-BoldMT.

a A e E i I o O u U y Y c C g G h H l L n N r R s S t T w W z Z
acute á Á é É í Í ó Ó ú Ú ý Ý ć Ć ĺ Ĺ ń Ń ŕ Ŕ ś Ś ź Ź
circumflex â Â ê Ê î Î ô Ô û Û ŷ Ŷ ĉ Ĉ ĝ Ĝ ĥ Ĥ ŝ Ŝ ŵ Ŵ
grave à À è È ì Ì ò Ò ù Ù
dieresis ä Ä ë Ë ï Ï ö Ö ü Ü ÿ Ÿ
tilde ã Ã ĩ Ĩ õ Õ ũ Ũ ñ Ñ
breve ă Ă ĕ Ĕ ĭ Ĭ ŏ Ŏ ŭ Ŭ ğ Ğ
macron ā Ā ē Ē ī Ī ō Ō ū Ū
ogonek ą Ą ę Ę į Į ų Ų
caron ě Ě č Č ǧ Ǧ ľ Ľ ň Ň ř Ř š Š ť Ť ž Ž
cedilla ç Ç ģ Ģ ļ Ļ ņ Ņ ŗ Ŗ ş Ş
dot ė Ė İ ŀ Ŀ
dotaccent ċ Ċ ġ Ġ ż Ż
hungarumlaut ő Ő ű Ű
slash ø Ø ł Ł
ring å Å ů Ů
bar ħ Ħ ŧ Ŧ
commaaccent ş Ş ţ Ţ

More accents and diacritics:     ǽ /aeacute     Ǽ /AEacute     ǻ /aringacute     Ǻ /Aringacute     ď /dcaron     Ď /Dcaron     đ /dcroat     Đ /Dcroat     ĵ /jcircumflex     Ĵ /Jcircumflex     ķ /kcedilla     Ķ /Kcedilla     ʼn /napostrophe

More extended Latin:     ı /dotlessi     þ /thorn     Þ /Thorn     ð /eth     Р/Eth     ŋ /eng     Ŋ /Eng     ĸ /kgreenlandic

Unicode

Some fonts allow access to some characters by their unicode hexadecimal number. Typically four hex digits are prefixed with “/uni”; five hex digits are prefixed with “/u”. E.g., /uni1D00 = “ᴀ” = small‑caps A; /u1D538 = “𝔸” = double‑struck A.

Lists and errors

Not all fonts have all glyphs, and not all fonts allow Unicode-style naming. Indeed, more strongly, few fonts have most glyphs. So the usual advice applies: carefully check the output.

Attempts to use a non-existent glyph are logged, but some fonts have the glyph name but not a proper glyph. E.g., several glyphs in Harrington paint as Missing glyph in Harrington font. So, as an assist, there is a stand-alone PostScript program which logs all the glyphs from the font specified in its line 15, and shows them in a simple PDF. Some examples follow.