Fetching contributors…
Cannot retrieve contributors at this time
4961 lines (4088 sloc) 195 KB
\input texinfo.tex @c -*- texinfo -*-
@c %**start of header (This is for running Texinfo on a region.)
@settitle GNU Smalltalk User's Guide
@setchapternewpage odd
@c %**end of header (This is for running Texinfo on a region.)
@c ******************************************* Values and macros *********
@include vers-gst.texi
@end ifclear
@macro bulletize{a}
@end macro
@set SMILE ;-)
@end ifinfo
@set SMILE
@end ifnotinfo
@c Preferred layout than @uref's
@macro hlink{url, link}
\link\@footnote{\link\, \url\}
@end macro
@macro mailto{mail}
@end macro
@unmacro hlink
@unmacro mailto
@macro hlink{url, link}
@uref{\url\, \link\}
@end macro
@macro mailto{mail}
@uref{mailto:\mail\, , \mail\}
@end macro
@macro url{url}
@end macro
@end ifhtml
@macro gst{}
@sc{gnu} Smalltalk
@end macro
@macro gnu{}
@end macro
@dircategory Software development
* Smalltalk: (gst). The @gst{} user's guide.
@end direntry
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled
``GNU Free Documentation License''.
@end quotation
@end copying
@title @sc{gnu} Smalltalk User's Guide
@subtitle Version @value{VERSION}
@subtitle @value{UPDATE-MONTH}
@author by Steven B. Byrne, Paolo Bonzini, Andy Valencia.
@comment The following two commands start the copyright page.
@vskip 0pt plus 1filll
@end titlepage
@node Top, , , (DIR)
This document describes installing and operating the @gst{}
programming environment.
@end ifnottex
* Overview:: What @gst{} is.
* Using GNU Smalltalk:: Running @gst{}.
* Features:: A description of @gst{}'s special features.
* Packages:: An easy way to install Smalltalk code into an image.
* Emacs:: @gst{} and Emacs.
* C and Smalltalk:: @gst{}'s C/Smalltalk interoperability features.
* Tutorial:: An introduction to Smalltalk and OOP.
--- The detailed node listing ---
Using GNU Smalltalk:
* Invocation:: What you can specify on the command line.
* Operation:: A step-by-step description of the
startup process and a short description of
how to interact with @gst{}.
* Syntax:: A description of the input file syntax
* Test suite:: How to run the test suite system.
* Legal concerns:: Licensing of GNU Smalltalk
* Command-line processing:: Picking an image path and a kernel path.
* Loading or creating an image:: Loading an image or creating a new one.
* Starting the system:: After the image is created or restored.
Legal concerns:
* GPL:: Complying with the GNU GPL.
* LGPL:: Complying with the GNU LGPL.
* Extended streams:: Extensions to streams, and generators
* Regular expressions:: String matching extensions
* Namespaces:: Avoiding clashes between class names.
* Disk file-IO:: Methods for reading and writing disk files.
* Object dumping:: Methods that read and write objects in binary format.
* Dynamic loading:: Picking external libraries and modules at run-time.
* Documentation:: Automatic documentation generation.
* Memory access:: The direct memory accessing classes and methods, plus
broadcasts from the virtual machine.
* GC:: The @gst{} memory manager.
* Security:: Sandboxing and access control.
* Special objects:: Methods to assign particular properties to objects.
* GTK and VisualGST: GUI.
* Parser, STInST, Compiler: Smalltalk-in-Smalltalk.
* DBI: Database.
* I18N: Locales.
* Seaside: Seaside.
* Swazoo: Swazoo.
* SUnit: SUnit.
* Sockets, WebServer, NetClients: Network support.
* XML, XPath, XSL: XML.
* Other packages: Other packages.
* Editing:: Autoindent and more for @gst{}.
* Interactor:: Smalltalk interactor mode.
C and Smalltalk:
* External modules:: Linking your libraries to the virtual machine
* C callout:: Calls from Smalltalk to C
* C data types:: Manipulating C data from Smalltalk
* Smalltalk types:: Manipulating Smalltalk data from C
* Smalltalk callin:: Calls from C to Smalltalk
* Object representation:: Manipulating your own Smalltalk objects
* Incubator:: Protecting newly created objects from garbage
* Other C functions:: Handling and creating OOPs
* Using Smalltalk:: The Smalltalk environment as an extension library
* Getting started:: Starting to explore @gst{}
* Some classes:: Using some of the Smalltalk classes
* The hierarchy:: The Smalltalk class hierarchy
* Creating classes:: Creating a new class of objects
* Creating subclasses:: Adding subclasses to another class
* Code blocks (I):: Control structures in Smalltalk
* Code blocks (II):: Guess what? More control structures
* Debugging:: Things go bad in Smalltalk too!
* More subclassing:: Coexisting in the class hierarchy
* Streams:: Something really powerful
* Exception handling:: More sophisticated error handling
* Behind the scenes:: Some nice stuff from the Smalltalk innards
* And now:: Some final words
* The syntax:: For the most die-hard computer scientists
@end detailmenu
@end menu
@node Overview
@unnumbered Introduction
@gst{} is an implementation that closely follows the Smalltalk-80
language as described in the book @cite{Smalltalk-80: the Language
and its Implementation} by Adele Goldberg and David Robson, which
will hereinafter be referred to as @cite{the Blue Book}.
The Smalltalk programming language is an object oriented programming
language. This means, for one thing, that when programming you are
thinking of not only the data that an object contains, but also of the
operations available on that object. The object's data representation
capabilities and the operations available on the object are
``inseparable''; the set of things that you can do with an object is defined
precisely by the set of operations, which Smalltalk calls @dfn{methods},
that are available for that object: each object belongs to a @dfn{class}
(a datatype and the set of functions that operate on it) or, better, it
is an @dfn{instance} of that class. You cannot even examine the
contents of an object from the outside---to an outsider, the object is a
black box that has some state and some operations available, but that's
all you know: when you want to perform an operation on an object, you
can only send it a @dfn{message}, and the object picks up the method
that corresponds to that message.
In the Smalltalk language, everything is an object. This includes not
only numbers and all data structures, but even classes, methods,
pieces of code within a method (@dfn{blocks} or @dfn{closures}), stack
frames (@dfn{contexts}), etc. Even @code{if} and @code{while} structures
are implemented as methods sent to particular objects.
Unlike other Smalltalks (including Smalltalk-80), @gst{} emphasizes
Smalltalk's rapid prototyping features rather than the graphical and
easy-to-use nature of the programming environment (did you know that the
first GUIs ever ran under Smalltalk?). The availability of a large body of
system classes, once you master them, makes it pretty easy to write complex
programs which are usually a task for the so called @dfn{scripting
languages}. Therefore, even though we have a @sc{gui} environment based
on GTK (@pxref{GUI, , GTK and VisualGST}), the goal of the @gst{} project is
currently to produce a complete system to be used to write your scripts in a
clear, aesthetically pleasing, and philosophically appealing programming
An example of what can be obtained with Smalltalk in this novel way can be
found in @ref{Top, , Class reference, gst-libs, the @gst{} Library
Reference}. That part of the manual is entirely generated by a Smalltalk
program, starting from the source code for the class libraries
distributed together with the system.
@node Using GNU Smalltalk
@chapter Using @gst{}
* Invocation:: What you can specify on the command line.
* Operation:: A step-by-step description of the
startup process and a short description of
how to interact with @gst{}.
* Syntax:: A description of the input file syntax
* Test suite:: How to run the test suite system.
* Legal concerns:: Licensing of GNU Smalltalk
@end menu
@node Invocation
@section Command line arguments
The @gst{} virtual machine may be invoked via the following command:
gst [ flags @dots{} ] [ file @dots{} ]
@end example
When you invoke @gst{}, it will ensure that the binary image file
(called @file{}) is up to date; if not, it will build a new one as
described in @ref{Loading or creating an image,, Loading an image or
creating a new one}. Your first invocation should look something like
"Global garbage collection... done"
@gst{} ready
@end display
If you specify one or more @var{file}s, they will be read and executed
in order, and Smalltalk will exit when end of file is reached. If you
don't specify @var{file}, @gst{} reads standard input, issuing a
@samp{st>} prompt if the standard input is a terminal. You may specify
@option{-} for the name of a file to invoke an explicit read from
standard input.
@cindex saving
@cindex quitting
@cindex exiting
@findex quit
@findex snapshot
To exit while at the @samp{st>} prompt, use @kbd{Ctrl-d}, or type
@kbd{ObjectMemory quit} followed by @key{RET}. Use @kbd{ObjectMemory
snapshot} first to save a new image that you can reload later, if you
As is standard for @acronym{GNU}-style options, specifying @option{--}
stops the interpretation of options so that every argument that follows
is considered a file name even if it begins with a @samp{-}.
You can specify both short and long flags; for example, @option{--version}
is exactly the same as @option{-v}, but is easier to remember. Short
flags may be specified one at a time, or in a group. A short flag or a
group of short flags always starts off with a single dash to indicate
that what follows is a flag or set of flags instead of a file name; a
long flag starts off with two consecutive dashes, without spaces between
In the current implementation the flags can be intermixed with file
names, but their effect is as if they were all specified first. The
various flags are interpreted as follows:
@table @option
@item -a
@itemx --smalltalk-args
@findex arguments
Treat all options afterward as arguments to be given to Smalltalk code
retrievable with @code{Smalltalk arguments}, ignoring them as arguments
to @gst{} itself.
@multitable {@option{--verbose -aq -c}} {Options seen by @sc{gnu} Smalltalk} {@code{Smalltalk arguments}}
@item command line
@tab Options seen by @gst{}
@tab @code{Smalltalk arguments}
@item (empty)
@tab (none)
@tab @code{#()}
@item @option{-Via foo bar}
@tab @option{-Vi}
@tab @code{#('foo' 'bar')}
@item @option{-Vai test}
@tab @option{-Vi}
@tab @code{#('test')}
@item @option{-Vaq}
@tab @option{-Vq}
@tab @code{#()}
@item @option{--verbose -aq -c }
@tab @option{--verbose -q}
@tab @code{#('-c')}
@end multitable
@item -c
@itemx --core-dump
When a fatal signal occurs, produce a core dump before terminating.
Without this option, only a backtrace is provided.
@item -D
@itemx --declaration-trace
Print the class name, the method name, and the byte codes that the
compiler generates as it compiles methods. Only applies to files that
are named explicitly on the command line, unless the flag is given
multiple times on the command line.
@item -E
@itemx --execution-trace
Print the byte codes being executed as the interpreter operates. Only
works for statements explicitly issued by the user (either interactively
or from files given on the command line), unless the flag is given
multiple times on the command line.
This option is disabled when the dynamic
translator (@pxref{Dynamic translator}) is enabled.
@end ignore
@item --kernel-directory
Specify the directory from which the kernel source files will be loaded.
This is used mostly while compiling @gst{} itself. Smalltalk code can
retrieve this information with @code{Directory kernel}.
@item --no-user-files
Don't load any files from @file{~/.st/} (@pxref{Loading or creating an
image,, Loading an image or creating a new one}).@footnote{The directory
would be called @file{_st/} under MS-DOS. Under OSes that don't use
home directories, it would be looked for in the current directory.}
This is used mostly while compiling @gst{} itself, to ensure that the
installed image is built only from files in the source tree.
@item -K @var{file}
@itemx --kernel-file @var{file}
Load @var{file} in the usual way, but look for it relative to the kernel
directory's parent directory, which is usually
@file{/usr/local/share/smalltalk/}. See @option{--kernel-dir} above.
@cindex shell scripts
@item -f
@itemx --file
The following two command lines are equivalent:
gst -f @var{file} @file{args...}
gst -q @var{file} -a @file{args...}
@end example
This is meant to be used in the so called ``sharp-bang'' sequence at the
beginning of a file, as in
#! /usr/bin/gst -f
@r{@i{@dots{} Smalltalk source code @dots{}}}
@end example
@gst{} treats the first line as a comment, and the @option{-f} option
ensures that the arguments are passed properly to the script. Use this
instead to avoid hard-coding the path to @command{gst}:@footnote{The
words in the shell command @command{exec} are all quoted, so GNU
Smalltalk parses them as five separate comments.}
#! /bin/sh
"exec" "gst" "-f" "$0" "$@@"
@r{@i{@dots{} Smalltalk source code @dots{}}}
@end example
@item -g
@itemx --no-gc-messages
Suppress garbage collection messages.
@item -h
@itemx --help
Print out a brief summary of the command line syntax of @gst{},
including the definitions of all of the option flags, and then exit.
@item -i
@itemx --rebuild-image
Always build and save a new image file; see @ref{Loading or creating an
image,, Loading an image or creating a new one}.
@item --maybe-rebuild-image
Perform the image checks and rebuild as described in @ref{Loading or
creating an image,, Loading an image or creating a new one}. This is
the default when @option{-I} is not given.
@cindex image path
@item -I @var{file}
@itemx --image-file @var{file}
Use the image file named @var{file} as the image file to load instead of
the default location, and set @var{file}'s directory part as the image
path. This option completely bypasses checking the file dates on the
kernel files; use @option{--maybe-rebuild-image} to restore the usual
behavior, writing the newly built image to @var{file} if needed.
@item -q
@itemx --quiet
@itemx --silent
Suppress the printing of answered values from top-level expressions
while @gst{} runs.
@item -r
@itemx --regression-test
This is used by the regression testing system and is probably not of
interest to the general user. It controls printing of certain
@item -S
@itemx --snapshot
Save the image after loading files from the command line. Of course
this ``snapshot'' is not saved if you include - (stdin) on the command
line and exit by typing @kbd{Ctrl-c}.
@item -v
@itemx --version
Print out the @gst{} version number, then exit.
@item -V
@itemx --verbose
Print various diagnostic messages while executing (the name of each file
as it's loaded, plus messages about the beginning of execution or how
many byte codes were executed).
@end table
@node Operation
@section Startup sequence
@strong{Caveat}: @emph{The startup sequence is pretty complicated. If
you are not interested in its customization, you can skip the first two
sections below. These two sections also don't apply when using the
command-line option @option{-I}, unless also using
You can abort @gst{} at any time during this procedure with @kbd{Ctrl-c}.
* Command-line processing:: Picking an image path and a kernel path.
* Loading or creating an image:: Loading an image or creating a new one.
* Starting the system:: After the image is created or restored.
@end menu
@node Command-line processing
@subsection Picking an image path and a kernel path
@cindex image path
When @gst{} is invoked, it first chooses two paths, the ``image path''
and the ``kernel path''. The image path is set by considering these
paths in succession:
@item the directory part of the @option{--image-file} option if it is
@item the value of the @env{SMALLTALK_IMAGE} environment variable
if it is defined and readable; this step will disappear in a future
@item the path compiled in the binary (usually, under Unix systems,
@file{/usr/local/var/lib/smalltalk} or a similar path under @file{/var})
if it exists and it is readable;
@item the current directory. The current directory is also used if
the image has to be rebuilt but you cannot write to a directory
chosen according to the previous criteria.
@end itemize
@cindex kernel path
The ``kernel path'' is the directory in which to look for Smalltalk code
compiled into the base image. The possibilities in this case are:
@item the argument to the @option{--kernel-dir} option if it is given;
@item the value of the @env{SMALLTALK_KERNEL} environment variable
if it is defined and readable; this step will disappear in a future
@item the path compiled in the binary (usually, under Unix systems,
@file{/usr/local/share/smalltalk/kernel} or a similar data file path)
if it exists and it is readable;
@item a subdirectory named @file{kernel} of the image path.
@end itemize
@node Loading or creating an image
@subsection Loading an image or creating a new one
@cindex compatible images
@cindex images, loading
@gst{} can load images created on any system with the same pointer size
as its host system by approximately the same version of @gst{}, even if
they have different endianness. For example, images created on 32-bit
PowerPC can be loaded with a 32-bit x86 @command{gst} @acronym{VM},
provided that the @gst{} versions are similar enough. Such images are
called @dfn{compatible images}. It cannot load images created on
systems with different pointer sizes; for example, our x86 @command{gst}
cannot load an image created on x86-64.
Unless the @option{-i} flag is used, @gst{} first tries to load the file
named by @option{--image-file}, defaulting to @file{} in the image
path. If this is found, @gst{} ensures the image is ``not stale'',
meaning its write date is newer than the write dates of all of the
kernel method definition files. It also ensures that the image is
``compatible'', as described above. If both tests pass, @gst{} loads
the image and continues with @ref{Starting the system,, After the image
is created or restored}.
If that fails, a new image has to be created. The image path may now be
changed to the current directory if the previous choice is not
@cindex kernel, loading
To build an image, @gst{} loads the set of files that make up the
kernel, one at a time. The list can be found in @file{libgst/lib.c}, in
the @code{standard_files} variable. You can override kernel files by
placing your own copies in @file{~/.st/kernel/}.@footnote{The directory
is called @file{_st/kernel} under MS-DOS. Under OSes that don't use
home directories, it is looked for in the current directory.} For
example, if you create a file @file{~/.st/kernel/}, it will
be loaded instead of the @file{} in the kernel path.
@cindex @file{}
@cindex @file{}
To aid with image customization and local bug fixes, @gst{} loads two
more files (if present) before saving the image. The first is
@file{}, found in the parent directory of the kernel
directory. Unless users at a site change the kernel directory when
running @command{gst}, @file{/usr/local/share/smalltalk/}
provides a convenient place for site-wide customization. The second is
@file{~/.st/}, which can be different for each user's home
directory.@footnote{The file is looked up as @file{_st/} under
MS-DOS and again, under OSes that don't use home directories it is
looked for as @file{} in the current directory.}.
Before the next steps, @gst{} takes a snapshot of the new memory image,
saving it over the old image file if it can, or in the current directory
@node Starting the system
@subsection After the image is created or restored
@c so it's not a "function"... it's an operation
@findex returnFromSnapshot
@cindex @file{}
Next, @gst{} sends the @code{returnFromSnapshot} event to the dependents
of the special class @code{ObjectMemory} (@pxref{Memory access}).
Afterwards, it loads @file{~/.st/} if available.@footnote{The
same considerations made above hold here too. The file is called
@file{_st/} under MS-DOS, and is looked for in the current
directory under OSes that don't use home directories.}
@cindex startup, customizing
@cindex customizing startup
You can remember the difference between @file{} and @file{}
by remembering that @file{} is the @emph{pre}-snapshot file and
@file{} is the post-image-load @emph{init}ialization file.
Finally, @gst{} loads files listed on the command line, or prompts for
input at the terminal, as described in @ref{Invocation,, Command line
@node Syntax
@section Syntax of @gst{}
The language that @gst{} accepts is basically the same that other Smalltalk
environment accept and the same syntax used in the @dfn{Blue Book}, also
known as @cite{Smalltalk-80: The Language and Its Implementation}.
The return operator, which is represented in the Blue Book as an
up-arrow, is mapped to the ASCII caret symbol @code{^}; the assignment
operator (left-arrow) is usually represented as @code{:=}@footnote{It
also bears mentioning that there are two assignment operators:
@code{_} and @code{:=}. Both are usable interchangeably, provided that
they are surrounded by spaces. The @gst{} kernel code uses the
@code{:=} form exclusively, but @code{_} is supported a) for
compatibility with previous versions of @gst{} b) because this is the
correct mapping between the assignment operator mentioned in the Blue
Book and the current ASCII definition. In the ancient days (like the
middle 70's), the ASCII underscore character was also printed as a
back-arrow, and many terminals would display it that way, thus its
current usage. Anyway, using @code{_} may lead to portability problems.}.
Actually, the grammar of @gst{} is slightly different from the grammar
of other Smalltalk environments in order to simplify interaction with
the system in a command-line environment as well as in full-screen
Statements are executed one by one; multiple statements are separated by a
period. At end-of-line, if a valid statement is complete, a period is
implicit. For example,
8r300. 16rFFFF
@end example
prints out the decimal value of octal @code{300} and hex @code{FFFF},
each followed by a newline.
Multiple statements share the same local variables, which are automatically
declared. To delete the local variables, terminate a statement with
@code{!} rather than @code{.} or newline. Here,
a := 42
@end example
the first two @code{a}s are printed as @code{42}, but the third one
is uninitialized and thus printed as @code{nil}.
In order to evaluate multiple statements in a single block, wrap them into
an @dfn{eval block} as follows:
Eval [
a := 42. a printString
@end example
This won't print the intermediate result (the integer 42), only the final
result (the string @code{'42'}).
ObjectMemory quit
@end example
exits from the system. You can also type a @kbd{C-d} to exit from
Smalltalk if it's reading statements from standard input.
@gst{} provides three extensions to the language that make it
simpler to write complete programs in an editor. However, it is also
compatible with the @dfn{file out} syntax as shown in the @dfn{Green Book}
(also known as @cite{Smalltalk-80: Bits of History, Words of Advice}
by Glenn Krasner).
A new class is created using this syntax:
@var{superclass-name} @t{subclass:} @var{new-class-name} @t{[}
@t{|} @var{instance variables} @t{|}
@var{message-pattern-1} @t{[} @var{statements} @t{]}
@var{message-pattern-2} @t{[} @var{statements} @t{]}
@var{class-variable-1} @t{:=} @var{expression}@t{.}
@var{class-variable-2} @t{:=} @var{expression}@t{.}
@end display
In short:
@itemize @bullet
@item Instance variables are defined with the same syntax as method temporary
@item Unlike other Smalltalks, method statements are inside brackets.
@item Class variables are defined the same as variable assignments.
@item Pragmas define class comment, class category, imported
namespaces, and the shape of indexed instance variables.
<comment: 'Class comment'>
<category: 'Examples-Intriguing'>
<import: SystemExceptions>
<shape: #pointer>
@end example
@end itemize
A similar syntax is used to define new methods in an existing class.
@var{class-expression} @t{extend} @t{[}
@end display
The @var{class-expression} is an expression that evaluates to a class
object, which is typically just the name of a class, although it can be
the name of a class followed by the word @code{class}, which causes the
method definitions that follow to apply to the named class itself,
rather than to its instances.
Number extend [
radiusToArea [
^self squared * Float pi
radiusToCircumference [
^self * 2 * Float pi
@end example
A complete treatment of the Smalltalk syntax and of the class library
can be found in the included tutorial and class reference (@pxref{Top, ,
Class Reference, gst-base, the @gst{} Library Reference}).
More information on the implementation of the language can be found in
the @cite{Blue Book}; the relevant parts are available, scanned, at
@node Test suite
@section Running the test suite
@gst{} comes with a set of files that provides a simple regression test
To run the test suite, you should be connected to the top-level
Smalltalk directory. Type
make check
@end example
You should see the names of the test suite files as they are processed,
but that's it. Any other output indicates some problem.
@node Legal concerns
@section Licensing of @gst{}
Different parts of @gst{} comes under two licenses: the virtual machine
and the development environment (compiler and browser) come under the
@gnu{} General Public License, while the system class libraries come
under the Lesser General Public License.
* GPL:: Complying with the GNU GPL.
* LGPL:: Complying with the GNU LGPL.
@end menu
@node GPL
@subsection Complying with the @gnu{} @acronym{GPL}
The @acronym{GPL} licensing of the virtual machine means that all
derivatives of the virtual machine must be put under the same license.
In other words, it is strictly forbidden to distribute programs that include the
@gst{} virtual machine under a license that is not the GPL.
This also includes any bindings to external libraries. For example,
the bindings to Gtk+ are released under the @acronym{GPL}.
In principle, the @acronym{GPL} would not extend to Smalltalk programs,
since these are merely input data for the virtual machine. On the
other hand, using bindings that are under the @acronym{GPL} via dynamic
linking would constitute combining two parts (the Smalltalk program and
the bindings) into one program. Therefore, we added a special exception
to the @acronym{GPL} in order to avoid gray areas that could adversely
hit both the project and its users:
In addition, as a special exception, the Free Software Foundation
give you permission to combine @gst{} with free software
programs or libraries that are released under the @gnu{} @acronym{LGPL} and with
independent programs running under the @gst{} virtual machine.
You may copy and distribute such a system following the terms of the
@gnu{} @acronym{GPL} for @gst{} and the licenses of the other code
concerned, provided that you include the source code of that other
code when and as the @gnu{} @acronym{GPL} requires distribution of source code.
Note that people who make modified versions of @gst{} are not
obligated to grant this special exception for their modified
versions; it is their choice whether to do so. The @gnu{} General
Public License gives permission to release a modified version without
this exception; this exception also makes it possible to release a
modified version which carries forward this exception.
@end quotation
@node LGPL
@subsection Complying with the @gnu{} @acronym{LGPL}
Smalltalk programs that run under @gst{} are linked with the system
classes in @gst{} class library. Therefore, they must respect the terms
of the Lesser General Public License@footnote{Of course, they may
be more constrained by usage of @acronym{GPL} class libraries.}.
The interpretation of this license for architectures different from
that of the C language is often difficult; the accepted one for
Smalltalk is as follows. The image file can be considered as an
object file, falling under Subsection 6a of the license, as long as
it allows a user to load an image, upgrade the library or otherwise
apply modifications to it, and save a modified image: this is most
conveniently obtained by allowing the user to use the read-eval-print
loop that is embedded in the @gst{} virtual machine.
In other words, provided that you leave access to the loop in a
documented way, or that you provide a way to file in arbitrary files
in an image and save the result to a new image, you are obeying
Subsection 6a of the Lesser General Public License, which is
reported here:
a) Accompany the work with the complete corresponding
machine-readable source code for the Library including whatever
changes were used in the work (which must be distributed under
Sections 1 and 2 above); and, if the work is an executable linked
with the Library, with the complete machine-readable "work that
uses the Library", as object code and/or source code, so that the
user can modify the Library and then relink to produce a modified
executable containing the modified Library. (It is understood
that the user who changes the contents of definitions files in the
Library will not necessarily be able to recompile the application
to use the modified definitions.)
@end quotation
In the future, alternative mechanisms similar to shared libraries may
be provided, so that it is possible to comply with the @gnu{} @acronym{LGPL}
in other ways.
@node Features
@chapter Features of @gst{}
In this section, the features which are specific to @gst{} are
described. These features include support for calling C functions from
within Smalltalk, accessing environment variables, and controlling
various aspects of compilation and execution monitoring.
Note that, in general, @gst{} is much more powerful than the original
Smalltalk-80, as it contains a lot of methods that are common in today's
Smalltalk implementation and are present in the ANSI Standard for
Smalltalk, but were absent in the Blue Book. Examples include
Collection's @code{allSatisfy:} and @code{anySatisfy:} methods and many
methods in SystemDictionary (the Smalltalk dictionary's class).
* Extended streams:: Extensions to streams, and generators
* Regular expressions:: String matching extensions
* Namespaces:: Avoiding clashes between class names.
* Disk file-IO:: Methods for reading and writing disk files.
* Object dumping:: Methods that read and write objects in binary format.
* Dynamic loading:: Picking external libraries and modules at run-time.
* Documentation:: Automatic documentation generation.
* Memory access:: The direct memory accessing classes and methods, plus
broadcasts from the virtual machine.
* GC:: The @gst{} memory manager.
* Security:: Sandboxing and access control.
* Special objects:: Methods to assign particular properties to objects.
@end menu
@node Extended streams
@section Extended streams
The basic image in @gst{} includes powerful extensions to the @emph{Stream}
hierarchy found in ANSI Smalltalk (and Smalltalk-80). In particular:
@itemize @bullet
Read streams support all the iteration protocols available for collections. In
some cases (like @code{fold:}, @code{detect:}, @code{inject:into:}) these
are completely identical. For messages that return a new stream, such
as @code{select:} and @code{collect:}, the blocks are evaluated lazily,
as elements are requested from the stream using @code{next}.
Read streams can be concatenated using @code{,} like SequenceableCollections.
@dfn{Generators} are supported as a quick way to create a Stream.
A generator is a kind of pluggable stream, in that a user-supplied
blocks defines which values are in a stream.
For example, here is an empty generator and two infinite generators:
"Returns an empty stream"
Generator on: [ :gen | ]
"Return an infinite stream of 1's"
Generator on: [ :gen | [ gen yield: 1 ] repeat ]
"Return an infinite stream of integers counting up from 1"
Generator inject: 1 into: [ :value | value + 1 ]
@end example
The block is put ``on hold'' and starts executing as soon as @code{#next}
or @code{#atEnd} are sent to the generator. When the block sends
@code{#yield:} to the generator, it is again put on hold and the argument
becomes the next object in the stream.
Generators use @dfn{continuations}, but they shield the users from their
complexity by presenting the same simple interface as streams.
@end itemize
@node Regular expressions
@section Regular expression matching
@emph{Regular expressions}, or "regexes", are a sophisticated way to
efficiently match patterns of text. If you are unfamiliar with regular
expressions in general, see @ref{Regexps, Syntax of Regular Expressions,
20.5 Syntax of Regular Expressions, emacs, GNU Emacs Manual}, for a
guide for those who have never used regular expressions.
@gst{} supports regular expressions in the core image with methods
on @code{String}.
The @gst{} regular expression library is derived from GNU libc,
with modifications made originally for Ruby to support Perl-like syntax.
It will always use its included library, and never the ones installed on
your system; this may change in the future in backwards-compatible ways.
Regular expressions are currently 8-bit clean, meaning they can
work with any ordinary String, but do not support full Unicode, even
when package @code{I18N} is loaded.
Broadly speaking, these regexes support Perl 5 syntax; register groups
@samp{()} and repetition @samp{@{@}} must not be given with backslashes,
and their counterpart literal characters should. For example,
@samp{\@{@{1,3@}} matches @samp{@{}, @samp{@{@{}, @samp{@{@{@{};
correspondingly, @samp{(a)(\()} matches @samp{a(}, with @samp{a} and
@samp{(} as the first and second register groups respectively.
@gst{} also supports the regex modifiers @samp{imsx}, as in Perl. You can't
put regex modifiers like @samp{im} after Smalltalk strings to
specify them, because they aren't part of Smalltalk syntax. Instead,
use the inline modifier syntax. For example, @samp{(?is:abc.)}
is equivalent to @samp{[Aa][Bb][Cc](?:.|\n)}.
In most cases, you should specify regular expressions as ordinary
strings. @gst{} always caches compiled regexes, and uses a special
high-efficiency caching when looking up literal strings (i.e. most
regexes), to hide the compiled @code{Regex} objects from most code.
For special cases where this caching is not good enough, simply send
@code{#asRegex} to a string to retrieved a compiled form, which
works in all places in the public API where you would specify a regex
string. You should always rely on the cache until you have demonstrated
that using Regex objects makes a noticeable performance difference in
your code.
Smalltalk strings only have one escape, the @samp{'} given by
@samp{''}, so backslashes used in regular expression strings will be
understood as backslashes, and a literal backslash can be given directly
with @samp{\\}@footnote{Whereas it must be given as @samp{\\\\}
in a literal Emacs Lisp string, for example.}.
The methods on the compiled Regex object are private to this interface.
As a public interface, @gst{} provides methods on String, in the category
@samp{regex}. There are several methods for matching, replacing, pattern
expansion, iterating over matches, and other useful things.
The fundamental operator is @code{#searchRegex:}, usually written as
@code{#=~}, reminiscent of Perl syntax. This method will always
return a @code{RegexResults}, which you can query for whether
the regex matched, the location Interval and contents of the match and
any register groups as a collection, and other features. For example,
here is a simple configuration file line parser:
| file config |
config := LookupTable new.
file := (File name: 'myapp.conf') readStream.
file linesDo: [:line |
(line =~ '(\w+)\s*=\s*((?: ?\w+)+)') ifMatched: [:match |
config at: (match at: 1) put: (match at: 2)]].
file close.
config printNl.
@end example
As with Perl, @code{=~} will scan the entire string and answer the
leftmost match if any is to be found, consuming as many characters as
possible from that position. You can anchor the search with variant
messages like @code{#matchRegex:}, or of course @code{^} and
@code{$} with their usual semantics if you prefer.
You shouldn't modify the string while you want a particular RegexResults
object matched on it to remain valid, because changes to the matched
text may propagate to the RegexResults object.
@c (currently "will", but best to leave open)
Analogously to the Perl @code{s} operator, @gst{} provides
@code{#replacingRegex:with:}. Unlike Perl, @gst{} employs the pattern expansion
syntax of the @code{#%} message here. For example, @code{'The ratio is
16/9.' replacingRegex: '(\d+)/(\d+)' with: '$%1\over%2$'} answers
@code{'The ratio is $16\over9$.'}. In place of the @code{g}
modifier, use the @code{#replacingAllRegex:with:} message instead.
One other interesting String message is @code{#onOccurrencesOfRegex:do:}, which
invokes its second argument, a block, on every successful match found in the
receiver. Internally, every search will start at the end of the previous
successful match. For example, this will print all the words in a stream:
stream contents onOccurrencesOfRegex: '\w+'
do: [:each | each match printNl]
@end example
@node Namespaces
@section Namespaces
@i{[This section (and the implementation of namespaces in @gst{})
is based on the paper @cite{Structured Symbolic Name Spaces in
Smalltalk}, by Augustin Mrazik.]}
@subsection Introduction
The Smalltalk-80 programming environment, upon which @gst{} is
historically based, supports symbolic identification of objects in one
global namespace---in the @code{Smalltalk} system dictionary. This means
that each global variable in the system has its unique name which is
used for symbolic identification of the particular object in the source
code (e.g.@: in expressions or methods). The most important of these
global variables are classes defining the behavior of objects.
In development dealing with modelling of real systems, @dfn{polymorphic
symbolic identification} is often needed. By this, we mean that it
should be possible to use the same name for different classes or other
global variables. Selection of the proper variable binding should be
context-specific. By way of illustration, let us consider class
@code{Statement} as an example which would mean totally different things
in different domains:
@table @asis
@item @gst{} or other programming language
An expression in the top level of a code body, possibly with special
syntax available such as assignment or branching.
@item Bank
A customer's trace report of recent transactions.
@item AI, logical derivation
An assertion of a truth within a logical system.
@end table
This issue becomes inevitable if we start to work persistently, using
@code{ObjectMemory snapshot} to save after each session for later
resumption. For example, you might have the class @code{Statement}
already in your image with the ``Bank'' meaning above (e.g.@: in the
live bank support systems we all run in our images) and you might decide
to start developing @acronym{YAC} [Yet Another C]. Upon starting to
write parse nodes for the compiler, you would find that
@code{#Statement} is boundk in the banking package. You could replace
it with your parse node class, and the bank's @code{Statement} could
remain in the system as an unbound class with full functionality;
however, it could not be accessed anymore at the symbolic level in the
source code. Whether this would be a problem or not would depend on
whether any of the bank's code refers to the class @code{Statement}, and
when these references occur.
Objects which have to be identified in source code by their names are
included in @code{Smalltalk}, the sole instance of
@code{SystemDictionary}. Such objects may be identified simply by
writing their names as you would any variable names. The code is
compiled in the default environment, and if the variable is found in
@code{Smalltalk}, without being shadowed by a class pool or local
variables, its value is retrieved and used as the value of the
expression. In this way @code{Smalltalk} represents the sole symbolic
namespace. In the following text the symbolic namespace, as a concept,
will be called simply @dfn{environment} to make the text more clear.
@subsection Concepts
To support polymorphic symbolical identification several environments
will be needed. The same name may exist concurrently in several
environments as a key, pointing to diverse objects in each.
Symbolic navigation between these environments is needed. Before
approaching the problem of the syntax and semantics to be implemented,
we have to decide on structural relations to be established between
Since the environment must first be symbolically identified to direct
access to its global variables, it must first itself be a global
variable in another environment. @code{Smalltalk} is a great choice for
the root environment, from which selection of other environments and
their variables begins. From @code{Smalltalk} some of the existing
sub-environments may be seen; from these other sub-environments may be
seen, etc. This means that environments represent nodes in a graph
where symbolic selections from one environment to another one represent
The symbolic identification should be unambiguous, although it will be
polymorphic. This is why we should avoid cycles in the environment
graph. Cycles in the graph could cause also other problems in the
implementation, e.g.@: inability to use trivially recursive algorithms.
Thus, in general, the environments must build a directed acyclic graph;
@gst{} currently limits this to an n-ary tree, with the extra feature
that environments can be used as pool dictionaries.
Let us call the partial ordering relation which occurs between
environments @dfn{inheritance}. Sub-environments inherit from their
super-environments. The feature of inheritance in the meaning of
object-orientation is associated with this relation: all associations of
the super-environment are valid also in its sub-environments, unless they
are locally redefined in the sub-environment.
A super-environment includes all its sub-enviroments as
@code{Association}s under their names. The sub-environment includes its
super-environment under the symbol @code{#Super}. Most environments
inherit from @code{Smalltalk}, the standard root environment, but they
are not required to do so; this is similar to how most classes derive
from @code{Object}, yet one can derive a class directly from @code{nil}.
Since they all inherit @code{Smalltalk}'s global variables, it is not
necessary to define @code{Smalltalk} as pointing to @code{Smalltalk}'s
@code{Smalltalk} in each environment.
The inheritance links to the super-environments are used in the lookup
for a potentially inherited global variable. This includes lookups by a
compiler searching for a variable binding and lookups via methods such
as @code{#at:} and @code{#includesKey:}.
@subsection Syntax
Global objects of an environment, be they local or inherited, may be
referenced by their symbol variable names used in the source code, e.g.
John goHome
@end example
if the @code{#John -> aMan} association exists in the particular environment or
one of its super-environments, all along the way to the root environment.
If an object must be referenced from another environment (i.e.@: which
is not one of its sub-environments) it has to be referenced either
@emph{relatively} to the position of the current environment, using the
@code{Super} symbol, or @emph{absolutely}, using the ``full pathname''
of the object, navigating from the tree root (usually @code{Smalltalk})
through the tree of sub-environments.
For the identification of global objects in another environment, we use
a ``pathname'' of symbols. The symbols are separated by periods; the
``look'' to appear is that of
@end example
and of
@end example
As is custom in Smalltalk, we are reminded by capitalization that we
are accessing global objects. Another syntax returns the @dfn{variable
binding}, the @code{Association} for a particular global. The first
example above is equivalently:
#@{Smalltalk.Tasks.MyTask@} value
@end example
The latter syntax, a @dfn{variable binding}, is also valid inside
literal arrays.
@subsection Implementation
A superclass of @code{SystemDictionary} called @code{RootNamespace} is
defined, and many of the features of the Smalltalk-80
@code{SystemDictionary} will be hosted by that class. @code{Namespace}
and @code{RootNamespace} are in turn subclasses of
To handle inheritance, the following methods have to be defined or redefined in
Namespace (@emph{not} in RootNamespace):
@table @asis
@item Accessors like @code{#at:ifAbsent:} and @code{#includesKey:}
Inheritance must be implemented. When @code{Namespace}, trying to read
a variable, finds an association in its own dictionary or a
super-environment dictionary, it uses that; for @code{Dictionary}'s
writes and when a new association must be created, @code{Namespace}
creates it in its own dictionary. There are special methods like
@code{#set:to:} for cases in which you want to modify a binding in a
super-environment if that is the relevant variable's binding.
@c this needs more clarity for #at:put: #set:to: disambig
@item Enumerators like @code{#do:} and @code{#keys}
This should return @strong{all} the objects in the namespace, including
those which are inherited.
@item Hierarchy access
@code{AbstractNamespace} will also implement a new set of
methods that allow one to navigate through the namespace hierarchy;
these parallel those found in @code{Behavior} for the class hierarchy.
@end table
The most important task of the @code{Namespace} class is to provide
organization for the most important global objects in the Smalltalk
system---for the classes. This importance becomes even more crucial in
a structure of multiple environments intended to change the semantics of
code compiled for those classes.
In Smalltalk the classes have the instance variable @code{name} which
holds the name of the class. Each @dfn{defined class} is included in
@code{Smalltalk}, or another environment, under this name. In a
framework with several environments the class should know the
environment in which it has been created and compiled. This is a new
property of @code{Class} which must be defined and properly used in
relevant methods. In the mother environment the class shall be included
under its name.
Any class, as with any other object, may be included concurrently in
several environments, even under different symbols in the same or in
diverse environments. We can consider these ``alias names'' of the
particular class or other value. A class may be referenced under the
other names or in other environments than its mother environment, e.g.@:
for the purpose of instance creation or messages to the class, but it
should not compile code in these environments, even if this compilation
is requested from another environment. If the syntax is not correct in
the mother environment, a compilation error occurs. This follows from
the existence of class ``mother environments'', as a class is
responsible for compiling its own methods.
An important issue is also the name of the class answered by the class
for the purpose of its identification in diverse tools (e.g.@: in a
browser). This must be changed to reflect the environment in which it is
shown, i.e.@: the method @samp{nameIn: environment} must be implemented
and used in proper places.
Other changes must be made to the Smalltalk system to achieve the full
functionality of structured environments. In particular, changes have
to be made to the behavior classes, the user interface, the compiler,
and a few classes supporting persistance. One small detail of note is
that evaluation in the @acronym{REPL} or @samp{Workspace}, implemented
by compiling methods on @code{UndefinedObject}, make more sense if
@code{UndefinedObject}'s environment is the ``current environment'' as
reachable by @code{Namespace current}, even though its mother
environment by any other sensibility is @code{Smalltalk}.
@subsection Using namespaces
Using namespaces is often merely a matter of adding a @samp{namespace}
option to the @gst{} @acronym{XML} package description used by
@code{PackageLoader}, or wrapping your code like this:
Namespace current: NewNS [
@end example
Namespaces can be imported into classes like this:
Stream subclass: EncodedStream [
<import: Encoders>
@end example
Alternatively, paths to
classes (and other objects) in the namespaces will have to be specified
completely. Importing a namespace into a class is similar to C++'s
@code{using namespace} declaration within the class proper's definition.
Finally, be careful when working with fundamental system classes. Although you
can use code like
Namespace current: NewNS [
Smalltalk.Set subclass: Set [
<category: 'My application-Extensions'>
@end example
this approach won't work
when applied to core classes. For example, you might be successful with
a @code{Set} or @code{WriteStream} object, but subclassing
@code{SmallInteger} this way can bite you in strange ways: integer
literals will still belong to the @code{Smalltalk} dictionary's version
of the class (this holds for @code{Array}s, @code{String}s, etc.@: too),
primitive operations will still answer standard Smalltalk
@code{SmallIntegers}, and so on. Similarly,
word-shaped will recognize 32-bit @code{Smalltalk.LargeInteger} objects,
but not @code{LargeInteger}s belonging to your own namespace.
Unfortunately, this problem is not easy to solve since Smalltalk has to
know the @acronym{OOP}s of determinate class objects for speed---it
would not be feasible to lookup the environment to which sender of a
message belongs every time the @code{+} message was sent to an Integer.
So, @gst{} namespaces cannot yet solve 100% of the problem of clashes
between extensions to a class---for that you'll still have to rely on
prefixes to method names. But they @emph{do} solve the problem of clashes
between class names, or between class names and pool dictionary names.
Namespaces are unrelated from packages; loading a package does not
import the corresponding namespace.
@node Disk file-IO
@section Disk file-IO primitive messages
Four classes (@code{FileDescriptor}, @code{FileStream}, @code{File},
@code{Directory}) allow you to create files and access the file system
in a fully object-oriented way.
@code{FileDescriptor} and @code{FileStream} are much more powerful than the
corresponding C language facilities (the difference between the two is that,
like the C @code{stdio} library, @code{FileStream} does buffering). For one
thing, they allow you to write raw binary data in a portable endian-neutral
format. But, more importantly, these classes transparently implement
virtual filesystems and asynchronous I/O.
Asynchronous I/O means that an input/output operation blocks the
Smalltalk Process that is doing it, but not the others, which makes them
very useful in the context of network programming. Virtual file systems
mean that these objects can transparently extract files from archives
such as @file{tar} and @file{gzip} files, through a mechanism that can
be extended through either shell scripting or Smalltalk programming.
For more information on these classes, look in the class reference, under
the @code{VFS} namespace. @acronym{URL}s may be used as file names; though,
unless you have loaded the @code{NetClients} package (@pxref{Network support}),
only @code{file} @acronym{URL}s will be accepted.
In addition, the three files, @code{stdin}, @code{stdout}, and @code{stderr}
are declared as global instances of @code{FileStream} that are bound to the
proper values as passed to the C virtual machine. They can be accessed as
either @code{stdout} and @code{FileStream stdout}---the former is easier to
type, but the latter can be clearer.
Finally, @code{Object} defines four other methods: @code{print} and
@code{printNl}, @code{store} and @code{storeNl}. These do a @code{printOn:} or
@code{storeOn:} to the ``Transcript'' object; this object, which is the sole
instance of class @code{TextCollector}, normally delegates write
operations to @code{stdout}. If you load the VisualGST @sc{gui}, instead,
the Transcript Window will be attached to the Transcript object (@pxref{GUI, ,
GTK and VisualGST}).
The @code{fileIn:} message sent to the FileStream class, with a file
name as a string argument, will cause that file to be loaded into
For example,
FileStream fileIn: '' !
@end example
will cause @file{} to be loaded into @gst{}.
@node Object dumping
@section The @gst{} ObjectDumper
Another @gst{}-specific class, the @code{ObjectDumper} class, allows
you to dump objects in a portable, endian-neutral, binary format. Note that
you can use the @code{ObjectDumper} on ByteArrays too, thanks to another
@gst{}-specific class, @code{ByteStream}, which allows you to treat
ByteArrays the same way you would treat disk files.
For more information on the usage of the @code{ObjectDumper}, look in the
class reference.
@node Dynamic loading
@section Dynamic loading
The @code{DLD} class enhances the C callout mechanism to automatically look
for unresolved functions in a series of program-specified libraries. To
add a library to the list, evaluate code like the following:
DLD addLibrary: 'libc'
@end example
The extension (@file{.so}, @file{.sl}, @file{.a}, @file{.dll} depending
on your operating system) will be added automatically. You are advised
not to specify it for portability reasons.
You will then be able to use the standard C call-out mechanisms
to define all the functions in the C run-time library. Note
that this is a potential security problem (especially if your program is
SUID root under Unix), so you might want to disable dynamic loading when
using @gst{} as an extension language. To disable dynamic loading,
configure @gst{} passing the @option{--disable-dld} switch.
Note that a @code{DLD} class will be present even if dynamic loading is
disabled (either because your system is not supported, or by the
@option{--disable-dld} configure switch) but any attempt to perform
dynamic linking will result in an error.
@node Documentation
@section Automatic documentation generator
@gst{} includes an automatic documentation generator invoked via the
@command{gst-doc} command. The code is actually part of the
@code{ClassPublisher} package, and @command{gst-doc} takes care
of reading the code to be documented and firing a @code{ClassPublisher}.
Currently, @command{gst-doc} can only generate output in Texinfo
format, though this will change in future releases.
@command{gst-doc} can document code that is already in the image, or
it can load external files and packages. Note that the latter approach
will not work for files and packages that programmatically create code
or file in other files/packages.
@command{gst-doc} is invoked as follows:
gst-doc [ @var{flag} ... ] @var{class} ...
@end example
The following options are supported:
@table @option
@item -p @var{package}
@itemx --package=@var{package}
Produce documentation for the classes inside the @var{package} package.
@item -f @var{file}
@itemx --file=@var{file}
Produce documentation for the classes inside the @var{file} file.
@item -I
@itemx --image-file
Produce documentation for the code that is already in the given image.
@item -o
@itemx --output=@var{file}
Emit documentation in the named file.
@end table
@var{class} is either a class name, or a namespace name followed by
@code{.*}. Documentation will be written for classes that are specified
in the command line. @var{class} can be omitted if a @option{-f} or
@option{-p} option is given. In this case, documentation will be
written for all the classes in the package.
@node Memory access
@section Memory accessing methods
@gst{} provides methods to query its own internal data structures.
You may determine the real memory address of an object or the real
memory address of the OOP table that points to a given object, by
using messages to the @code{Memory} class, described below.
@defmethod Object asOop
Returns the index of the OOP for anObject. This index is immume from
garbage collection and is the same value used by default as an hash
value for anObject (it is returned by Object's implementation of
@code{hash} and @code{identityHash}).
@end defmethod
@defmethod Integer asObject
Converts the given OOP @emph{index} (not address) back to an object.
Fails if no object is associated to the given index.
@end defmethod
@defmethod Integer asObjectNoFail
Converts the given OOP @emph{index} (not address) back to an object.
Returns nil if no object is associated to the given index.
@end defmethod
Other methods in ByteArray and Memory allow to read various C types
(@code{doubleAt:}, @code{ucharAt:}, etc.). These are mostly obsoleted
by @code{CObject} which, in newer versions of @gst{}, supports
manually managed heap-backed memory as well as garbage collected
ByteArray-backed memory.
Another interesting class is ObjectMemory. This provides a few methods
that enable one to tune the virtual machine's usage of memory; many
methods that in the past were instance methods of Smalltalk or class
methods of Memory are now class methods of ObjectMemory. In addition,
and that's what the rest of this section is about, the virtual machines
signals events to its dependents exactly through this class.
The events that can be received are
@table @dfn
@item returnFromSnapshot
This is sent every time an image is restarted, and substitutes the
concept of an @dfn{init block} that was present in previous versions.
@item aboutToQuit
This is sent just before the interpreter is exiting, either because
@code{ObjectMemory quit} was sent or because the specified files were
all filed in. Exiting from within this event might cause an infinite
loop, so be careful.
@item aboutToSnapshot
This is sent just before an image file is created. Exiting from within
this event will leave any preexisting image untouched.
@item finishedSnapshot
This is sent just after an image file is created. Exiting from within
this event will not make the image unusable.
@end table
@node GC
@section Memory management in @gst{}
The @gst{} virtual machine is equipped with a garbage collector, a
facility that reclaims the space occupied by objects that are no
longer accessible from the system roots. The collector is composed
of several parts, each of which can be invoked by the virtual machine
using various tunable strategies, or invoked manually by the programmer.
These parts include a @dfn{generation scavenger}, a @dfn{mark & sweep}
collectory with an incremental sweep phase, and a @dfn{compactor}.
All these facilities work on different memory spaces and differs from
the other in its scope, speed and disadvantages (which are hopefully
balanced by the availability of different algorithms). What follows
is a description of these algorithms and of the memory spaces they
work in.
@dfn{NewSpace} is the memory space where young objects live. It is
composed of three sub-spaces: an object-creation space (@dfn{Eden})
and two @dfn{SurvivorSpaces}. When an object is first created, it is
placed in Eden. When Eden starts to fill up (i.e., when the number of
used bytes in Eden exceeds the scavenge threshold), objects that are
housed in Eden or in the occupied SurvivorSpace and that are still
reachable from the system roots are copied to the unoccupied
SurvivorSpace. As an object survives different scavenging passes, it
will be shuffled by the scavenger from the occupied SurvivorSpace to
the unoccupied one. When the number of used bytes in SurvivorSpace is
high enough that the scavenge pause might be excessively long, the
scavenger will move some of the older surviving objects from NewSpace
to @dfn{OldSpace}. In the garbage collection jargon, we say that such
objects are being @dfn{tenured} to OldSpace.
This garbage collection algorithm is designed to reclaim short-lived
objects, that is those objects that expire while residing in NewSpace,
and to decide when enough data is residing in NewSpace that it is
useful to move some of it in OldSpace. A @dfn{copying} garbage
collector is particularly efficient in an object population whose
members are more likely to die than survive, because this kind of
scavenger spends most of its time copying survivors, who will be few
in number in such populations, rather than tracing corpses, who will
be many in number. This fact makes copying collection especially
well suited to NewSpace, where a percentage of 90% or more objects
often fails to survive across a single scavenge.
The particular structure of NewSpace has many advantages. On one
hand, having a large Eden and two small SurvivorSpaces has a smaller
memory footprint than having two equally big semi-spaces and
allocating new objects directly from the occupied one (by default,
@gst{} uses 420=300+60*2 kilobytes of memory, while a simpler
configuration would use 720=360*2 kilobytes). On the other hand, it
makes tenuring decisions particularly simple: the copying order is
such that short-lived objects tend to be copied last, while objects
that are being referred from OldSpace tend to be copied first: this is
because the tenuring strategy of the scavenger is simply to treat the
destination SurvivorSpace as a circular buffer, tenuring objects with
a First-In-First-Out policy.
An object might become part of the scavenger root set for several
reasons: objects that have been tenured are roots if their data lives
in an OldSpace page that has been written to since the last scavenge
(more on this later), plus all objects can be roots if they are known
to be referenced from C code or from the Smalltalk stacks.
In turn, some of the old objects can be made to live in a special
area, called @dfn{FixedSpace}. Objects that reside in FixedSpace are
special in that their body is guaranteed to remain at a fixed address
(in general, @gst{} only ensures that the header of the object remains
at a fixed address in the Object Table). Because the garbage
collector can and does move objects, passing objects to foreign code
which uses the object's address as a fixed key, or which uses a
ByteArray as a buffer, presents difficulties. One can use
@code{CObject} to manipulate C data on the @code{malloc} heap, which
indeed does not move, but this can be tedious and requires the same
attentions to avoid memory leaks as coding in C. FixedSpace provides
a much more convenient mechanism: once an object is deemed fixed, the
object's body will never move through-out its life-time; the space it
occupies will however still be returned automatically to the
FixedSpace pool when the object is garbage collected. Note that
because objects in FixedSpace cannot move, FixedSpace cannot be
compacted and can therefore suffer from extensive fragmentation. For
this reason, FixedSpace should be used carefully. FixedSpace however
is rebuilt (of course) every time an image is brought up, so a kind of
compaction of FixedSpace can be achieved by saving a snapshot,
quitting, and then restarting the newly saved image.
Memory for OldSpace and FixedSpace is allocated using a variation of
the system allocator @code{malloc}: in fact, @gst{} uses the same
allocator for its own internal needs, for OldSpace and for FixedSpace,
but it ensures that a given memory page never hosts objects that
reside in separate spaces. New pages are mapped into the address
space as needed and devoted to OldSpace or FixedSpace segments;
similarly, when unused they may be subsequently unmapped, or they
might be left in place waiting to be reused by @code{malloc} or
by another Smalltalk data space.
Garbage that is created among old objects is taken care of by a mark &
sweep collector which, unlike the scavenger which only reclaims
objects in NewSpace, can only reclaim objects in OldSpace. Note that
as objects are allocated, they will not only use the space that was
previously occupied in the Eden by objects that have survived, but
they will also reuse the entries in the global Object Table that have
been freed by object that the scavenger could reclaim. This quest for
free object table entries can be combined with the sweep phase of the
OldSpace collector, which can then be done incrementally, limiting the
disruptive part of OldSpace garbage collection to the mark phase.
Several runs of the mark & sweep collector can lead to fragmentation
(where objects are allocated from several pages, and then become
garbage in an order such that a bunch of objects remain in each page
and the system is not able to recycle them). For this reason, the
system periodically tries to compact OldSpace. It does so simply by
looping through every old object and copying it into a new OldSpace.
Since the OldSpace allocator does not suffer from fragmentation until
objects start to be freed nor after all objects are freed, at the end
of the copy all the pages in the fragmented OldSpace will have been
returned to the system (some of them might already have been used by
the compacted OldSpace), and the new, compacted OldSpace is ready to
be used as the system OldSpace. Growing the object heap (which is
done when it is found to be quite full even after a mark & sweep
collection) automatically triggers a compaction.
You can run the compactor without marking live objects. Since the
amount of garbage in OldSpace is usually quite limited, the overhead
incurred by copying potentially dead objects is small enough that the
compactor still runs considerably faster than a full garbage
collection, and can still give the application some breathing room.
Keeping OldSpace and FixedSpace in the same heap would then make
compaction of OldSpace (whereby it is rebuilt from time to time in
order to limit fragmentation) much less effective. Also, the
@code{malloc} heap is not used for FixedSpace objects because @gst{}
needs to track writes to OldSpace and FixedSpace in order to support
efficient scavenging of young objects.
To do so, the grey page table@footnote{The denomination @dfn{grey}
comes from the lexicon of @dfn{tri-color marking}, which is an
abstraction of every possible garbage collection algorithm: in
tri-color marking, grey objects are those that are known to be
reachable or that we are not interested in reclaiming, yet have not
been scanned to mark the objects that they refer to as reachable.}
contains one entry for each page in OldSpace or FixedSpace that is
thought to contain at least a reference to an object housed in
NewSpace. Every page in OldSpace is created as grey, and is considered
grey until a scavenging pass finds out that it actually does not contain
pointers to NewSpace. Then the page is recolored black@footnote{Black
objects are those that are known to be reachable or that we are not
interested in reclaiming, and are known to have references only to
other black or grey objects (in case you're curious, the tri-color
marking algorithm goes on like this: object not yet known to be
reachable are white, and when all objects are either black or white,
the white ones are garbage).},
and will stay black until it is written to or another object is
allocated in it (either a new fixed object, or a young object being
tenured). The grey page table is expanded and shrunk as needed by the
virtual machine.
Drawing an histogram of object sizes shows that there are only a few
sources of large objects on average (i.e., objects greater than a page
in size), but that enough of these objects are created dynamically
that they must be handled specially. Such objects should not be
allocated in NewSpace along with ordinary objects, since they would
fill up NewSpace prematurely (or might not even fit in it), thus
accelerating the scavenging rate, reducing performance and resulting
in an increase in tenured garbage. Even though this is not an optimal
solution because it effectively tenures these objects at the time they
are created, a benefit can be obtained by allocating these objects
directly in FixedSpace. The reason why FixedSpace is used is that
these objects are big enough that they don't result in
fragmentation@footnote{Remember that free pages are shared among the
three heaps, that is, OldSpace, FixedSpace and the @code{malloc}
heap. When a large object is freed, the memory that it used can be
reused by @code{malloc} or by OldSpace allocation}; and using
FixedSpace instead of OldSpace avoids that the compactor copies them
because this would not provide any benefit in terms of reduced
Smalltalk activation records are allocated from another special heap,
the context pool. This is because it is often the case that they
can be deallocated in a Last-In-First-Out (stack) fashion, thereby
saving the work needed to allocate entries in the object table for them,
and quickly reusing the memory that they use. When the activation record
is accessed by Smalltalk, however, the activation record must be turned
into a first-class @code{OOP}@footnote{This is short for @dfn{Ordinary
Object Pointer}.}. Since even these objects are usually very
short-lived, the data is however not copied to the Eden: the eviction
of the object bodies from the context pool is delayed to the next
scavenging, which will also empty the context pool just like it
empties Eden. If few objects are allocated and the context pool turns
full before the Eden, a scavenging is also triggered; this is however
quite rare.
Optionally, @gst{} can avoid the overhead of interpretation by
executing a given Smalltalk method only after that method has been
compiled into the underlying microprocessor's machine code. This
machine-code generation is performed automatically, and the resulting
machine code is then placed in @code{malloc}-managed memory. Once
executed, a method's machine code is left there for subsequent
execution. However, since it would require way too much memory to
permanently house the machine-code version of every Smalltalk method,
methods might be compiled more than once: when a translation is not
used at the time that two garbage collection actions are taken
(scavenges and global garbage collections count equally), the
incremental sweeper discards it, so that it will be recomputed if and
when necessary.
@node Security
@section Security in @gst{}
@node Special objects
@section Special kinds of objects
A few methods in Object support the creation of particular objects.
This include:
@itemize @bullet
finalizable objects
weak and ephemeron objects (i.e. objects whose contents are considered
specially, during the heap scanning phase of garbage collection).
read-only objects (like literals found in methods)
fixed objects (guaranteed not to move across garbage collections)
@end itemize
They are:
@defmethod Object makeWeak
Marks the object so that it is considered weak in subsequent garbage
collection passes. The garbage collector will consider dead an object
which has references only inside weak objects, and will replace
references to such an ``almost-dead'' object with nils, and then
send the @code{mourn} message to the object.
@end defmethod
@defmethod Object makeEphemeron
Marks the object so that it is considered specially in subsequent
garbage collection passes. Ephemeron objects are sent the message
@code{mourn} when the first instance variable is not referenced
or is referenced @emph{only through another instance variable in the
Ephemerons provide a very versatile base on which complex interactions
with the garbage collector can be programmed (for example, finalization
which is described below is implemented with ephemerons).
@end defmethod
@defmethod Object addToBeFinalized
Marks the object so that, as soon as it becomes unreferenced, its
@code{finalize} method is called. Before @code{finalize} is called,
the VM implicitly removes the objects from the list of finalizable
ones. If necessary, the @code{finalize} method can mark again
the object as finalizable, but by default finalization will only occur
Note that a finalizable object is kept in memory even when it has no
references, because tricky finalizers might ``resuscitate'' the object;
automatic marking of the object as not to be finalized has the nice side
effect that the VM can simply delay the releasing of the memory associated
to the object, instead of being forced to waste memory even after
finalization happens.
An object must be explicitly marked as to be finalized @emph{every time
the image is loaded}; that is, finalizability is not preserved by an
image save. This was done because in most cases finalization is used
together with operating system resources that would be stale when the
image is loaded again. For @code{CObject}s, in particular, freeing them
would cause a segmentation violation.
@end defmethod
@defmethod Object removeToBeFinalized
Removes the to-be-finalized mark from the object.
As I noted above, the finalize code for the object does not have to
do this explicitly.
@end defmethod
@defmethod Object finalize
This method is called by the VM when there are no more references to
the object (or, of course, if it only has references inside weak objects).
@end defmethod
@defmethod Object isReadOnly
This method answers whether the VM will refuse to make changes to the
objects when methods like @code{become:}, @code{basicAt:put:},
and possibly @code{at:put:} too (depending on the implementation of the
Note that @gst{} won't try to intercept assignments to fixed
instance variables, nor assignments via @code{instVarAt:put:}. Many
objects (Characters, @code{nil}, @code{true}, @code{false}, method
literals) are read-only by default.
@end defmethod
@defmethod Object makeReadOnly: aBoolean
Changes the read-only or read-write status of the receiver to that
indicated by @code{aBoolean}.
@end defmethod
@defmethod Object basicNewInFixedSpace
Same as @code{#basicNew}, but the object won't move across garbage
@end defmethod
@defmethod Object basicNewInFixedSpace:
Same as @code{#basicNew:}, but the object won't move across garbage
@end defmethod
@defmethod Object makeFixed
Ensure that the receiver won't move across garbage collections.
This can be used either if you decide after its creation that an
object must be fixed, or if a class does not support using @code{#new}
or @code{#new:} to create an object
@end defmethod
Note that, although particular applications will indeed have a need for
fixed, read-only or finalizable objects, the @code{#makeWeak} primitive
is seldom needed and weak objects are normally used only indirectly,
through the so called @dfn{weak collections}. These are easier to use
because they provide additional functionality (for example, @code{WeakArray}
is able to determine whether an item has been garbage collected, and
@code{WeakSet} implements hash table functionality); they are:
@itemize @bullet
@bulletize @code{WeakArray}
@bulletize @code{WeakSet}
@bulletize @code{WeakKeyDictionary}
@bulletize @code{WeakValueLookupTable}
@bulletize @code{WeakIdentitySet}
@bulletize @code{WeakKeyIdentityDictionary}
@bulletize @code{WeakValueIdentityDictionary}
@end itemize
Versions of @gst{} preceding 2.1 included a @code{WeakKeyLookupTable} class
which has been replaced by @code{WeakKeyDictionary}; the usage is completely
identical, but the implementation was changed to use a more efficient
approach based on ephemeron objects.
@node Packages
@chapter Packages
@gst{} includes a packaging system which allows one to file in components
(often called @dfn{goodies} in Smalltalk lore) without caring of whether
they need other goodies to be loaded first.
The packaging system is implemented by a Smalltalk class,
@code{PackageLoader}, which looks for information about packages in
various places:
@item the kernel directory's parent directory; this is where
an installed @file{packages.xml} resides, in a system-wide data
directory such as @file{/usr/local/share/smalltalk};
@item the above directory's @file{site-packages} subdirectory, for
example @file{/usr/local/share/smalltalk/site-packages};
@item in the file @file{.st/packages.xml}, hosting per-user packages;
@item finally, there can be a @file{packages.xml} in the same directory
as the current image.
@end itemize
Each of this directories can contain package descriptions in an
XML file named (guess what) @file{packages.xml}, as well as standalone
packages in files named @file{*.star} (short for @cite{Smalltalk
archive}). Later in this section you will find information about
@command{gst-package}, a program that helps you create @file{.star} files.
There are two ways to load something using the packaging system. The
first way is to use the PackageLoader's @code{fileInPackage:} and
@code{fileInPackages:} methods. For example:
PackageLoader fileInPackages: #('DBD-MySQL' 'DBD-SQLite').
PackageLoader fileInPackage: 'Sockets'.
@end example
The second way is to use the @file{gst-load} script which is installed
together with the virtual machine. For example, you can do:
@t{@ @ @ @ gst-load DBD-MySQL DBD-SQLite DBI}
and @gst{} will automatically file in:
@itemize @bullet
@bulletize DBI, loaded first because it is needed by the other two packages
@bulletize Sockets and Digest, not specified, but needed by DBD-MySQL
@bulletize DBD-MySQL
@bulletize DBD-SQLite
@end itemize
Notice how DBI has already been loaded.
Then it will save the Smalltalk image, and finally exit.
@file{gst-load} supports several options:
@table @option
@item -I
@itemx --image-file
Load the packages inside the given image.
@item -i
@itemx --rebuild-image
Build an image from scratch and load the package into it. Useful
when the image specified with @option{-I} does not exist yet.
@item -q
@itemx --quiet
Hide the script's output.
@item -v
@itemx --verbose
Show which files are loaded, one by one.
@item -f
@itemx --force
If a package given on the command-line is already present, reload it.
This does not apply to automatically selected prerequisites.
@item -t
@itemx --test
Run the package testsuite before installing, and exit with a failure
if the tests fail. Currently, the testsuites are placed in the image
together with the package, but this may change in future versions.
@item -n
@item --dry-run
Do not save the image after loading.
@item --start[=ARG]
Start the services identified by the package. If an argument is
given, only one package can be specified on the command-line. If
at least one package specifies a startup script, @code{gst-load}
won't exit.
@end table
To provide support for this system, you have to give away with your @gst{}
goodies a small file (usually called @file{package.xml}) which looks like
<!-- @i{@r{The @code{prereq} tag identifies packages that
must be loaded before this one.}} -->
<!-- @i{@r{The @code{module} tag loads a dynamic shared object
and calls the @code{gst_initModule} function in it. Modules
can register functions so that Smalltalk code can call them,
and can interact with or manipulate Smalltalk objects.}} -->
<!-- @i{@r{A separate subpackage can be defined for testing purposes.
The @code{SUnit} package is implicitly made a prerequisite of the
testing subpackage, and the default value of @code{namespace}
is the one given for the outer package.}} -->
<!-- @i{@r{Specifies a testing script that @file{gst-sunit} (@pxref{SUnit})
will run in order to test the package. If this is specified outside
the testing subpackage, the package should list @code{SUnit} among
the prerequisites.}} -->
<!-- @i{@r{The @code{filein} tag identifies files that
compose this package and that should be loaded in the
image in this order.}} -->
<!-- @i{@r{The @code{file} tag identifies extra files that
compose this package's distribution.}} -->
@end example
Other tags exist:
@table @code
@item url
Specifies a URL at which a repository for the package can be found.
The repository, when checked out, should contain a @file{package.xml}
file at its root. The contents of this tag are not used for local
packages; they are used when using the @option{--download} option to
@item library
Loads a dynamic shared object and registers the functions in it
so that they can all be called from Smalltalk code. The @code{GTK}
package registers the GTK+ library in this way, so that the
bindings can use them.
@item callout
Instructs to load the package only if the C function whose name is
within the tag is available to be called from Smalltalk code.
@item start
Specifies a Smalltalk script that @file{gst-load} and @file{gst-remote}
will execute in order to start the execution of the service implemented
in the package. Before executing the script, @code{%1} is replaced
with either @code{nil} or a String literal.
@item stop
Specifies a Smalltalk script that @file{gst-remote}
will execute in order to shut down the service implemented
in the package. Before executing the script, @code{%1} is replaced
with either @code{nil} or a String literal.
@item dir
Should include a @code{name} attribute. The @code{file}, @code{filein}
and @code{built-file} tags that are nested within a @code{dir} tag are
prepended with the directory specified by the attribute.
@item test
Specifies a subpackage that is only loaded by @file{gst-sunit} in order
to test the package. The subpackage may include arbitrary tags (including
@code{file}, @code{filein} and @code{sunit}) but not @code{name}.
@item provides
In some cases, a single functionality can be provided by multiple
modules. For example, @gst{} includes two browsers but only one
should be loaded at any time. To this end, a dummy package @code{Browser}
is created pointing to the default browser (@code{VisualGST}), but
both browsers use @code{provides} so that if the old BLOX browser
is in the image, loading @code{Browser} will have no effect.
@end table
To install your package, you only have to do
gst-package path/to/package.xml
@end example
@command{gst-package} is a Smalltalk script which will create
a @file{.star} archive in the current image directory, with the
files specified in the @code{file}, @code{filein} and
@code{built-file} tags. By default the package is
placed in the system-wide package directory; you can use the option
@option{--target-directory} to create the @file{.star} file elsewhere.
Instead of a local @file{package.xml} file, you can give:
@itemize @bullet
a local @file{.star} file or a @code{URL} to such a file. The file
will be downloaded if necessary, and copied to the target directory;
a URL to a @file{package.xml} file. The @code{url} tag in the file
will be used to find a source code repository (@command{git} or
@command{svn}) or as a redirect to another @file{package.xml} file.
@end itemize
There is also a short form for specifying @file{package.xml} file on
@gst{}'s web site, so that the following two commands are equivalent:
gst-package --download Iliad
@end example
When downloading remote @file{package.xml} files, @command{gst-package}
also performs a special check to detect multiple packages in the same
repository. If the following conditions are met:
@itemize @bullet
a package named @code{@var{package}} has a prerequisite
there is a toplevel subdirectory @var{subpackage} in the repository;
the subdirectory has a @file{package.xml} file in it
@end itemize
then the @file{@var{subpackage}/package.xml} will be installed as well.
@command{gst-package} does not check if the file actually defines a
package with the correct name, but this may change in future versions.
Alternatively, @code{gst-package} can be used to create a skeleton
@gnu{} style source tree. This includes a @file{} that will
find the installation path of @gst{}, and a @file{}
to support all the standard Makefile targets (including @command{make
install} and @command{make dist}). To do so, go in the directory that
is to become the top of the source tree and type.
gst-package --prepare path1/package.xml path2/package.xml
@end example
In this case the generated configure script and Makefile will use more
features of @command{gst-package}, which are yet to be documented.
The @gst{} makefile similarly uses @command{gst-package} to install
packages and to prepare the distribution tarballs.
The rest of this chapter discusses some of the packages provided with @gst{}.
* GTK and VisualGST: GUI.
* Parser, STInST, Compiler: Smalltalk-in-Smalltalk.
* DBI: Database.
* I18N: Locales.
* Seaside: Seaside.
* Swazoo: Swazoo.
* SUnit: SUnit.
* Sockets, WebServer, NetClients: Network support.
* XML, XPath, XSL: XML.
* Other packages: Other packages.
@end menu
@node GUI
@section GTK and VisualGST
@gst{} comes with GTK bindings and with a browser based on it. The system
can be started as @command{gst-browser} and will allow the programmer to
view the source code for existing classes, to modify existing classes and
methods, to get detailed information about the classes and methods, and to
evaluate code within the browser. In addition, simple debugging and unit
testing tools are provided. An Inspector window allows the programmer
to graphically inspect and modify the representation of an object and
a walkback inspector was designed which will display a backtrace when
the program encounters an error. SUnit tests (@pxref{SUnit}) can be
run from the browser in order to easily support test driven development.
The Transcript global object is redirected to print to the
transcript window instead of printing to stdout, and the transcript
window as well as the workspaces, unlike the console read-eval-print
loop, support variables that live across multiple evaluations:
a := 2 "Do-it"
a + 2 "Print-it: 4 will be shown"
@end example
To start the browser you can simply type:
@end example
This will load any requested packages, then, if all goes well, a
@emph{launcher} window combining all the basic tools
will appear on your display.
@node Smalltalk-in-Smalltalk
@section The Smalltalk-in-Smalltalk library
The Smalltalk-in-Smalltalk library is a set of classes for looking at
Smalltalk code, constructing models of Smalltalk classes that can later
be created for real, analyzing and performing changes to the image,
finding smelly code and automatically doing repetitive changes.
This package incredibly enhances the reflective capabilities of Smalltalk.
Being quite big (20000 source code lines) this package is split into
three different packages: @code{Parser} loads the parser only,
@code{STInST} loads various other tools (which compose the
``Refactoring Browser'' package by John Brant and Don Roberts and
will be the foundation for @gst{}'s next generation browser),
@code{STInSTTest} performs comprehensive unit tests@footnote{
The tests can take @strong{hours} to complete!}
(@pxref{SUnit}). Porting of the @code{STInST} package will be
completed in @gst{} 2.2.
@end ignore
A fundamental part of the system is the recursive-descent parser which
creates parse nodes in the form of instances of subclasses of
The parser's extreme flexibility can be exploited in three ways, all of
which are demonstrated by source code available in the distribution:
@itemize @bullet
First, actions are not hard-coded in the parser itself: the parser
creates a parse tree, then hands it to methods in @code{RBParser} that
can be overridden in different @code{RBParser} subclasses. This is done
by the compiler itself, in which a subclass of @code{RBParser} (class
@code{STFileInParser}) hands the parse trees to the @code{STCompiler}
Second, an implementation of the ``visitor'' pattern is provided to help
in dealing with parse trees created along the way; this approach is
demonstrated by the Smalltalk code pretty-printer in class
@code{RBFormatter}, by the syntax highlighting engine included
with the browser, and by the compiler.
The parser is able to perform complex tree searches and rewrites,
through the ParseTreeSearcher and ParseTreeRewriter classes.
This mechanism is exploited by most of the tools loaded by the
@code{STInST} package.
@end ignore
@end itemize
In addition, two applications were created on top of this library
which are specific to @gst{}. The first is a compiler for Smalltalk
methods written in Smalltalk itself, whose source code provides good
insights into the @gst{} virtual machine.
The second is the automatic documentation extractor. @code{gst-doc} is able
to create documentation even if the library cannot be loaded (for example,
if loading it requires a running X server). To do so it uses
@code{STClassLoader} from the @file{Parser} package to load and interpret
Smalltalk source code, creating objects for the classes and methods being
read in; then, polymorphism allows one to treat these exactly like usual
@node Database
@section Database connectivity
@gst{} includes support for connecting to databases. Currently this
support is limited to retrieving result sets from @acronym{SQL} selection
queries and executing @acronym{SQL} data manipulation queries; in the
future however a full object model will be available that hides the
usage of @acronym{SQL}.
Classes that are independent of the database management system that is
in use reside in package @code{DBI}, while the drivers proper reside
in separate packages which have @code{DBI} as a prerequisite; currently,
drivers are supplied for @emph{MySQL} and @emph{PostgreSQL}, in packages
@code{DBD-MySQL} and @code{DBD-PostgreSQL} respectively.
Using the library is fairly simple. To execute a query you need to
create a connection to the database, create a statement on the connection,
and execute your query. For example, let's say I want to connect to the
@file{test} database on the localhost. My user name is @code{doe} and
my password is @code{mypass}.
| connection statement result |
connection := DBI.Connection
connect: 'dbi:MySQL:dbname=test;hostname=localhost'
user: 'doe'
password: 'mypass').
@end example
You can see that the @acronym{DBMS}-specific classes live in a sub-namespace
of @code{DBI}, while @acronym{DBMS}-independent classes live in @code{DBI}.
Here is how I execute a query.
statement := connection execute: 'insert into aTable (aField) values (123)'.
@end example
The result that is returned is a @code{ResultSet}. For write queries
the object returns the number of rows affected. For read queries (such
as selection queries) the result set supports standard stream protocol
(@code{next}, @code{atEnd} to read rows off the result stream) and
can also supply collection of column information. These are
instances of @code{ColumnInfo}) and describe the type, size, and
other characteristics of the returned column.
A common usage of a ResultSet would be:
| resultSet values |
[resultSet atEnd] whileFalse: [values add: (resultSet next at: 'columnName') ].
@end example
@node Locales
@section Internationalization and localization support
Different countries and cultures have varying conventions for how to
communicate. These conventions range from very simple ones, such as the
format for representing dates and times, to very complex ones, such as
the language spoken. Provided the programs are written to obey the
choice of conventions, they will follow the conventions preferred by the
user. @gst{} provides two packages to ease you in doing so.
The @code{I18N} package covers both @dfn{internationalization} and
@dfn{multilingualization}; the lighter-weight @code{Iconv} package
covers only the latter, as it is a prerequisite for correct
@dfn{Multilingualizing} software means programming it to be able to
support languages from every part of the world. In particular, it
includes understanding multi-byte character sets (such as UTF-8)
and Unicode characters whose @dfn{code point} (the equivalent of the
ASCII value) is above 127. To this end, @gst{} provides the
@code{UnicodeString} class that stores its data as 32-bit Unicode
values. In addition, @code{Character} will provide support for
all the over one million available code points in Unicode.
Loading the @code{I18N} package improves this support through
the @code{EncodedStream} class@footnote{All
the classes mentioned in this section reside in the
@code{I18N} namespace.}, which interprets and transcodes
non-ASCII Unicode characters. This support is mostly transparent,
because the base classes @code{Character}, @code{UnicodeCharacter}
and @code{UnicodeString} are enhanced to use it. Sending @code{asString}
or @code{printString} to an instance of @code{Character} and
@code{UnicodeString} will convert Unicode characters so that they
are printed correctly in the current locale. For example,
@samp{$<279> printNl} will print a small Latin letter @samp{e} with
a dot above, when the @code{I18N} package is loaded.
Dually, you can convert @code{String} or @code{ByteArray} objects to
Unicode with a single method call. If the current locale's encoding is
UTF-8, @samp{#[196 151] asUnicodeString} will return a Unicode string
with the same character as above, the small Latin letter @samp{e} with
a dot above.
The implementation of multilingualization support is not yet
complete. For example, methods such as @code{asLowercase},
@code{asUppercase}, @code{isLetter} do not yet recognize Unicode
You need to exercise some care, or your program will be buggy when
Unicode characters are used. In particular, Characters must
@strong{not} be compared with @code{==}@footnote{Character equality
with @code{=} will be as fast as with @code{==}.} and should
be printed on a Stream with @code{display:} rather than
Also, Characters need to be created with
the class method @code{codePoint:} if you are referring to their
Unicode value; @code{codePoint:} is also the only method to create
characters that is accepted by the ANSI Standard for Smalltalk.
The method @code{value:}, instead, should be used if you are referring
to a byte in a particular encoding. This subtle difference means
that, for example, the last two of the following examples will fail:
"Correct. Use #value: with Strings, #codePoint: with UnicodeString."
String with: (Character value: 65)
String with: (Character value: 128)
UnicodeString with: (Character codePoint: 65)
UnicodeString with: (Character codePoint: 128)
"Correct. Only works for characters in the 0-127 range, which may
be considered as defensive programming."
String with: (Character codePoint: 65)
"Dubious, and only works for characters in the 0-127 range. With
UnicodeString, probably you always want #codePoint:."
UnicodeString with: (Character value: 65)
"Fails, we try to use a high character in a String"
String with: (Character codePoint: 128)
"Fails, we try to use an encoding in a Unicode string"
UnicodeString with: (Character value: 128)
@end example
@dfn{Internationalizing} software, instead, means programming it to be able to
adapt to the user's favorite conventions. These conventions can get
pretty complex; for example, the user might specify the locale
`espana-castellano' for most purposes, but specify the locale
`usa-english' for currency formatting: this might make sense if the user
is a Spanish-speaking American, working in Spanish, but representing
monetary amounts in US dollars. You can see that this system is simple
but, at the same time, very complete. This manual, however, is not the
right place for a thorough discussion of how an user would set up his
system for these conventions; for more information, refer to your
operating system's manual or to the @gnu{} C library's manual.
@gst{} inherits from @sc{iso} C the concept of a @dfn{locale}, that is, a
collection of conventions, one convention for each purpose, and maps each of
these purposes to a Smalltalk class defined by the @code{I18N} package, and
these classes form a small hierarchy with class @code{Locale} as its roots:
@itemize @bullet
@code{LcCollate} defines the collating sequence for the local language and
character set.
@end ignore
@code{LcNumeric} formats numbers; @code{LcMonetary} and @code{LcMonetaryISO}
format currency amounts.
@code{LcTime} formats dates and times.
@code{LcMessages} translates your program's output. Of course, the
package can't automatically translate your program's output messages
into other languages; the only way you can support output in the user's
favorite language is to translate these messages by hand. The package
does, though, provide methods to easily handle translations into
multiple languages.
@end itemize
Basic usage of the @code{I18N} package involves a single selector, the
question mark (@code{?}), which is a rarely used yet valid character for
a Smalltalk binary message. The meaning of the question mark selector
is ``How do you say @dots{} under your convention?''. You can send
@code{?} to either a specific instance of a subclass of @code{Locale},
or to the class itself; in this case, rules for the default locale
(which is specified via environment variables) apply. You might say,
for example, @code{LcTime ? Date today} or, for example,
@code{germanMonetaryLocale ? account balance}. This syntax can be at
first confusing, but turns out to be convenient because of its
consistency and overall simplicity.
Here is how @code{?} works for different classes:
@defmethod LcCollate ? aString
Answer an instance of LcCollationKey; code like
@code{LcCollate ? string1 < string2} will compare
the two strings under the rules of the default locale
@end defmethod
@end ignore
@defmethod LcTime ? aString
Format a date, a time or a timestamp (@code{DateTime}
@end defmethod
@defmethod LcNumber ? aString
Format a number.
@end defmethod
@defmethod LcMonetary ? aString
Format a monetary value together with its currency symbol.
@end defmethod
@defmethod LcMonetaryISO ? aString
Format a monetary value together with its @sc{iso} currency symbol.
@end defmethod
@defmethod LcMessages ? aString
Answer an @code{LcMessagesDomain} that retrieves translations
from the specified file.
@end defmethod
@defmethod LcMessagesDomain ? aString
Retrieve the translation of the given string.@footnote{The @code{?} method
does not apply to the LcMessagesDomain class itself, but only to its
instances. This is because LcMessagesDomain is not a subclass of
@end defmethod
These two packages provides much more functionality, including more
advanced formatting options support for Unicode, and conversion to and
from several character sets. For more information, refer to
@ref{I18N, , Multilingual and international support with Iconv and I18N,
gst-libs, the @gst{} Library Reference}.
As an aside, the representation of locales that the package uses is
exactly the same as the C library, which has many advantages: the burden
of mantaining locale data is removed from @gst{}'s mantainers; the need
of having two copies of the same data is removed from @gst{}'s users;
and finally, uniformity of the conventions assumed by different
internationalized programs is guaranteed to the end user.
In addition, the representation of translated strings is the standard
@sc{mo} file format adopted by the @gnu{} @code{gettext} library.
@node Seaside
@section The Seaside web framework
Seaside is a framework to build highly interactive web applications
quickly, reusably and maintainably. Features of Seaside include
callback-based request handling, hierarchical (component-based)
page design, and modal session management to easily implement
complex workflows.
A simple Seaside component looks like this:
Seaside.WAComponent subclass: MyCounter [
| count |
MyCounter class >> canBeRoot [ ^true ]
initialize [
super initialize.
count := 0.
states [ ^@{ self @} ]
renderContentOn: html [
html heading: count.
html anchor callback: [ count := count + 1 ]; with: '++'.
html space.
html anchor callback: [ count := count - 1 ]; with: '--'.
MyCounter registerAsApplication: 'mycounter'
@end example
Most of the time, you will run Seaside in a background virtual machine.
First of all, you should load the Seaside packages into a new image like
$ gst-load -iI Seaside Seaside-Development Seaside-Examples
@end example
Then, you can start Seaside with either of these commands
$ gst-load -I --start Seaside
$ gst-remote -I --daemon --start=Seaside
@end example
which will start serving pages at @url{http://localhost:8080/seaside}.
The former starts the server in foreground, the latter instead runs a
virtual machine that you can control using further invocations of
@command{gst-remote}. For example, you can stop serving Seaside
pages, and bring down the server, respectively with these commands:
$ gst-remote --stop=Seaside
$ gst-remote --kill
@end example
@node Swazoo
@section The Swazoo web server
Swazoo (Smalltalk Web Application Zoo) is a free Smalltalk HTTP server
supporting both static web serving and a fully-featured web request
resolution framework.
The server can be started using
$ gst-load --start@i{[=@var{ARG}]} Swazoo
@end example
or loaded into a background @gst{} virtual machine with
$ gst-remote --start=Swazoo@i{[:@var{ARG}]}
@end example
Usually, the first time you start Swazoo @var{ARG} is @code{swazoodemo}
(which starts a simple ``Hello, World!'' servlet) or a path to a
configuration file like this one:
<Site name: 'hello'; port: 8080>
<CompositeResource uriPattern: ''/''>
<HelloWorldResource uriPattern: ''hello.html''>
@end example
After this initial step, @var{ARG} can take the following meanings:
@itemize @bullet
@item if omitted altogether, all the sites registered on the server
are started;
@item if a number, all the sites registered on the server
on that port are started;
@item if a configuration file name, the server configuration is
@emph{replaced} with the one loaded from that file;
@item if any other string, the site named @var{ARG} is started.
@end itemize
In addition, a background server can be stopped using
$ gst-remote --stop=Swazoo@i{[:@var{ARG}]}
@end example
where @var{ARG} can have the same meanings, except for being a
configuration file.
In addition, package @code{WebServer} implements an older web server
engine which is now superseded by Swazoo. It is based on the @sc{gpl}'ed
WikiWorks project. Apart from porting to @gst{}, a number of changes were
made to the code, including refactoring of classes, better aesthetics,
authentication support, virtual hosting, and @sc{http} 1.1 compliance.
@node SUnit
@section The SUnit testing package
@code{SUnit} is a framework to write and perform test cases in Smalltalk,
originarily written by the father of Extreme Programming@footnote{Extreme
Programming is a software engineering technique that focuses on
team work (to the point that a programmer looks in real-time at
what another one is typing), frequent testing of the program,
and incremental design.},
Kent Beck. @code{SUnit} allows one to write the tests and check
results in Smalltalk; while this approach has the disadvantage that
testers need to be able to write simple Smalltalk programs, the
resulting tests are very stable.
What follows is a description of the philosophy of @code{SUnit} and
a description of its usage, excerpted from Kent Beck's paper in which
he describes @code{SUnit}.
@subsection Where should you start?
Testing is one of those impossible tasks. You'd like to be absolutely
complete, so you can be sure the software will work. On the other hand,
the number of possible states of your program is so large that you can't
possibly test all combinations.
If you start with a vague idea of what you'll be testing, you'll never
get started. Far better to @emph{start with a single configuration whose
behavior is predictable}. As you get more experience with your software,
you will be able to add to the list of configurations.
Such a configuration is called a @dfn{fixture}. Two example fixtures
for testing Floats can be @code{1.0} and @code{2.0}; two fixtures for
testing Arrays can be @code{#()} and @code{#(1 2 3)}.
By choosing a fixture you are saying what you will and won't test for. A
complete set of tests for a community of objects will have many
fixtures, each of which will be tested many ways.
To design a test fixture you have to
@bulletize{Subclass TestCase}
@bulletize{Add an instance variable for each known object in the fixture}
@bulletize{Override setUp to initialize the variables}
@end itemize
@subsection How do you represent a single unit of testing?
You can predict the results of sending a message to a fixture. You need
to represent such a predictable situation somehow. The simplest way to
represent this is interactively. You open an Inspector on your fixture
and you start sending it messages. There are two drawbacks to this
method. First, you keep sending messages to the same fixture. If a test
happens to mess that object up, all subsequent tests will fail, even
though the code may be correct.
More importantly, though, you can't easily communicate interactive tests
to others. If you give someone else your objects, the only way they have
of testing them is to have you come and inspect them.
By representing each predictable situation as an object, each with its
own fixture, no two tests will ever interfere. Also, you can easily give
tests to others to run. @emph{Represent a predictable reaction of a
fixture as a method.} Add a method to TestCase subclass, and stimulate
the fixture in the method.
@subsection How do you test for expected results?
If you're testing interactively, you check for expected results
directly, by printing and inspecting your objects. Since tests are in
their own objects, you need a way to programmatically look for
problems. One way to accomplish this is to use the standard error
handling mechanism (@code{#error:}) with testing logic to signal errors:
2 + 3 = 5 ifFalse: [self error: 'Wrong answer']
@end example
When you're testing, you'd like to distinguish between errors you are
checking for, like getting six as the sum of two and three, and errors
you didn't anticipate, like subscripts being out of bounds or messages
not being understood.
There's not a lot you can do about unanticipated errors (if you did
something about them, they wouldn't be unanticipated any more, would
they?) When a catastrophic error occurs, the framework stops running the
test case, records the error, and runs the next test case. Since each
test case has its own fixture, the error in the previous case will not
affect the next.
The testing framework makes checking for expected values simple by
providing a method, @code{#should:}, that takes a Block as an argument.
If the Block evaluates to true, everything is fine. Otherwise, the test
case stops running, the failure is recorded, and the next test case
So, you have to @emph{turn checks into a Block evaluating to a Boolean,
and send the Block as the parameter to @code{#should:}}.
In the example, after stimulating the fixture by adding an object to an
empty Set, we want to check and make sure it's in there:
empty add: 5.
self should: [empty includes: 5]
@end example
There is a variant on
@code{TestCase>>#should:}. @code{TestCase>>#shouldnt:} causes the test
case to fail if the Block argument evaluates to true. It is there so you
don't have to use @code{(...) not}.
Once you have a test case this far, you can run it. Create an instance
of your TestCase subclass, giving it the selector of the testing
method. Send @code{run} to the resulting object:
(SetTestCase selector: #testAdd) run
@end example
If it runs to completion, the test worked. If you get a walkback,
something went wrong.
@subsection How do you collect and run many different test cases?
As soon as you have two test cases running, you'll want to run them both
one after the other without having to execute two do it's. You could
just string together a bunch of expressions to create and run test
cases. However, when you then wanted to run ``this bunch of cases and
that bunch of cases'' you'd be stuck.
The testing framework provides an object to represent @dfn{a bunch of
tests}, @code{TestSuite}. A @code{TestSuite} runs a collection of test
cases and reports their results all at once. Taking advantage of
polymorphism, @code{TestSuites} can also contain other
@code{TestSuites}, so you can put Joe's tests and Tammy's tests together
by creating a higher level suite. @emph{Combine test cases into a test
(TestSuite named: 'Money')
add: (MoneyTestCase selector: #testAdd);
add: (MoneyTestCase selector: #testSubtract);
@end example
The result of sending @code{#run} to a @code{TestSuite} is a
@code{TestResult} object. It records all the test cases that caused
failures or errors, and the time at which the suite was run.
All of these objects are suitable for being stored in the image and
retrieved. You can easily store a suite, then bring it in and run it,
comparing results with previous runs.
@subsection Running testsuites from the command line
@gst{} includes a Smalltalk script to simplify running SUnit test suites.
It is called @command{gst-sunit}. The command-line to @command{gst-sunit}
specifies the packages, files and classes to test:
@table @option
@item -I
@itemx --image-file
Run tests inside the given image.
@item -q
@itemx --quiet
Hide the program's output. The results are still communicated with the
program's exit code.
@item -v
@itemx --verbose
Be more verbose, in particular this will cause @command{gst-sunit} to write
which test is currently being executed.
@item -f @var{FILE}
@itemx --file=@var{FILE}
Load @var{FILE} before running the required test cases.
@item -p @var{PACKAGE}
@item --package=@var{PACKAGE}
Load @var{PACKAGE} and its dependencies, and add @var{PACKAGE}'s tests to
the set of test cases to run.
@item @var{CLASS}
@itemx @var{CLASS}*
Add @var{CLASS} to the set of test cases to run. An asterisk after the class
name adds all the classes in @var{CLASS}'s hierarchy. In particular,
each selector whose name starts with @code{test} constitutes a separate
test case.
@item @var{VAR}=@var{VALUE}
Associate variable @var{VAR} with a value. Variables allow customization
of the testing environment. For example, the username with which to access
a database can be specified with variables. From within a test, variables
are accessible with code like this:
TestSuitesScripter variableAt: 'mysqluser' ifAbsent: [ 'root' ]
@end example
Note that a @code{#variableAt:} variant does @emph{not} exist, because
the testsuite should pick default values in case the variables are
not specified by the user.
@end table
@node Network support
@section Sockets, WebServer, NetClients
@gst{} includes an almost complete abstraction of the @sc{tcp}, @sc{udp}
and @sc{ip} protocols. Although based on the standard @sc{bsd} sockets,
this library provides facilities such as buffering and preemptive I/O
which a C programmer usually has to implement manually.
The distribution includes a few tests (mostly loopback tests that
demonstrate both client and server connection), which are class methods
in @code{Socket}. This code should guide you in the process of creating
and using both server and client sockets; after creation, sockets behave
practically the same as standard Smalltalk streams, so you should not
have particular problems. For more information, refer to @ref{Sockets, ,
Network programming with Sockets, gst-libs, the @gst{} Library Reference}.
The library is also used by many other packages, including Swazoo
and the MySQL driver.
There is also code implementing the most popular Internet protocols:
@sc{ftp}, @sc{http}, @sc{nntp}, @sc{smtp}, @sc{pop3} and @sc{imap}.
These classes, loaded by the @code{NetClients} package, are derived
from multiple public domain and free software packages available for
other Smalltalk dialects and ported to @gst{}. Future version of
@gst{} will include documentation for these as well.
@node XML
@section An XML parser and object model for @gst{}
The @sc{xml} parser library for Smalltalk, loaded as package @code{XML}
includes a validating @sc{xml} parser and Document Object Model.
This library is rapidly becoming a standard in the Smalltalk world
and a @sc{xslr} interpreter based on it is bundled with @gst{} as
well (see packages @code{XPath} and @code{XSL}).
Parts of the basic XML package can be loaded independently using packages
@code{XML-DOM}, @code{XML-SAXParser}, @code{XML-XMLParser},
@code{XML-SAXDriver}, @code{XML-XMLNodeBuilder}.
@node Other packages
@section Other packages
Various other ``minor'' packages are provided, typically as examples of
writing modules for @gst{} (@pxref{External modules, , Linking your
libraries to the virtual machine}). These include:
@table @i
@item Complex
which adds transparent operations with complex numbers
@item @sc{gdbm}
which is an interface to the @gnu{} database manager
@item Digest
which provides two easy to use classes to quickly compute
cryptographically strong hash values using the MD5 and SHA1
@item NCurses
which provides bindings to @i{ncurses}
@item Continuations
which provides more examples and tests for continuations (an
advanced feature to support complex control flow).
@item DebugTools
which provides a way to attach to another Smalltalk process
and execute it a bytecode or a method at a time.
@end table
@node Emacs
@chapter Smalltalk interface for @gnu{} Emacs
@gst{} comes with its own Emacs mode for hacking Smalltalk
code. It also provides tools for interacting with a running Smalltalk
system in an Emacs subwindow.
Emacs will automatically go into Smalltalk mode when you edit a
Smalltalk file (one with the extension @file{.st}).
* Editing:: Autoindent and more for @gst{}.
* Interactor:: Smalltalk interactor mode.
@end menu
@node Editing
@section Smalltalk editing mode
The @gst{} editing mode is there to assist you in editing your
Smalltalk code. It tries to be smart about indentation and provides
a few cooked templates to save you keystrokes.
Since Smalltalk syntax is highly context sensitive,
the Smalltalk editing mode will occasionally get confused when you are
editing expressions instead of method definitions. In particular,
using local variables, thus:
| foo |
foo := 3.
^foo squared !
@end example
will confuse the Smalltalk editing mode, as this might also be a
definition the binary operator @code{|}, with second argument called
@samp{foo}. If you find yourself confused when editing this type of
expression, put a dummy method name before the start of the expression,
and take it out when you're done editing, thus:
| foo |
foo := 3.
^foo squared !
@end example
@node Interactor
@section Smalltalk interactor mode
An interesting feature of Emacs Smalltalk is the Smalltalk interactor,
which basically allows you run in @gnu{} Emacs with Smalltalk files in one
window, and Smalltalk in the other. You can, with a single command, edit
and change method definitions in the live Smalltalk system, evaluate
expressions, make image snapshots of the system so you can pick up where
you left off, file in an entire Smalltalk file, etc. It makes a tremendous
difference in the productivity and enjoyment that you'll have when using
To start up the Smalltalk interactor, you must be running @gnu{} Emacs
and in a buffer that's in Smalltalk mode. Then, if you type @kbd{C-c m}.
A second window will appear with @gst{} running in it.
This window is in most respects like a Shell mode window. You can type
Smalltalk expressions to it directly and re-execute previous things
in the window by moving the cursor back to the line that contains
the expression that you wish to re-execute and typing return.
Notice the status in the mode line (e.g. @samp{starting-up},
@samp{idle}, etc). This status will change when you issue various
commands from Smalltalk mode.
When you first fire up the Smalltalk interactor, it puts you in the
window in which Smalltalk is running. You'll want to switch
back to the window with your file in it to explore the rest of the
interactor mode, so do it now.
To execute a range of code, mark the region around and type
@kbd{C-c e}. The expression in the region is sent to Smalltalk
and evaluated. The status will change to indicate that the
expression is executing. This will work for any region that you
create. If the region does not end with an exclamation point (which is
syntactically required by Smalltalk), one will be added for you.
There is also a shortcut, @kbd{C-c d} (also invokeable as
@kbd{M-x smalltalk-doit}), which uses a simple heuristic to
figure out the start and end of the expression: it searches forward
for a line that begins with an exclamation point, and backward for
a line that does not begin with space, tab, or the comment
character, and sends all the text in between to Smalltalk.
If you provide a prefix argument (by typing @kbd{C-u C-c d} for
instance), it will bypass the heuristic and use the region instead
(just like @kbd{C-c e} does).
@kbd{C-c c} will compile a method; it uses a similar heuristic to
determine the bounds of the method definition. Typically, you'll
change a method definition, type @kbd{C-c c} and move on to
whatever's next. If you want to compile a whole bunch of method
definitions, you'll have to mark the entire set of method
definitions (from the @code{methodsFor:} line to the
@code{! !}) as the region and use @kbd{C-c e}.
After you've compiled and executed some expressions, you may want to
take a snapshot of your work so that you don't have to re-do things
next time you fire up Smalltalk. To do this, you use the @kbd{C-c s}
command, which invokes @code{ObjectMemory snapshot}.
If you invoke this command with a prefix argument, you can specify
a different name for the image file, and you can have that image file
loaded instead of the default one by using the @code{-I} flag on the
command line when invoking Smalltalk.
You can also evaluate an expression and have the result of the
evaluation printed by using the @kbd{C-c p} command. Mark the region
and use the command.
To file in an entire file (perhaps the one that you currently have in
the buffer that you are working on), type @kbd{C-c f}. You can type
the name of a file to load at the prompt, or just type return and
the file associated with the current buffer will be loaded into Smalltalk.
When you're ready to quit using @gst{}, you can quit cleanly by using
the @kbd{C-c q} command. If you want to fire up Smalltalk again, or
if (heaven forbid) Smalltalk dies on you, you can use the @kbd{C-c m}
command, and Smalltalk will be reincarnated. Even if it's running,
but the Smalltalk window is not visible, @kbd{C-c m} will cause it
to be displayed right away.
You might notice that as you use this mode, the Smalltalk window will scroll
to keep the bottom of the buffer in focus, even when the Smalltalk
window is not the current window. This was a design choice that I
made to see how it would work. On the whole, I guess I'm pretty happy
with it, but I am interested in hearing your opinions on the subject.
@node C and Smalltalk
@chapter Interoperability between C and @gst{}
* External modules:: Linking your libraries to the virtual machine
* C callout:: Calls from Smalltalk to C
* C data types:: Manipulating C data from Smalltalk
* Smalltalk types:: Manipulating Smalltalk data from C
* Smalltalk callin:: Calls from C to Smalltalk
* Smalltalk callbacks:: Smalltalk blocks as C function pointers
* Object representation:: Manipulating your own Smalltalk objects
* Incubator:: Protecting newly created objects from garbage
* Other C functions:: Handling and creating OOPs
* Using Smalltalk:: The Smalltalk environment as an extension library
@end menu
@node External modules
@section Linking your libraries to the virtual machine
A nice thing you can do with @gst{} is enhancing it with your own
goodies. If they're written in Smalltalk only, no problem: getting them
to work as packages (@pxref{Packages}), and to fit in with the @gst{}
packaging system, is likely to be a five-minutes task.
If your goodie is creating a binding to an external C library and you do
not need particular glue to link it to Smalltalk (for example, there are
no callbacks from C code to Smalltalk code), you can use the @code{dynamic
library linking} system. When using this system, you have to link @gst{}
with the library at run-time using @sc{dld}, using either
@code{DLD class>>#addLibrary:} or a @code{<library>} tag in a
@file{package.xml} file (@pxref{Packages}). The following line:
DLD addLibrary: 'libc'
@end example
is often used to use the standard C library functions from Smalltalk.
However, if you want to provide a more intimate link between C and Smalltalk,
as is the case with for example the GTK bindings, you should use the @code{dynamic module
linking} system. This section explains what to do, taking the Digest
library as a guide.
A module is distinguished from a standard shared library because it has
a function which Smalltalk calls to initialize the module; the name of
this function must be @code{gst_initModule}. Here is the initialization
function used by Digest:
VMProxy *proxy;
vmProxy = proxy;
vmProxy->defineCFunc ("MD5AllocOOP", MD5AllocOOP);
vmProxy->defineCFunc ("MD5Update", md5_process_bytes);
vmProxy->defineCFunc ("MD5Final", md5_finish_ctx);
vmProxy->defineCFunc ("SHA1AllocOOP", SHA1AllocOOP);
vmProxy->defineCFunc ("SHA1Update", sha1_process_bytes);
vmProxy->defineCFunc ("SHA1Final", sha1_finish_ctx);
@end example
Note that the @code{defineCFunc} function is called through a function
pointer in @code{gst_initModule}, and that the value of its parameter
is saved in order to use it elsewhere in its code. This is not strictly
necessary on many platforms, namely those where the module is
effectively @emph{linked with the Smalltalk virtual machine} at
run-time; but since some@footnote{The most notable are @sc{aix} and
Windows.} cannot obtain this, for maximum portability you must always
call the virtual machine through the proxy and never refer to any symbol
which the virtual machine exports. For uniformity, even programs that
link with @file{libgst.a} should not call these functions directly, but
through a @code{VMProxy} exported by @file{libgst.a} and accessible
through the @code{gst_interpreter_proxy} variable.
Modules are shared libraries; the default directory in which modules
are searched for is stored in a @file{gnu-smalltalk.pc} file that is
installed by @gst{} so that it can be used with @command{pkg-config}.
An Autoconf macro @code{AM_PATH_GST} is also installed that will put the
directory in the @code{gstmoduledir} Autoconf substitution. When using
@gnu{} Automake and Libtool, you can then build modules by including
something like this in @file{}:
gstmodule_LTLIBRARIES =
libdigest_la_LDFLAGS = -module -no-undefined @dfn{... more flags ...}
libdigest_la_SOURCES = @dfn{... your source files ...}
@end example
While you can use @code{DLD class>>#addModule:} to link a module into
the virtual machine at run time, usually bindings that require a module
are complex enough to be packaged as @file{.star} files. In this case,
you will have to add the name of the module in a package file
(@pxref{Packages}). In this case, the relevant entry in the file will be
<sunit>MD5Test SHA1Test</sunit>
@end example
There is also a third case, in which the bindings are a mixture of
code written specially for @gst{}, and the normal C library. In this
case, you can use a combination of dynamic shared libraries and dynamic
To do this, you can specify both @code{<library>} and @code{<module>} tags
in the @file{package.xml} file; alternatively, the following functions
allow you to call @code{DLD class>>#addLibrary:} from within a module.
@deftypefun mst_Boolean dlOpen (void *filename, int module)
Open the library pointed to by with @var{filename} (which need not include
an extension), and invoke gst_initModule if it is found in the library.
If @var{module} is false, add the file to the list of libraries that
Smalltalk searches for external symbols.
Return true if the library was found.
@end deftypefun
@deftypefun void dlAddSearchDir (const char *dir)
Add @var{dir} at the beginning of the search path of @code{dlOpen}.
@end deftypefun
@deftypefun void dlPushSearchPath (void)
Save the current value of the search path for @code{dlOpen}. This can be
used to temporarily add the search path for the libraries added by a
module, without affecting subsequent libraries manually opened with the
@code{DLD} class.
@end deftypefun
@deftypefun void dlPopSearchPath (void)
Restore the last saved value of the search path.
@end deftypefun
@node C callout
@section Using the C callout mechanism
To use the C callout mechanism, you first need to inform Smalltalk about
the C functions that you wish to call. You currently need to do this in
two places: 1) you need to establish the mapping between your C
function's address and the name that you wish to refer to it by, and 2)
define that function along with how the argument objects should be
mapped to C data types to the Smalltalk interpreter. As an example, let
us use the pre-defined (to @gst{}) functions of @code{system} and
First, the mapping between these functions and string names for the
functions needs to be established in your module. If you are writing an
external Smalltalk module (which can look at Smalltalk objects and
manipulate them), see @ref{External modules, , Linking your libraries
to the virtual machine}; if you are using function from a dynamically
loaded library, see @ref{Dynamic loading}.
Second, we need to define a method that will invoke these C functions
and describe its arguments to the Smalltalk runtime system. Such a
method is defined with a primitive-like syntax, similar to the
following example (taken from @file{kernel/})
system: aString
<cCall: 'system' returning: #int args: #(#string)>
getenv: aString
<cCall: 'getenv' returning: #string args: #(#string)>
@end example
These methods were defined on class @code{SystemDictionary}, so
that we would invoke it thus:
Smalltalk system: 'lpr README' !
@end example
However, there is no special significance to which class receives the
method; it could have just as well been Float, but it might look kind of
strange to see:
1701.0 system: 'mail' !
@end example
The various keyword arguments are described below.
@table @b
@item @code{cCall: 'system'}
This says that we are defining the C function @code{system}. This name
must be @strong{exactly} the same as the string passed to
The name of the method does not have to match the name of the C function;
we could have just as easily defined the selector to be @code{'rambo:
fooFoo'}; it's just good practice to define the method with a similar
name and the argument names to reflect the data types that should be
@item @code{returning: #int}
This defines the C data type that will be returned. It is converted to
the corresponding Smalltalk data type. The set of valid return types
@table @code
@item char
Single C character value
@item string
A C char *, converted to a Smalltalk string
@item stringOut
A C char *, converted to a Smalltalk string and then freed.
@item symbol
A C char *, converted to a Smalltalk symbol
@item symbolOut
A C char *, converted to a Smalltalk symbol and then freed.
@item int
A C int value
@item uInt
A C unsigned int value
@item long
A C long value
@item uLong
A C unsigned long value
@item double
A C double, converted to an instance of FloatD
@item longDouble
A C long double, converted to an instance of FloatQ
@item void
No returned value (@code{self} returned from Smalltalk)
@item wchar
Single C wide character (@code{wchar_t}) value
@item wstring
Wide C string (@code{wchar_t *}), converted to a UnicodeString
@item wstringOut
Wide C string (@code{wchar_t *}), converted to a UnicodeString and then freed
@item cObject
An anonymous C pointer; useful to pass back to some C function later
@item smalltalk
An anonymous (to C) Smalltalk object pointer; should have been passed to
C at some point in the past or created by the program by calling other
public @gst{} functions (@pxref{Smalltalk types}).
@item @var{ctype}
You can pass an instance of CType or one of its subclasses (@pxref{C
data types}). In this case the object will be sent @code{#narrow}
before being returned: an example of this feature is given in the
experimental Gtk+ bindings.
@end table
@item @code{args: #(#string)}
This is an array of symbols that describes the types of the arguments in
order. For example, to specify a call to open(2), the arguments might
look something like:
args: #(#string #int #int)
@end example
The following argument types are supported; see above for details.
@table @code
@item unknown
Smalltalk will make the best conversion that it can guess for this
object; see the mapping table below
@item boolean
passed as @code{char}, which is promoted to @code{int}
@item char
passed as @code{char}, which is promoted to @code{int}
@item wchar
passed as @code{wchar_t}
@item string
passed as @code{char *}
@item byteArrayOut
passed as @code{char *}. The contents are expected to be overwritten
with a new C string, and copied back to the object that was passed
on return from the C function
@item stringOut
passed as @code{char *}, the contents are expected to be overwritten
with a new C string, and the object that was passed becomes the new
string on return
@item wstring
passed as @code{wchar_t *}
@item wstringOut
passed as @code{wchar_t *}, the contents are expected to be overwritten
with a new C wide string, and the object that was passed becomes the new
string on return
@item symbol
passed as @code{char *}
@item byteArray
passed as @code{char *}, even though may contain NUL's
@item int
passed as @code{int}
@item uInt
passed as @code{unsigned int}
@item long
passed as @code{long}
@item uLong
passed as @code{unsigned long}
@item double
passed as @code{double}
@item longDouble
passed as @code{long double}
@item cObject
C object value passed as @code{void *}.
Any class with non-pointer indexed instance variables can be passed as
a @code{#cObject}, and @gst{} will pass the address of the first indexed
instance variable. This however should never be done for functions that
allocate objects, call back into Smalltalk code or otherwise may cause
a garbage collection: after a GC, pointers passed as @code{#cObject} may be
invalidated. In this case, it is safer to pass every object as
@code{#smalltalk}, or to only pass @code{CObject}s that were returned
by a C function previously.
In addition, @code{#cObject} can be used for function pointers. These are
instances of @code{CCallable} or one of its subclasses. See @ref{Smalltalk
callbacks} for more information on how to create function pointers for
Smalltalk blocks.
@item cObjectPtr
Pointer to C object value passed as @code{void **}. The @code{CObject}
is modified on output to reflect the value stored into the passed object.
@item smalltalk
Pass the object pointer to C. The C routine should treat the value as a
pointer to anonymous storage. This pointer can be returned to Smalltalk
at some later point in time.
@item variadic
@itemx variadicSmalltalk
an Array is expected, each of the elements of the array will be
converted like an @code{unknown} parameter if @code{variadic} is used,
or passed as a raw object pointer for @code{variadicSmalltalk}.
@item self
@itemx selfSmalltalk
Pass the receiver, converting it to C like an @code{unknown} parameter
if @code{self} is used or passing the raw object pointer for
@code{selfSmalltalk}. Parameters passed this way don't map to the
message's arguments, instead they map to the message's receiver.
@end table
@end table
Table of parameter conversions:
@multitable {Declared param type} {Boolean (True, False)} {@code{int} (C promotion rule)}
@item Declared param type @tab Object type @tab C parameter type used
@item boolean @tab Boolean (True, False)@tab int
@item byteArray @tab ByteArray @tab char *
@item cObject @tab CObject @tab void *
@item cObject @tab ByteArray, etc. @tab void *
@item cObjectPtr @tab CObject @tab void **
@item char @tab Boolean (True, False)@tab int
@item char @tab Character @tab int (C promotion rule)
@item char @tab Integer @tab int
@item double @tab Float @tab double (C promotion)
@item longDouble @tab Float @tab long double
@item int @tab Boolean (True, False)@tab int
@item int @tab Integer @tab int
@item uInt @tab Boolean (True, False)@tab unsigned int
@item uInt @tab Integer @tab unsigned int
@item long @tab Boolean (True, False)@tab long
@item long @tab Integer @tab long
@item uLong @tab Boolean (True, False)@tab unsigned long
@item uLong @tab Integer @tab unsigned long
@item smalltalk, selfSmalltalk @tab anything @tab OOP
@item string @tab String @tab char *
@item string @tab Symbol @tab char *
@item stringOut @tab String @tab char *
@item symbol @tab Symbol @tab char *
@item unknown, self @tab Boolean (True, False)@tab int
@item unknown, self @tab ByteArray @tab char *
@item unknown, self @tab CObject @tab void *
@item unknown, self @tab Character @tab int
@item unknown, self @tab Float @tab double
@item unknown, self @tab Integer @tab long
@item unknown, self @tab String @tab char *
@item unknown, self @tab Symbol @tab char *
@item unknown, self @tab anything else @tab OOP
@item variadic @tab Array @tab each element is passed according to "unknown"
@item variadicSmalltalk @tab Array @tab each element is passed as an OOP
@item wchar @tab Character @tab wchar_t
@item wstring @tab UnicodeString @tab wchar_t *
@item wstringOut @tab UnicodeString @tab wchar_t *
@end multitable
When your call-out returns @code{#void}, depending on your
application you might consider using @dfn{asynchronous
call-outs}. These are call-outs that do not suspend the process
that initiated them, so the process might be scheduled again,
executing the code that follows the call-out, during the execution
of the call-out itself. This is particularly handy when writing
event loops (the most common place where you call back into Smalltalk)
because then @emph{you can handle events that arrive during the
handling of an outer event} before the outer event's processing
has ended. Depending on your application this might be correct or
not, of course. In the future, asynchronous call-outs might be
started into a separate thread.
An asynchronous call-out is defined using an alternate primitive-like
syntax, @code{asyncCCall:args:}. Note that the returned value parameter
is missing because an asynchronous call-out always returns @code{nil}.
@node C data types
@section The C data type manipulation system
@c rewrite this.....
@code{CType} is a class used to represent C data types themselves (no
storage, just the type). There are subclasses called things like
@code{C@var{mumble}CType}. The instances can answer their size and
alignment. Their @code{valueType} is the underlying type of data. It's
either an integer, which is interpreted by the interpreter as the scalar
type, or the underlying element type, which is another @code{CType}
subclass instance.
To make life easier, there are global variables which hold onto
instances of @code{CScalarCType}: they are called
@code{C@var{mumble}Type} (like @code{CIntType}, not like
@code{CIntCType}), and can be used wherever a C datatype is used. If
you had an array of strings, the elements would be CStringType's (a
specific instance of CScalarCType).
@code{CObject} is the base class of the instances of C data. It has a
subclass called @code{CScalar}, which has subclasses called
@code{C@var{mumble}}. These subclasses can answer size and alignment
Instances of @code{CObject} can hold a raw C pointer (for example in
@code{malloc}ed heap)), or can delegate their storage to a @code{ByteArray}.
In the latter case, the storage is automatically garbage collected when
the @code{CObject} becomes dead, and the VM checks accesses to make sure
they are in bounds. On the other hand, the storage may move, and for this
reason extra care must be put when using this kind of @code{CObject} with
C routines that call back into Smalltalk, or that store the passed pointer
Instances of @code{CObject} can be created in many ways:
@item creating an instance with @code{@var{class} new} initializes
the pointer to @code{NULL};
@item doing @code{@var{type} new}, where @var{type} is a @code{CType}
subclass instance, allocates a new instance with @code{malloc}.
@item doing @code{@var{type} gcNew}, where @var{type} is a @code{CType}
subclass instance, allocates a new instance backed by garbage-collected
@end itemize
@code{CStruct} and @code{CUnion} subclasses are special. First,
@code{new} allocates a new instance with @code{malloc} instead of initializing
the pointer to @code{NULL}. Second, they support @code{gcNew} which
creates a new instance backed by garbage-collected storage.
@code{CObject}s created by the C callout mechanism are never backed by
garbage-collected storage.
@code{CObject} and its subclasses represent a pointer to a C object and
as such provide the full range of operations supported by C pointers.
For example, @code{+} @code{anInteger} which returns a CObject which is
higher in memory by @code{anInteger} times the size of each item. There
is also @code{-} which acts like @code{+} if it is given an
integer as its parameter. If a CObject is given, it returns the
difference between the two pointers. @code{incr}, @code{decr},
@code{incrBy:}, @code{decrBy:} adjust the string either forward or
backward, by either 1 or @code{n} characters. Only the pointer to the
string is changed; the actual characters in the string remain untouched.
CObjects can be divided into two families, scalars and non-scalars,
just like C data types. Scalars fetch a Smalltalk object when sent the
@code{value} message, and change their value when sent the @code{value:}
message. Non-scalars do not support these two messages. Non-scalars
include instances of @code{CArray} and subclasses of @code{CStruct}
and @code{CUnion} (but not @code{CPtr}).
@code{CPtr}s and @code{CArray}s get their underlying element type through a
@code{CType} subclass instance which is associated with the
@code{CArray} or @code{CPtr} instance.
@code{CPtr}'s @code{value} and @code{value:} method get or change
the underlying value that's pointed to. @code{value} returns another
@code{CObject} corresponding to the pointed value. That's because, for
example, a @code{CPtr} to @code{long} points to a place in memory where
a pointer to long is stored. It is really a @code{long **} and must be
dereferenced twice with @code{cPtr value value} to get the @code{long}.
@code{CString} is a subclass of @code{CPtr} that answers a Smalltalk
@code{String} when sent @code{value}, and automatically allocates
storage to copy and null-terminate a Smalltalk @code{String} when sent
@code{value:}. @code{replaceWith:} replaces the string the instance
points to with a new string or @code{ByteArray}, passed as the argument.
Actually, it copies the bytes from the Smalltalk @code{String} instance
aString into the same buffer already pointed to by the @code{CString},
with a null terminator.
Finally, there are @code{CStruct} and @code{CUnion}, which are abstract
subclasses of @code{CObject}@footnote{Actually they have a common superclass
named @code{CCompound}.}. The following will refer to CStruct, but the
same considerations apply to CUnion as well, with the only difference that
CUnions of course implement the semantics of a C union.
These classes provide direct access to C data structures including
@itemize @bullet
@bulletize @code{long} (unsigned too)
@bulletize @code{short} (unsigned too)
@bulletize @code{char} (unsigned too) & byte type
@bulletize @code{double}, @code{long double}, @code{float}
@bulletize @code{string} (NUL terminated char *, with special accessors)
@bulletize arrays of any type
@bulletize pointers to any type
@bulletize other structs containing any fixed size types
@end itemize
Here is an example struct decl in C:
struct audio_prinfo @{
unsigned channels;
unsigned precision;
unsigned encoding;
unsigned gain;
unsigned port;
unsigned _xxx[4];
unsigned samples;
unsigned eof;
unsigned char pause;
unsigned char error;
unsigned char waiting;
unsigned char _ccc[3];
unsigned char open;
unsigned char active;
struct audio_info @{
audio_prinfo_t play;
audio_prinfo_t record;
unsigned monitor_gain;
unsigned _yyy[4];
@end example
And here is a Smalltalk equivalent decision:
CStruct subclass: AudioPrinfo [
<declaration: #( (#sampleRate #uLong)
(#channels #uLong)
(#precision #uLong)
(#encoding #uLong)
(#gain #uLong)
(#port #uLong)
(#xxx (#array #uLong 4))
(#samples #uLong)
(#eof #uLong)
(#pause #uChar)
(#error #uChar)
(#waiting #uChar)
(#ccc (#array #uChar 3))
(#open #uChar)
(#active #uChar))>
<category: 'C interface-Audio'>
CStruct subclass: AudioInfo [
<declaration: #( (#play #@{AudioPrinfo@} )
(#record #@{AudioPrinfo@} )
(#monitorGain #uLong)
(#yyy (#array #uLong 4)))>
<category: 'C interface-Audio'>
@end example
This creates two new subclasses of @code{CStruct} called
@code{AudioPrinfo} and @code{AudioInfo}, with the given fields. The
syntax is the same as for creating standard subclasses, with the
additional metadata @code{declaration:}. You can
make C functions return @code{CObject}s that are instances of these
classes by passing @code{AudioPrinfo type} as the parameter to the
@code{returning:} keyword.
AudioPrinfo has methods defined on it like:
@end example
etc. These access the various data members. The array element
accessors (xxx, ccc) just return a pointer to the array itself.
For simple scalar types, just list the type name after the variable.
Here's the set of scalars names, as defined in @file{kernel/}:
#long CLong
#uLong CULong
#ulong CULong
#byte CByte
#char CChar
#uChar CUChar
#uchar CUChar
#short CShort
#uShort CUShort
#ushort CUShort
#int CInt
#uInt CUInt
#uint CUInt
#float CFloat
#double CDouble
#longDouble CLongDouble
#string CString
#smalltalk CSmalltalk
#@{...@} @r{A given subclass of @code{CObject}}
@end example
The @code{#@{@dots{}@}} syntax is not in the Blue Book, but it is
present in @gst{} and other Smalltalks; it returns an Association object
corresponding to a global variable.
To have a pointer to a type, use something like:
(#example (#ptr #long))
@end example
To have an array pointer of size @var{size}, use:
(#example (#array #string @var{size}))
@end example
Note that this maps to @code{char *example[@var{size}]} in C.
The objects returned by using the fields are CObjects; there is no
implicit value fetching currently. For example, suppose you somehow got
ahold of an instance of class AudioPrinfo as described above (the
instance is a CObject subclass and points to a real C structure
somewhere). Let's say you stored this object in variable
@code{audioInfo}. To get the current gain value, do
audioInfo gain value
@end example
to change the gain value in the structure, do
audioInfo gain value: 255
@end example
The structure member message just answers a @code{CObject} instance, so
you can hang onto it to directly refer to that structure member, or you
can use the @code{value} or @code{value:} methods to access or change
the value of the member.
Note that this is the same kind of access you get if you use the
@code{addressAt:} method on CStrings or CArrays or CPtrs: they return a
CObject which points to a C object of the right type and you need to use
@code{value} and @code{value:} to access and modify the actual C
@node Smalltalk types
@section Manipulating Smalltalk data from C
@gst{} internally maps every object except Integers to a data structure
named an @dfn{OOP} (which is short for @dfn{Ordinary Object Pointer}).
An OOP is a pointer to an internal data structure; this data structure
basically adds a level of indirection in the representation of objects,
since it contains
@itemize @bullet
a pointer to the actual object data
a bunch of flags, most of which interest the garbage collection process
@end itemize
This additional level of indirection makes garbage collection very
efficient, since the collector is free to move an object in memory
without updating every reference to that object in the heap, thereby
keeping the heap fully compact and allowing very fast allocation of new
objects. However, it makes C code that wants to deal with objects even
more messy than it would be without; if you want some examples, look at
the hairy code in @gst{} that deals with processes.
To shield you as much as possible from the complications of doing
object-oriented programming in a non-object-oriented environment like C,
@gst{} provides friendly functions to map between common Smalltalk
objects and C types. This way you can simply declare OOP variables and
then use these functions to treat their contents like C data.
These functions are passed to a module via the @code{VMProxy} struct, a
pointer to which is passed to the module, as shown in @ref{External
modules, , Linking your libraries to the virtual machine}. They can be
divided in two groups, those that map @emph{from Smalltalk objects to C
data types} and those that map @emph{from C data types to Smalltalk
Here are those in the former group (Smalltalk to C); you can see that
they all begin with @code{OOPTo}:
@deftypefun long OOPToInt (OOP)
This function assumes that the passed OOP is an Integer and returns the
C @code{signed long} for that integer.
@end deftypefun
@deftypefun long OOPToId (OOP)
This function returns an unique identifier for the given OOP, valid
until the OOP is garbage-collected.
@end deftypefun
@deftypefun double OOPToFloat (OOP)
This function assumes that the passed OOP is an Integer or Float and
returns the C @code{double} for that object.
@end deftypefun
@deftypefun {long double} OOPToLongDouble (OOP)
This function assumes that the passed OOP is an Integer or Float and
returns the C @code{long double} for that object.
@end deftypefun
@deftypefun int OOPToBool (OOP)
This function returns a C integer which is true (i.e. @code{!= 0}) if
the given OOP is the @code{true} object, false (i.e. @code{== 0})
@end deftypefun
@deftypefun char OOPToChar (OOP)
This function assumes that the passed OOP is a Character and returns the
C @code{char} for that integer.
@end deftypefun
@deftypefun wchar_t OOPToWChar (OOP)
This function assumes that the passed OOP is a Character or
UnicodeCharacter and returns the C @code{wchar_t} for that integer.
@end deftypefun
@deftypefun char *OOPToString (OOP)
This function assumes that the passed OOP is a String or ByteArray and
returns a C null-terminated @code{char *} with the same contents. It is
the caller's responsibility to free the pointer and to handle possible
@samp{NUL} characters inside the Smalltalk object.
@end deftypefun
@deftypefun wchar_t *OOPToWString (OOP)
This function assumes that the passed OOP is a UnicodeString and
returns a C null-terminated @code{wchar_t *} with the same contents. It is
the caller's responsibility to free the pointer and to handle possible
@samp{NUL} characters inside the Smalltalk object.
@end deftypefun
@deftypefun char *OOPToByteArray (OOP)
This function assumes that the passed OOP is a String or ByteArray and
returns a C @code{char *} with the same contents, without
null-terminating it. It is the caller's responsibility to free the
@end deftypefun
@deftypefun PTR OOPToCObject (OOP)
This functions assumes that the passed OOP is a kind of CObject and
returns a C @code{PTR} to the C data pointed to by the object. The
caller should not free the pointer, nor assume anything about its size
and contents, unless it @b{exactly} knows what it's doing. A @code{PTR}
is a @code{void *} if supported, or otherwise a @code{char *}.
@end deftypefun
@deftypefun long OOPToC (OOP)
This functions assumes that the passed OOP is a String, a ByteArray,
a CObject, or a built-in object (@code{nil}, @code{true}, @code{false},
character, integer). If the OOP is @code{nil}, it answers 0; else the
mapping for each object is exactly the same as for the above functions.
Note that, even though the function is declared as returning a
@code{long}, you might need to cast it to either a @code{char *}
or @code{PTR}.
@end deftypefun
While special care is needed to use the functions above (you will
probably want to know at least the type of the Smalltalk object you're
converting), the functions below, which convert C data to Smalltalk
objects, are easier to use and also put objects in the incubator so that
they are not swept by a garbage collection (@pxref{Incubator}). These
functions all @dfn{end} with @code{ToOOP}, except
@deftypefun OOP intToOOP (long)
This object returns a Smalltalk @code{Integer} which contains the same value as
the passed C @code{long}.
@end deftypefun
@deftypefun OOP uintToOOP (unsigned long)
This object returns a Smalltalk @code{Integer} which contains the same value as
the passed C @code{unsigned long}.
@end deftypefun
@deftypefun OOP idToOOP (OOP)
This function returns an OOP from a unique identifier returned by
@code{OOPToId}. The OOP will be the same that was passed to
@code{OOPToId} only if the original OOP has not been garbage-collected
since the call to @code{OOPToId}.
@end deftypefun
@deftypefun OOP floatToOOP (double)
This object returns a Smalltalk @code{FloatD} which contains the same value as
the passed @code{double}. Unlike Integers, FloatDs have exactly the same
precision as C doubles.
@end deftypefun
@deftypefun OOP longDoubleToOOP (long double)
This object returns a Smalltalk @code{FloatQ} which contains the same value as
the passed @code{long double}. Unlike Integers, FloatQs have exactly the same
precision as C long doubles.
@end deftypefun
@deftypefun OOP boolToOOP (int)
This object returns a Smalltalk @code{Boolean} which contains the same boolean
value as the passed C @code{int}. That is, the returned OOP is the sole
instance of either @code{False} or @code{True}, depending on where the
parameter is zero or not.
@end deftypefun
@deftypefun OOP charToOOP (char)
This object returns a Smalltalk @code{Character} which represents the same char
as the passed C @code{char}.
@end deftypefun
@deftypefun OOP charToOOP (wchar_t)
This object returns a Smalltalk @code{Character} or @code{UnicodeCharacter}
which represents the same char as the passed C @code{wchar_t}.
@end deftypefun
@deftypefun OOP classNameToOOP (char *)
This method returns the Smalltalk class (i.e. an instance of a subclass
of Class) whose name is the given parameter. Namespaces are supported;
the parameter must give the complete path to the class starting from the
@code{Smalltalk} dictionary. @code{NULL} is returned if the class is
not found.
This method is slow; you can safely cache its result.
@end deftypefun
@deftypefun OOP stringToOOP (char *)
This method returns a String which maps to the given null-terminated C
string, or the builtin object @code{nil} if the parameter points to
address 0 (zero).
@end deftypefun
@deftypefun OOP wstringToOOP (wchar_t *)
This method returns a UnicodeString which maps to the given null-terminated C
wide string, or the builtin object @code{nil} if the parameter points to
address 0 (zero).
@end deftypefun
@deftypefun OOP byteArrayToOOP (char *, int)
This method returns a ByteArray which maps to the bytes that the first
parameters points to; the second parameter gives the size of the
ByteArray. The builtin object @code{nil} is returned if the first
parameter points to address 0 (zero).
@end deftypefun
@deftypefun OOP symbolToOOP (char *)
This method returns a String which maps to the given null-terminated C
string, or the builtin object @code{nil} if the parameter points to
address 0 (zero).
@end deftypefun
@deftypefun OOP cObjectToOOP (PTR)
This method returns a CObject which maps to the given C pointer, or the
builtin object @code{nil} if the parameter points to address 0 (zero).
The returned value has no precise CType assigned. To assign one, use
@end deftypefun
@deftypefun OOP cObjectToTypedOOP (PTR, OOP)
This method returns a CObject which maps to the given C pointer, or the
builtin object @code{nil} if the parameter points to address 0 (zero).
The returned value has the second parameter as its type; to get possible
types you can use @code{typeNameToOOP}.
@end deftypefun
@deftypefun OOP typeNameToOOP (char *)
All this method actually does is evaluating its parameter as Smalltalk
code; so you can, for example, use it in any of these ways:
cIntType = typeNameToOOP("CIntType");
myOwnCStructType = typeNameToOOP("MyOwnCStruct type");
@end example
This method is primarily used by @code{msgSendf} (@pxref{Smalltalk callin}),
but it can be useful if you use lower level call-in methods. This method
is slow too; you can safely cache its result.
@end deftypefun
As said above, the C to Smalltalk layer automatically puts the objects
it creates in the incubator which prevents objects from being collected
as garbage. A plugin, however, has limited control on the incubator,
and the incubator itself is not at all useful when objects should be
kept registered for a relatively long time, and whose lives in the
registry typically overlap.
To avoid garbage collection of such object, you can use these functions,
which access a separate registry:
@deftypefun OOP registerOOP (OOP)
Puts the given OOP in the registry. If you register an object multiple
times, you will need to unregister it the same number of times. You may
want to register objects returned by Smalltalk call-ins.
@end deftypefun
@deftypefun void unregisterOOP (OOP)
Removes an occurrence of the given OOP from the registry.
@end deftypefun
@deftypefun void registerOOPArray (OOP **, OOP **)
Tells the garbage collector that an array of objects must be made part
of the root set. The two parameters point indirectly to the base and
the top of the array; that is, they are pointers to variables holding
the base and the top of the array: having indirect pointers allows you
to dynamically change the size of the array and even to relocate it in
memory without having to unregister and re-register it every time you
modify it. If you register an array multiple times, you will need to
unregister it the same number of times.
@end deftypefun
@deftypefun void unregisterOOPArray (OOP **)
Removes the array with the given base from the registry.
@end deftypefun
@node Smalltalk callin
@section Calls from C to Smalltalk
@gst{} provides seven different function calls that allow you to call
Smalltalk methods in a different execution context than the current
one. The priority in which the method will execute will be the same as
the one of Smalltalk process which is currently active.
Four of these functions are more low level and are more suited when the
Smalltalk program itself gave a receiver, a selector and maybe some
parameters; the others, instead, are more versatile. One of them
(@code{msgSendf}) automatically handles most conversions between C data
types and Smalltalk objects, while the others takes care of compiling full
snippets of Smalltalk code.
All these functions handle properly the case of specifying, say, 5 arguments
for a 3-argument selector---see the description of the single functions
for more information).
In all cases except @code{msgSendf}, passing NULL as the selector will
expect the receiver to be a block and evaluate it.
@deftypefun OOP msgSend (OOP receiver, OOP selector, @dots{})
This function sends the given selector (should be a Symbol, otherwise
@code{nilOOP} is returned) to the given receiver. The message arguments should
also be OOPs (otherwise, an access violation exception is pretty likely)
and are passed in a NULL-terminated list after the selector. The value
returned from the method is passed back as an OOP to the C program as
the result of @code{msgSend}, or @code{nilOOP} if the number of arguments is
wrong. Example (same as @code{1 + 2}):
OOP shouldBeThreeOOP = vmProxy->msgSend(
@end example
@end deftypefun
@deftypefun OOP strMsgSend (OOP receiver, char *selector, @dots{})
This function is the same as above, but the selector is passed as a C
string and is automatically converted to a Smalltalk symbol.
Theoretically, this function is a bit slower than @code{msgSend} if your
program has some way to cache the selector and avoiding a call to
@code{symbolToOOP} on every call-in. However, this is not so apparent
in ``real'' code because the time spent in the Smalltalk interpreter
will usually be much higher than the time spent converting the selector
to a Symbol object. Example:
OOP shouldBeThreeOOP = vmProxy->strMsgSend(
@end example
@end deftypefun
@deftypefun OOP vmsgSend (OOP receiver, OOP selector, OOP *args)
This function is the same as msgSend, but accepts a pointer to the
NULL-terminated list of arguments, instead of being a variable-arguments
functions. Example:
OOP arguments[2], shouldBeThreeOOP;
arguments[0] = intToOOP(2);
arguments[1] = NULL;
/* @dots{} some more code here @dots{} */
shouldBeThreeOOP = vmProxy->vmsgSend(
@end example
@end deftypefun
@deftypefun OOP nvmsgSend (OOP receiver, OOP selector, OOP *args, int nargs)
This function is the same as msgSend, but accepts an additional parameter
containing the number of arguments to be passed to the Smalltalk method,
instead of relying on the NULL-termination of args. Example:
OOP argument, shouldBeThreeOOP;
argument = intToOOP(2);
/* @dots{} some more code here @dots{} */
shouldBeThreeOOP = vmProxy->nvmsgSend(
@end example
@end deftypefun
@deftypefun OOP perform (OOP, OOP)
Shortcut function to invoke a unary selector. The first parameter
is the receiver, and the second is the selector.
@end deftypefun
@deftypefun OOP performWith (OOP, OOP, OOP)
Shortcut function to invoke a one-argument selector. The first parameter
is the receiver, the second is the selector, the third is the sole
@end deftypefun
@deftypefun OOP invokeHook (int)
Calls into Smalltalk to process a @code{ObjectMemory} hook given by
the parameter. In practice, @code{changed:} is sent to @code{ObjectMemory}
with a symbol derived from the parameter. The parameter can be one of:
@item @code{GST_BEFORE_EVAL}
@item @code{GST_AFTER_EVAL}
@item @code{GST_ABOUT_TO_QUIT}
@end itemize
All cases where the last three should be used should be covered in
@gst{}'s source code. The first three, however, can actually be useful
in user code.
@end deftypefun
The two functions that directly accept Smalltalk code are named
@code{evalCode} and @code{evalExpr}, and they're basically the same.
They both accept a single parameter, a pointer to the code to be
submitted to the parser. The main difference is that @code{evalCode}
discards the result, while @code{evalExpr} returns it to the caller
as an OOP.
@code{msgSendf}, instead, has a radically different syntax. Let's first
look at some examples.
/* 1 + 2 */
int shouldBeThree;
vmProxy->msgSendf(&shouldBeThree, "%i %i + %i", 1, 2)
/* aCollection includes: 'abc' */
OOP aCollection;
int aBoolean;
vmProxy->msgSendf(&aBoolean, "%b %o includes: %s", aCollection, "abc")
/* 'This is a test' printNl -- in two different ways */
vmProxy->msgSendf(NULL, "%v %s printNl", "This is a test");
vmProxy->msgSendf(NULL, "%s %s printNl", "This is a test");
/* 'This is a test', ' ok?' */
char *str;
vmProxy->msgSendf(&str, "%s %s , %s", "This is a test", " ok?");
@end example
As you can see, the parameters to msgSendf are, in order:
@itemize @bullet
A pointer to the variable which will contain the record. If this pointer
is @code{NULL}, it is discarded.
A description of the method's interface in this format (the object
types, after percent signs, will be explained later in this section)
%result_type %receiver_type selector %param1_type %param2_type
@end example
A C variable or Smalltalk object (depending on the type specifier) for
the receiver
If needed, the C variables and/or Smalltalk object (depending on the
type specifiers) for the arguments.
@end itemize
Note that the receiver and parameters are NOT registered in the object
registry (@pxref{Smalltalk types}). @dfn{receiver_type} and
@dfn{paramX_type} can be any of these characters, with these meanings:
Specifier C data type equivalent Smalltalk class
i long Integer (see intToOOP)
f double Float (see floatToOOP)
F long double Float (see longDoubleToOOP)
b int True or False (see boolToOOP)
B OOP BlockClosure
c char Character (see charToOOP)
C PTR CObject (see cObjToOOP)
s char * String (see stringToOOP)
S char * Symbol (see symbolToOOP)
o OOP any
t char *, PTR CObject (see below)
T OOP, PTR CObject (see below)
w wchar_t Character (see wcharToOOP)
W wchar_t * UnicodeString (see wstringToOOP)
@end example
@samp{%t} and @samp{%T} are particular in the sense that you need to
pass @dfn{two} additional arguments to @code{msgSendf}, not one. The
first will be a description of the type of the CObject to be created,
the second instead will be the CObject's address. If you specify
@samp{%t}, the first of the two arguments will be converted to a
Smalltalk @code{CType} via @code{typeNameToOOP} (@pxref{Smalltalk
types}); instead, if you specify @samp{%T}, you will have to directly
pass an OOP for the new CObject's type.
For @samp{%B} you should not pass a selector, and the block will be
The type specifiers you can pass for @dfn{result_type} are a bit
Specifier if nil C data type expected result
i 0L long nil or an Integer
f 0.0 double nil or a Float
F 0.0 long double nil or a Float
b 0 int nil or a Boolean
c '\0' char nil or a Character
C NULL PTR nil or a CObject
s NULL char * nil, a String, or a Symbol
? 0 char *, PTR See oopToC
o nilOOP OOP any (result is not converted)
w '\0' wchar_t nil or a Character
W NULL wchar_t * nil or a UnicodeString
v / any (result is discarded)
@end example
Note that, if resultPtr is @code{NULL}, the @dfn{result_type} is always
treated as @samp{%v}. If an error occurs, the value in the `result if
nil' column is returned.
@node Smalltalk callbacks
@section Smalltalk blocks as C function pointers
The Smalltalk callin mechanism can be used effectively to construct
bindings to C libraries that require callbacks into Smalltalk.
However, it is a ``static'' mechanism, as the callback functions
passed to the libraries have to be written in C and their type
signatures are fixed.
If the signatures of the callbacks are not known in advance,
and the only way to define callbacks is via C function pointers (as
opposed to reflective mechanisms such as the ones in GTK+), then
the @code{VMProxy} functions for Smalltalk callin are not enough.
@gst{} provides a more dynamic way to convert Smalltalk blocks into
C function pointers through the @code{CCallbackDescriptor} class.
This class has a constructor method that is similar to the
@code{cCall:} annotation used for callouts. The method is
called @code{for:returning:withArgs:} and its parameters are:
@itemize @bullet
@item a block, whose number of arguments is variable
@item a symbol representing the return type
@item an array representing the type of the arguments.
@end itemize
The array passed as the third parameter represents values that
are passed @emph{from C to Smalltalk} and, as such, should be
filled with the same rules that are used by the @emph{return
type} of a C callout. In particular, if the C callback
accepts an @code{int *} it is possible (and indeed useful)
to specify the type of the argument as @code{#@{CInt@}},
so that the block will receive a @code{CInt} object.
Here is an example of creating a callback which is passed to
@code{glutReshapeFunc}@footnote{The GLUT bindings use a different
scheme for setting up callbacks.}. The desired
signature in C is @code{void (*) (int, int)}.
| glut |
glut glutReshapeFunc: (CCallbackDescriptor
for: [ :x :y | self reshape: x@@y ]
returning: #void
withArgs: #(#int #int))
@end example
It is important to note that this kind of callback does not survive
across an image load (this restriction may be lifted in a future version).
When the image is loaded, it has to be reset by sending it the @code{link}
message before it is passed to any C function. Sending the @code{link}
message to an already valid callback is harmless and cheap.
@node Other C functions
@section Other functions available to modules
In addition to the functions described so far, the @code{VMProxy} that is
available to modules contains entry-points for many functions that aid
in developing @gst{} extensions in C. This node documents these
functions and the macros that are defined by @file{libgst/gstpub.h}.
@deftypefun void asyncCall (void (*) (OOP), OOP)
This functions accepts a function pointer and an OOP (or @code{NULL}, but
not an arbitrary pointer) and sets up the interpreter to call the
function as soon as the next message send is executed.
@emph{Caution:} This and the next two are the only functions in the
@code{intepreterProxy} that are thread-safe.
@end deftypefun
@deftypefun void asyncSignal (OOP)
This functions accepts an OOP for a @code{Semaphore} object and signals
that object so that one of the processes waiting on that semaphore is
waken up. Since a Smalltalk call-in is not an atomic operation, the
correct way to signal a semaphore is not to send the @code{signal}
method to the object but, rather, to use:
@end example
The signal request will be processed as soon as the next message send is
@end deftypefun
@deftypefun void asyncSignalAndUnregister (OOP)
This functions accepts an OOP for a @code{Semaphore} object and signals
that object so that one of the processes waiting on that semaphore is
waken up; the signal request will be processed as soon as the next
message send is executed. The object is then removed from the registry.
@end deftypefun
@deftypefun void wakeUp (void)
When no Smalltalk process is running, @gst{} tries to limit CPU usage
by pausing until it gets a signal from the OS. @code{wakeUp} is an
alternative way to wake up the main Smalltalk loop. This should rarely
be necessary, since the above functions already call it automatically.
@end deftypefun
@deftypefun void syncSignal (OOP, mst_Boolean)
This functions accepts an OOP for a @code{Semaphore} object and signals
that object so that one of the processes waiting on that semaphore is
waken up. If the semaphore has no process waiting in the queue and
the second argument is true, an excess signal is added to the semaphore.
Since a Smalltalk call-in is not an atomic operation, the correct way to
signal a semaphore is not to send the @code{signal} or @code{notify}
methods to the object but, rather, to use:
syncSignal(semaphoreOOP, true)
@end example
The @code{sync} in the name of this function distinguishes it from
@code{asyncSignal}, in that it can only be called from a procedure
already scheduled with @code{asyncCall}. It cannot be called from
a call-in, or from other threads than the interpreter thread.
@end deftypefun
@deftypefun void syncWait (OOP)
This function is present for backwards-compatibility only and
should not be used.
@end deftypefun
@deftypefun void showBacktrace (FILE *)
This functions show a backtrace on the given file.