Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

758 lines (645 sloc) 24.551 kb
<?xml version="1.0" encoding="iso-8859-1" ?>
<!DOCTYPE chapter SYSTEM "chapter.dtd">
<chapter>
<header>
<title>Academic and Historical Questions</title>
<prepared>Matthias Lang</prepared>
<docno></docno>
<date>2009-04-19</date>
<rev>1.1</rev>
<file>academic.xml</file>
</header>
<section><title> Why is Erlang called "Erlang"?</title>
<p>
Erlang is named after the mathematician Agner Erlang. Among other
things, he worked on queueing theory. The original implementors
also tend to avoid denying that Erlang sounds a little like
"ERicsson LANGuage".
</p></section>
<section><title>Who invented Erlang?</title>
<p>
During the 1980s there was a project at the Ericsson
<url href="http://www.cs-lab.org/">
Computer Science Laboratory</url>
which aimed to find out what aspects of computer languages
made it easier to program telecommunications systems. Erlang
emerged in the second half of the 80s was the result of
taking those features which made writing such systems
simpler and avoiding those which made them more complex
or error prone.
</p><p>
The people involved at the start were Joe Armstrong,
Robert Virding and Mike Williams. Others joined later
and added things like distribution and OTP.
</p></section>
<section><title>Where does Erlang syntax come from?</title>
<p>
Mostly from prolog. Erlang started life as a modified prolog.
! as the send-message operator comes from CSP. Eripascal
was probably responsible for , and ; being separators and
not terminators.
</p></section>
<section><title>Why is Ericsson giving away Erlang?</title>
<p>
(The following is my personal impression. I don't speak
for Ericsson!)
</p><p>
<em>Nothing to lose</em>: Ericsson's core business is
telecommunications products, selling programming
tools is not really a business Ericsson is interested in.
</p><p>
<em>Stimulate adoption</em>: Erlang is a great
language for many sorts of systems. Releasing a
good, free development environment is likely to
make Erlang catch on faster.
</p><p>
<em>Generate goodwill</em>: Giving away cool software can
only improve Ericsson's image, especially given the
current level of media attention around "open software".
</p><p>
</p></section>
<section><marker id="soft-realtime"/>
<title>What does soft realtime mean?</title>
<p>
Cynics will say "basically nothing".
</p><p>
A <em>hard realtime</em>
system is one which can guarantee that a certain action will
always be carried out in less than a certain time. Many simple
embedded systems can make hard realtime guarantees, e.g.
it is possible to guarantee that a particular interrupt
service routine on a Z80 CPU will never take more than
34us. It gets progressively harder to make such guarantees for
more complex systems.
</p><p>
Many telecomms systems have less strict requirements, for
instance they might require a statistical guarantee
along the lines of "a database lookup takes less than 20ms
in 97% of cases". Soft realtime
systems, such as Erlang, let you make that sort of guarantee.
</p><p>
A rule of thumb is that it is straightforward to write
Erlang programs which can respond to
external events within a few milliseconds.
The parts of Erlang which help with this are:
</p>
<list>
<item><p>Language features which make it hard(er)
for programmers to roughly estimate performance were
never added to Erlang.
For instance, Erlang doesn't have lazy evaluation.
</p></item>
<item><p>Erlang processes are very lightweight, much lighter
than an operating system thread. Switching
between Erlang processes is typically an order
of magnitude or two faster than switching between
OS threads.
</p></item>
<item><p>Each Erlang process is garbage collected separately.
This avoids the (relatively) long delay which would
occur if there was a single large heap for all
processes.
</p></item>
</list>
</section>
<section><title> How did the first Erlang compiler get written?</title>
<p>
(or: <em>how was Erlang bootstrapped?</em>) In Joe's words:
</p>
<quote>
<p>
First I designed an abstract machine to execute Erlang.
This was called the JAM machine; JAM = Joe's
Abstract Machine.
</p>
</quote>
<quote>
<p>
Then I wrote a compiler from Erlang to JAM
and an emulator to see if the machine worked.
Both these were written in prolog.
</p>
</quote>
<quote>
<p>
At the same time Mike Williams wrote a C emulator for the JAM.
</p>
</quote>
<quote>
<p>
Then I rewrote the erlang-to-jam compiler in Erlang and
used the prolog compiler to compile it. The resultant
object code was run in the C emulator. Then we threw away prolog.
</p>
</quote>
<p>
Some of this is described in <url
href="http://www.erlang.se/publications/prac_appl_prolog.ps">
an old paper</url>
</p></section>
<section><title> Erlang Syntax and Semantics Details</title>
<p>
When in doubt about exactly what the language allows or does,
the best place to start is the
<url href="http://www.erlang.org/download/erl_spec47.ps.gz">
Erlang Specification</url>. This is still a work
in progress, but it covers most of the language.
</p></section>
<section><title> Is the order of message reception guaranteed?</title>
<p>
Yes, but only within one process.
</p><p>
If there is a live process and you send it message A and then
message B, it's guaranteed that if message B arrived, message
A arrived before it.
</p><p>
On the other hand, imagine processes P, Q and R. P sends
message A to Q, and then message B to R. There is no
guarantee that A arrives before B. (Distributed Erlang would
have a pretty tough time if this was required!)
</p>
</section>
<section><title>If I send a message, is it guaranteed to reach the receiver?
</title>
<p>
Most people find it simplest to program as though the answer
was <em>"yes, always"</em>.
</p><p>
Per Hedeland covered the issues on the mailing list (edited a bit):
</p>
<quote>
<p>
"Delivery is guaranteed if nothing breaks" - and if something
breaks, you will find
out provided you've used <c>link/1</c>. I.e.
you will get an <c>EXIT</c> signal
not only if the linked process dies, but
also if the entire remote node crashes, or the network is
broken, or if any of these happen <em>before</em>
you do the link.
</p><p>
It seems this issue of "guaranteed delivery" comes up every
now and then, but I've never managed to find out exactly what
it is those that are asking for it actually want:
</p>
</quote>
<list>
<item><p>
A guarantee that the message is put into the receiver's
input queue? But if the receiver dies before extracting
it from there, that guarantee is useless, as the message
is lost anyway.
</p></item>
<item><p> A guarantee that the receiver extracts the message from its
input queue? Well, besides the obvious problem that
depending on how the receiver is written, even if
it lives happily ever after it may <em>never
</em> extract
that particular message, it suffers from a variant of
the previous problem: Even if you "know" that the
receiver has "consumed" the message, it may die before
acting on it in any way, and then again it may as well
never have been sent.
</p></item>
<item><p> A guarantee that the receiver actually
<em>processes</em>
the message? Just kidding of course, hopefully it's
obvious to everyone that the only way to obtain such
a guarantee, regardless of what programming and
communication system you use, is that the receiver
is programmed to send an explicit acknowledgment
when the processing is complete (of course this may
be hidden below an abstraction such as RPC, but the
fundamental principle holds).
</p></item>
</list>
<quote>
<p>
Add to this that any guarantee would <em>have</em>
to entail some
form of ack from the remote in at least a distributed
system, even if it wasn't directly visible to the
programmer. E.g. you could have '!' block until the ack
comes back from the remote saying that the message had
progressed however far you required - i.e. synchronous
communication of sorts. But this would penalize those that
<em>don't</em> require the "guarantee" and
<em>want</em> asynchronous communication.
</p><p>
So, depending on your requirements, Erlang offers you
<em>at least</em> these levels of "guarantee":
</p>
</quote>
<section>
<title>Super-safe</title>
<p>
Receiver sends ack after processing;
sender links, sends, waits for <c>ack</c>
or <c>EXIT</c>.
This means the sender knows, for each message, whether
it was fully processed or not.
</p>
</section>
<section>
<title>Medium-safe</title>
<p>
Receiver doesn't send acks; Sender links,
sends message(s). This means an <c>EXIT</c>
signal informs the sender that some messages may never have
been processed.
</p>
</section>
<section>
<title>Pretty-safe</title>
<p>
Receiver doesn't send acks; sender sends messages. :-)
</p>
</section>
<quote>
<p>
There are any number of combinations of these (e.g. receiver sends
ack not after each message but at some critical points in the
processing).
</p>
</quote>
<p>
Per concluded by pointing out that "if you think TCP guarantees
delivery, which most people probably do, then so does Erlang".
</p></section>
<section><title> What limits does Erlang have?</title>
<p>
The Erlang language doesn't specify any limits, but
different implementations have different limits on the
number of processes, the maximum amount of RAM and so on.
These are
<url href="http://erlang.org/doc/efficiency_guide/advanced.html#9.2">documented for each implementation</url>.
</p></section>
<section><title>Do funs behave during a code change?</title>
<p>
Yes and no. A fun is a reference to code; it is not the
code itself. Thus a fun can only be evaluated if the code
it refers to is actually loaded on the evaluating node.
In some cases, we can be sure that execution will never
return to a fun. In these cases the old code can be purged
without problems. The following code causes no surprises:
</p>
<code><![CDATA[
Parent = self(),
F = fun() -> loop(Parent) end,
spawn(F).
]]></code>
<p>
On the other hand, a bound fun will not be replaced.
In the following example, the old verson of F is
executed even after a code change:
</p>
<code><![CDATA[
-module(cc).
-export([go/0, loop/1]).
go() ->
F = fun() -> 5 end,
spawn(fun() -> loop(F) end).
loop(F) ->
timer:sleep(1000),
F(),
cc:loop(F).
]]></code>
<p>
This type of problem can be solved in the
<c>code_change/2</c> function in the standard behaviours.
</p></section>
<section><title>Examples of bound funs causing surprises</title>
<p>
Some typical ways to get in trouble are:
</p>
<list>
<item><p>Mnesia. If you store funs in mnesia, you need to
have a way to update the table (i.e. replace the
fun references) when you load code.</p></item>
<item><p>Serialisation. The same deal. If you serialise a
fun (e.g. <c>term_to_binary/1</c>) and
then unserialise it, you can only evaluate the fun
if the code it refers to still exists. In general,
serialising a fun, saving it to a file and then
loading it in another emulator will not do what you
hoped.</p></item>
<item><p>Message passing. Sending a fun to a remote node
always works, but the fun can only be evaluated on
a node where the same version of the same module is
available.</p></item>
</list>
<p>
A general way to avoid problems with funs which refer to code
which doesn't exist is to store the function by name instead
of by reference, e.g. by writing <c>fun M:F/A</c>.
</p></section>
<section><title>Is it just me, or are records ugly?</title>
<p>
Compared to the rest of Erlang, records are rather
ugly and error prone. They're ugly because they
require an awful lot of typing (no pun intended).
They're error prone because the usual method of defining
records, the <c>-include</c> directive,
provides no protection against multiple, incompatible
definitions of records.
</p><p>
Several ways forward have been explored. One is lisp-like
<em>structs</em> which have been discussed on <url
href="http://erlang.org/pipermail/erlang-questions/2003-January/006684.html">the
mailing list.</url>
Another is Richard O'Keefe's <em>abstract patterns</em> which
is an <url
href="http://www.erlang.org/eeps/eep-0029.html">Erlang Enhancement
Proposal</url>.
Then there is also a suggestion for <url href="http://erlang.org/pipermail/erlang-questions/2005-September/017201.html">making records more reliable</url>.
</p></section>
<section>
<marker id="priority-receive"/>
<title>Can a process receive messages with varying priorities?</title>
<p>
Sure. Here's an example of how to do it using nested receives:
</p>
<code><![CDATA[
receive ->
{priority_msg, Data1} -> priority(Data1)
after
0 ->
receive
{priority_msg, Data1} -> priority(Data1)
{normal_msg, Data2} -> normal(Data2)
end
end.
]]></code>
</section>
<section><title>What's the history of static type checking in Erlang?</title>
<p>
Erlang ships with a static type analysis system called the
<url
href="http://www.it.uu.se/research/group/hipe/dialyzer/">Dialyzer</url>.
Using the dialyzer is optional, though many or most serious
projects use it. The Dialyzer does not require source code to
be modified or annotated, though annotations increase the
number of problems the Dialyzer can find.
</p>
<p>
Prior to about 2005, static type checking was used rarely in
commercial Erlang-based systems. Several people experimented
with various approaches to the problem, including Sven-Olof
Nyström, Joe Armstrong, Philip Wadler, Simon Marlow and Thomas
Arts.
</p>
</section>
<!-- This section is inspired by a post by ROK 23-Oct-2008 -->
<section><title>How does type checking in Erlang compare to
Java/Python/C/C++/Haskell?</title>
<p>
Erlang itself, i.e. <em>ignoring</em> the Dialyzer, uses a
dynamic type system. All type checking is done
at run-time, the compiler does not check types at compile
time. The run-time type system cannot be defeated. This is
comparable to the type systems in Lisp, Smalltalk, Python,
Javascript, Prolog and others.
</p>
<p>
Java, Eiffel and some other languages have type systems which
are mostly checked at compile time, but with some remaining
checking done at run time. The combination of checking cannot
be defeated. Such type systems provide some guarantees about
types which can be exploited by the compiler, this can be
useful for optimisation.
</p>
<p>
Haskell, Mercury and some other languages have type systems
which are completely checked at compile time. This type system
cannot be defeated. The type system in this type of language
is also a design tool, it increases the language's
expressiveness.
</p>
<p>
C, pascal and C++ have type systems which are checked at
compile time, but can be defeated by straightforward means
provided by the language.
</p>
</section>
<section><title>What's Erlang's relation to 'object oriented' programming?</title>
<p>
Many people think of Erlang as a language oriented towards
concurrency and message passing, which happens to be
functional. I.e. object-oriented isn't mentioned.
</p>
<p>
Another common point of view is to try and define 'object oriented'
and then look at how Erlang fits that definition. For instance, a
relatively noncontroversial list of OO features is: dynamic
dispatch, encapsulation, subtype polymorphism, inheritance and open
recursion. If you regard an Erlang process as an object, then three
of the five features are clearly there---e.g. 'open recursion' is
just sending a message to self(). The two others, subtype
polymorphism and inheritance, are less naturally modelled.
</p>
<p>
Still another approach is to elevate the importance of message
passing in OO. Alan Kay is often brought up in such discussions for
three reasons: he's credited with cointing the term 'object
oriented' and message passing is the first thing he mentions when
explaining what he means by 'object oriented' (for instance in
<url href='http://www.purl.org/stefan_ram/pub/doc_kay_oop_en'>
this email</url>) and he doesn't specify a rigid interpretation of
how <em>inheritance</em> should work.
</p>
<p>
Various people have published papers on 'how to do OO in Erlang",
the first book (now defunct) about Erlang featured a whole chapter
on the subject. In practice, few, if any, real Erlang systems
attempt to follow those 'OO Erlang' conventions. The <url
href="http://ceylan.sourceforge.net/main/documentation/wooper/">
WOOPER</url> project is one example of a serious effort to put an OO
layer on top of Erlang.
</p>
</section>
<section>
<marker id="string-performance"/>
<title>How are strings implemented and how can I speed up my string code?</title>
<p>
Strings are represented as linked lists, so each character
takes 8 octets of memory on a 32 bit machine, twice as much on a 64
bit machine. Access to the Nth element is O(N). This makes it
easy to accidentally write O(N^2) code. Here's an example of
how that can happen:
</p>
<code><![CDATA[
slurp(Socket) -> slurp(Socket, "").
slurp(Socket, Acc) ->
case gen_tcp:recv(Socket, 1) of
{ok, [Byte]} -> slurp(Socket, Acc ++ Byte);
_ -> Acc
end.
]]></code>
<p>
The bit-syntax provides an alternative way of handling
strings with different space/time tradeoffs to the
list implementation.
</p>
<p>Some techniques for improving performance of string-related
code:
</p>
<list>
<item><p>Avoid O(N^2) operations. Sometimes this means building
a longer string backwards and then using <c>reverse
</c>
in the final step. Other times, it's better to build
up a string as a list of lists and then let the io
functions flatten the list.
</p></item>
<item><p>Consider using an alternative representation of
your data, e.g. instead of representing the text as a
list of characters, it may be better to represent it as
a tree of words.
</p></item>
<item><p>If you're reading and writing data from sockets,
consider using binaries instead of lists of bytes. If
all you're doing is copying data from one socket to
another, this will be (much) faster. More generally,
the bit-syntax provides an alternative way of handling
strings with different space/time tradeoffs to the list
implementation.
</p></item>
<item><p>Write speed-critical (profile first!) code in C,
and then put it in your erlang system as a port-driver or a
linked-in driver.
</p></item>
<item><p>Ports (and sockets) flatten lists for you,
so flattening lists yourself slows things down.
This can also be used to create more readable code:
</p>
<code><![CDATA[
gen_tcp:send(Socket, ["GET ", Url, " HTTP/1.0", "\r\n\r\n"]).
]]></code>
</item>
</list>
</section>
<section><title> How does the Garbage Collector (GC) work?</title>
<p>
GC in Erlang works independently on each Erlang process, i.e.
each Erlang process has its own heap, and that heap is GCed
independently of other processes' heaps.
</p>
<p>
The current default GC is a "stop the world" generational mark-sweep
collector. On Erlang systems running with multiple threads (the
default on systems with more than one core), GC stops work on the
Erlang process being GCed, but other Erlang processes on other OS
threads within the same VM continue to run. The time the process
spends stopped is normally short because the size of one process'
heap is normally relatively small; much smaller than the combined
size of all processes heaps.
</p>
<p>
The GC strategy for a newly started Erlang process is
full-sweep. Once the process' live data grows above a certain size,
the GC switches to a generational strategy. If the generational
strategy reclaims less than a certain amount, the GC reverts to a
full sweep. If the full sweep also fails to recover enough space,
then the heap size is increased.
</p>
</section>
<section><title>Can I exploit knowledge about my program to make the GC more efficient?</title>
<p>
Maybe, but you might be better off expending effort on
thinking of other ways to make your system go faster.
</p><p>
One version of <c>spawn/4</c> accepts a list of
options as the last argument. Ulf Wiger (AXD301) says
that the <c>gc_switch</c> and <c>min_heap_size</c> can be
used to obtain better performance if you do some measuring,
benchmarking and thinking. One 'win' happens when you can
completely avoid GC in short-lived processes by setting their
initial heap large enough to avoid all GC during the
process' life.
</p><p>
<c>min_heap_size</c> can be useful when you <em>know</em> that a
certain process will rapidly grow its heap to well above
the system's default size. Under such circumstances you get
particularly bad GC performance with the current GC
implementation.
</p><p>
<c>gc_switch</c> affects the point at
which the garbage collector
changes from its full-sweep algorithm to its generational
algorithm. Again, in a rapidly growing heap which doesn't
contain many binaries, GC <em>might</em> perform
better with a higher threshold.
</p><p>
</p></section>
<section><title>Could the Java Virtual Machine be used to run Erlang?</title>
<p>
Yes.</p>
<p><url href="http://www.erjang.org/">Erjang</url> does exactly
that. It's an experimental system with some impressive results.
</p></section>
<section><title>Is it possible to prove the correctness of Erlang programs?
</title>
<p>
One of the touted advantages of functional programming
languages is that it is easier to formally reason about
programs and prove certain properties of a given program.
</p><p>
Two tools used to do that are
<url href="https://github.com/fredlund/McErlang">
MCErlang</url> and
<url href="https://github.com/mariachris/Concuerror">
Concuerror</url>.
</p></section>
<section><title>I've heard it's a bad idea to program defensively in Erlang. Why?</title>
<p>
The <url href="http://www.erlang.se/doc/programming_rules.shtml">
Erlang coding guidelines</url> suggest avoiding defensive
programming. The choice of the term "defensive programming"
is unfortunate, because it is usually associated with good
practice. The point of the recommendation is that allowing
an Erlang process to exit when things go wrong inside the
Erlang program is a <em>good</em> approach, i.e.
writing code which attempts to avoid an exit is usually a bad idea.
</p><p>
For example, when parsing an integer it makes perfect
sense to just write
</p>
<code><![CDATA[
I = list_to_integer(L)
]]></code>
<p>
if L is not an integer, the process will exit and a supervisor
somewhere will restart that part of the system, reporting an
error:
</p>
<code><![CDATA[
=ERROR REPORT==== 12-Mar-2003::13:04:08 ===
Error in process <0.25.0> with exit value: {badarg,[{erlang,list_to_integer,[bla]},{erl_eval,expr,3},{erl_eval,exprs,4},{shell,eval_loop,2}]}
** exited: {badarg,[{erlang,list_to_integer,[bla]},
{erl_eval,expr,3},
{erl_eval,exprs,4},
{shell,eval_loop,2}]} **
]]></code>
<p>
If a more descriptive diagnostic is required, use a manual exit:
</p>
<code><![CDATA[
uppercase_ascii(C) when C >= $a, C =< $z ->
C - ($a - $A);
uppercase_ascii(X) ->
exit({"uppercase_ascii given non-lowercase argument", X}).
]]></code>
<p>
This separation of error detection and error handling is a key
part of Erlang. It reduces complexity in fault-tolerant systems
by keeping the normal and error-handling code separate.
</p><p>
As for most most advice, there are exceptions to the recommendation. One
example is the case where input is coming from an untrusted
interface, e.g. a user or an external program.
</p><p>
Joe's
<url href="http://erlang.org/pipermail/erlang-questions/2003-March/007870.html">original explanation</url> is available online.
</p>
</section>
</chapter>
Jump to Line
Something went wrong with that request. Please try again.