Permalink
Browse files

""

git-svn-id: https://erlyaws.svn.sourceforge.net/svnroot/erlyaws/trunk/yaws@863 9fbdc01b-0d2c-0410-bfb7-fb27d70d8b52
  • Loading branch information...
1 parent 8120a8b commit c416324c24144e917aec2f3b19c5790300042914 @klacke committed Apr 11, 2005
View
2 www/TAB.inc
@@ -51,7 +51,7 @@ PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
<a href="/shopingcart/index.yaws">Tiny shopping cart</a>
<div class="%%embed%%"> <a href="/embed.yaws">Embedding Yaws</a></div>
<h4> Misc </h4>
-<div class="%%internals%%"> <a href="/internals.yaws">Simple</a> </div>
+<div class="%%internals%%"> <a href="/internals.yaws">Internals</a> </div>
View
BIN www/compile_layout.dia
Binary file not shown.
View
BIN www/compile_layout.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View
221 www/internals.yaws
@@ -7,12 +7,24 @@ out(A) ->
<div id="entry">
<h1>Internals</h1>
+
+ <h2>Introduction</h2>
<p>I'll try to describe some of the internal workings of Yaws in this page.
The page is thus mostly interesting for people intereted in either hacking Yaws
or simply wanting to get a better understanding.
</p>
+
+ <p>I'll describe how Yaws pages get compiled, the process structure
+ and other things which can make it easier to understand the code. This page
+ is ment to be read by programmers that wish to either work on Yaws or
+ just get a better understanding.
+ </p>
+
+
+ <h2> JIT Compiling a .yaws page</h2>
+
<p>
When the client GETs a a page that has a .yaws suffix. The Yaws server
will read that page from the hard disk and divide it in parts
@@ -22,18 +34,219 @@ out(A) ->
proper error message into the generated HTML output.
</p>
- <p>When the \Yaws\ server ships a .yaws page it will process it chunk by chunk
+
+ <p>When the Yaws server ships a .yaws page it will process it chunk by chunk
through the .yaws file. If it is HTML code, the server will ship that
as is, whereas if it is \Erlang\ code, the \Yaws\ server will invoke the
- \verb+out/1+ function in that code and insert the output of that \verb+out/1+
+ <tt>out/1</tt> function in that code and insert the output of that <tt>out/1</tt>
function into the stream
of HTML that is being shipped to the client.
</p>
- <p>\Yaws\ will (of course) cache the result of the compilation
- and the next time a client requests the same .yaws page \Yaws\ will
+ <p>Yaws will cache the result of the compilation
+ and the next time a client requests the same .yaws page Yaws will
be able to invoke the already compiled modules directly.
</p>
+
+
+ <p>This is best explained by an example:</p>
+
+ <p>Say that a file consists of 400 bytes, we have "foo.yaws"
+ and it looks like:</p>
+
+ <p>
+ <img src="compile_layout.png" />
+ </p>
+
+ <p>When a client request the file "foo.yaws", the webserver will
+ look in its cache for the file, (more on that later). For the sake of
+ argument, we assume the file is not in the cache.
+
+ </p>
+ <p>The file will be processes by the code in <tt>yaws_compile.erl</tt>
+ and the result will be a structure that looks like:</p>
+
+ <div class="box">
+ <verbatim>
+
+ [CodeSpec]
+ CodeSpec = Data | Code | Error
+ Data = {data, NumChars}
+ Code = {mod, LineNo, YawsFile, NumSkipChars, Mod, Func}
+ Err = {error, NumSkipChars, E}
+
+ </verbatim>
+ </div>
+
+
+ <p>In the particular case of our "foo.yaws" file above, the JIT
+ compiler will return:
+ </p>
+
+ <div class="box">
+ <verbatim>
+
+ [{mod, 1, "/foo.yaws", 100, m1, out},
+ {data, 200},
+ {mod, 30, "/foo.yaws", 100, m2, out}
+ ]
+
+ </verbatim>
+ </div>
+
+ <p>
+ This structure gets stored in the cache and will continue
+ to be associated to the file "foo.yaws".
+ </p>
+ <p>When the server "ships" a .yaws page, it needs the <tt>CodeSpec</tt>
+ structure to do it. If the structure is not in the cache, the page
+ gets JIT compiled and inserted into the cache.
+ </p>
+ <p>To ship the above <tt>CodeSpec</tt> structure, the server
+ performs the following steps:</p>
+ <ol>
+ <li>Create the Arg structure which is a #arg{} record, this
+ structure is wellknown to all yaws programmers since it's the
+ main mechanism to pass data from the server to the .yaws
+ page.</li>
+ <li>Item (1) Invoke <tt>m1:out(Arg)</tt></li>
+ <li>Look at the return value from <tt>m1:out(Arg)</tt> and
+ perform whatever is requested. This typically involves generating
+ some dynamic ehtml code, generate headers or whatever.
+ </li>
+ <li>Finally jump ahead 100 bytes in the file as a result of
+ processing the first <tt>CodeSpec</tt> item.</li>
+
+ <li>Item (2) Next <tt>CodeSpec</tt> is just plain data from the file,
+ thus we read 200 bytes from the file (or rather from the cache
+ since the data will be there) and ship to the client.</li>
+
+ <li>Item (3) Yet another {mod structure which is handled
+ the same way as Item (1) above except that the erlang module
+ is <tt>m2</tt> instead of <tt>m1</tt></li>
+ </ol>
+
+ <p>Another thing that is worth mentioning is that yaws will
+ not ship (write on the socket) data until all content is generated.
+ This is questionable
+ and different from what i.e. PHP does. This makes it possible to
+ generate headers after content has been generated.
+ </p>
+
+
+
+ <h2>Process structure</h2>
+
+ <p>Before describing the process structure, I need to describe
+ the two most important datastructures in Yaws. The <tt>#gconf{}</tt>
+ and the <tt>#sconf{}</tt> records.
+ </p>
+
+ <h3>The <tt>#gconf{}</tt> record</h3>
+ <p>This record is used to hold all global state, i.e. state and configuration
+ data which is valid for all Virtual servers.
+ The record looks like:
+ </p>
+ <div class="box">
+ <verbatim>
+
+ %%global conf
+ record(gconf,{
+ yaws_dir, %% topdir of Yaws installation
+ trace, %% false | {true,http}|{true,traffic}
+ flags = ?GC_DEF, %% boolean flags
+ logdir,
+ ebin_dir = [],
+ runmods = [], %% runmods for entire server
+ keepalive_timeout = 15000,
+ max_num_cached_files = 400,
+ max_num_cached_bytes = 1000000, %% 1 MEG
+ max_size_cached_file = 8000,
+ large_file_chunk_size = 10240,
+ log_wrap_size = 1000000, % wrap logs after 1M
+ cache_refresh_secs = 30, % seconds (auto zero when debug)
+ include_dir = [], %% list of inc dirs for .yaws files
+ phpexe = "php", %% cgi capable php executable
+ yaws, %% server string
+ username, %% maybe run as a different user than root
+ uid, %% unix uid of user that started yaws
+ id = "default" %% string identifying this instance of yaws
+ }).
+
+ </verbatim>
+ </div>
+
+ <p>The structure is derived from the /etc/yaws.conf file and is passed
+ around all through the functions in the server.
+ </p>
+
+ <h3> The <tt>#sconf{}</tt> record</h3>
+ <p>The next important datastructure is the <tt>#sconf{}</tt> record. It
+ is used to describe a single virtual server.
+ <p>Each:
+ </p>
+ <p>
+ <verbatim>
+ <server>
+ .....
+ </server>
+ </verbatim>
+ </p>
+ <p>In the /etc/yaws.conf file corresponds to one <tt>#sconf{}</tt>
+ record. We have: </p>
+
+ <div class="box">
+ <verbatim>
+ %% server conf
+ -record(sconf,
+ {port = 8000, %% which port is this server listening to
+ flags = ?SC_DEF,
+ rhost, %% forced redirect host (+ optional port)
+ rmethod, %% forced redirect method
+ docroot, %% path to the docs
+ listen = {127,0,0,1}, %% bind to this IP, {0,0,0,0} is possible
+ servername = "localhost", %% servername is what Host: header is
+ ets, %% local store for this server
+ ssl,
+ authdirs = [],
+ partial_post_size = nolimit,
+ appmods = [], %% list of modules for this app
+ errormod_404 = yaws_404, %% the default 404 error module
+ errormod_crash = yaws_404, %% use the same module for crashes
+ arg_rewrite_mod = yaws,
+ opaque = [], %% useful in embedded mode
+ start_mod, %% user provided module to be started
+ allowed_scripts = [yaws],
+ revproxy = []
+ }).
+
+ </verbatim>
+ </div>
+
+ <p>Both of these two structures are defined in "yaws.hrl"</p>
+
+ <p>Now we're ready to describe the process structure. We have:</p>
+
+ <img src="process_tree.png" />
+
+ <p>Thus, all the different "servers" defined in the configuration
+ file are clumped together in groups. For HTTP (i.e. not HTTPS) servers
+ there can be multiple virtual servers per IP address. Each group is
+ defined by the pair <tt>{IpAddr, Port}</tt> and they all need to
+ have different server names.</p>
+ <p>The client will send the server name in the "Host:" header and that
+ header is used to pick a <tt>#sconf{}</tt> record out of the list
+ of virtual servers for a specific <tt>{Ip,Port}</tt> pair.
+ </p>
+
+ <p>SSL servers are different, we cannot read the headers before we
+ decide which virtual server to choose because the certificate is connected
+ to a server name. Thus, there can only be one HTTPS server per
+ <tt>{Ip,Port}</tt> pair.
+
+
+
+
</div>
View
BIN www/process_tree.dia
Binary file not shown.
View
BIN www/process_tree.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit c416324

Please sign in to comment.