Skip to content
This repository
Fetching contributors…

Cannot retrieve contributors at this time

file 256 lines (203 sloc) 8.448 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256
<erl>
out(A) ->
       {ssi, "TAB.inc", "%%",[{"internals", "choosen"}]}.
</erl>


<div id="entry">

  <h1>Internals</h1>

  <h2>Introduction</h2>

  <p>I'll try to describe some of the internal workings of Yaws in this page.
  The page is thus mostly interesting for people interested in either hacking Yaws
  or simply wanting to get a better understanding.
  </p>


  <p>I'll describe how Yaws pages get compiled, the process structure
  and other things which can make it easier to understand the code. This page
  is ment to be read by programmers that wish to either work on Yaws or
  just get a better understanding.
  </p>


  <h2> JIT Compiling a .yaws page</h2>

  <p>
    When the client GETs a a page that has a .yaws suffix. The Yaws server
    will read that page from the hard disk and divide it in parts
    that consist of HTML code and Erlang code. Each chunk of Erlang code
    will be compiled into a module. The chunk of Erlang code must contain
    a function <tt>out/1</tt> If it doesn't the Yaws server will insert a
    proper error message into the generated HTML output.

  </p>

  <p>When the Yaws server ships a .yaws page it will process it chunk by chunk
  through the .yaws file. If it is HTML code, the server will ship that
  as is, whereas if it is Erlang code, the Yaws server will invoke the
  <tt>out/1</tt> function in that code and insert the output of that <tt>out/1</tt>
  function into the stream
  of HTML that is being shipped to the client.
  </p>

  <p>Yaws will cache the result of the compilation
  and the next time a client requests the same .yaws page Yaws will
  be able to invoke the already compiled modules directly.
  </p>


  <p>This is best explained by an example:</p>

  <p>Say that a file consists of 400 bytes, we have "foo.yaws"
  and it looks like:</p>

  <p>
    <img src="compile_layout.png" />
  </p>

  <p>When a client request the file "foo.yaws", the webserver will
  look in its cache for the file, (more on that later). For the sake of
  argument, we assume the file is not in the cache.

  </p>
  <p>The file will be processes by the code in <tt>yaws_compile.erl</tt>
  and the result will be a structure that looks like:</p>

  <div class="box">
    <verbatim>

    [CodeSpec]
    CodeSpec = Data | Code | Error
    Data = {data, NumChars}
    Code = {mod, LineNo, YawsFile, NumSkipChars, Mod, Func}
    Err = {error, NumSkipChars, E}

    </verbatim>
  </div>


  <p>In the particular case of our "foo.yaws" file above, the JIT
  compiler will return:
  </p>

  <div class="box">
  <verbatim>

    [{mod, 1, "/foo.yaws", 100, m1, out},
     {data, 200},
     {mod, 30, "/foo.yaws", 100, m2, out}
    ]

  </verbatim>
  </div>

  <p>
    This structure gets stored in the cache and will continue
    to be associated to the file "foo.yaws".
  </p>
  <p>When the server "ships" a .yaws page, it needs the <tt>CodeSpec</tt>
  structure to do it. If the structure is not in the cache, the page
  gets JIT compiled and inserted into the cache.
  </p>
  <p>To ship the above <tt>CodeSpec</tt> structure, the server
  performs the following steps:</p>
  <ol>
    <li>Create the Arg structure which is a #arg{} record, this
    structure is wellknown to all yaws programmers since it's the
    main mechanism to pass data from the server to the .yaws
    page.</li>
    <li>Item (1) Invoke <tt>m1:out(Arg)</tt></li>
    <li>Look at the return value from <tt>m1:out(Arg)</tt> and
    perform whatever is requested. This typically involves generating
    some dynamic ehtml code, generate headers or whatever.
    </li>
    <li>Finally jump ahead 100 bytes in the file as a result of
    processing the first <tt>CodeSpec</tt> item.</li>

    <li>Item (2) Next <tt>CodeSpec</tt> is just plain data from the file,
    thus we read 200 bytes from the file (or rather from the cache
    since the data will be there) and ship to the client.</li>

    <li>Item (3) Yet another {mod structure which is handled
    the same way as Item (1) above except that the erlang module
    is <tt>m2</tt> instead of <tt>m1</tt></li>
  </ol>

  <p>Another thing that is worth mentioning is that yaws will
  not ship (write on the socket) data until all content is generated.
  This is questionable
  and different from what i.e. PHP does. This makes it possible to
  generate headers after content has been generated.
  </p>



  <h2>Process structure</h2>

  <p>Before describing the process structure, I need to describe
  the two most important datastructures in Yaws. The <tt>#gconf{}</tt>
  and the <tt>#sconf{}</tt> records.
  </p>

  <h3>The <tt>#gconf{}</tt> record</h3>
  <p>This record is used to hold all global state, i.e. state and configuration
  data which is valid for all Virtual servers.
  The record looks like:
  </p>
  <div class="box">
    <verbatim>

 %%global conf
 record(gconf,{
      yaws_dir, %% topdir of Yaws installation
      trace, %% false | {true,http}|{true,traffic}
      flags = ?GC_DEF, %% boolean flags
      logdir,
      ebin_dir = [],
      runmods = [], %% runmods for entire server
      keepalive_timeout = 15000,
      max_num_cached_files = 400,
      max_num_cached_bytes = 1000000, %% 1 MEG
      max_size_cached_file = 8000,
      large_file_chunk_size = 10240,
      mnesia_dir = [],
      log_wrap_size = 1000000, % wrap logs after 1M
      cache_refresh_secs = 30, % seconds (auto zero when debug)
      include_dir = [], %% list of inc dirs for .yaws files
      phpexe = "php", %% cgi capable php executable
      yaws, %% server string
      username, %% maybe run as a different user than root
      uid, %% unix uid of user that started yaws
      id = "default" %% string identifying this instance of yaws
     }).

    </verbatim>
  </div>

  <p>The structure is derived from the /etc/yaws/yaws.conf file and is passed
  around all through the functions in the server.
  </p>

  <h3> The <tt>#sconf{}</tt> record</h3>
  <p>The next important datastructure is the <tt>#sconf{}</tt> record. It
  is used to describe a single virtual server.
  <p>Each:
  </p>
  <p>
  <verbatim>
    <server>
      .....
    </server>
  </verbatim>
  </p>
  <p>In the /etc/yaws/yaws.conf file corresponds to one <tt>#sconf{}</tt>
  record. We have: </p>

  <div class="box">
    <verbatim>
 %% server conf
 -record(sconf,
      {port = 8000, %% which port is this server listening to
       flags = ?SC_DEF,
       rhost, %% forced redirect host (+ optional port)
       rmethod, %% forced redirect method
       docroot, %% path to the docs
       listen = {127,0,0,1}, %% bind to this IP, {0,0,0,0} is possible
       servername = "localhost", %% servername is what Host: header is
       ets, %% local store for this server
       ssl,
       authdirs = [],
       partial_post_size = nolimit,
       appmods = [], %% list of modules for this app
       errormod_404 = yaws_404, %% the default 404 error module
       errormod_crash = yaws_404, %% use the same module for crashes
       arg_rewrite_mod = yaws,
       opaque = [], %% useful in embedded mode
       start_mod, %% user provided module to be started
       allowed_scripts = [yaws],
       revproxy = []
      }).

    </verbatim>
  </div>

  <p>Both of these two structures are defined in "yaws.hrl"</p>

  <p>Now we're ready to describe the process structure. We have:</p>

  <img src="process_tree.png" />

  <p>Thus, all the different "servers" defined in the configuration
  file are clumped together in groups. For HTTP (i.e. not HTTPS) servers
  there can be multiple virtual servers per IP address. Each group is
  defined by the pair <tt>{IpAddr, Port}</tt> and they all need to
  have different server names.</p>
  <p>The client will send the server name in the "Host:" header and that
  header is used to pick a <tt>#sconf{}</tt> record out of the list
  of virtual servers for a specific <tt>{Ip,Port}</tt> pair.
  </p>

  <p>SSL servers are different, we cannot read the headers before we
  decide which virtual server to choose because the certificate is connected
  to a server name. Thus, there can only be one HTTPS server per
  <tt>{Ip,Port}</tt> pair.




</div>


<erl>
out(A) -> {ssi, "END2",[],[]}.
</erl>
Something went wrong with that request. Please try again.