Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 258 lines (203 sloc) 8.484 kb
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
1 <erl>
2 out(A) ->
3 {ssi, "TAB.inc", "%%",[{"internals", "choosen"}]}.
4 </erl>
5
6
7 <div id="entry">
8
9 <h1>Internals</h1>
c416324 @klacke ""
authored
10
11 <h2>Introduction</h2>
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
12
13 <p>I'll try to describe some of the internal workings of Yaws in this page.
053dd41 @klacke *** empty log message ***
authored
14 The page is thus mostly interesting for people interested in either hacking Yaws
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
15 or simply wanting to get a better understanding.
16 </p>
17
c416324 @klacke ""
authored
18
19 <p>I'll describe how Yaws pages get compiled, the process structure
20 and other things which can make it easier to understand the code. This page
21 is ment to be read by programmers that wish to either work on Yaws or
22 just get a better understanding.
23 </p>
24
25
26 <h2> JIT Compiling a .yaws page</h2>
27
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
28 <p>
29 When the client GETs a a page that has a .yaws suffix. The Yaws server
30 will read that page from the hard disk and divide it in parts
31 that consist of HTML code and Erlang code. Each chunk of Erlang code
32 will be compiled into a module. The chunk of Erlang code must contain
33 a function <tt>out/1</tt> If it doesn't the Yaws server will insert a
34 proper error message into the generated HTML output.
35
36 </p>
c416324 @klacke ""
authored
37
38 <p>When the Yaws server ships a .yaws page it will process it chunk by chunk
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
39 through the .yaws file. If it is HTML code, the server will ship that
ebb94d8 @klacke 1.54
authored
40 as is, whereas if it is Erlang code, the Yaws server will invoke the
c416324 @klacke ""
authored
41 <tt>out/1</tt> function in that code and insert the output of that <tt>out/1</tt>
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
42 function into the stream
43 of HTML that is being shipped to the client.
44 </p>
45
c416324 @klacke ""
authored
46 <p>Yaws will cache the result of the compilation
47 and the next time a client requests the same .yaws page Yaws will
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
48 be able to invoke the already compiled modules directly.
49 </p>
c416324 @klacke ""
authored
50
51
52 <p>This is best explained by an example:</p>
53
54 <p>Say that a file consists of 400 bytes, we have "foo.yaws"
55 and it looks like:</p>
56
57 <p>
58 <img src="compile_layout.png" />
59 </p>
60
61 <p>When a client request the file "foo.yaws", the webserver will
62 look in its cache for the file, (more on that later). For the sake of
63 argument, we assume the file is not in the cache.
64
65 </p>
66 <p>The file will be processes by the code in <tt>yaws_compile.erl</tt>
67 and the result will be a structure that looks like:</p>
68
69 <div class="box">
70 <verbatim>
71
72 [CodeSpec]
73 CodeSpec = Data | Code | Error
74 Data = {data, NumChars}
75 Code = {mod, LineNo, YawsFile, NumSkipChars, Mod, Func}
76 Err = {error, NumSkipChars, E}
77
78 </verbatim>
79 </div>
80
81
82 <p>In the particular case of our "foo.yaws" file above, the JIT
83 compiler will return:
84 </p>
85
86 <div class="box">
87 <verbatim>
88
89 [{mod, 1, "/foo.yaws", 100, m1, out},
90 {data, 200},
91 {mod, 30, "/foo.yaws", 100, m2, out}
92 ]
93
94 </verbatim>
95 </div>
96
97 <p>
98 This structure gets stored in the cache and will continue
99 to be associated to the file "foo.yaws".
100 </p>
101 <p>When the server "ships" a .yaws page, it needs the <tt>CodeSpec</tt>
102 structure to do it. If the structure is not in the cache, the page
103 gets JIT compiled and inserted into the cache.
104 </p>
105 <p>To ship the above <tt>CodeSpec</tt> structure, the server
106 performs the following steps:</p>
107 <ol>
108 <li>Create the Arg structure which is a #arg{} record, this
109 structure is wellknown to all yaws programmers since it's the
110 main mechanism to pass data from the server to the .yaws
111 page.</li>
112 <li>Item (1) Invoke <tt>m1:out(Arg)</tt></li>
113 <li>Look at the return value from <tt>m1:out(Arg)</tt> and
114 perform whatever is requested. This typically involves generating
115 some dynamic ehtml code, generate headers or whatever.
116 </li>
117 <li>Finally jump ahead 100 bytes in the file as a result of
118 processing the first <tt>CodeSpec</tt> item.</li>
119
120 <li>Item (2) Next <tt>CodeSpec</tt> is just plain data from the file,
121 thus we read 200 bytes from the file (or rather from the cache
122 since the data will be there) and ship to the client.</li>
123
124 <li>Item (3) Yet another {mod structure which is handled
125 the same way as Item (1) above except that the erlang module
126 is <tt>m2</tt> instead of <tt>m1</tt></li>
127 </ol>
128
129 <p>Another thing that is worth mentioning is that yaws will
130 not ship (write on the socket) data until all content is generated.
131 This is questionable
132 and different from what i.e. PHP does. This makes it possible to
133 generate headers after content has been generated.
134 </p>
135
136
137
138 <h2>Process structure</h2>
139
140 <p>Before describing the process structure, I need to describe
141 the two most important datastructures in Yaws. The <tt>#gconf{}</tt>
142 and the <tt>#sconf{}</tt> records.
143 </p>
144
145 <h3>The <tt>#gconf{}</tt> record</h3>
146 <p>This record is used to hold all global state, i.e. state and configuration
147 data which is valid for all Virtual servers.
148 The record looks like:
149 </p>
150 <div class="box">
151 <verbatim>
152
153 %%global conf
154 record(gconf,{
155 yaws_dir, %% topdir of Yaws installation
156 trace, %% false | {true,http}|{true,traffic}
157 flags = ?GC_DEF, %% boolean flags
158 logdir,
159 ebin_dir = [],
160 runmods = [], %% runmods for entire server
161 keepalive_timeout = 15000,
162 max_num_cached_files = 400,
163 max_num_cached_bytes = 1000000, %% 1 MEG
164 max_size_cached_file = 8000,
165 large_file_chunk_size = 10240,
d508b52 @klacke added mnesia_dir support to the gconf record as per patch BY Richard Buc...
authored
166 mnesia_dir = [],
c416324 @klacke ""
authored
167 log_wrap_size = 1000000, % wrap logs after 1M
168 cache_refresh_secs = 30, % seconds (auto zero when debug)
169 include_dir = [], %% list of inc dirs for .yaws files
170 phpexe = "php", %% cgi capable php executable
171 yaws, %% server string
172 username, %% maybe run as a different user than root
173 uid, %% unix uid of user that started yaws
174 id = "default" %% string identifying this instance of yaws
175 }).
176
177 </verbatim>
178 </div>
179
180 <p>The structure is derived from the /etc/yaws.conf file and is passed
181 around all through the functions in the server.
182 </p>
183
184 <h3> The <tt>#sconf{}</tt> record</h3>
185 <p>The next important datastructure is the <tt>#sconf{}</tt> record. It
186 is used to describe a single virtual server.
187 <p>Each:
188 </p>
189 <p>
190 <verbatim>
191 <server>
192 .....
193 </server>
194 </verbatim>
195 </p>
196 <p>In the /etc/yaws.conf file corresponds to one <tt>#sconf{}</tt>
197 record. We have: </p>
198
199 <div class="box">
200 <verbatim>
201 %% server conf
202 -record(sconf,
203 {port = 8000, %% which port is this server listening to
204 flags = ?SC_DEF,
205 rhost, %% forced redirect host (+ optional port)
206 rmethod, %% forced redirect method
207 docroot, %% path to the docs
208 listen = {127,0,0,1}, %% bind to this IP, {0,0,0,0} is possible
209 servername = "localhost", %% servername is what Host: header is
210 ets, %% local store for this server
211 ssl,
212 authdirs = [],
213 partial_post_size = nolimit,
214 appmods = [], %% list of modules for this app
215 errormod_404 = yaws_404, %% the default 404 error module
216 errormod_crash = yaws_404, %% use the same module for crashes
217 arg_rewrite_mod = yaws,
218 opaque = [], %% useful in embedded mode
219 start_mod, %% user provided module to be started
220 allowed_scripts = [yaws],
221 revproxy = []
222 }).
223
224 </verbatim>
225 </div>
226
227 <p>Both of these two structures are defined in "yaws.hrl"</p>
228
229 <p>Now we're ready to describe the process structure. We have:</p>
230
231 <img src="process_tree.png" />
232
233 <p>Thus, all the different "servers" defined in the configuration
234 file are clumped together in groups. For HTTP (i.e. not HTTPS) servers
235 there can be multiple virtual servers per IP address. Each group is
236 defined by the pair <tt>{IpAddr, Port}</tt> and they all need to
237 have different server names.</p>
238 <p>The client will send the server name in the "Host:" header and that
239 header is used to pick a <tt>#sconf{}</tt> record out of the list
240 of virtual servers for a specific <tt>{Ip,Port}</tt> pair.
241 </p>
242
243 <p>SSL servers are different, we cannot read the headers before we
244 decide which virtual server to choose because the certificate is connected
245 to a server name. Thus, there can only be one HTTPS server per
246 <tt>{Ip,Port}</tt> pair.
247
5ddf4f4 @klacke Bug fixed with bindings that got propagated over redirects. Good ol get/...
authored
248
c416324 @klacke ""
authored
249
250
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
251 </div>
252
253
254 <erl>
255 out(A) -> {ssi, "END2",[],[]}.
256 </erl>
257
Something went wrong with that request. Please try again.