Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 312 lines (246 sloc) 11.234 kb
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
1 <erl>
7811247 @vinoski whitespace cleanup
vinoski authored
2 out(A) ->
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
3 {ssi, "TAB.inc", "%%",[{"internals", "choosen"}]}.
4 </erl>
5
6
7 <div id="entry">
7811247 @vinoski whitespace cleanup
vinoski authored
8
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
9 <h1>Internals</h1>
c416324 @klacke ""
authored
10
11 <h2>Introduction</h2>
7811247 @vinoski whitespace cleanup
vinoski authored
12
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
13 <p>I'll try to describe some of the internal workings of Yaws in this page.
053dd41 @klacke *** empty log message ***
authored
14 The page is thus mostly interesting for people interested in either hacking Yaws
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
15 or simply wanting to get a better understanding.
16 </p>
7811247 @vinoski whitespace cleanup
vinoski authored
17
c416324 @klacke ""
authored
18
19 <p>I'll describe how Yaws pages get compiled, the process structure
20 and other things which can make it easier to understand the code. This page
21 is ment to be read by programmers that wish to either work on Yaws or
22 just get a better understanding.
23 </p>
24
25
26 <h2> JIT Compiling a .yaws page</h2>
27
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
28 <p>
29 When the client GETs a a page that has a .yaws suffix. The Yaws server
30 will read that page from the hard disk and divide it in parts
31 that consist of HTML code and Erlang code. Each chunk of Erlang code
32 will be compiled into a module. The chunk of Erlang code must contain
33 a function <tt>out/1</tt> If it doesn't the Yaws server will insert a
34 proper error message into the generated HTML output.
35
36 </p>
c416324 @klacke ""
authored
37
38 <p>When the Yaws server ships a .yaws page it will process it chunk by chunk
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
39 through the .yaws file. If it is HTML code, the server will ship that
ebb94d8 @klacke 1.54
authored
40 as is, whereas if it is Erlang code, the Yaws server will invoke the
c416324 @klacke ""
authored
41 <tt>out/1</tt> function in that code and insert the output of that <tt>out/1</tt>
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
42 function into the stream
43 of HTML that is being shipped to the client.
44 </p>
7811247 @vinoski whitespace cleanup
vinoski authored
45
c416324 @klacke ""
authored
46 <p>Yaws will cache the result of the compilation
47 and the next time a client requests the same .yaws page Yaws will
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
48 be able to invoke the already compiled modules directly.
49 </p>
c416324 @klacke ""
authored
50
51
52 <p>This is best explained by an example:</p>
7811247 @vinoski whitespace cleanup
vinoski authored
53
c416324 @klacke ""
authored
54 <p>Say that a file consists of 400 bytes, we have "foo.yaws"
55 and it looks like:</p>
56
57 <p>
58 <img src="compile_layout.png" />
59 </p>
60
61 <p>When a client request the file "foo.yaws", the webserver will
62 look in its cache for the file, (more on that later). For the sake of
63 argument, we assume the file is not in the cache.
64
65 </p>
66 <p>The file will be processes by the code in <tt>yaws_compile.erl</tt>
67 and the result will be a structure that looks like:</p>
68
69 <div class="box">
70 <verbatim>
71
72 [CodeSpec]
73 CodeSpec = Data | Code | Error
74 Data = {data, NumChars}
75 Code = {mod, LineNo, YawsFile, NumSkipChars, Mod, Func}
76 Err = {error, NumSkipChars, E}
77
78 </verbatim>
79 </div>
7811247 @vinoski whitespace cleanup
vinoski authored
80
c416324 @klacke ""
authored
81
82 <p>In the particular case of our "foo.yaws" file above, the JIT
83 compiler will return:
84 </p>
85
86 <div class="box">
87 <verbatim>
88
89 [{mod, 1, "/foo.yaws", 100, m1, out},
90 {data, 200},
91 {mod, 30, "/foo.yaws", 100, m2, out}
92 ]
93
94 </verbatim>
95 </div>
96
97 <p>
98 This structure gets stored in the cache and will continue
99 to be associated to the file "foo.yaws".
100 </p>
101 <p>When the server "ships" a .yaws page, it needs the <tt>CodeSpec</tt>
102 structure to do it. If the structure is not in the cache, the page
103 gets JIT compiled and inserted into the cache.
104 </p>
7811247 @vinoski whitespace cleanup
vinoski authored
105 <p>To ship the above <tt>CodeSpec</tt> structure, the server
c416324 @klacke ""
authored
106 performs the following steps:</p>
107 <ol>
7811247 @vinoski whitespace cleanup
vinoski authored
108 <li>Create the Arg structure which is a #arg{} record, this
c416324 @klacke ""
authored
109 structure is wellknown to all yaws programmers since it's the
7811247 @vinoski whitespace cleanup
vinoski authored
110 main mechanism to pass data from the server to the .yaws
c416324 @klacke ""
authored
111 page.</li>
112 <li>Item (1) Invoke <tt>m1:out(Arg)</tt></li>
113 <li>Look at the return value from <tt>m1:out(Arg)</tt> and
114 perform whatever is requested. This typically involves generating
115 some dynamic ehtml code, generate headers or whatever.
116 </li>
117 <li>Finally jump ahead 100 bytes in the file as a result of
118 processing the first <tt>CodeSpec</tt> item.</li>
119
120 <li>Item (2) Next <tt>CodeSpec</tt> is just plain data from the file,
7811247 @vinoski whitespace cleanup
vinoski authored
121 thus we read 200 bytes from the file (or rather from the cache
c416324 @klacke ""
authored
122 since the data will be there) and ship to the client.</li>
123
124 <li>Item (3) Yet another {mod structure which is handled
125 the same way as Item (1) above except that the erlang module
126 is <tt>m2</tt> instead of <tt>m1</tt></li>
127 </ol>
128
129 <p>Another thing that is worth mentioning is that yaws will
7811247 @vinoski whitespace cleanup
vinoski authored
130 not ship (write on the socket) data until all content is generated.
c416324 @klacke ""
authored
131 This is questionable
7811247 @vinoski whitespace cleanup
vinoski authored
132 and different from what i.e. PHP does. This makes it possible to
c416324 @klacke ""
authored
133 generate headers after content has been generated.
134 </p>
135
136
137
138 <h2>Process structure</h2>
7811247 @vinoski whitespace cleanup
vinoski authored
139
c416324 @klacke ""
authored
140 <p>Before describing the process structure, I need to describe
141 the two most important datastructures in Yaws. The <tt>#gconf{}</tt>
142 and the <tt>#sconf{}</tt> records.
143 </p>
48c0510 Add access functions for #gconf{} and #sconf{} records
Christopher Faulet authored
144 <p><b>Note:</b> To retrieve information from these records, yaws:gconf_*/1
145 and yaws:sconf_*/1 (e.g. yaws:gconf_id/1 or yaws:sconf_docroot/1) should
146 be used in preference to a direct access to reduce the dependence of your
147 code on it.
148 </p>
c416324 @klacke ""
authored
149
150 <h3>The <tt>#gconf{}</tt> record</h3>
151 <p>This record is used to hold all global state, i.e. state and configuration
152 data which is valid for all Virtual servers.
153 The record looks like:
154 </p>
155 <div class="box">
156 <verbatim>
157
158 %%global conf
159 record(gconf,{
48c0510 Add access functions for #gconf{} and #sconf{} records
Christopher Faulet authored
160 yaws_dir, % topdir of Yaws installation
161 trace, % false | {true,http} | {true,traffic}
162 flags = ?GC_DEF, % boolean flags
163 logdir,
164 ebin_dir = [],
165 runmods = [], % runmods for entire server
166 keepalive_timeout = 30000,
167 keepalive_maxuses = nolimit, % nolimit or non negative integer
168 max_num_cached_files = 400,
169 max_num_cached_bytes = 1000000, % 1 MEG
170 max_size_cached_file = 8000,
171 max_connections = nolimit, % max number of TCP connections
172
173 %% Override default connection handler processes spawn options for
174 %% performance/memory tuning.
175 %% [] | [{fullsweep_after,Number}, {min_heap_size, Size}]
176 %% other options such as monitor, link are ignored.
177 process_options = [],
178
179 large_file_chunk_size = 10240,
180 mnesia_dir = [],
181 log_wrap_size = 10000000, % wrap logs after 10M
182 cache_refresh_secs = 30, % seconds (auto zero when debug)
183 include_dir = [], % list of inc dirs for .yaws files
184 phpexe = "/usr/bin/php-cgi", % cgi capable php executable
185
186 yaws, % server string
187 id = "default", % string identifying this instance of yaws
188
189 enable_soap = false, % start yaws_soap_srv iff true
190
191 %% a list of
192 %% {{Mod, Func}, WsdlFile, Prefix} | {{Mod, Func}, WsdlFile}
193 %% automatically setup in yaws_soap_srv init.
194 soap_srv_mods = [],
195
196 ysession_mod = yaws_session_server, % storage module for ysession
197 acceptor_pool_size = 8, % size of acceptor proc pool
198
199 mime_types_info % undefined | #mime_types_info{}
7811247 @vinoski whitespace cleanup
vinoski authored
200 }).
c416324 @klacke ""
authored
201
202 </verbatim>
203 </div>
204
49fc86f @klacke docs and also don't fail if authmod:get_headers() don't exist
authored
205 <p>The structure is derived from the /etc/yaws/yaws.conf file and is passed
c416324 @klacke ""
authored
206 around all through the functions in the server.
207 </p>
7811247 @vinoski whitespace cleanup
vinoski authored
208
c416324 @klacke ""
authored
209 <h3> The <tt>#sconf{}</tt> record</h3>
7811247 @vinoski whitespace cleanup
vinoski authored
210 <p>The next important datastructure is the <tt>#sconf{}</tt> record. It
c416324 @klacke ""
authored
211 is used to describe a single virtual server.
212 <p>Each:
213 </p>
214 <p>
215 <verbatim>
216 <server>
217 .....
218 </server>
219 </verbatim>
220 </p>
49fc86f @klacke docs and also don't fail if authmod:get_headers() don't exist
authored
221 <p>In the /etc/yaws/yaws.conf file corresponds to one <tt>#sconf{}</tt>
c416324 @klacke ""
authored
222 record. We have: </p>
223
224 <div class="box">
225 <verbatim>
48c0510 Add access functions for #gconf{} and #sconf{} records
Christopher Faulet authored
226
c416324 @klacke ""
authored
227 %% server conf
48c0510 Add access functions for #gconf{} and #sconf{} records
Christopher Faulet authored
228 -record(sconf, {
229 port = 8000, % which port is this server listening to
230 flags = ?SC_DEF,
231 redirect_map=[], % a list of
232 % {Prefix, #url{}, append|noappend}
233 % #url{} can be partially populated
234
235 rhost, % forced redirect host (+ optional port)
236 rmethod, % forced redirect method
237 docroot, % path to the docs
238 xtra_docroots = [], % if we have additional pseudo docroots
239 listen = [{127,0,0,1}], % bind to this IP, {0,0,0,0} is possible
240 servername = "localhost", % servername is what Host: header is
241 yaws, % server string for this vhost
242 ets, % local store for this server
243 ssl, % undefined | #ssl{}
244 authdirs = [], % [{docroot, [#auth{}]}]
245 partial_post_size = 10240,
246
247 %% An item in the appmods list can be either of the
248 %% following, this is all due to backwards compat issues.
249 %% 1. an atom - this is the equivalent to {atom, atom}
250 %% 2 . A two tuple {Path, Mod}
251 %% 3 A three tuple {Path, Mod, [ExcludeDir ....]}
252 appmods = [],
253
254 expires = [],
255 errormod_401 = yaws_outmod, % the default 401 error module
256 errormod_404 = yaws_outmod, % the default 404 error module
257 errormod_crash = yaws_outmod, % use the same module for crashes
258 arg_rewrite_mod = yaws,
259 logger_mod = yaws_log, % access/auth logging module
260 opaque = [], % useful in embedded mode
261 start_mod, % user provided module to be started
262 allowed_scripts = [yaws,php,cgi,fcgi],
263 tilde_allowed_scripts = [],
264 index_files = ["index.yaws", "index.html", "index.php"],
265 revproxy = [],
266 soptions = [],
267 extra_cgi_vars = [],
268 stats, % raw traffic statistics
269 fcgi_app_server, % FastCGI application server {host,port}
270 php_handler = {cgi, "/usr/bin/php-cgi"},
271 shaper,
272 deflate_options, % undefined | #deflate{}
273 mime_types_info, % undefined | #mime_types_info{}
274 % if undefined, global config is used
275 dispatch_mod % custom dispatch module
c416324 @klacke ""
authored
276 }).
277
278 </verbatim>
279 </div>
280
281 <p>Both of these two structures are defined in "yaws.hrl"</p>
282
283 <p>Now we're ready to describe the process structure. We have:</p>
7811247 @vinoski whitespace cleanup
vinoski authored
284
c416324 @klacke ""
authored
285 <img src="process_tree.png" />
7811247 @vinoski whitespace cleanup
vinoski authored
286
c416324 @klacke ""
authored
287 <p>Thus, all the different "servers" defined in the configuration
288 file are clumped together in groups. For HTTP (i.e. not HTTPS) servers
289 there can be multiple virtual servers per IP address. Each group is
290 defined by the pair <tt>{IpAddr, Port}</tt> and they all need to
291 have different server names.</p>
292 <p>The client will send the server name in the "Host:" header and that
293 header is used to pick a <tt>#sconf{}</tt> record out of the list
294 of virtual servers for a specific <tt>{Ip,Port}</tt> pair.
295 </p>
296
7811247 @vinoski whitespace cleanup
vinoski authored
297 <p>SSL servers are different, we cannot read the headers before we
c416324 @klacke ""
authored
298 decide which virtual server to choose because the certificate is connected
299 to a server name. Thus, there can only be one HTTPS server per
300 <tt>{Ip,Port}</tt> pair.
301
7811247 @vinoski whitespace cleanup
vinoski authored
302
c416324 @klacke ""
authored
303
304
c18942b @klacke embedded bugfix by Michael Arnoldus
authored
305 </div>
306
307
308 <erl>
309 out(A) -> {ssi, "END2",[],[]}.
310 </erl>
311
Something went wrong with that request. Please try again.