Permalink
Browse files

[en] rewrote "Nginx Variables (01)" and also added internal sections …

…to improve readability.
  • Loading branch information...
1 parent f7905e1 commit f33c9799f2207307d5d7cb6d3b667397481c3397 @agentzh agentzh committed Mar 1, 2013
Showing with 163 additions and 116 deletions.
  1. +163 −116 en/01-NginxVariables01.tut
View
@@ -1,51 +1,60 @@
= Nginx Variables (01) =
-Nginx's configuration is itself a mini language. Many Nginx configurations
-are practically programs.
-The language might not be Turing-Complete, as far as I can see, its design
+== String Container ==
+
+Nginx's configuration files use a micro programming language. Many real-world
+Nginx configuration files are essentially small programs.
+This language's design
is heavily influenced by
-Perl and Bourne Shell. This is a characteristic feature of Nginx, comparing
+Perl and Bourne Shell as far as I can see, despite the fact that it might not
+be Turing-Complete. This is a distinguishing feature of Nginx, as compared
to the other web servers
-such as Apache or Lighttpd. Being a language, "Variable" declaration becomes
-a common concept (However,
-exception does exist in Functional Languages such as Haskell)
+like Apache or Lighttpd. Being a programming language, "variables" are
+thus a natural part of it (exceptions do exist, of course, as in pure
+functional languages like Haskell).
+
+Variables are just containers holding various values in imperative languages
+like Perl, Bourne Shell, and C/C++.
+And "values" here can be numbers like C<3.14>, strings like
+C<hello world>, or even complicated things like references to arrays or
+hash tables. For the
+Nginx configuration language, however, variables can only hold one single type
+of values, that is, strings.
-For those who know well imperative languages like Perl, Bourne Shell, C/C++,
-variable is nothing but
-a container holding various values, and the "value" can be numbers like
-C<3.14> or strings like
-C<hello world>. Values can be as complicated as references to arrays or
-hash tables too. However in the
-Nginx configuration, variable contains one and only one type of value:
-strings.
+== Variable Syntax and Interpolation ==
-For example, our F<nginx.conf> has following variable declaration:
+Let's say our F<nginx.conf> configuration file has the following configuration
+line:
:nginx
set $a "hello world";
-We have used built-in L<ngx_rewrite> module's L<ngx_rewrite/set> command
-to declare and initialize
-the variable C<$a>. Specifically, it is assigned with strings C<hello world>.
-Like Perl and PHP, the
-Nginx syntax requires prefix C<$> to declare and devalue variables.
+where we assign a value to the variable C<$a> via the L<ngx_rewrite/set>
+configuration directive coming from the standard L<ngx_rewrite> module. In
+particular, we assign the string value C<hello world> to it.
-Many C<Java> and C<C#> programmers dislike the ugly C<$> variable prefix,
-yet the approach does have
-a few advantages, notably, variables can be embedded directly in a string
-to construct another string
+We can see that the Nginx variable name takes a dollar sign (C<$>) in front of
+it. This is required by the language syntax: whenever we want to reference an
+Nginx variable in the configuration file, we must add a C<$> prefix. This look
+very familiar to those Perl and PHP programmers.
+
+Such variable prefix modifiers may discomfort some C<Java> and C<C#>
+programmers, this notation does have an
+obvious advantage though, that is, variables can be embedded directly into a
+string literal:
:nginx
set $a hello;
set $b "$a, $a";
-It is using Nginx variable C<$a>, to construct variable C<$b>. Now C<$a>
-is C<hello>, and C<$b> is
-C<hello, hello>. The technique is called "variable interpolation" in Perl.
-It effectively executes
-the string concatenation.
+Here we use the value of the existing Nginx variable C<$a> to construct the
+value for the variable C<$b>. So after these two directives complete execution,
+the value of C<$a> is C<hello>, and C<$b> C<hello, hello>. This technique is
+called "variable interpolation" in the Perl world, which makes ad-hoc string
+concatenation operators no longer that necessary. Let's use the same term for
+the Nginx world from now on.
-Let's have a look at another example:
+Let's see another complete example:
:nginx
server {
@@ -57,34 +66,55 @@ Let's have a look at another example:
}
}
-The example omits the outter C<http> directive and C<events> directive
-in F<nginx.conf>. With
-the HTTP client utility C<curl>, we can issue a HTTP request to C</test>
-from command line and
-obtain following result:
+This example omits the C<http> directive and C<events> configuration blocks in
+the outer-most scope for brevity. To request this C</test> interface via
+C<curl>, an HTTP client utility, on the command line, we get
:bash
$ curl 'http://localhost:8080/test'
foo: hello
-Here we use 3rd party module L<ngx_echo> and its command L<ngx_echo/echo>
-to print the value
-of variable C<$foo> as HTTP response.
+Here we use the L<ngx_echo/echo> directive of the 3rd party module L<ngx_echo>
+to print out the value of the C<$foo> variable as the HTTP response.
-We can assert that L<ngx_echo/echo> supports "variable interpolation",
-yet we must not take it
-for granted, since not all the variable commands supports "variable interpolation"
+Apparently the arguments of the L<ngx_echo/echo> directive does support
+"variable interpolation", but we
+can not take it
+for granted for other directives. Because not all the configuration directives
+support "variable interpolation"
and it is
-in fact up to the module's implementation.
+in fact up to the implementation of the directive in that module. Always look
+up the documentation to be sure.
+
+=== Escaping "$" ===
+
+We've already learned that the C<$> character is special and it serves as the
+variable name prefix, but now consider that we want to output a literal C<$>
+character via the L<ngx_echo/echo> directive. The following naive example does
+not work at all:
+
+ ? :nginx
+ ? location /t {
+ ? echo "$";
+ ? }
+
+we will get the following error message while loading this configuration:
-Is there any way to escape C<$> so that it is no more than a typical dollar
-sign by using
-L<ngx_echo/echo> ? The answer is negative (the answer still holds in the
+ [emerg] invalid variable name in ...
+
+Obviously Nginx is try to parse C<$"> as a variable name. Is there a way to
+escape C<$> in the string literal? The answer is "no" (it is still the case in
+the
latest Nginx stable
-release C<1.0.10>. Luckily this can be done by other module commands, which
-designate C<$> value
-as a Nginx variable, then the variable can be used in L<ngx_echo/echo>,
-example:
+release C<1.2.7>) and I have been hoping that we could write something like
+C<$$> to obtain a literal C<$>.
+
+Luckily, workarounds do exist and here is one proposed by Maxim Dounin: first
+we assign to a variable a literal string containing the dollar sign character
+via a configuration directive that does I<not> support "variable interpolation"
+(remember that not all the directives support "variable interpolation"?), and
+then use L<ngx_echo/echo> to print out this variable's value. Here is such an
+example to demonstrate the idea:
:nginx
geo $dollar {
@@ -99,30 +129,32 @@ example:
}
}
-testing result is following:
+Let's test it out:
:bash
$ curl 'http://localhost:8080/test'
This is a dollar sign: $
-The built-in module L<ngx_geo> and its command L<ngx_geo/geo> are used
-to initialize
-variable C<$dollar> with string C<"$">, thereafter variable C<$dollar>
+Here we make use of the L<ngx_geo/geo> directive of the standard module
+L<ngx_geo> to initialize the
+C<$dollar> variable with the string C<"$">, thereafter variable C<$dollar>
can be used
-for circumstances asking for a dollar sign. Actually, the typical scenario
-L<ngx_geo>
-is applied for, is to assign Nginx variable by taking into account the
-request client
-IP addresses. For above specific example, it is used to initialize C<$dollar>
+wherever we need a literal dollar sign. This works because the L<ngx_geo/geo>
+directive does not
+support "variable interpolation" at all. However, the L<ngx_geo> module
+is designed to set a Nginx variable to different values according to the
+remote client
+address. In the sample above, we just abuse it to initialize the C<$dollar>
variable
-with the dollar sign string unconditionally.
+with the string C<"$"> unconditionally.
-Attention, "variable interpolation" has a special case, where the variable
-name itself
-cannot be delimited from the rest of the string (such as it is right in
-front of letter,
-digit or underscore) Hence a special syntax is needed to handle the case,
-as following:
+=== Disambiguating Variable Names ===
+
+There is a special case when using "variable interpolation" when the variable
+name is followed directly by characters consisting the variable names (like
+letters, digits, and underscores).
+In such cases we can use a special notation to disambiguate the variable name
+from the subsequent literal characters:
:nginx
server {
@@ -134,27 +166,32 @@ as following:
}
}
-In the example, variable C<$first> is concatenated with C<world>. If it
-is written
-directly as C<"$firstworld">, Nginx's variable interpolation tries to devalue
-variable
-C<$firstworld> instead of C<$first>. To fix this problem, curly bracket
-can be used
-together with C<$>, such as C<${first}>. Above example has following result:
+Here the variable C<$first> is concatenated with the literal string C<world>.
+If it
+were written
+directly as C<"$firstworld">, Nginx's "variable interpolation" engine (also
+known as the "script engine") would try to access the variable
+C<$firstworld> instead of C<$first>. To resolve the ambiguity, curly brackets
+must be used
+after the C<$> prefix, as in C<${first}>. Let's test this sample:
:bash
$ curl 'http://localhost:8080/test
hello world
-Command L<ngx_rewrite/set> (and Command L<ngx_geo/geo>) not only initialize
-a variable,
-effectively it firstly declares the variable. Which means, if the variable
-is not declared yet,
-it is declared automatically (then initialized). In the example, if variable
-C<$a> is not declared,
-C<set> declares the variable at first hand. If variables are not declared,
-Nginx cannot devalue
-them, another example:
+== Variable Declaration or Creation ==
+
+In languages like C/C++, variables must be declared (or created) before they
+can be used so that the compiler can allocate storage and perform type checking
+at compile-time. Similarly, Nginx creates all the Nginx variables while loading
+the configuration file (or in other words, at "configuration time"), so Nginx
+variables are also required to be declared somehow.
+
+Fortunately the L<ngx_rewrite/set> directive and the L<ngx_geo/geo> directive
+mentioned above do have the side effect of declaring or creating Nginx
+variables that they will assign values to later at "request time". If we do not
+declare a variable this way and use it directly in, say, the L<ngx_echo/echo>
+directive, we will get an error. For example,
:nginx
? server {
@@ -165,25 +202,26 @@ them, another example:
? }
? }
-Nginx aborts loading configuration:
+Here we do not declare the C<$foo> variable and access its value directly in
+L<ngx_echo/echo>. Nginx will just refuse loading this configuration:
[emerg] unknown "foo" variable
-Yes, the server cannot even be started!
+Yes, we cannot even start the server!
+
+Nginx variable creation and assignment happen
+at completely phases along the timeline.
+Variable creation only occurs when Nginx loads its configuration. On the other
+hand, variable assignment occurs when requests are actually
+being handled. This also means that we can never create new Nginx variables at
+"request time".
-More importantly, Nginx variable declaration and initialization happens
-at different phases in the timeline.
-Variable declaration only occurs when Nginx loads its configuration, in
-other words, when Nginx is started.
-On the other hand, variable initialization occurs when actual request is
-being handled. Consequently, server
-fails bootstrap if variable is not declared, further more, new Nginx variables
-cannot be declared dynamically in
-the run time.
+== Variable Scope ==
-As soon as a variable is declared in Nginx, its scope is the entire configuration,
+Once an Nginx variable is created, it is visible to the entire configuration,
regardless of the location
-it is referenced, even for different virtual server directives. Here is
+it is referenced, even across different virtual server configuration blocks.
+Here is
an example:
:nginx
@@ -200,11 +238,13 @@ an example:
}
}
-Variable C<$foo> is declared by command C<set> within C<location /bar>,
-as variable
-visibility is the entire configuration. It can be referenced in C<location
+Here the variable C<$foo> is created by the L<ngx_rewrite/set> directive within
+C<location /bar>,
+and this variable is visible to the entire configuration, therefore we can
+reference it in C<location
/foo> without
-causing any error, following are the location outcomes respectively:
+worries. Below is the result of testing these two interfaces via the C<curl>
+tool.
:bash
$ curl 'http://localhost:8080/foo'
@@ -216,21 +256,28 @@ causing any error, following are the location outcomes respectively:
$ curl 'http://localhost:8080/foo'
foo = []
-As we can tell, command C<set> is executed within C<location /bar>, so
-the variable is only initialized when C</bar>
-is requested. If C</foo> is requested directly, variable C<$foo> has an
-empty value. Default value is an empty string
-if Nginx variable is not initialized.
-
-The example carries another important feature, i.e. although variable scope
-is the entire configuration, every request
-has its own copies of the declared variables. In the example, variable
-C<$foo> is initialized with value C<32> when C</bar>
-is requested, but it remains empty in the subsequent request to C</foo>
-since every request has their own copy of variables
-
-This is a common pitfall many Nginx newbie stumbles, which is to think
-Nginx variable as "global variable" or configuration
-settings that are shared for the entire server life time. In fact, variables
-cannot last in between different requests.
+We can see that the assignment operation is only performed in requests that
+access C<location /bar>, since the corresponding L<ngx_rewrite/set> directive
+is only used in that location. When requesting the C</foo> interface, we always
+get an empty value for the C<$foo> variable because that is what we get when
+accessing an uninitialized variable.
+
+Another important behavior that we can observe from this example is that even
+though the scope of Nginx variables is the entire configuration, each request
+does have its own version of all those variables. Or in other words, each
+request has its own copy of value containers for all variables. Requests do not
+interfere with each other even if they are referencing a variable with the same
+name. This is very much like local variables in C/C++ function bodies. Each
+invocation of the C/C++ function does use its own version of those local
+variables.
+
+For instance, in this sample, we request C</bar> and the variable C<$foo> gets
+the value C<32>, which does not affect the value of C<$foo> in subsequent
+requests to C</foo> (it is still uninitialized!), because they correspond to
+different value containers.
+
+One of the most common mistakes for Nginx newcomers is to regard Nginx
+variables as something shared among all the requests. Even though the scope of
+Nginx variables go across configuration blocks, it never goes beyond request
+boundaries. Essentially here we do have two different kinds of scopes here.

0 comments on commit f33c979

Please sign in to comment.