### 2. [Lexical Analysis](https://docs.python.org/2/reference/lexical_analysis.html)

- A Python program is read by a parser. 
- Input to the parser is a stream of tokens.
- The stream of tokens are generated by the lexical analyzer. 

This chapter describes how the lexical analyzer breaks a file into tokens.

- Python uses the 7-bit ASCII character set for program text.
- New in version 2.3: An encoding declaration can be used to indicate that string literals and comments use an encoding different from ASCII.
- Only warns if it finds 8-bit characters：
    
    - those warnings should be corrected by either declaring an explicit encoding, 
    - or using escape sequences if those bytes are binary data, instead of characters.

- The run-time character set depends on the I/O devices connected to the program but is generally a superset of ASCII.
- Future compatibility note: ISO Latin-1 or UTF-8 

##### Line Structures

* A Python program is divided into a number of logical lines.
* A logical line is constructed from one or more physical lines by following the explicit or implicit line joining rules.
* A physical line is a sequence of characters terminated by an end-of-line sequence:
    - Unix: LF 
    - Windows: CR LE
    - Old Mac: CR
* Comments: starts with #, ends with the end of the physical line.
* Encoding declarations: will be addressed separately
* Explicit line joining: using \

In [2]:
# Explicit Joining

s = 'This is one line. \
This is still in the same line. \
This is again in the same line'

print s

This is one line. This is still in the same line. This is again in the same line


* Implicit line joining:

In [4]:
# Implicit line joining 
lst = [[1,2,3,4,5]  # 1st row
      ,[6,7,8,9,10] # 2nd row
      ]
print lst

[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]


* Blank lines: A logical line that contains only spaces, tabs, formfeeds and possibly a comment, is ignored.
* Indentation: no need to cover here
* White spaces between tokens: separate tokens except at the beginning of a logical line or in string literals.

##### Identifiers

```
identifier ::=  (letter|"_") (letter | digit | "_")*
letter     ::=  lowercase | uppercase
lowercase  ::=  "a"..."z"
uppercase  ::=  "A"..."Z"
digit      ::=  "0"..."9"
```

##### Keywords

```
and       del       from      not       while
as        elif      global    or        with
assert    else      if        pass      yield
break     except    import    print
class     exec      in        raise
continue  finally   is        return
def       for       lambda    try

2.4: None became a constant and as a name for the built-in object None. Cannot assign a different object to it.
2.5: Using as and with as identifiers triggers a warning. Need to enable the with_statement future feature .
2.6: as and with are full keywords.
```

##### Reserved classes of identifiers

```
_*
```

- Not imported by _from module import *_. 
- The special identifier _ is used in the interactive interpreter to store the result of the last evaluation; it is stored in the ** __builtin__ ** module. 
- When not in interactive mode, _ has no special meaning and is not defined. 
- Note The name _ is often used in conjunction with internationalization; refer to the documentation for the gettext module for more information on this convention.

```
__*__
```

- System-defined names. 

```
__*
```

- Class-private names. 

##### Literals
- String literals
- Numeric literals
    - Integer and long integer literals
    ```
    7     2147483647                        0177
    3L    79228162514264337593543950336L    0377L   0x100000000L
          79228162514264337593543950336             0xdeadbeef
    ```
    - Floating point literals
    ```
    3.14    10.    .001    1e100    3.14e-10    0e0
    ```
    - Imaginary literals
    ```
    3.14j   10.j    10j     .001j   1e100j  3.14e-10j
    ```
    
##### Operators

```
+       -       *       **      /       //      %
<<      >>      &       |       ^       ~
<       >       <=      >=      ==      !=      <>
```

##### Delimiters

```
(       )       [       ]       {       }      @
,       :       .       `       =       ;
+=      -=      *=      /=      //=     %=
&=      |=      ^=      >>=     <<=     **=
```

The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer:

```
'       "       #       \
```

The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error:

```
$       ?
```

In [6]:
 0xdeadbeef

3735928559L