<a href="https://colab.research.google.com/github/manolan1/PythonNotebooks/blob/main/IntroToPython\Chapter%202%20Variable%20Fundamentals\Chapter%202%20Variable%20Fundamentals%20(part%203).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Variable Fundamentals (continued)

## String Data Types

- Strings are immutable sequences of Unicode code points (characters)
- String constants
  - Single quote \
    `'I am a string'`
  - Double quote\
    `"I am also a string"`
  - Triple quote\
    `"""I
    am
    a
    string"""`
- There is no character type – a character is a `str` of length one
- There is a `byte` type for holding data in the range of 0 through 255
  - Good introductory tutorial: https://www.w3resource.com/python/python-bytes.php
  - Not covered in class

In [None]:
a = 'I am a string'
b = "I am also a string"
c = """I
am
a
string"""

In [None]:
print(c)

Strings may be delimited by single, double or triple quotes, which must be matched.

Strings delimited with triple quotes may span multiple lines.

### Escape Sequences

| Escape Sequence | Meaning |
|:----------------|:--------|
| \\newline (\\ at end of line) | ignore newline |
| \\\\ | backslash |
| \\'  | single quote, mainly used when single quotes are the delimiter |
| \\"  | double quote, mainly used when double quotes are the delimiter |
| \\n  | newline |
| \\t  | tab character |
| \\ooo | character with octal value _ooo_ |
| \\xhh | character with hex value _hh_. Unlike some languages, exactly 2 hex digits are required |
| \\N{name} | character called _name_ in the Unicode database |
| \\uxxxx | character with 16-bit hex value _xxxx_ |
| \\Uxxxxxxxx | character with 32-bit hex value xxxxxxxx |

All strings are made up of Unicode characters, but to retain compatibility with Python 2.x, the first quote delimiter may be preceded by `u` (or `U`), which has no effect (in Python 2.x, this denoted a Unicode string).

In [None]:
"a" == u"a"

Escape sequences may be ignored by making the string _raw_. Do this by preceding the first quote delimiter with `r` or `R`.

In [None]:
print("This is on\n2 lines!")

In [None]:
print(r"This is not on\n2 lines!")

### Simple `in` and `not in`

`x in y` is `True` if `x` is a substring of `y`.

In [None]:
"am a" in b

In [None]:
"am a" in c

## Exercise 2.1: String Methods

Open the notebook and do this exercise

## String Interpolation

- Insert data values into a string literal, replacing placeholders

- 4 ways in Python:
  - `printf`-style formatting, aka %-formatting
  - `str.format()`
  - Literal string interpolation, aka f-strings
  - Template Strings

- Will not cover any of these in detail
  - Principles are similar in each case
  - There are crucial differences


### `printf`-style Formatting

Similar to `printf` in C. The earliest mechanism available.

In [None]:
print("%s has %d quote types." % ('Python', 3))

More generally, `string % values`

- `string` contains *conversion specifiers*, each starting with `%`
- `%` is the interpolation operator
- `values` may be
  - A single object, if there is a single specifier
  - A tuple (comma-separated set of values in parentheses), as above
  - A mapping object (such as a dictionary, we will see this shortly)

### `printf` Conversion Specifier

1. The '`%`' character, which marks the start of the specifier.
2. Mapping key (optional), consisting of a parenthesised sequence of characters (for example, `(some_name)`).
3. Conversion flags (optional), which affect the result of some conversion types.
4. Minimum field width (optional). Includes any decimal places.
   - If specified as '`*`' (an asterisk), the actual width is read from the next element of the tuple in values, and the object to convert comes after.
5. Precision (optional), given as a '`.`' (dot) followed by the precision (number of digits after the decimal point)
   - If specified as '`*`' (an asterisk), the actual precision is read from the next element of the tuple in values, and the value to convert comes after.
6. Length modifier (optional). 
   - A length modifier (`h`, `l`, or `L`) is available for compatibility with C, but ignored by Python (`%ld` is identical to `%d`).
7. Conversion type.


In [None]:
value = 12.345678

In [None]:
print('|result: %20.4f|' % value)

The conversion specifier here is `%20.4f`, which is:
1. `%`
2. no mapping key
3. no conversion flags
4. minimum field width = `20`
5. precision = `4`
6. no length modifier
7. conversion type is `f` for `floating point`

### Conversion Flags

| Flag | Meaning |
|:-----|:--------|
| `#`  | Alternate form <ul><li>`0o` before octal, `0x` or `0X` before hexadecimal, `0b` before binary</li><li>always include decimal point in floating point numbers (even if no decimals)</li></ul> |
| `0`  | Zero fill numbers |
| `-`  | Left justify |
| `<space>` | Leave a space before positive numbers for a signed conversion |
| `+`  | Always put a sign (`+` or `-`) for signed conversions |


In [None]:
print('|result: %020.4f|' % value)

In [None]:
print('|result: %-20.4f|' % value)

### Conversion Types

| Conversion | Meaning |
|:-----------|:--------|
| d<br>i     | Signed integer decimal |
| o	         | Signed octal value |
| X<br>x     | Signed hexadecimal (upper or lower case), 2A or 2a |
| E<br>e     | Floating point exponential format (upper or lower case) |
| F<br>f     | Floating point decimal format (case is irrelevant) |
| G<br>g     | Floating point format. Uses exponential if exponent is less than -4 or not less than precision, decimal format otherwise. (Case applies to exponential format) |
| c          | Single character |
| r          | String (converts using `repr()`, resulting in a visual representation of an object that can be passed to `eval()` – strings have quotes, for example) |
| s          | String (converts using `str()`, resulting is a _friendly_ representation of the object – strings are the bare value) |
| a          | String (converts using `ascii()`, as `r`, but escapes non-ASCII characters) |
| %          | A percentage sign character, no value is converted |


In [None]:
s = "Lúthien Tinúviel"

In [None]:
print("%s" % s)
print("%r" % s)
print("%a" % s)

In [None]:
print("%g" % 1000000)

### Mapping Key

- Allows values to be interpolated by name
  - Instead of using positional specifiers

In [None]:
print('%(language)s has %(number)d quote types.' % { 'language': "Python", "number": 3 })

`{ 'language': "Python", "number": 3 }` is a dictionary 
- A comma-separated set of name-value pairs
- Order of the dictionary in not important
- Will learn all about the dictionary data type later

### `str.format()`

- Provides a sophisticated mini-language to describe interpolations
- Since v2.6

The syntax appears fairly complex:

```
replacement_field ::=  "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
arg_name          ::=  [identifier | digit+]
attribute_name    ::=  identifier
element_index     ::=  digit+ | index_string
index_string      ::=  <any source character except "]"> +
conversion        ::=  "r" | "s" | "a"
format_spec       ::=  <described in the next section>
```

What's important?
- Interpolation is described by _replacement fields_ that are surrounded by braces `{ }`
- Inside the braces is a field name and an expression describing the conversion.
- Field names may be either a numeric index into the arguments, or a named parameter
  - You may mix them, but you cannot have numbered arguments after named parameter
  - You can omit the numbers if they are used in sequence
- The `conversion` has the same meaning as `r`, `s` and `a` in `printf` formatting
- Insert braces by doubling: `{{` or `}}`

In [None]:
print('{0} and {1}'.format('ham', 'eggs'))

In [None]:
print('{} and {}'.format('ham', 'eggs'))

In [None]:
print('This {food} is {adjective}.'.format(food = 'spam', adjective = 'absolutely horrible'))

In [None]:
print('This {food} is {0}.'.format('absolutely horrible', food = 'spam'))

In [None]:
script = '''
WAITRESS: Well, there's {0} and {1}; {0}, {2} and {1}; {0} and {3}; {0}, {1} and {3}; {0}, {1}, {2} and {3}; {3}, {1}, {2} and {3}; {3}, {0}, {3}, {3}, {1} and {3}; {3}, {2}, {3}, {3}, {1}, {3}, {4} and {3}; ...

VIKINGS (starting to chant): {3} {3} {3} {3}...

WAITRESS: ... {3}, {3}, {3}, {0} and {3}; {3}, {3}, {3}, {3}, {3}, {3}, {5}, {3}, {3}, {3}; ...

VIKINGS (singing): {3}! Lovely {3}! Lovely {3}!

WAITRESS: ... or Lobster Thermidor au Crevette with a Mornay sauce, served in a Provencale manner with shallots and aubergines, garnished with truffle pâté, brandy and with a fried {0} on top... and {3}.
'''

print(script.format('egg', 'bacon', 'sausage', 'spam', 'tomato', 'baked beans'))

### Format Specifiers

Again, the syntax appears complex at first:

```
format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.precision][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
```

But this is actually very similar to the `printf`-style formatting we saw earlier.

Rather than look at it exhaustively, here are some key differences:
- You can use any `fill` character rather than just `0` or `<space>`
- There is more alignment control (`=` aligns numbers separately from their sign, `^` centres)
- `_` and `,` are new options (v3.6 & 3.1, respectively) that provide digit grouping
  - Do not take account of national grouping conventions, use `type n` instead
- `b` is binary
- `n` is a number according to the current locale (with appropriate separators, decimal point etc.)

In [None]:
import math

In [None]:
print('The value of \N{MATHEMATICAL ITALIC SMALL PI} is approximately {0:.3f}.'.format(math.pi))

In [None]:
value = 12.345678

In [None]:
print('|result: %20.4f|' % value)
print('|result: {value:20.4f}|'.format(value = value))

In [None]:
print('|{0:10.2f}|'.format(value))
print('|{0:<10.2f}|'.format(value))
print('|{0:^10.2f}|'.format(value))
print('|{0:*>10.2f}|'.format(value))
print('|{0:!>10.2f}|'.format(value))

### Literal String Interpolation

Introduced in v3.6

Very similar to `str.format()`
- Literals are specified with the `f` modifier and are often known as f-strings (standing for *formatted strings*).
- Allows for expressions to be used when specifying a placeholder
- Expressions are evaluated in the current context, meaning they have access to all local and global variables
- As of v3.8, including an `=` sign causes the name and value of a variable to appear
- Otherwise uses the same grammar as `str.format()`

In [None]:
name = "Dragon"

In [None]:
print(f'He said, "I am a {name!s}."')

In [None]:
import sys
sys.version

In [None]:
print(f'{name =}') # from v3.8 onwards, should result in name = Dragon

In [None]:
value = 12.123456

In [None]:
print('|result: %20.4f|' % value)
print('|result: {value:20.4f}|'.format(value = value))
print(f'|result: {value:20.4f}|') # more expressive than %, more concise than str.format

In [None]:
counter = 42

In [None]:
print(f'We should consider positions {counter} and {counter + 1}')

### Template Strings

Part of the `string` module since v2.4.

A simpler substitution mechanism than the `%` operator.

- `Template`s may contain simple placeholders introduced by `$`
- Placeholders may be:
  - `$identifier` and are terminated by the first character that is not a valid part of an identifier (typically `<space>` or some punctuation).
  - `${identifier}` which is used when they are adjacent to characters that would form part of a valid identifier
- The `substitute()` function provides values to replace the placeholders

In [None]:
from string import Template

In [None]:
t1 = Template('He said, "I am a $name."')
print(t1.substitute(name = 'Dragon'))

In [None]:
t2 = Template('"Now we start the process of ${name}ification"')
print(t2.substitute(name = 'Dragon'))

## Exercise 2.2: Chapter Exercise

Open the notebook and do this exercise

# End of Notebook