# Chapter 6 - Strings
This chapter covers
- Understanding strings as sequences of characters
- using basic string operations
- inserting special characters and escape sequences
- converting from objects to strings
- Formatting strings
- Using the byte type

## 6.1 Strings as sequences of characters

In [1]:
x = "Hello"
print(x[0])
print(x[-1])
print(x[1:])

H
o
ello


*strings aren't lists of characets*, _strings can't be modified._ 

## 6.2 Basic string operations

In [5]:
# combine strings
x = "Hello " + "World"
print(x)
# analogues string multiplication
print(8 * "x")

Hello World
xxxxxxxx


## 6.3 Special characers and escape sequences
\n: newline characeters
\t represents the tab 
sequences of characters that start with a backslash and that are used to represent other character are called _escape sequences_, to represent _special character_
### 6.3.1 Basic escape sequences
```python
x = ["\'", "\"", "\\", "\a", "\b", "\f", "\n", "\r", "\t", "\t", "\v"]
for spec_ch in x:
    print(x)
```
ASCII character set
### 6.3.2 Numeric (octal and hexadecimal) and Unicode escape sequences

In [28]:
print("m")
print("\155")
print("\x6D")
print("\n")
print("\012")
unicode_a = '\N{LATIN SMALL LETTER A}' # Escape by Unicode name
print(unicode_a)
unicode_a_with_acute = '\N{LATIN SMALL LETTER A WITH ACUTE}'
print(unicode_a_with_acute)
print("\u00E1") # Escape by number using \u

m
m
m




a
á
á


The Unicode character set includes the common ASCII characters.

## 6.4 String methods
### 6.4.1 The split and join methods
'+' to join characters is not efficient

In [30]:
print(" ".join(["Join", "puts", "spaces", "between", "elements"]))

Join puts spaces between elements


'split' splits on any whitespace by default.  you can tell it to split on a particular sequecne by passing in 
an optional argument:

In [36]:
x = "You\t\t can have tabs\t\n \t and newlines \n\n "
print(x.split())
x = "Mississippi"
x.split("ss")

['You', 'can', 'have', 'tabs', 'and', 'newlines']


['Mi', 'i', 'ippi']

In [45]:
help('str.split')
x = 'a b c d'
print(x.split(sep = ' ', maxsplit = 1))
print(x.split(sep = ' ', maxsplit = 2))
print(x.split(sep = ' ', maxsplit = 10))

Help on method_descriptor in str:

str.split = split(self, /, sep=None, maxsplit=-1)
    Return a list of the words in the string, using sep as the delimiter string.
    
    sep
      The delimiter according which to split the string.
      None (the default value) means split according to any whitespace,
      and discard empty strings from the result.
    maxsplit
      Maximum number of splits to do.
      -1 (the default value) means no limit.

['a', 'b c d']
['a', 'b', 'c d']
['a', 'b', 'c', 'd']


In [47]:
x = "this is a test"
"-".join(x.split())

'this-is-a-test'

### 6.4.2 Converting strings to numbers
You can use 'int' and 'float' to convert strings to int or floating numbers.
If they're passed a strings that can't be interpreted as a number, 'valueError' will raise
### 6.4.3 Getting rid of extra white space
'strip', 'lstrip', 'rstrip' functions

strip: remove whitespace at the beginning and the end  of the string

lstrip and rstrip: only remove whitespace at the left or right end of the original string

In [49]:
x = "   Hello,   world \t\t "
x.strip()
x.lstrip()
x.rstrip()

'   Hello,   world'

Check what constitute 'whitespace'

In [50]:
import string
string.whitespace

' \t\n\r\x0b\x0c'

In [53]:
x = "www.python.org"
print(x.strip("w")) # strips of all ws
print(x.strip("gor"))  # strips off all gs, os, and rs
print(x.strip(".gorw")) # strips off all dots, gs, os, rs, and ws

.python.org
www.python.
python


In [54]:
x = "(name, date),\n"
x.strip("\n)(,")

'name, date'

### 6.4.4 String searching
re, regular expresions, module is more flexible but will be introduce later

four basic string-searching method: find, rfind, index, and rindex


In [56]:
help("str.find")
x = "Mississippi"
print(x.find("ss"))
print(x.find("zz"))

Help on method_descriptor in str:

str.find = find(...)
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.

2
-1


In [57]:
x.count("ss")

2

two other methods: startswith and endswith.

In [59]:
print(x.startswith("Miss"))
print(x.startswith("Mist"))
print(x.endswith("pi"))
print(x.endswith("p"))

True
False
True
False


In [60]:
# quick check: a line ends with the string "rejected"
x = "xxx xxx xxxxxxx \v \n\t  xxxxx rejected"
x.endswith("rejected")

True

### 6.4.5 Modifying strings
*Strings are immutable, but string object have several method that can operate on that string and return a new string that's a modified version of the original string*

In [62]:
x = "Mississippi"
print(x.replace("ss", "++++"))
print(x)


Mi++++i++++ippi
Mississippi


In [66]:
help("str.maketrans")
help("str.translate")

Help on built-in function maketrans in str:

str.maketrans = maketrans(x, y=None, z=None, /)
    Return a translation table usable for str.translate().
    
    If there is only one argument, it must be a dictionary mapping Unicode
    ordinals (integers) or characters to Unicode ordinals, strings or None.
    Character keys will be then converted to ordinals.
    If there are two arguments, they must be strings of equal length, and
    in the resulting dictionary, each character in x will be mapped to the
    character at the same position in y. If there is a third argument, it
    must be a string, whose characters will be mapped to None in the result.

Help on method_descriptor in str:

str.translate = translate(self, table, /)
    Replace each character in the string using the given translation table.
    
      table
        Translation table, which must be a mapping of Unicode ordinals to
        Unicode ordinals, strings, or None.
    
    The table must implement lookup/indexin

In [74]:
x = "~x ^ (y % z)"
table = x.maketrans("~^()", "!&[]")
print(table)
print(x.translate(table))

{126: 33, 94: 38, 40: 91, 41: 93}


'!x & [y % z]'

### 6.4.6 Modifying strings with list manupulations
you can turn string into list of characters, do whatever you want, and then turn them back into a string

In [82]:
text = "Hello, World"
# removes everything after comma, and reverse the text
wordList = list(text) 
print(wordList)
wordList.index(",")
wordList[6:] = []
wordList.reverse()
text = "".join(wordList) # join the text with no space between
print(text)

['H', 'e', 'l', 'l', 'o', ',', ' ', 'W', 'o', 'r', 'l', 'd']
,olleH


### 6.4.7 Useful methods and constants
strings object have several useful methods to report various characteristics of the string

In [84]:
x = "123"
print(x.isdigit())
print(x.isalpha())
x = 'M'
print(x.islower())
print(x.isupper())


True
False
False
True


In [104]:
# What code would you use on each element to remove just the double quotes?
x = ['"abc"', 'def', '"ghi"', '"klm"', 'nop']
y = []
for element in x:
    if element.startswith("\"") and element.endswith("\""):
        y.append(element.replace("\"", ""))
    else:
        y.append(element)

print(y)

['abc', 'def', 'ghi', 'klm', 'nop']


In [107]:
# what could could you use to find the position of the *last* p in Mississippi?
x = "Mississippi"
x.rfind("p")

9

In [115]:
# When you’ve found that position, what code would you use to remove just that letter?
y = list(x)
del y[x.rfind("p")]
text = "".join(y)
print(text)

Mississipi


## 6.5 Converting from objects to strings
almost everthing on python can be conver to string use 'repr' function

In [118]:
repr([1, 2, 3])
x = [1]
x.append(2)
x.append([3, 4])
print('this list x is ' + repr(x))

this list x is [1, 2, [3, 4]]


In [127]:
print(repr(len))
help(repr)
str(len)
str(x)

<built-in function len>
Help on built-in function repr in module builtins:

repr(obj, /)
    Return the canonical string representation of the object.
    
    For many object types, including most builtins, eval(repr(obj)) == obj.



'[1, 2, [3, 4]]'

## 6.6 Using the format method
### 6.6.1 The format method and positional parameters

In [131]:
"{} is the {} of {}".format("Ambrosia", "food", "the gods")

'Ambrosia is the food of the gods'

### 6.6.2 The format method and named parameters
The 'format' method also recognizes named parameters and repalcement fields:

You can also use both positional and named parameters.

In [134]:
"{food} is the food of {user}".format(food = "Ambrosia", user = "the gods")
"{0} is the food of {user[1]}".format("Ambrosia", user = ["men", "the gods", "others"])

'Ambrosia is the food of the gods'

### 6.6.3 Format specifiers

In [141]:
help(format)
print("{0:10} is the food of gods".format("Ambrosia"))
print("{0:{1}} is the food of gods".format("Ambrosia", 10))
print("{food:{width}} is the food of gods".format(food = "Ambrosia", width = 10))
print("{0:>10} is the food of gods".format("Ambrosia")) # forces right-justification of the field and pads with space
print("{0:&>10} is the food of gods".format("Ambrosia")) # :&>10 forces right-justification and pads with & instead of spaces


Help on built-in function format in module builtins:

format(value, format_spec='', /)
    Return value.__format__(format_spec)
    
    format_spec defaults to the empty string.
    See the Format Specification Mini-Language section of help('FORMATTING') for
    details.

Ambrosia   is the food of gods
Ambrosia   is the food of gods
Ambrosia   is the food of gods
  Ambrosia is the food of gods
&&Ambrosia is the food of gods


In [151]:
x = "{1:{0}}".format(3, 4) # two spaces and 4
print(x)
x = "{0:$>5}".format(3) # $$$$$3
print(x)
x = "{a:{b}}".format(a=1, b=5) # 4 sapces and 1
print(x)
x = "{a:{b}}:{0:$>5}".format(3, 4, a=1, b=5, c=10) # no idea
print(x)

  4
$$$$3
    1
    1:$$$$3


## 6.8 String interpolation

## 6.9 Bytess

In [156]:
unicode_a_with_acute = '\N{LATIN SMALL LETTER A WITH ACUTE}'
xb = unicode_a_with_acute.encode()
print(xb)
type(xb)
print(xb.decode())

b'\xc3\xa1'
á


QUICK CHECK: BYTES

For which of the following kinds of data would you want to use a string? For which could you use bytes?

Data file storing binary data: bytes
Text in a language with accented characters: encoding
Text with only uppercase and lowercase roman characters: encoding
A series of integers no larger than 255: encoding