# Strings 

- Textual data in Python is handled with `str` objects, more commonly known as strings. 
- They are `immutable` sequences of unicode code points. 
- When it comes to store textual data though, or send it on the network, you may want to encode it, using an appropriate encoding for the medium you're using. 
- String literals are written in Python using single, double or triple quotes (both single or double). 

4 ways to define a string

In [None]:
str1 = 'This is a string. We built it with single quotes.'

In [None]:
 str2 = "This is also a string, but built with double quotes."

In [None]:
str3 = '''This is built using triple quotes,
so it can span multiple lines.'''

In [None]:
str4 = """This too 
is a multiline one, 
built with triple double-quotes."""

In [None]:
str4

In [None]:
print(str4)

In [None]:
len(str4)

As these are instances of the `str`class, they have associated methods and properties

In [None]:
dir(str4)

which can be called in a traditional OOP way

In [None]:
str1.lower()

In [None]:
str1.upper()

In [None]:
str1.title()

In [None]:
str1.split()

## Indexing and slicing

- When manipulating sequences, it's very common to have to access them at one precise position (indexing), or to get a subsequence out of them (slicing). 
- When dealing with immutable sequences, both operations are read-only.
- When you get a slice of a sequence, you can specify the start and stop positions, and the step: `my_sequence[start:stop:step]`. 

![indexing](images/indexing.png)


In [None]:
s = 'Are you suggesting that coconuts migrate?'

In [None]:
s[0] = 't'

In [None]:
s[0]

In [None]:
s[5]

In [None]:
s[:4]

In [None]:
s[4:]

In [None]:
s[4:14]

In [None]:
list(zip(s, range(len(s))))

In [None]:
s[4:14:3]        # slicing, start, stop and step (every 3 chars)

In [None]:
s

In [None]:
s[-1]            # indexing at last position

In [None]:
s[-5:]

In [None]:
s[:-5]

In [None]:
s[5:-5]

In [None]:
s[:]

In [None]:
r = s

In [None]:
id(s)

In [None]:
id(r)

In [None]:
s_copy = s[:5] + s[5:]
print(s_copy)
id(s_copy)

In [None]:
s_copy = s[:]
id(s_copy)

In [None]:
s_copy = 'Are you suggesting that coconuts migrate?'
id(s_copy)

In [None]:
import copy
t = copy.deepcopy(s)

In [None]:
id(t)

## Encode and decoding strings (optional)

- Using the encode/decode methods, we can encode unicode strings and decode bytes objects. 
- Utf-8 is a variable length character encoding, capable of encoding all possible unicode code points. 
- Notice also that by adding a literal b in front of a string declaration, we're creating a bytes object.

In [None]:
s = "This is üñíção"            # unicode string: code points
s

In [None]:
type(s)

In [None]:
encoded_s = s.encode('utf-8')  # utf-8 encoded version of s
type(encoded_s)

In [None]:
encoded_s

In [None]:
encoded_s.decode('utf-8')

In [None]:
b"This is \xc3\xbc\xc3\xb1\xc3\xad\xc3\xa7\xc3\xa3o"

In [None]:
b"This is \xc3\xbc\xc3\xb1\xc3\xad\xc3\xa7\xc3\xa3o".decode('utf-8')

In [None]:
"This is \xc3\xbc\xc3\xb1\xc3\xad\xc3\xa7\xc3\xa3o".decode('utf-8')

# Exercises

[Go here...](exercises/02-exercises.ipynb)