# 01 - Sequence Types

Sequence types have the general concept of a first element, a second element, and so on. Basically an ordering of the sequence items using the natural numbers. In Python (and many other languages) the starting index is set to `0`, not `1`.

So the first item has index `0`, the second item has index `1`, and so on.

Python has built-in mutable and immutable sequence types.

Strings, tuples, ranges and bytes are immutable - we can access but not modify the **content** of the **sequence**:

In [15]:
t = (1, 2, 3)

t[0] = 100

TypeError: 'tuple' object does not support item assignment

But of course, if the sequence contains mutable objects, then although we cannot modify the sequence of elements (cannot replace, delete or insert elements), we certainly **can** change the contents of the mutable objects:

In [17]:
t = ( [1, 2], 3, 4)

`t` is immutable, but its first element is a mutable object:

In [18]:
t[0][0] = 100

t

([100, 2], 3, 4)

Sequences can be **Homogeneous** or **heterogeneous** where elements in a homogeneous sequences are all of the same type. For example, in a string, all characters have the same type. Lists can be heterogeneous where different elements have different types. Generally homogeneous sequences are usually more efficient (storage-wise).

#### Iterables

An **iterable** is just something that can be iterated over, for example using a `for` loop. It is a **container** type of object which means it contains other objects, and it becomes an iterable if we can list out the elements one by one. 

But iterables are more general; they are **not necessarily** a sequence type. For example a set `s = {1, 2, 3}` is an iterable because we can type `for e in s` but it's not a sequence because we can't type `s[0]`.

In [7]:
t = (10, 'a', 1+3j)

In [8]:
s = {10, 'a', 1+3j}

In [9]:
for c in t:
    print(c)

10
a
(1+3j)


In [10]:
for c in s:
    print(c)

a
10
(1+3j)


Note how we could iterate over both the tuple and the set. Iterating the tuple preserved the **order** of the elements in the tuple, but not for the set. Sets do not have an ordering of elements - they are iterable, but not sequences.

Most sequence types support the `in` and `not in` operations. Ranges do too, but not quite as efficiently as lists, tuples, strings, etc.

In [11]:
'a' in ['a', 'b', 100]

True

In [12]:
100 in range(200)

True

#### Min, Max and Length

Sequences also generally support the `len` method to obtain the number of items in the collection. Some iterables may also support that method.

In [13]:
len('python'), len([1, 2, 3]), len({10, 20, 30}), len({'a': 1, 'b': 2})

(6, 3, 3, 2)

Sequences (and even some iterables) may support `max` and `min` as long as the data types in the collection can be **ordered** in some sense (`<` or `>`).

In [14]:
a = [100, 300, 200]
min(a), max(a)

(100, 300)

In [15]:
s = 'python'
min(s), max(s)

('h', 'y')

In [16]:
s = {'p', 'y', 't', 'h', 'o', 'n'}
min(s), max(s)

('h', 'y')

But if the elements do not have an ordering defined:

In [17]:
a = [1+1j, 2+2j, 3+3j]
min(a)

TypeError: '<' not supported between instances of 'complex' and 'complex'

`min` and `max` will work for heterogeneous types as long as the elements are pairwise comparable (`<` or `>` is defined). 

For example:

In [18]:
from decimal import Decimal

In [19]:
t = 10, 20.5, Decimal('30.5')

In [20]:
min(t), max(t)

(10, Decimal('30.5'))

In [21]:
t = ['a', 10, 1000]
min(t)

TypeError: '<' not supported between instances of 'int' and 'str'

Even `range` objects support `min` and `max`:

In [22]:
r = range(10, 200)
min(r), max(r)

(10, 199)

#### Concatenation

We can **concatenate** sequences using the `+` operator:

In [25]:
[1, 2, 3] + [4, 5, 6]

[1, 2, 3, 4, 5, 6]

In [26]:
(1, 2, 3) + (4, 5, 6)

(1, 2, 3, 4, 5, 6)

What happens if we want try concatenation on a mutable. Below `[0,0]` is a mutable.

In [27]:
x = [ [0, 0] ]

a = x + x
a

[[0, 0], [0, 0]]

It looks like it worked well, but let's look at the memory addresses of all three `[0, 0]`s mentioned.

In [28]:
id(x[0]) == id(a[0]) == id(a[1])

True

**All** of them have the exact same memory addresses. So when python did `x + x`, it literally took the object in `x` and duplicated it. They all point to the same object, and that object now has multiple pointers to it. So what happens if we modify this single object?

In [29]:
a[0][0] = 100

a

[[100, 0], [100, 0]]

Since the first and second objects are identical, they both are modified. Even `x` is modified.

In [30]:
x

[[100, 0]]

This doesn't happen with strings or integers because they are both immutable objects.

In [31]:
x = [1, 2]
a = x + x

y = 'python'
b = y + y

print(a)
print(b)

[1, 2, 1, 2]
pythonpython


Note that the type of the concatenated result is the same as the type of the sequences being concatenated, so concatenating sequences of varying types will not work:

In [25]:
(1, 2, 3) + [4, 5, 6]

TypeError: can only concatenate tuple (not "list") to tuple

In [26]:
'abc' + ['d', 'e', 'f']

TypeError: must be str, not list

Note: if you really want to concatenate varying types you'll have to transform them to a common type first:

In [27]:
(1, 2, 3) + tuple([4, 5, 6])

(1, 2, 3, 4, 5, 6)

In [28]:
tuple('abc') + ('d', 'e', 'f')

('a', 'b', 'c', 'd', 'e', 'f')

In [29]:
''.join(tuple('abc') + ('d', 'e', 'f'))

'abcdef'

#### Repetition

Most sequence types also support **repetition**, which is essentially concatenating the same sequence an integer number of times:

In [30]:
'abc' * 5

'abcabcabcabcabc'

In [31]:
[1, 2, 3] * 5

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

We'll come back to some caveats of concatenation and repetition in a bit.

We see the same issue of duplicate objects with repetition as we saw with concatenation

In [32]:
a = [ [0, 0] ] * 2 
print(a)
id(a[0]) == id(a[1])

[[0, 0], [0, 0]]


True

In [33]:
a[0][0] = 100 
print(a)

[[100, 0], [100, 0]]


But, again, we're safe if we are repeating an immutable object such as a string.

In [36]:
a = ['python'] * 2
print(a)

a[0][0] = 'P'
print(a)

['python', 'python']


TypeError: 'str' object does not support item assignment

So the key takeaway is to be careful using mutable elements inside your sequence.

#### Finding things in Sequences

We can find the index of the occurrence of an element in a sequence.

We have 1 required argument and two optional arguments.

`s.index(x, i, j)`: This will return the index of the first occurrence of `x` in `s` at or after index `i` but before index `j`. These final two indices can be left out.

In [32]:
s = "gnu's not unix"

In [33]:
s.index('n')

1

In [34]:
s.index('n', 1), s.index('n', 2), s.index('n', 8)

(1, 6, 11)

An exception is raised of the element is not found, so you'll want to catch it if you don't want your app to crash:

In [35]:
s.index('n', 13)

ValueError: substring not found

In [36]:
try:
    idx = s.index('n', 13)
except ValueError:
    print('not found')

not found


Note that these methods of finding objects in sequences do not assume that the objects in the sequence are ordered in any way. These are basically searches that iterate over the sequence until they find (or not) the requested element.

If you have a sorted sequence, then other search techniques are available - such as binary searches. I'll cover some of these topics in the extras section of this course.

#### Slicing

We'll come back to slicing in a later lecture, but sequence types generally support slicing, even ranges (as of Python 3.2). Just like concatenation, slices will return the same type as the sequence being sliced:

In [37]:
s = 'python'
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [38]:
s[0:3], s[4:6]

('pyt', 'on')

In [39]:
l[0:3], l[4:6]

([1, 2, 3], [5, 6])

It's ok to extend ranges past the bounds of the sequence:

In [40]:
s[4:1000]

'on'

If your first argument in the slice is `0`, you can even omit it. Omitting the second argument means it will include all the remaining elements:

In [41]:
s[0:3], s[:3]

('pyt', 'pyt')

In [42]:
s[3:1000], s[3:], s[:]

('hon', 'hon', 'python')

We can even have extended slicing, which provides a start, stop and a step:

In [43]:
s, s[0:5], s[0:5:2]

('python', 'pytho', 'pto')

In [44]:
s, s[::2]

('python', 'pto')

Technically we can also use negative values in slices, including extended slices (more on that later):

In [45]:
s, s[-3:-1], s[::-1]

('python', 'ho', 'nohtyp')

Range objects are more restrictive. They **don't** support concatenation or repetition and `min`, `max`, `in`, `not in` are not as efficient. Since ranges are sequences, we can slice them, returning back another range object.

In [46]:
r = range(11)  # numbers from 0 to 10 (inclusive)

In [47]:
print(r)
print(list(r))

range(0, 11)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [48]:
print(r[:5])

range(0, 5)


In [49]:
print(list(r[:5]))

[0, 1, 2, 3, 4]


As you can see, slicing a range returns a range object as well, as expected.

#### Hashing

Immutable sequences generally support a `hash` method that we'll discuss in detail in the section on mapping types. But immutable sequences are not hashable if they contain mutable types. There's good reasons why we don't want to hash mutable types which we'll see later on. 

In [50]:
l = (1, 2, 3)
hash(l)

2528502973977326415

In [51]:
s = '123'
hash(s)

-1892188276802162953

In [52]:
r = range(10)
hash(r)

-6299899980521991026

But mutable sequences (and mutable types in general) do not:

In [53]:
l = [1, 2, 3]

In [54]:
hash(l)

TypeError: unhashable type: 'list'

Note also that a hashable sequence, is no longer hashable if one (or more) of it's elements are not hashable:

In [55]:
t = (1, 2, [10, 20])
hash(t)

TypeError: unhashable type: 'list'

But this would work:

In [56]:
t = ('python', (1, 2, 3))
hash(t)

-8790163410081325536

In general, immutable types are likely hashable, while immutable types are not. So numbers, strings, tuples, etc are hashable, but lists and sets are not:

In [57]:
from decimal import Decimal
d = Decimal(10.5)
hash(d)

1152921504606846986

Sets are not hashable:

In [58]:
s = {1, 2, 3}
hash(s)

TypeError: unhashable type: 'set'

But frozensets, an immutable variant of the set, are:

In [59]:
s = frozenset({1, 2, 3})

In [60]:
hash(s)

-7699079583225461316

#### Caveats with Concatenation and Repetition

Consider this:

In [61]:
x = [2000]

In [62]:
id(x[0])

2177520743920

In [63]:
l = x + x

In [64]:
l

[2000, 2000]

In [65]:
id(l[0]), id(l[1])

(2177520743920, 2177520743920)

As expected, the objects in `l[0]` and `l[1]` are the same.

Could also use:

In [66]:
l[0] is l[1]

True

This is not a big deal if the objects being concatenated are immutable. But if they are mutable:

In [67]:
x = [ [0, 0] ]
l = x + x

In [68]:
l

[[0, 0], [0, 0]]

In [69]:
l[0] is l[1]

True

And then we have the following:

In [70]:
l[0][0] = 100

In [71]:
l[0]

[100, 0]

In [72]:
l

[[100, 0], [100, 0]]

Notice how changing the 1st item of the 1st element also changed the 1st item of the second element.

While this seems fairly obvious when concatenating using the `+` operator as we have just done, the same actually happens with repetition and may not seem so obvious:

In [73]:
x = [ [0, 0] ]

In [74]:
m = x * 3

In [75]:
m

[[0, 0], [0, 0], [0, 0]]

In [76]:
m[0][0] = 100

In [77]:
m

[[100, 0], [100, 0], [100, 0]]

And in fact, even `x` changed:

In [78]:
x

[[100, 0]]

If you really want these repeated objects to be different objects, you'll have to copy them somehow. A simple list comprehensions would work well here:

In [79]:
x = [ [0, 0] ]
m = [e.copy() for e in x*3]

In [80]:
m

[[0, 0], [0, 0], [0, 0]]

In [81]:
m[0][0] = 100

In [82]:
m

[[100, 0], [0, 0], [0, 0]]

x

# 02 - Mutable Sequence Types

# 03 - Lists vs Tuples

# 04 - Copying Sequences

# 05 - Slicing

# 06 - Custom Sequences - Part 1

# 07 - In-Place Concatenation and Repetition

# 08 - Assignments in Mutable Sequences

# 09 - Custom Sequences - Part 2a

# 10 - Custom Sequences - Part 2b_c

# 11- Sorting Sequences

# 12 - List Comprehensions