# Ch2 序列构成的数组

## 2.1 内置序列类型概览

容器序列
- `list`, `tuple`, `collections.deque` 可存放不同类型的数据

扁平序列
- `str`, `bytes`, `bytearray`, `memoryview`, `array.array` 只能容纳一种类型

容器序列存放的是引用，而扁平序列存放的是值。扁平序列是一段连续的内存空间。

可变序列
- `list`, `bytearray`, `array.array`, `collections.deque`, `memoryview`

不可变序列
- `tuple`, `str`, `bytes`

可变序列从不可变序列处继承了一些方法

继承树如下:

`Container`类
- `__contains__`

`Iterable`类
- `__iter__`

`Sized`类
- `__len__`

`Sequence`类 extends `Container`, `Iterable`, `Sized`
- `__getitem__`
- `__contains__`
- `__iter__`
- `__reversed__`
- `index`
- `count`

`MutableSequence`类 extends `Sequence`
- `__setitem__`
- `__delitem__`
- `insert`
- `append`
- `reverse`
- `extend`
- `pop`
- `remove`
- `__iadd__`


## 2.2 列表推导和生成器表达式
通常的原则是，只用列表推导来创建新的列表，并且尽量保持简短。超过了两行的话，就要考虑是不是应该用for循环重写。

In [4]:
symbols = '$¢£¥€¤'
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
print(beyond_ascii)

[162, 163, 165, 8364, 164]


In [5]:
# or using map/filter
beyond_ascii = list(filter(lambda c : c > 127, map(ord, symbols)))
print(beyond_ascii)

[162, 163, 165, 8364, 164]


Comparison of speed can be found in *listcomp_speed.py*

### 2.2.3 cartesian product

In [6]:
colors = ['black', 'white'] 
sizes = ['S', 'M', 'L'] 
tshirts = [(color, size) for color in colors for size in sizes]
print(tshirts)

[('black', 'S'), ('black', 'M'), ('black', 'L'), ('white', 'S'), ('white', 'M'), ('white', 'L')]


### 2.2.4 生成器表达式
生成器表达式是懒惰的，只有在需要的时候才会生成值,这样有助于节省内存。生成器表达式的语法和列表推导很像，只不过把中括号换成了圆括号。

In [7]:
tuple(ord(symbol) for symbol in symbols)

(36, 162, 163, 165, 8364, 164)

In [8]:
import array
arr = array.array('I', (ord(symbol) for symbol in symbols))
print(arr)


array('I', [36, 162, 163, 165, 8364, 164])


1. 如果生成器表达式是一个函数调用过程中的唯一参数，那么不需要额外的括号
2. array的构造方法需要两个参数，第一个指定了数组中数字的储存方式，第二个是可迭代对象

In [9]:
colors = ['black','white']
sizes = ['S','M','L']
for tshirt in ('%s %s' % (c,s) for c in colors for s in sizes):
    print(tshirt)

black S
black M
black L
white S
white M
white L


## 2.3 元组不仅仅是不可变的列表
除了用作不可变的列表，它还可以用于没有字段名的记录

### 2.3.1 元组和记录

如果只把元组理解为不可变的列表，那其他信息——它所含有的元素的总数和它们的位置——似乎就变得可有可无。但是如果把元组当作一些字段的集合，那么**数量和位置信息**就变得非常重要了。

### 2.3.2 元组拆包
可以参考Python Techniques系列的笔记


In [10]:
# * unpacks an iterable
t = (20,8)
print(divmod(*t))

(2, 4)


parallel assignment technique

In [11]:
a, *body, c, d = range(5)
print(a,body,c,d)

0 [1, 2] 3 4


### 2.3.3 嵌套元组拆包
接受表达式的元组可以是嵌套式的，例如(a, b, (c, d))。

In [12]:
# metro_lat_long.py
metro_areas = [
    ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),   # <1>
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
    ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]

print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))
fmt = '{:15} | {:9.4f} | {:9.4f}'
for name, cc, pop, (latitude, longitude) in metro_areas:  # <2>
    if longitude <= 0:  # <3>
        print(fmt.format(name, latitude, longitude))

                |   lat.    |   long.  
Mexico City     |   19.4333 |  -99.1333
New York-Newark |   40.8086 |  -74.0204
Sao Paulo       |  -23.5478 |  -46.6358


### 2.3.4 具名元组(namedtuple)
collections.namedtuple是一个**工厂函数**，它可以用来构建一个带字段名的元组和一个有名字的类——这个带名字的类对调试程序有很大帮助。

*拓展:工厂函数*

In programming, a factory function is a concept used primarily in object-oriented programming. It refers to a function that is designed to create and return new instances of objects. Unlike constructors that are associated with a specific class and are used to create instances of that class, factory functions can be more flexible. They can create objects from multiple classes based on the parameters passed to them or based on specific conditions.

Factory functions are useful for several reasons:
1. **Abstraction and Encapsulation**: They can hide the complexity of creating instances of complex objects, making the code that uses these objects simpler and cleaner.
2. **Flexibility**: Since factory functions are not tied to specific classes, they can return instances of different classes. This makes it easier to introduce new types of objects without changing the code that uses the factory function.
3. **Customization**: Parameters passed to a factory function can dictate the customization of the created object, allowing for a more dynamic object creation process.

Here's a simple example in JavaScript to illustrate a factory function:

```javascript
function carFactory(model, year) {
    return {
        model: model,
        year: year,
        displayInfo: function() {
            console.log(`Model: ${this.model}, Year: ${this.year}`);
        }
    };
}

const car1 = carFactory('Toyota', 2020);
car1.displayInfo(); // Output: Model: Toyota, Year: 2020
```

In this example, `carFactory` is a factory function that creates and returns a new car object each time it is called. The created car object includes properties for `model` and `year`, as well as a method `displayInfo` to display the car's information. This approach allows for the creation of car objects with different properties without the need for a specific class for each car.

书本注:
用`namedtuple`构建的类的实例所消耗的内存和元组一样，因为字段名都被存在对应的类内。这个实例比普通的对象实例比起来要小一点，因为python不会用`__dict__`来存放这些实例的属性

In [13]:
from collections import namedtuple
City = namedtuple('City','name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))

In [14]:
print(tokyo)
print(tokyo.name)
print(tokyo.population)

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))
Tokyo
36.933


*code comment:*
1. 创建`namedtuple`需要两个参数，一个是类名，另一个是类的各个字段的名字。后者可以是由数个**字符串组成的可迭代对象**，或者是由**空格分隔开的字段名组成的字符串**。
2. 存放在对应字段里的数据要以一串参数的形式传入到构造函数中（元组的构造函数却只接受单一的可迭代对象）
3. ...

具名元组还有一些自己专有的属性:
- `_fields` class attribute
- `_make(iterable)` class method
- `_asdict()` instance method

In [15]:
City._fields

('name', 'country', 'population', 'coordinates')

In [16]:
LatLong = namedtuple('LatLong', 'lat long') 
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889)) 
delhi = City._make(delhi_data)   
print(delhi._asdict()) 

{'name': 'Delhi NCR', 'country': 'IN', 'population': 21.935, 'coordinates': LatLong(lat=28.613889, long=77.208889)}


In [17]:
for key, value in delhi._asdict().items(): 
    print(key + ':', value) 

name: Delhi NCR
country: IN
population: 21.935
coordinates: LatLong(lat=28.613889, long=77.208889)


*code comment:*
1. `_fields`属性是一个包含这个类所有字段名称的元组。
2. 用`_make()`通过接受一个可迭代对象来生成这个类的一个实例，它的作用跟`City(*delhi_data)`是一样的。
3. `_asdict()`把具名元组以`collections.OrderedDict`的形式返回，我们可以利用它来把元组里的信息清晰的呈现出来。

### 2.3.5 作为不可变列表的元组

除了和增减元素相关的方法之外，元组支持列表的其他所有方法。有一个例外是元组没有`__reversed__`方法 (书上描述:这个方法只是个优化) ，但是可以使用`reversed()`函数。

注: `__reversed__` method returns an iterator of reversed items

In [3]:
a = (1,2,3)
a = reversed(a)
print(a)
print(tuple(a))

<reversed object at 0x00000291885FE9B0>
(3, 2, 1)


In [5]:
a = [1,2,3]
a = reversed(a)
print(a)
print(list(a))

<list_reverseiterator object at 0x00000291885FF9A0>
[3, 2, 1]


## 2.4 切片
`a:b:c` 这种用法只能作为索引或者下标用在[]中来返回一个切片对象：`slice(a, b, c)` 。对`seq[start:stop:step]` 进行求值的时候，Python会调用`seq`. 
`__getitem__(slice(start, stop, step))`。

(may refer to a video about slicing <a href = "https://www.bilibili.com/video/BV1ki421v7Ao/?share_source=copy_web&vd_source=0f2805b5dbbfea4fe3a5cd27c708c741">【Python】Slice：被低估的小技巧，减少重复工作量 </a>)

### 2.4.3 多维切片和省略
`[]` 运算符里还可以使用以逗号分开的多个索引或者是切片, numpy库就利用了这个特性。

要正确处理这种`[]`运算符的话，对象的特殊方法`__getitem__`和`__setitem__`需要以元组的形式来接收`a[i, j]`中的索引。也就是说，如果要得到`a[i, j]`的值，Python会调用`a.__getitem__((i, j))`

省略`...`是Ellipsis对象的别名，它可以表示任意多的冒号

书本注：fun fact, `Ellipsis` object is a singleton object of `ellipsis` class (Ellipsis是一个内置实例). Yes, it is a class with lower letters. Similar to `bool` class with `True` and `False` instances.


In [12]:
import numpy as np

a = np.array([[[[1,2,3,4],[5,6,7,8]],[[12,34,56,78],[56,78,90,12]]],[[[1,2,3,4],[5,6,7,8]],[[12,34,56,78],[56,78,90,12]]]])
b = a[0, ...] # very interesting example
print(b)

[[[ 1  2  3  4]
  [ 5  6  7  8]]

 [[12 34 56 78]
  [56 78 90 12]]]


### 2.4.4 给切片赋值（有意思）
如果把切片放在赋值语句的左边，或把它作为`del`操作的对象，我们就可以对序列进行**嫁接、切除或就地修改**操作。

In [19]:
l = list(range(10))
print(l)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [20]:
l[2:5] = [20,30]
l

[0, 1, 20, 30, 5, 6, 7, 8, 9]

In [21]:
del l[5:7]
l

[0, 1, 20, 30, 5, 8, 9]

In [22]:
l[3::2] = [11,22]
l

[0, 1, 20, 11, 5, 22, 9]

In [23]:
# l[2:5] = 100 error because 100 is not iterable
l[2:5] = [100] 
l

[0, 1, 100, 22, 9]

如果赋值的对象是一个切片，那么赋值语句的右侧**必须是个可迭代对象**。即便只有单独一个值，也要把它转换成可迭代的序列。

## 对序列使用`+`和`*`
`+`和`*`都遵循这样的规律：不修改原有的操作对象，而是构建一个全新的序列。

`*`操作符的一个潜在的缺点是，它会把一个单一的元素复制多次以构建新的列表。这意味着，如果这个元素是**可变的**，就可能导致意想不到的副作用。(萌新时期噩梦!)