# 用Pythonic方式来思考

## 一. bytes、str与unicode的区别
Python3：有两种表示字符序列的类型：bytes和str：
- bytes：8个二进制位
- str：unicode字符  

Python2：有两种表示字符序列的类型：str和unicode：
- str：8个二进制位
- unicode：unicode字符

unicode到二进制：  
常见的编码方式是utf-8，encode方法  

二进制到unicode：  
decode方法

在编程的时候，编解码放在接口外面来做。程序核心使用unicode字符，且不要对字符编码做任何假设。

In [1]:
def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decoce('utf-8')
    else:
        value = bytes_or_str
    return value

def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value

在Python3中，涉及到文件处理的操作（使用内置的open函数）会默认的以UTF-8进行编码。而在Python2中默认采用二进制形式来编码。这也是导致意外事故发生的根源，特别是对于那些更习惯于使用Python2的程序员而言。

比方说，将几个随机的二进制数据写入到一个文件中。在Python2中，下面的这段代码可以正常的工作，但是在Python3中却会报错并退出。

In [12]:
import os
with open('random.bin', 'w') as f:
    f.write(os.urandom(10))

TypeError: write() argument must be str, not bytes

导致这个异常发生的原因是在Python3中对于open函数又新增了一个名为encoding的参数。此参数默认为UTF-8。这样在文件句柄上进行read和write操作时，必须传入Unicode字符串的str实例，而不是包含了二进制数据的bytes实例。

## 二. 用生成器表达式来改写数据量较大的列表推导  


In [15]:
value = [len(x) for x in open('my_file.txt')]
value

[2, 3, 4, 5, 6, 7, 8, 9, 10, 3, 5, 7, 5, 5, 8, 11, 9, 3, 2, 2, 5, 5, 2]

- 使用圆括号构成生成器表达式

In [16]:
value = (len(x) for x in open('my_file.txt'))
value

<generator object <genexpr> at 0x00000278C2EBE360>

In [17]:
next(value)

2

In [18]:
next(value)

3

- 使用生成器表达式的另一个好处是可以互相组合

In [19]:
roots = ((x, x**10) for x in value)
roots

<generator object <genexpr> at 0x00000278C2EBE258>

In [20]:
next(roots)

(4, 1048576)

## 三. 尽量用enumerate代替range

In [22]:
flavor_list = ['vanilla', 'chocolate', 'pecan', 'strawberry']
for i in range(len(flavor_list)):
    print('%d: %s' % (i+1, flavor_list[i]))

1: vanilla
2: chocolate
3: pecan
4: strawberry


In [25]:
for i, flavor in enumerate(flavor_list):
    print('%d: %s' % (i+1, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


- 还可以直接指定enumerate函数开始计数时所用的值

In [26]:
for i, flavor in enumerate(flavor_list, 1):
    print('%d: %s' % (i, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


## 四. 用zip函数同时遍历两个迭代器

In [28]:
names = ['LiMing', 'LiLi', 'ZhangMei']
letters = [len(n) for n in names]
longest_name = None
max_letters = 0

for i in range(len(names)):
    count = letters[i]
    if count > max_letters:
        max_letters = count
        longest_name = names[i]
    
        
print(longest_name, max_letters)

ZhangMei 8


- Python3中内置的zip函数，可以把两个或两个以上的迭代器封装为生成器，以便稍后求值，这种zip生成器，会从每个迭代器中获取该迭代器的下一个值，然后把这些值汇聚成元祖

In [29]:
longest_name = None
max_letters = 0
for name, count in zip(names, letters):
    if count > max_letters:
        longest_name = name
        max_letters = count
        
print(longest_name, max_letters)

ZhangMei 8


## 五. 不要在for和while循环后面写else块

## 六. 合理利用 tyr/except/else/finally 结构中的每个代码块

- finally

In [35]:
handle = open('my_file.txt')
try:
    data = handle.read()
finally:
    print('close handle')
    handle.close()

close handle


- else

In [36]:
import json
def load_json_key(data, key):
    try:
        result_dict = json.load_json_key(data)
    except ValueError as e:
        raise KeyError from e
    else:
        return result_dict[key]

- 混合使用

In [37]:
UNDEFINED = object()

def divide_json(path):
    handle = open(path, 'r+')
    try:
        data = handle.read()
        op = json.loads(data)
        value = (op['numerator'], op['denominator'])
    except ZeroDivisionError as e:
        return UNDEFINED
    else:
        op['result'] = value
        result = json.dumps(op)
        handle.seek(0)
        handle.write(result)
        return value
    finally:
        handle.close()

## 七. 尽量用异常来表示特殊情况，而不要返回None 

In [41]:
def devide(a, b):
    try:
        return a / b
    except:
        return None

def divide(a, b):
    try:
        return True, a/b
    except ZeroDivisionError:
        return False, None
    
sucess, result = divide(3,3)
result

1.0

In [43]:
def divide(a, b):
    try:
        return a / b
    except ZeroDivisionError as e:
        raise ValueError('Invalid inputs') from e

try:
    result = divide(3, 0)
except ValueError:
    print('Invlid inputs')
else:
    print('Result is: %f' % result)

Invlid inputs


## 八. 了解如何在闭包里使用外围作用域中的变量

In [58]:
def sort_priority(values, group):
    def helper(x):
        if x in group:
            return (0, x)
        return (1, x)
    values.sort(key=helper)

numbers = [8, 3, 1, 2, 5, 4, 7, 6, -1]
group = {2, 3, 5, 7}
sort_priority(numbers, group)
numbers

[2, 3, 5, 7, -1, 1, 4, 6, 8]

- nonlocal清楚的表明：如果在闭包内给该变量赋值，那么修改的其实是闭包外的那个作用域中的变量，nonlocal不能延伸到模块级别
- Python2中不支持nonlocal

In [65]:
def sort_priority(numbers, group):
    found = False
    def helper(x):
        nonlocal found
        if x in group:
            found = True
            return (0, x)
        return (1, x)
    numbers.sort(key=helper)
    return found

In [66]:
class Sorter(object):
    def __init__(self, group):
        self.group = group
        self.found = False
        
    def __call__(self, x):
        if x in self.group:
            self.found = True
            return (0, x)
        return (1, x)

numbers = [8, 3, 1, 2, 5, 4, 7, 6, -1]
group = {2, 3, 5, 7}
sorter = Sorter(group)
numbers.sort(key=sorter)
print(sorter.found)
print(numbers)

True
[2, 3, 5, 7, -1, 1, 4, 6, 8]


## 九. 考虑用生成器来改写直接返回列表的函数

In [72]:
def index_words(text):
    result = []
    if text:
        result.append(0)
    
    for index, letter in enumerate(text):
        if letter == ' ':
            result.append(index + 1)
    return result

address = 'Four score and seven years ago...'
result = index_words(address)
result

[0, 5, 11, 15, 21, 27]

In [73]:
def index_words(text):
    if text:
        yield 0
    
    for index, letter in enumerate(text):
        if letter == ' ':
            yield index+1
address = 'Four score and seven years ago...'
result = index_words(address)
result

<generator object index_words at 0x00000278C319E048>

In [74]:
list(result)

[0, 5, 11, 15, 21, 27]

In [78]:
from itertools import islice
def index_file(handle):
    offset = 0
    for line in handle:
        if line:
            yield offset
        for letter in line:
            offset += 1
            if letter == ' ':
                yield offset

with open('address.txt') as f:
    it = index_file(f)
    print(list(it))

[0, 2, 5, 7, 15, 21, 26, 29]
[]


## 十. 在参数上迭代时，要多加小心

In [79]:
def normalize(numbers):
    total = sum(numbers)
    result = []
    for num in numbers:
        result.append(100 * num / total)
    return result

visits = [15, 35, 80]
normalize(visits)

[11.538461538461538, 26.923076923076923, 61.53846153846154]

In [81]:
def read_vists(path):
    with open(path) as f:
        for line in f:
            yield int(line)

it = read_vists('data.txt')
percentages = normalize(it)
percentages

[]

- 定义一个可以迭代的容器类

In [82]:
class ReadVisits(object):
    def __init__(self, data_path):
        self.data_path = data_path
        
    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)
                
visits = ReadVisits('data.txt')
normalize(visits)

[7.6923076923076925,
 28.205128205128204,
 33.97435897435897,
 9.615384615384615,
 20.512820512820515]

In [89]:
def normalize_defensive(numbers):
    if iter(numbers) == iter(numbers):
        raise TypeError('Must supply a container')
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

In [90]:
visits = [15, 35, 80]
normalize_defensive(visits)

[11.538461538461538, 26.923076923076923, 61.53846153846154]

In [91]:
visits = ReadVisits('data.txt')
normalize_defensive(visits)

[7.6923076923076925,
 28.205128205128204,
 33.97435897435897,
 9.615384615384615,
 20.512820512820515]

In [92]:
it = read_vists('data.txt')
normalize_defensive(it)

TypeError: Must supply a container

## 十一. 用数量可变的位置参数减少视觉杂讯

In [95]:
def log(message, values):
    if not values:
        print(message)
    else:
        value_str = ','.join(str(x) for x in values)
        print('%s: %s' % (message, value_str))
log('My Number is: ', [11, 22, 33])

My Number is: : 11,22,33


In [102]:
def log(message, *values):
    if not values:
        print(message)
    else:
        value_str = ','.join(str(x) for x in values)
        print('%s: %s' % (message, value_str))
log('My Number is', 12, 33 ,44)

My Number is: 12,33,44


In [103]:
numbers = [12, 33, 44]
log('My Numver is', *numbers)

My Numver is: 12,33,44
