# 用Pythonic方式来思考

## 一. bytes、str与unicode的区别
Python3：有两种表示字符序列的类型：bytes和str：
- bytes：8个二进制位
- str：unicode字符  

Python2：有两种表示字符序列的类型：str和unicode：
- str：8个二进制位
- unicode：unicode字符

unicode到二进制：  
常见的编码方式是utf-8，encode方法  

二进制到unicode：  
decode方法

在编程的时候，编解码放在接口外面来做。程序核心使用unicode字符，且不要对字符编码做任何假设。

In [1]:
def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decoce('utf-8')
    else:
        value = bytes_or_str
    return value

def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value

在Python3中，涉及到文件处理的操作（使用内置的open函数）会默认的以UTF-8进行编码。而在Python2中默认采用二进制形式来编码。这也是导致意外事故发生的根源，特别是对于那些更习惯于使用Python2的程序员而言。

比方说，将几个随机的二进制数据写入到一个文件中。在Python2中，下面的这段代码可以正常的工作，但是在Python3中却会报错并退出。

In [2]:
import os
with open('random.bin', 'w') as f:
    f.write(os.urandom(10))

TypeError: write() argument must be str, not bytes

导致这个异常发生的原因是在Python3中对于open函数又新增了一个名为encoding的参数。此参数默认为UTF-8。这样在文件句柄上进行read和write操作时，必须传入Unicode字符串的str实例，而不是包含了二进制数据的bytes实例。

## 二. 用生成器表达式来改写数据量较大的列表推导  


In [3]:
value = [len(x) for x in open('my_file.txt')]
value

[2, 3, 4, 5, 6, 7, 8, 9, 10, 3, 5, 7, 5, 5, 8, 11, 9, 3, 2, 2, 5, 5, 2]

- 使用圆括号构成生成器表达式

In [4]:
value = (len(x) for x in open('my_file.txt'))
value

<generator object <genexpr> at 0x00000241085DE410>

In [5]:
next(value)

2

In [6]:
next(value)

3

- 使用生成器表达式的另一个好处是可以互相组合

In [7]:
roots = ((x, x**10) for x in value)
roots

<generator object <genexpr> at 0x00000241086702B0>

In [8]:
next(roots)

(4, 1048576)

## 三. 尽量用enumerate代替range

In [9]:
flavor_list = ['vanilla', 'chocolate', 'pecan', 'strawberry']
for i in range(len(flavor_list)):
    print('%d: %s' % (i+1, flavor_list[i]))

1: vanilla
2: chocolate
3: pecan
4: strawberry


In [10]:
for i, flavor in enumerate(flavor_list):
    print('%d: %s' % (i+1, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


- 还可以直接指定enumerate函数开始计数时所用的值

In [11]:
for i, flavor in enumerate(flavor_list, 1):
    print('%d: %s' % (i, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


## 四. 用zip函数同时遍历两个迭代器

In [12]:
names = ['LiMing', 'LiLi', 'ZhangMei']
letters = [len(n) for n in names]
longest_name = None
max_letters = 0

for i in range(len(names)):
    count = letters[i]
    if count > max_letters:
        max_letters = count
        longest_name = names[i]
    
        
print(longest_name, max_letters)

ZhangMei 8


- Python3中内置的zip函数，可以把两个或两个以上的迭代器封装为生成器，以便稍后求值，这种zip生成器，会从每个迭代器中获取该迭代器的下一个值，然后把这些值汇聚成元祖

In [13]:
longest_name = None
max_letters = 0
for name, count in zip(names, letters):
    if count > max_letters:
        longest_name = name
        max_letters = count
        
print(longest_name, max_letters)

ZhangMei 8


## 五. 不要在for和while循环后面写else块

## 六. 合理利用 tyr/except/else/finally 结构中的每个代码块

- finally

In [14]:
handle = open('my_file.txt')
try:
    data = handle.read()
finally:
    print('close handle')
    handle.close()

close handle


- else

In [15]:
import json
def load_json_key(data, key):
    try:
        result_dict = json.load_json_key(data)
    except ValueError as e:
        raise KeyError from e
    else:
        return result_dict[key]

- 混合使用

In [16]:
UNDEFINED = object()

def divide_json(path):
    handle = open(path, 'r+')
    try:
        data = handle.read()
        op = json.loads(data)
        value = (op['numerator'], op['denominator'])
    except ZeroDivisionError as e:
        return UNDEFINED
    else:
        op['result'] = value
        result = json.dumps(op)
        handle.seek(0)
        handle.write(result)
        return value
    finally:
        handle.close()

## 七. 尽量用异常来表示特殊情况，而不要返回None 

In [17]:
def devide(a, b):
    try:
        return a / b
    except:
        return None

def divide(a, b):
    try:
        return True, a/b
    except ZeroDivisionError:
        return False, None
    
sucess, result = divide(3,3)
result

1.0

In [18]:
def divide(a, b):
    try:
        return a / b
    except ZeroDivisionError as e:
        raise ValueError('Invalid inputs') from e

try:
    result = divide(3, 0)
except ValueError:
    print('Invlid inputs')
else:
    print('Result is: %f' % result)

Invlid inputs


## 八. 了解如何在闭包里使用外围作用域中的变量

In [19]:
def sort_priority(values, group):
    def helper(x):
        if x in group:
            return (0, x)
        return (1, x)
    values.sort(key=helper)

numbers = [8, 3, 1, 2, 5, 4, 7, 6, -1]
group = {2, 3, 5, 7}
sort_priority(numbers, group)
numbers

[2, 3, 5, 7, -1, 1, 4, 6, 8]

- nonlocal清楚的表明：如果在闭包内给该变量赋值，那么修改的其实是闭包外的那个作用域中的变量，nonlocal不能延伸到模块级别
- Python2中不支持nonlocal

In [20]:
def sort_priority(numbers, group):
    found = False
    def helper(x):
        nonlocal found
        if x in group:
            found = True
            return (0, x)
        return (1, x)
    numbers.sort(key=helper)
    return found

In [21]:
class Sorter(object):
    def __init__(self, group):
        self.group = group
        self.found = False
        
    def __call__(self, x):
        if x in self.group:
            self.found = True
            return (0, x)
        return (1, x)

numbers = [8, 3, 1, 2, 5, 4, 7, 6, -1]
group = {2, 3, 5, 7}
sorter = Sorter(group)
numbers.sort(key=sorter)
print(sorter.found)
print(numbers)

True
[2, 3, 5, 7, -1, 1, 4, 6, 8]


## 九. 考虑用生成器来改写直接返回列表的函数

In [22]:
def index_words(text):
    result = []
    if text:
        result.append(0)
    
    for index, letter in enumerate(text):
        if letter == ' ':
            result.append(index + 1)
    return result

address = 'Four score and seven years ago...'
result = index_words(address)
result

[0, 5, 11, 15, 21, 27]

In [23]:
def index_words(text):
    if text:
        yield 0
    
    for index, letter in enumerate(text):
        if letter == ' ':
            yield index+1
address = 'Four score and seven years ago...'
result = index_words(address)
result

<generator object index_words at 0x00000241086A9360>

In [24]:
list(result)

[0, 5, 11, 15, 21, 27]

In [25]:
from itertools import islice
def index_file(handle):
    offset = 0
    for line in handle:
        if line:
            yield offset
        for letter in line:
            offset += 1
            if letter == ' ':
                yield offset

with open('address.txt') as f:
    it = index_file(f)
    print(list(it))

[0, 2, 5, 7, 15, 21, 26, 29]


## 十. 在参数上迭代时，要多加小心

In [26]:
def normalize(numbers):
    total = sum(numbers)
    result = []
    for num in numbers:
        result.append(100 * num / total)
    return result

visits = [15, 35, 80]
normalize(visits)

[11.538461538461538, 26.923076923076923, 61.53846153846154]

In [27]:
def read_vists(path):
    with open(path) as f:
        for line in f:
            yield int(line)

it = read_vists('data.txt')
percentages = normalize(it)
percentages

[]

- 定义一个可以迭代的容器类

In [28]:
class ReadVisits(object):
    def __init__(self, data_path):
        self.data_path = data_path
        
    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)
                
visits = ReadVisits('data.txt')
normalize(visits)

[7.6923076923076925,
 28.205128205128204,
 33.97435897435897,
 9.615384615384615,
 20.512820512820515]

In [29]:
def normalize_defensive(numbers):
    if iter(numbers) == iter(numbers):
        raise TypeError('Must supply a container')
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

In [30]:
visits = [15, 35, 80]
normalize_defensive(visits)

[11.538461538461538, 26.923076923076923, 61.53846153846154]

In [31]:
visits = ReadVisits('data.txt')
normalize_defensive(visits)

[7.6923076923076925,
 28.205128205128204,
 33.97435897435897,
 9.615384615384615,
 20.512820512820515]

In [32]:
it = read_vists('data.txt')
normalize_defensive(it)

TypeError: Must supply a container

## 十一. 用数量可变的位置参数减少视觉杂讯

In [33]:
def log(message, values):
    if not values:
        print(message)
    else:
        value_str = ','.join(str(x) for x in values)
        print('%s: %s' % (message, value_str))
log('My Number is: ', [11, 22, 33])

My Number is: : 11,22,33


In [34]:
def log(message, *values):
    if not values:
        print(message)
    else:
        value_str = ','.join(str(x) for x in values)
        print('%s: %s' % (message, value_str))
log('My Number is', 12, 33 ,44)

My Number is: 12,33,44


In [35]:
numbers = [12, 33, 44]
log('My Numver is', *numbers)

My Numver is: 12,33,44


## 十二. 用关键字参数表达可选的行为

## 十三. 用None和文档字符串来描述具有动态默认值的参数

例如打印日志时想要加上打印的时间。  
先看如下的写法：

In [36]:
from datetime import datetime
import time
def log(message, when=datetime.now()):
    print('%s: %s' % (when, message))

In [37]:
log('Hi there!')
time.sleep(1)
log('Hi again!')

2018-04-16 23:44:30.426229: Hi there!
2018-04-16 23:44:30.426229: Hi again!


从上面打印的信息来看，时间戳是相同的，这是因为`datetime.now`仅仅被执行了一次，而且是在函数被定义的时候。

在Python中，如果实现动态默认值，常用的方式是提供一个值为None的默认值，并且在帮助文档中记录详细的行为和使用方法。当代码发现一个值为None的参数的时候，就可以为其分配默认值了。

In [38]:
def log(message, when=None):
    """
    Log a message with a timestamp.

    Args:
        message: Message to print
        when: datetime of when the message occurred.
            Default to the present time.
    """
    when = datetime.now() if when is None else when
    print("%s: %s" %(when, message))

log("hi there!")
time.sleep(1)
log('hi again!')

2018-04-16 23:44:31.465338: hi there!
2018-04-16 23:44:32.466058: hi again!


In [39]:
import json
def decode(data, default={}):
    try:
        return json.loads(data)
    except ValueError:
        return default

In [40]:
foo = decode('bad data')
foo['stuff'] = 5
bar = decode('also bad')
bar['meep'] = 1
print(foo)
print(bar)

{'stuff': 5, 'meep': 1}
{'stuff': 5, 'meep': 1}


由于default参数的默认值只在模块加载时执行一次，所以凡是以默认的空字典调用这个函数的代码，都将共享一份字典

In [41]:
assert foo is bar

解决办法就是对关键字参数设置默认值None并且记录在该函数的说明文档中。

In [42]:
def decode(data, default=None):
    """Load JSON data from string.

    Args:
        data: JSON data to be decoded.
        default: Value to return if decoding fails.
            Defaults to an empty dictionary.
    """

    if default is None:
        default = {}
    try:
        return json.loads(data)
    except ValueError:
        return default
    
foo = decode('bad data')
foo['stuff'] = 5
bar = decode('also bad')
bar['meep'] = 1
print('Foo:', foo)
print('Bar:', bar)

Foo: {'stuff': 5}
Bar: {'meep': 1}


## 十四. 用只能以关键字形式指定的参数来确保代码明确清晰

In [43]:
def safe_division_c(number, division, *, ignore_overflow=False, 
                    ignore_zero_division=False):
    pass

这些参数只能通过关键字参数赋值的形式被使用，而不是通过位置参数赋值的方式。

## 十五. 用@classmethod形式的多态去通用地构建对象

参考链接：  
https://stackoverflow.com/questions/12179271/meaning-of-classmethod-and-staticmethod-for-beginner  
首先我们有一个处理时间的类：

In [44]:
class Date(object):
    
    def __init__(self, day=0, month=0, year=0):
        self.day = day
        self.month = month
        self.year = year

如果我们要通过字符串创建许多的`Date`实例，此时我们要做如下操作：  
- 将string的日志转为int
- 通过int的日期构建`Date` 

如下：  
day, month, year = map(int, string_date.split('-'))  
date1 = Date(day, month, year)

对于这种情况，`C++`有重载的方法，但python没有重载，每个类只有一个构造器，只有一个`__init__`方法，因此`@classmethod`方法应运而生

In [45]:
class Date(object):

    def __init__(self, day=0, month=0, year=0):
        self.day = day
        self.month = month
        self.year = year

    @classmethod
    def from_string(cls, date_as_string):
        day, month, year = map(int, date_as_string.split('-'))
        date1 = cls(day, month, year)
        return date1

`@staticmethod`与`classmethod`的区别是不构建类的实例，也不访问类，只是一个函数

In [46]:
class Date(object):

    def __init__(self, day=0, month=0, year=0):
        self.day = day
        self.month = month
        self.year = year

    @classmethod
    def from_string(cls, date_as_string):
        day, month, year = map(int, date_as_string.split('-'))
        date1 = cls(day, month, year)
        return date1

    @staticmethod
    def is_date_valid(date_as_string):
        day, month, year = map(int, date_as_string.split('-'))
        return day <= 31 and month <= 12 and year <= 3999

date2 = Date.from_string('11-09-2012')
is_date = Date.is_date_valid('11-09-2012')