# 模块与标准库

Python代码按照模块划分，一个模块可以是只有一个函数的单个文件，也可以是包含一个或者多个子函数的文件夹。包与模块之间的区别非常小，每个模块同时可以视作一个包。

模块与包的区别：
- 模块：单个的模块，一般是单个（偶尔为多个）python文件
- 包：多个相关的模块的组合。肯定是多个，相关的，python文件的组合。包是用来把相关的模块组织在一起，成为一个整体。

包检索路径：
- sys.path
- PYTHONPATH

## 第三方包

常见安装第三方包的方法：
- 通过社区开发的类似pip，easy_install等多种工具
 > 会分析包的依赖，首先自动安装依赖
- 从源文件安装

包安装的过程：
- 安装依赖包，视情况编译代码，然后把包中模块复制到标准包检索路径。

第三方包在哪里：
- python 包索引（也被称为PyPI） [PyPI](https://pypi.org/search/)
- [国内镜像](https://mirrors.tuna.tsinghua.edu.cn/help/pypi/)

## pip使用方法

- pip install ***
- pip list
- pip freeze
- pip install -r requirements.txt
 - 通过requirements文件安装各种包
- pip show ***
 - 查看已安装包信息
- pip unistall ***
 - 卸载包
- pip install *** --upgrade
 - 更新包
- pip install *** -i url
 - pip 镜像源的设置与使用


In [1]:
!pip install hsw2v

Collecting hsw2v
  Using cached https://files.pythonhosted.org/packages/70/f4/54830e0571b14b0e212af1e3c89d305b1b033d3afc8b4f33d80f34b9a28e/hsw2v-1.0a5.tar.gz
Collecting torch (from hsw2v)
  Using cached https://files.pythonhosted.org/packages/5f/e9/bac4204fe9cb1a002ec6140b47f51affda1655379fe302a1caef421f9846/torch-0.1.2.post1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\likun\AppData\Local\Temp\pip-install-nbyo1juk\torch\setup.py", line 11, in <module>
        raise RuntimeError(README)
    RuntimeError: PyTorch does not currently provide packages for PyPI (see status at https://github.com/pytorch/pytorch/issues/566).
    
    Please follow the instructions at http://pytorch.org/ to install with miniconda instead.
    
    
    ----------------------------------------


Command "python setup.py egg_info" failed with error code 1 in C:\Users\likun\AppData\Local\Temp\pip-install-nbyo1juk\torch\


In [None]:
# 通过国内镜像
!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch

In [None]:
# 查看已安装包
!pip list

In [None]:
# 查看已安装包的另有一种形式
!pip freeze

In [None]:
!pip show pip 

In [None]:
!pip uninstall hsw2v

In [None]:
!pip install hsw2v --upgrade

# Python 标准库

Python Standard Library

常见模块分类：
- 通用文本处理(string，re)
- 二进制数据处理(struct，codces)
- **数据类型扩展(datetime，collections，copy)**
- **数字以及科学计算(math，random)**
- 函数式编程(functools，itertools,operator)
- **文件以及目录(glob，shutil)**
- 持久化（pickle，shelve，sqlite3)
- 文件类型(CSV，configparser)
- **加密算法（哈希）(hashlib)**
- **通用操作系统服务（os，time，argparse，logging)**
- 并发模块(threading，multiprocessing)
- 进程通信以及网络模块(socket，asyncio)
- 网络数据处理(email,json)
- 结构化数据处理(html,xml)
- 互联网协议处理(urllib，ftplib，poplib...)
- 多媒体处理(audioop)
- 国际化(gettext，locale)
- 程序框架(turtle，cmd，shlex)
- TK图形化模块(tkinter)
- **开发工具(doctest，unittest)**
- **调试以及性能(pdb，timeit)**
- 软件打包(distutils)
- **实时服务模块(sys，—future.)**
- 通用解析器、模块导入、Phon语言服务、Win服务，Unix服务、其他作废模块

## Datetime模块

datetime主要用来处理时间日期
- time类
 - time类表示时间，由时，分，秒以及微妙组成
 - `datetime.time(hour[,minute[,second[,microsecond[,tzinfo]]]])`
 
- date类
 - date类表示一个日期。日期由年月日组成
 - `datetime.date(year,month,day)`

- datetime类
 - datetime是date与time的结合体，包括date与time的所有信息
 - `datetime.datetime(year,month,day[,hour[,minute[second[,microsecond[,tzinfo]]]]])`

- timedelta 类

### Time

In [2]:
import datetime

t = datetime.time(1, 2, 3)
print(t)
print('hour       :', t.hour)
print('minute     :', t.minute)
print('second     :', t.second)
print('microsecond:', t.microsecond)
print('tzinfo     :', t.tzinfo)

01:02:03
hour       : 1
minute     : 2
second     : 3
microsecond: 0
tzinfo     : None


In [3]:
import datetime

print('Earliest  :', datetime.time.min)
print('Latest    :', datetime.time.max)
print('Resolution:', datetime.time.resolution)

Earliest  : 00:00:00
Latest    : 23:59:59.999999
Resolution: 0:00:00.000001


### Dates

In [4]:
import datetime

today = datetime.date.today()
print(today)
print('ctime  :', today.ctime())
tt = today.timetuple()
print('tuple  : tm_year  =', tt.tm_year)
print('         tm_mon   =', tt.tm_mon)
print('         tm_mday  =', tt.tm_mday)
print('         tm_hour  =', tt.tm_hour)
print('         tm_min   =', tt.tm_min)
print('         tm_sec   =', tt.tm_sec)
print('         tm_wday  =', tt.tm_wday)
print('         tm_yday  =', tt.tm_yday)
print('         tm_isdst =', tt.tm_isdst)
print('ordinal:', today.toordinal())
print('Year   :', today.year)
print('Mon    :', today.month)
print('Day    :', today.day)


2019-06-16
ctime  : Sun Jun 16 00:00:00 2019
tuple  : tm_year  = 2019
         tm_mon   = 6
         tm_mday  = 16
         tm_hour  = 0
         tm_min   = 0
         tm_sec   = 0
         tm_wday  = 6
         tm_yday  = 167
         tm_isdst = -1
ordinal: 737226
Year   : 2019
Mon    : 6
Day    : 16


In [5]:
import datetime
import time

o = 733114
print('o               :', o)
print('fromordinal(o)  :', datetime.date.fromordinal(o))

t = time.time()
print('t               :', t)
print('fromtimestamp(t):', datetime.date.fromtimestamp(t))

o               : 733114
fromordinal(o)  : 2008-03-13
t               : 1560679622.3787894
fromtimestamp(t): 2019-06-16


In [6]:
import datetime

print('Earliest  :', datetime.date.min)
print('Latest    :', datetime.date.max)
print('Resolution:', datetime.date.resolution)

Earliest  : 0001-01-01
Latest    : 9999-12-31
Resolution: 1 day, 0:00:00


In [7]:
import datetime

d1 = datetime.date(2008, 3, 29)
print('d1:', d1.ctime())

d2 = d1.replace(year=2009)
print('d2:', d2.ctime())

d1: Sat Mar 29 00:00:00 2008
d2: Sun Mar 29 00:00:00 2009


### Datetime

In [10]:
import datetime

print('Now    :', datetime.datetime.now())
print('Today  :', datetime.datetime.today())
print('UTC Now:', datetime.datetime.utcnow())
print()

FIELDS = [
    'year', 'month', 'day',
    'hour', 'minute', 'second',
    'microsecond',
]

d = datetime.datetime.now()
for attr in FIELDS:
    print('{:15}: {}'.format(attr, getattr(d, attr)))


Now    : 2019-06-16 12:09:42.693988
Today  : 2019-06-16 12:09:42.694989
UTC Now: 2019-06-16 10:09:42.695086

year           : 2019
month          : 6
day            : 16
hour           : 12
minute         : 9
second         : 42
microsecond    : 695086


In [11]:
import datetime

t = datetime.time(1, 2, 3)
print('t :', t)

d = datetime.date.today()
print('d :', d)

dt = datetime.datetime.combine(d, t)
print('dt:', dt)

t : 01:02:03
d : 2019-06-16
dt: 2019-06-16 01:02:03


### Timedeltas

In [12]:
import datetime

print('microseconds:', datetime.timedelta(microseconds=1))
print('milliseconds:', datetime.timedelta(milliseconds=1))
print('seconds     :', datetime.timedelta(seconds=1))
print('minutes     :', datetime.timedelta(minutes=1))
print('hours       :', datetime.timedelta(hours=1))
print('days        :', datetime.timedelta(days=1))
print('weeks       :', datetime.timedelta(weeks=1))

microseconds: 0:00:00.000001
milliseconds: 0:00:00.001000
seconds     : 0:00:01
minutes     : 0:01:00
hours       : 1:00:00
days        : 1 day, 0:00:00
weeks       : 7 days, 0:00:00


In [13]:
import datetime

for delta in [datetime.timedelta(microseconds=1),
              datetime.timedelta(milliseconds=1),
              datetime.timedelta(seconds=1),
              datetime.timedelta(minutes=1),
              datetime.timedelta(hours=1),
              datetime.timedelta(days=1),
              datetime.timedelta(weeks=1),
              ]:
    print('{:15} = {:8} seconds'.format(
        str(delta), delta.total_seconds())
    )

0:00:00.000001  =    1e-06 seconds
0:00:00.001000  =    0.001 seconds
0:00:01         =      1.0 seconds
0:01:00         =     60.0 seconds
1:00:00         =   3600.0 seconds
1 day, 0:00:00  =  86400.0 seconds
7 days, 0:00:00 = 604800.0 seconds


### 日期计算

In [14]:
import datetime

today = datetime.date.today()
print('Today    :', today)

one_day = datetime.timedelta(days=1)
print('One day  :', one_day)

yesterday = today - one_day
print('Yesterday:', yesterday)

tomorrow = today + one_day
print('Tomorrow :', tomorrow)

print()
print('tomorrow - yesterday:', tomorrow - yesterday)
print('yesterday - tomorrow:', yesterday - tomorrow)

Today    : 2019-06-16
One day  : 1 day, 0:00:00
Yesterday: 2019-06-15
Tomorrow : 2019-06-17

tomorrow - yesterday: 2 days, 0:00:00
yesterday - tomorrow: -2 days, 0:00:00


In [15]:
import datetime

one_day = datetime.timedelta(days=1)
print('1 day    :', one_day)
print('5 days   :', one_day * 5)
print('1.5 days :', one_day * 1.5)
print('1/4 day  :', one_day / 4)

# assume an hour for lunch
work_day = datetime.timedelta(hours=7)
meeting_length = datetime.timedelta(hours=1)
print('meetings per day :', work_day / meeting_length)

1 day    : 1 day, 0:00:00
5 days   : 5 days, 0:00:00
1.5 days : 1 day, 12:00:00
1/4 day  : 6:00:00
meetings per day : 7.0


In [None]:
# 日期比较

import datetime
import time

print('Times:')
t1 = datetime.time(12, 55, 0)
print('  t1:', t1)
t2 = datetime.time(13, 5, 0)
print('  t2:', t2)
print('  t1 < t2:', t1 < t2)

print()
print('Dates:')
d1 = datetime.date.today()
print('  d1:', d1)
d2 = datetime.date.today() + datetime.timedelta(days=1)
print('  d2:', d2)
print('  d1 > d2:', d1 > d2)


### 日期与格式化

In [16]:
#  ISO-8601 format (YYYY-MM-DDTHH:MM:SS.mmmmmm)

import datetime

format = "%a %b %d %H:%M:%S %Y"

today = datetime.datetime.today()
print('ISO     :', today)

s = today.strftime(format)
print('strftime:', s)

d = datetime.datetime.strptime(s, format)
print('strptime:', d.strftime(format))

ISO     : 2019-06-16 12:15:01.148124
strftime: Sun Jun 16 12:15:01 2019
strptime: Sun Jun 16 12:15:01 2019


In [17]:
import datetime

today = datetime.datetime.today()
print('ISO     :', today)
print('format(): {:%a %b %d %H:%M:%S %Y}'.format(today))

ISO     : 2019-06-16 12:15:49.943759
format(): Sun Jun 16 12:15:49 2019


### 时区

In [18]:
import datetime

min6 = datetime.timezone(datetime.timedelta(hours=-6))
plus6 = datetime.timezone(datetime.timedelta(hours=6))
d = datetime.datetime.now(min6)

print(min6, ':', d)
print(datetime.timezone.utc, ':',
      d.astimezone(datetime.timezone.utc))
print(plus6, ':', d.astimezone(plus6))

# convert to the current system timezone
d_system = d.astimezone()
print(d_system.tzinfo, '      :', d_system)

UTC-06:00 : 2019-06-16 04:16:15.490099-06:00
UTC : 2019-06-16 10:16:15.490099+00:00
UTC+06:00 : 2019-06-16 16:16:15.490099+06:00
?¡Â?¡¤??¨¢?¨º¡À       : 2019-06-16 12:16:15.490099+02:00


## Collections — Container Data Types

- ChainMap — Search Multiple Dictionaries
- Counter — Count Hashable Objects
- defaultdict — Missing Keys Return a Default Value
- deque — Double-Ended Queue
- namedtuple — Tuple Subclass with Named Fields
- OrderedDict — Remember the Order Keys are Added to a Dictionary
- collections.abc — Abstract Base Classes for Containers

Python拥有一些内置的数据类型，比如str，int，list，tuple,dict等，
- collections模块在这些内置数据类型的基础上，提供了几个额外的数据类型
- namedtuple():生成可以使用名字来访问元素内容的tuple子类
- deque:双端队列，可以快速的从另外一侧追加和推出对象
- Counter:计数器，主要用来计数
- OrderedDict:有序字典
- defaultdict:带有默认值的字典


### NamedTuple

In [22]:
bob = ('Bob', 30, 'male')
print('Representation:', bob)

jane = ('Jane', 29, 'female')
print('\nField by index:', jane[0])

print('\nFields by index:')
for p in [bob, jane]:
    print('{} is a {} year old {}'.format(*p))
    
# *p 将元组解包

Representation: ('Bob', 30, 'male')

Field by index: Jane

Fields by index:
Bob is a 30 year old male
Jane is a 29 year old female


In [21]:
# defining
import collections

Person = collections.namedtuple('Person', 'name age')

bob = Person(name='Bob', age=30)
print('\nRepresentation:', bob)

jane = Person(name='Jane', age=29)
print('\nField by name:', jane.name)

print('\nFields by index:')
for p in [bob, jane]:
    print('{} is {} years old'.format(*p))


Representation: Person(name='Bob', age=30)

Field by name: Jane

Fields by index:
Bob is 30 years old
Jane is 29 years old


In [23]:
# change will trigger AttributeError

import collections

Person = collections.namedtuple('Person', 'name age')

pat = Person(name='Pat', age=12)
print('\nRepresentation:', pat)

pat.age = 21


Representation: Person(name='Pat', age=12)


AttributeError: can't set attribute

In [24]:
import collections

try:
    collections.namedtuple('Person', 'name class age')
except ValueError as err:
    print(err)

try:
    collections.namedtuple('Person', 'name age age')
except ValueError as err:
    print(err)

Type names and field names cannot be a keyword: 'class'
Encountered duplicate field name: 'age'


In [25]:
# 特殊属性
import collections

Person = collections.namedtuple('Person', 'name age')

bob = Person(name='Bob', age=30)
print('Representation:', bob)
print('Fields:', bob._fields)

Representation: Person(name='Bob', age=30)
Fields: ('name', 'age')


In [26]:
import collections

Person = collections.namedtuple('Person', 'name age')

bob = Person(name='Bob', age=30)
print('Representation:', bob)
print('As Dictionary:', bob._asdict())

Representation: Person(name='Bob', age=30)
As Dictionary: OrderedDict([('name', 'Bob'), ('age', 30)])


### deque

In [27]:
import collections

d = collections.deque('abcdefg')
print('Deque:', d)
print('Length:', len(d))
print('Left end:', d[0])
print('Right end:', d[-1])

d.remove('c')
print('remove(c):', d)

Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
Length: 7
Left end: a
Right end: g
remove(c): deque(['a', 'b', 'd', 'e', 'f', 'g'])


In [28]:
# pop
import collections

# Add to the right
d1 = collections.deque()
d1.extend('abcdefg')
print('extend    :', d1)
d1.append('h')
print('append    :', d1)

# Add to the left
d2 = collections.deque()
d2.extendleft(range(6))
print('extendleft:', d2)
d2.appendleft(6)
print('appendleft:', d2)

extend    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
append    : deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
extendleft: deque([5, 4, 3, 2, 1, 0])
appendleft: deque([6, 5, 4, 3, 2, 1, 0])


In [29]:
import collections

print('From the right:')
d = collections.deque('abcdefg')
while True:
    try:
        print(d.pop(), end='')
    except IndexError:
        break
print()

print('\nFrom the left:')
d = collections.deque(range(6))
while True:
    try:
        print(d.popleft(), end='')
    except IndexError:
        break
print()

From the right:
gfedcba

From the left:
012345


In [30]:
# rotate
import collections

d = collections.deque(range(10))
print('Normal        :', d)

d = collections.deque(range(10))
d.rotate(2)
print('Right rotation:', d)

d = collections.deque(range(10))
d.rotate(-2)
print('Left rotation :', d)

Normal        : deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right rotation: deque([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
Left rotation : deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])


In [31]:
# limit queue size
import collections
import random

# Set the random seed so we see the same output each time
# the script is run.
random.seed(1)

d1 = collections.deque(maxlen=3)
d2 = collections.deque(maxlen=3)

for i in range(5):
    n = random.randint(0, 100)
    print('n =', n)
    d1.append(n)
    d2.appendleft(n)
    print('D1:', d1)
    print('D2:', d2)

n = 17
D1: deque([17], maxlen=3)
D2: deque([17], maxlen=3)
n = 72
D1: deque([17, 72], maxlen=3)
D2: deque([72, 17], maxlen=3)
n = 97
D1: deque([17, 72, 97], maxlen=3)
D2: deque([97, 72, 17], maxlen=3)
n = 8
D1: deque([72, 97, 8], maxlen=3)
D2: deque([8, 97, 72], maxlen=3)
n = 32
D1: deque([97, 8, 32], maxlen=3)
D2: deque([32, 8, 97], maxlen=3)


### OrderedDict 
- Remember the Order Keys are Added to a Dictionary

In [32]:
import collections

print('Regular dictionary:')
d = {}
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'

for k, v in d.items():
    print(k, v)

print('\nOrderedDict:')
d = collections.OrderedDict()
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'

for k, v in d.items():
    print(k, v)

Regular dictionary:
a A
b B
c C

OrderedDict:
a A
b B
c C


In [33]:
# Equality
import collections

print('dict       :', end=' ')
d1 = {}
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'

d2 = {}
d2['c'] = 'C'
d2['b'] = 'B'
d2['a'] = 'A'

print(d1 == d2)

print('OrderedDict:', end=' ')

d1 = collections.OrderedDict()
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'

d2 = collections.OrderedDict()
d2['c'] = 'C'
d2['b'] = 'B'
d2['a'] = 'A'

print(d1 == d2)

dict       : True
OrderedDict: False


In [None]:
# reorder
import collections

d = collections.OrderedDict(
    [('a', 'A'), ('b', 'B'), ('c', 'C')]
)

print('Before:')
for k, v in d.items():
    print(k, v)

d.move_to_end('b')

print('\nmove_to_end():')
for k, v in d.items():
    print(k, v)

d.move_to_end('b', last=False)

print('\nmove_to_end(last=False):')
for k, v in d.items():
    print(k, v)

### random 
- Pseudorandom Number Generators
- 是用于生成随机数的，我们可以利用它随机生成数字或者选择字符串。

生成方法：有随机正态分布，帕累托分布，高斯分布，阝分布，v分布，三角分布，威布尔分布等等各种函数

一般计算机的随机数都是伪随机数，以一个真随机数（种子）作为初始条件，然后用一定的算法不停迭代产生随机数

使用方法
- random.random()
- randint(a，b)
- random.unifom(a，b)
- choice(seq)
- random.sample(sequence，k)
- random.seed(\[x\])


In [38]:
import random

for i in range(5):
    print('%04.3f' % random.random(), end=' ')
print()

0.836 0.433 0.762 0.002 0.445 


In [35]:
import random

for i in range(5):
    print('{:04.3f}'.format(random.uniform(1, 100)), end=' ')
print()

49.298 89.438 39.591 61.136 76.949 


In [36]:
# seeding
import random

random.seed(1)

for i in range(5):
    print('{:04.3f}'.format(random.random()), end=' ')
print()

0.134 0.847 0.764 0.255 0.495 


In [41]:
# state
import random
import os
import pickle

if os.path.exists('state.dat'):
    # Restore the previously saved state
    print('Found state.dat, initializing random module')
    with open('state.dat', 'rb') as f:
        state = pickle.load(f)
    random.setstate(state)
else:
    # Use a well-known start state
    print('No state.dat, seeding')
    random.seed(1)

# Produce random values
for i in range(3):
    print('{:04.3f}'.format(random.random()), end=' ')
print()

# Save state for next time
with open('state.dat', 'wb') as f:
    pickle.dump(random.getstate(), f)

# Produce more random values
print('\nAfter saving state:')
for i in range(3):
    print('{:04.3f}'.format(random.random()), end=' ')
print()

Found state.dat, initializing random module
0.652 0.789 0.094 

After saving state:
0.028 0.836 0.433 


In [42]:
# Random Integers randint

import random

print('[1, 100]:', end=' ')

for i in range(3):
    print(random.randint(1, 100), end=' ')

print('\n[-5, 5]:', end=' ')
for i in range(3):
    print(random.randint(-5, 5), end=' ')
print()

[1, 100]: 98 99 1 
[-5, 5]: 2 -1 -2 


In [43]:
# randrange
import random

for i in range(3):
    print(random.randrange(0, 101, 5), end=' ')
print()

90 15 50 


In [44]:
# Picking Random Items
import random
import itertools

outcomes = {
    'heads': 0,
    'tails': 0,
}
sides = list(outcomes.keys())

for i in range(10000):
    outcomes[random.choice(sides)] += 1

print('Heads:', outcomes['heads'])
print('Tails:', outcomes['tails'])

Heads: 5059
Tails: 4941


In [None]:
# 洗牌 Permutations

In [45]:
# 抽样, 没有这个文件的，大家可以随机写一个，包含多个单词的文件
import random

with open('/usr/share/dict/words', 'rt') as f:
    words = f.readlines()
words = [w.rstrip() for w in words]

for w in random.sample(words, 5):
    print(w)

FileNotFoundError: [Errno 2] No such file or directory: '/usr/share/dict/words'

In [None]:
# 系统随机
import random
import time

print('Default initializiation:\n')

r1 = random.SystemRandom()
r2 = random.SystemRandom()

for i in range(3):
    print('{:04.3f}  {:04.3f}'.format(r1.random(), r2.random()))

print('\nSame seed:\n')

seed = time.time()
r1 = random.SystemRandom(seed)
r2 = random.SystemRandom(seed)

for i in range(3):
    print('{:04.3f}  {:04.3f}'.format(r1.random(), r2.random()))

## Glob

- glob的应用场景是要寻找一系列（符合特定规则）文件名。
- glob模块是最简单的模块之一。用它可以查找符合特定规则的文件路径名。查找文件只用到三个匹配符:
 - \* ? \[\]
 - \*  匹配0个或多个字符；
 - ？ 匹配单个字符
 - \[\] 匹配指定范围内的字符 如：[0-9]匹配数字。

In [46]:
# 默认无序
import glob
for name in glob.glob('Lesson6/*'):
    print(name)

Lesson6\arg1.py
Lesson6\doctest_blankline_fail.py
Lesson6\doctest_simple.py
Lesson6\doctest_simple_with_docs.py
Lesson6\file1.txt
Lesson6\file1.txt.copy
Lesson6\file11.txt
Lesson6\file2.txt
Lesson6\file3.txt
Lesson6\file4.txt
Lesson6\file5.txt
Lesson6\file6.txt
Lesson6\fileA.txt
Lesson6\logging_level.py
Lesson6\logging_module.py
Lesson6\subdir
Lesson6\unittest_expectedfailure.py
Lesson6\unittest_fixtures.py
Lesson6\unittest_outcomes.py
Lesson6\unittest_simple.py
Lesson6\unittest_skip.py


In [47]:
import glob
for name in sorted(glob.glob('Lesson6/*')):
    print(name)

Lesson6\arg1.py
Lesson6\doctest_blankline_fail.py
Lesson6\doctest_simple.py
Lesson6\doctest_simple_with_docs.py
Lesson6\file1.txt
Lesson6\file1.txt.copy
Lesson6\file11.txt
Lesson6\file2.txt
Lesson6\file3.txt
Lesson6\file4.txt
Lesson6\file5.txt
Lesson6\file6.txt
Lesson6\fileA.txt
Lesson6\logging_level.py
Lesson6\logging_module.py
Lesson6\subdir
Lesson6\unittest_expectedfailure.py
Lesson6\unittest_fixtures.py
Lesson6\unittest_outcomes.py
Lesson6\unittest_simple.py
Lesson6\unittest_skip.py


In [48]:
# 子文件夹，不会进行递归，需要写出路径
import glob

print('Named explicitly:')
for name in sorted(glob.glob('Lesson6/subdir/*')):
    print('  {}'.format(name))

print('Named with wildcard:')
for name in sorted(glob.glob('Lesson6/*/*')):
    print('  {}'.format(name))

Named explicitly:
  Lesson6/subdir\subfile1.txt
  Lesson6/subdir\subfile2.txt
Named with wildcard:
  Lesson6\subdir\subfile1.txt
  Lesson6\subdir\subfile2.txt


In [49]:
# 单字符
import glob

for name in sorted(glob.glob('Lesson6/file?.txt')):
    print(name)

Lesson6\file1.txt
Lesson6\file2.txt
Lesson6\file3.txt
Lesson6\file4.txt
Lesson6\file5.txt
Lesson6\file6.txt
Lesson6\fileA.txt


In [50]:
# 限制字符
import glob
for name in sorted(glob.glob('Lesson6/file[0-9].*')):
    print(name)

Lesson6\file1.txt
Lesson6\file1.txt.copy
Lesson6\file2.txt
Lesson6\file3.txt
Lesson6\file4.txt
Lesson6\file5.txt
Lesson6\file6.txt


## shutil
- 文件移动，拷贝
- 目标拷贝，删除，移动
- 压缩文件
- 解压文件
- 查找文件
- 文件系统空间

In [51]:
# 拷贝文件
import glob
import shutil

print('BEFORE:', glob.glob('Lesson6/file*'))

BEFORE: ['Lesson6\\file1.txt', 'Lesson6\\file1.txt.copy', 'Lesson6\\file11.txt', 'Lesson6\\file2.txt', 'Lesson6\\file3.txt', 'Lesson6\\file4.txt', 'Lesson6\\file5.txt', 'Lesson6\\file6.txt', 'Lesson6\\fileA.txt']


In [52]:
shutil.copyfile('Lesson6/file1.txt', 'Lesson6/file1.txt.copy')

'Lesson6/file1.txt.copy'

In [53]:
print('AFTER:', glob.glob('Lesson6/file*'))

AFTER: ['Lesson6\\file1.txt', 'Lesson6\\file1.txt.copy', 'Lesson6\\file11.txt', 'Lesson6\\file2.txt', 'Lesson6\\file3.txt', 'Lesson6\\file4.txt', 'Lesson6\\file5.txt', 'Lesson6\\file6.txt', 'Lesson6\\fileA.txt']


### 目录操作

In [54]:
# 目录操作
import glob
import pprint
import shutil

print('BEFORE:')
pprint.pprint(glob.glob('Lesson6/*'))

BEFORE:
['Lesson6\\arg1.py',
 'Lesson6\\doctest_blankline_fail.py',
 'Lesson6\\doctest_simple.py',
 'Lesson6\\doctest_simple_with_docs.py',
 'Lesson6\\file1.txt',
 'Lesson6\\file1.txt.copy',
 'Lesson6\\file11.txt',
 'Lesson6\\file2.txt',
 'Lesson6\\file3.txt',
 'Lesson6\\file4.txt',
 'Lesson6\\file5.txt',
 'Lesson6\\file6.txt',
 'Lesson6\\fileA.txt',
 'Lesson6\\logging_level.py',
 'Lesson6\\logging_module.py',
 'Lesson6\\subdir',
 'Lesson6\\unittest_expectedfailure.py',
 'Lesson6\\unittest_fixtures.py',
 'Lesson6\\unittest_outcomes.py',
 'Lesson6\\unittest_simple.py',
 'Lesson6\\unittest_skip.py']


In [55]:
shutil.copytree('Lesson6/subdir', 'Lesson6/subdir_copy')

'Lesson6/subdir_copy'

In [56]:
print('\nAFTER:')
pprint.pprint(glob.glob('Lesson6/*/*'))


AFTER:
['Lesson6\\subdir\\subfile1.txt',
 'Lesson6\\subdir\\subfile2.txt',
 'Lesson6\\subdir_copy\\subfile1.txt',
 'Lesson6\\subdir_copy\\subfile2.txt']


In [57]:
# rm tree
shutil.rmtree('Lesson6/subdir_copy')

In [58]:
print('\nAFTER:')
pprint.pprint(glob.glob('Lesson6/*/*'))


AFTER:
['Lesson6\\subdir\\subfile1.txt', 'Lesson6\\subdir\\subfile2.txt']


### Move

In [59]:
# mv
shutil.copytree('Lesson6/subdir', 'Lesson6/subdir_mv_test')

'Lesson6/subdir_mv_test'

In [60]:
shutil.move('Lesson6/subdir_mv_test', 'Lesson6/subdir_mv_out')

'Lesson6/subdir_mv_out'

In [61]:
print('\nAFTER:')
pprint.pprint(glob.glob('Lesson6/*/*'))


AFTER:
['Lesson6\\subdir\\subfile1.txt',
 'Lesson6\\subdir\\subfile2.txt',
 'Lesson6\\subdir_mv_out\\subfile1.txt',
 'Lesson6\\subdir_mv_out\\subfile2.txt']


In [62]:
shutil.rmtree('Lesson6/subdir_mv_out')

### 压缩文件夹

In [None]:
# pack
import shutil

for format, description in shutil.get_archive_formats():
    print('{:<5}: {}'.format(format, description))

In [None]:
shutil.make_archive(
    'example', 'zip',
    root_dir='Lesson6/',
    base_dir='subdir'
)

In [None]:
# unpack
import shutil

for format, exts, description in shutil.get_unpack_formats():
    print('{:<5}: {}, names ending in {}'.format(
        format, description, exts))

In [None]:
# 调整zip文件的路径
shutil.unpack_archive(
        'example.zip',
        extract_dir='tmp',
    )

### 系统空间

In [63]:
import shutil

total_b, used_b, free_b = shutil.disk_usage('.')

gib = 2 ** 30  # GiB == gibibyte
gb = 10 ** 9   # GB == gigabyte

print('Total: {:6.2f} GB  {:6.2f} GiB'.format(
    total_b / gb, total_b / gib))
print('Used : {:6.2f} GB  {:6.2f} GiB'.format(
    used_b / gb, used_b / gib))
print('Free : {:6.2f} GB  {:6.2f} GiB'.format(
    free_b / gb, free_b / gib))

Total: 208.55 GB  194.22 GiB
Used : 155.86 GB  145.16 GiB
Free :  52.68 GB   49.06 GiB


## 加密算法Hashlib

Python的hashlib提供了常见的摘要算法，如MD5,SHAI等等。它通过一个函数，把任意长度的数据转换为一个长度固定的数据串（通常用
16进制的字符串表示）

散列算法
- md5
- shal
- sha224
- sha256
- sha384
- sha512

![](./res/Lesson6_Hashlib.jpg)

In [64]:
import hashlib


print('Guaranteed:\n{}\n'.format(
    ', '.join(sorted(hashlib.algorithms_guaranteed))))
print('Available:\n{}'.format(
    ', '.join(sorted(hashlib.algorithms_available))))

Guaranteed:
blake2b, blake2s, md5, sha1, sha224, sha256, sha384, sha3_224, sha3_256, sha3_384, sha3_512, sha512, shake_128, shake_256

Available:
blake2b, blake2b512, blake2s, blake2s256, md4, md5, md5-sha1, mdc2, ripemd160, sha1, sha224, sha256, sha3-224, sha3-256, sha3-384, sha3-512, sha384, sha3_224, sha3_256, sha3_384, sha3_512, sha512, sha512-224, sha512-256, shake128, shake256, shake_128, shake_256, sm3, whirlpool


In [65]:
import hashlib

lorem = '''China should offer more help to Africa as it develops and 
effectively synergize its own progress with its efforts in facilitating 
Africa's development, President Xi Jinping told visiting Namibian President 
Hage Geingob on Thursday.
The goal is to further achieve mutual benefits and common development, 
particularly to boost the African nations' own capabilities to develop 
on an independent and sustained basis, Xi said in their talks on Thursday in Beijing.'''

### MD5

In [66]:
import hashlib

h = hashlib.md5()
h.update(lorem.encode('utf-8'))
print(h.hexdigest())

0f2c16a0419b82b9c3ab31d8415012d6


### SHA1

In [67]:
import hashlib

h = hashlib.sha1()
h.update(lorem.encode('utf-8'))
print(h.hexdigest())

c1cd2648812e7080e7aa02027ee1aaf91e05e0c7


### create by string name

In [68]:
h = hashlib.new('md5')
h.update(lorem.encode('utf-8'))
print(h.hexdigest())

0f2c16a0419b82b9c3ab31d8415012d6


In [69]:
# update

import hashlib

h = hashlib.md5()
h.update(lorem.encode('utf-8'))
all_at_once = h.hexdigest()


def chunkize(size, text):
    "Return parts of the text in size-based increments."
    start = 0
    while start < len(text):
        chunk = text[start:start + size]
        yield chunk
        start += size
    return


h = hashlib.md5()
for chunk in chunkize(64, lorem.encode('utf-8')):
    h.update(chunk)
line_by_line = h.hexdigest()

print('All at once :', all_at_once)
print('Line by line:', line_by_line)
print('Same        :', (all_at_once == line_by_line))

All at once : 0f2c16a0419b82b9c3ab31d8415012d6
Line by line: 0f2c16a0419b82b9c3ab31d8415012d6
Same        : True


## SYS

- sys模块包括一个用于在运行时追踪和更改解释器配置的服务集合，以及用于与当前程序之外的操作环境交互的资源
- 编译器配置
 - 内置版本信息
- 运行时环境
 - 命令行参数
 - 输入输出流
 - 返回状态
- 内存管理
 - 引用计数
- 追踪程序(Tracing)
 - 调用栈

In [None]:
import sys

print('Version info:')
print()
print('sys.version      =', repr(sys.version))
print('sys.version_info =', sys.version_info)
print('sys.hexversion   =', hex(sys.hexversion))
print('sys.api_version  =', sys.api_version)

In [None]:
import sys

print('This interpreter was built for:', sys.platform)

In [None]:
import sys


print('Name:', sys.implementation.name)
print('Version:', sys.implementation.version)
print('Cache tag:', sys.implementation.cache_tag)

In [None]:
import sys

if sys.flags.bytes_warning:
    print('Warning on bytes/str errors')
if sys.flags.debug:
    print('Debuging')
if sys.flags.inspect:
    print('Will enter interactive mode after running')
if sys.flags.optimize:
    print('Optimizing byte-code')
if sys.flags.dont_write_bytecode:
    print('Not writing byte-code files')
if sys.flags.no_site:
    print('Not importing "site"')
if sys.flags.ignore_environment:
    print('Ignoring environment')
if sys.flags.verbose:
    print('Verbose mode')

### 运行时环境

In [70]:
import sys

print('Arguments:', sys.argv)

Arguments: ['C:\\ProgramData\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py', '-f', 'C:\\Users\\likun\\AppData\\Roaming\\jupyter\\runtime\\kernel-891c9016-d2a1-4cf3-9e2d-951b56a36755.json']


In [None]:
import sys

print('STATUS: Reading from stdin', file=sys.stderr)

data = sys.stdin.read()

print('STATUS: Writing data to stdout', file=sys.stderr)

sys.stdout.write(data)
sys.stdout.flush()

print('STATUS: Done', file=sys.stderr)

In [None]:
import sys

exit_code = 0
sys.exit(exit_code)

### 内存管理

In [72]:
import sys

one = []
print('At start         :', sys.getrefcount(one))

two = one

print('Second reference :', sys.getrefcount(one))

del two

print('After del        :', sys.getrefcount(one))

At start         : 2
Second reference : 3
After del        : 2


In [71]:
# 查看不同类型所占字节数
import sys

objects = [
    [], (), {}, 'c', 'string', b'bytes', 1, 2.3
]

for obj in objects:
    print('{:>10} : {}'.format(type(obj).__name__,
                               sys.getsizeof(obj)))

      list : 64
     tuple : 48
      dict : 240
       str : 50
       str : 55
     bytes : 38
       int : 28
     float : 24


### 调用栈

In [None]:
import sys


def trace_calls(frame, event, arg):
    if event != 'call':
        return
    co = frame.f_code
    func_name = co.co_name
    if func_name == 'write':
        # Ignore write() calls from printing
        return
    func_line_no = frame.f_lineno
    func_filename = co.co_filename
    if not func_filename.endswith('sys_settrace_call.py'):
        # Ignore calls not in this module
        return
    caller = frame.f_back
    caller_line_no = caller.f_lineno
    caller_filename = caller.f_code.co_filename
    print('* Call to', func_name)
    print('*  on line {} of {}'.format(
        func_line_no, func_filename))
    print('*  from line {} of {}'.format(
        caller_line_no, caller_filename))
    return


def b():
    print('inside b()\n')


def a():
    print('inside a()\n')
    b()


sys.settrace(trace_calls)
a()

In [None]:
import functools
import sys


def trace_lines(frame, event, arg):
    if event != 'line':
        return
    co = frame.f_code
    func_name = co.co_name
    line_no = frame.f_lineno
    print('*  {} line {}'.format(func_name, line_no))


def trace_calls(frame, event, arg, to_be_traced):
    if event != 'call':
        return
    co = frame.f_code
    func_name = co.co_name
    if func_name == 'write':
        # Ignore write() calls from printing
        return
    line_no = frame.f_lineno
    filename = co.co_filename
    if not filename.endswith('sys_settrace_line.py'):
        # Ignore calls not in this module
        return
    print('* Call to {} on line {} of {}'.format(
        func_name, line_no, filename))
    if func_name in to_be_traced:
        # Trace into this function
        return trace_lines
    return


def c(input):
    print('input =', input)
    print('Leaving c()')


def b(arg):
    val = arg * 5
    c(val)
    print('Leaving b()')


def a():
    b(2)
    print('Leaving a()')


tracer = functools.partial(trace_calls, to_be_traced=['b'])
sys.settrace(tracer)
a()

## OS 模块

os模块提供了针对特定于平台的模块的包装器，如posix、nt和maco所有平台上可用的函数的API应该是相同的，因此用os模块提供了一些可移植性的接囗。然而，并非所有的功能都可以在每个平台上使用。部分管理功能在Windows中是不可用的。
- 查看文件系统
- 管理文件系统权限
- 创建以及删除目录
- 删除以及替换文件
- 管理进程环境
- 执行外部命令

In [73]:
import os
import sys

# 查看当前目录
print(sorted(os.listdir('.')))

['.ipynb_checkpoints', 'Lesson1.ipynb', 'Lesson6', 'Lesson6.ipynb', 'data', 'lesson2.ipynb', 'lesson3.ipynb', 'lesson4.ipynb', 'lesson5.ipynb', 'res', 'state.dat', 'test.ipynb']


In [74]:
#查看所有目录，包括文件和子目录
import os
import sys

root = '.'

for dir_name, sub_dirs, files in os.walk(root):
    print(dir_name)
    # Make the subdirectory names stand out with /
    sub_dirs = [n + '/' for n in sub_dirs]
    # Mix the directory contents together
    contents = sub_dirs + files
    contents.sort()
    # Show the contents
    for c in contents:
        print('  {}'.format(c))
    print()

.
  .ipynb_checkpoints/
  Lesson1.ipynb
  Lesson6.ipynb
  Lesson6/
  data/
  lesson2.ipynb
  lesson3.ipynb
  lesson4.ipynb
  lesson5.ipynb
  res/
  state.dat
  test.ipynb

.\.ipynb_checkpoints
  Lesson1-checkpoint.ipynb
  Lesson6-checkpoint.ipynb
  lesson2-checkpoint.ipynb
  lesson3-checkpoint.ipynb
  lesson4-checkpoint.ipynb
  lesson5-checkpoint.ipynb
  test-checkpoint.ipynb

.\data
  2.txt
  a1.txt
  example.jpeg
  example.txt
  example_code.py
  example_out.txt
  json.txt
  pickle.txt
  pickle_dict_func.txt
  seralize.txt
  test.txt

.\Lesson6
  arg1.py
  doctest_blankline_fail.py
  doctest_simple.py
  doctest_simple_with_docs.py
  file1.txt
  file1.txt.copy
  file11.txt
  file2.txt
  file3.txt
  file4.txt
  file5.txt
  file6.txt
  fileA.txt
  logging_level.py
  logging_module.py
  subdir/
  unittest_expectedfailure.py
  unittest_fixtures.py
  unittest_outcomes.py
  unittest_simple.py
  unittest_skip.py

.\Lesson6\subdir
  subfile1.txt
  subfile2.txt

.\res
  Functional_programming.

In [75]:
# 判断文件或者目录
import os
import sys

for entry in os.scandir('.'):
    if entry.is_dir():
        typ = 'dir'
    elif entry.is_file():
        typ = 'file'
    elif entry.is_symlink():
        typ = 'link'
    else:
        typ = 'unknown'
    print('{name} {typ}'.format(
        name=entry.name,
        typ=typ,
    ))

.ipynb_checkpoints dir
data dir
Lesson1.ipynb file
lesson2.ipynb file
lesson3.ipynb file
lesson4.ipynb file
lesson5.ipynb file
Lesson6 dir
Lesson6.ipynb file
res dir
state.dat file
test.ipynb file


In [76]:
### 文件系统权限
import os
import sys
import time

filename = 'Lesson6'

stat_info = os.stat(filename)

print('os.stat({}):'.format(filename))
print('  Size:', stat_info.st_size)
print('  Permissions:', oct(stat_info.st_mode))
print('  Owner:', stat_info.st_uid)
print('  Device:', stat_info.st_dev)
print('  Created      :', time.ctime(stat_info.st_ctime))
print('  Last modified:', time.ctime(stat_info.st_mtime))
print('  Last accessed:', time.ctime(stat_info.st_atime))


os.stat(Lesson6):
  Size: 4096
  Permissions: 0o40777
  Owner: 0
  Device: 3524748848
  Created      : Sun Jun 16 10:53:38 2019
  Last modified: Sun Jun 16 13:05:07 2019
  Last accessed: Sun Jun 16 13:05:08 2019


In [None]:
# 目录操作
import os

dir_name = 'os_directories_example'

print('Creating', dir_name)
os.makedirs(dir_name)

file_name = os.path.join(dir_name, 'example.txt')
print('Creating', file_name)
with open(file_name, 'wt') as f:
    f.write('example file')

print('Cleaning up')
os.unlink(file_name)
os.rmdir(dir_name)

In [77]:
# 管理进程环境

import os

print('Initial value:', os.environ.get('TESTVAR', None))
print('Child process:')
os.system('echo $TESTVAR')

os.environ['TESTVAR'] = 'THIS VALUE WAS CHANGED'

print()
print('Changed value:', os.environ['TESTVAR'])
print('Child process:')
os.system('echo $TESTVAR')

del os.environ['TESTVAR']

print()
print('Removed value:', os.environ.get('TESTVAR', None))
print('Child process:')
os.system('echo $TESTVAR')

Initial value: None
Child process:

Changed value: THIS VALUE WAS CHANGED
Child process:

Removed value: None
Child process:


0

In [78]:
import os

print('Starting:', os.getcwd())

print('Moving up one:', os.pardir)
os.chdir(os.pardir)

print('After move:', os.getcwd())

Starting: C:\Users\likun\OneDrive\桌面\Python\code
Moving up one: ..
After move: C:\Users\likun\OneDrive\桌面\Python


In [None]:
# 执行外部命令

import os

# Simple command
os.system('pwd')

In [None]:
import os

# Command with shell expansion
os.system('echo $TMPDIR')

In [None]:
import os
import time

print('Calling...')
os.system('date; (sleep 3; date) &')

print('Sleeping...')
time.sleep(5)

## argparse 
Command-Line Option and Argument Parsing

用于构建命令行参数和选项处理器工具

- 使用方法：
 - 创建命令行参数解析器
 - 参数定义
 - 解析一个命令

In [None]:
# 创建
import argparse
parser = argparse.ArgumentParser(
    description='This is a sample program',
)

In [79]:
import argparse

parser = argparse.ArgumentParser(description='Short sample app')

parser.add_argument('-a', action="store_true", default=False)
parser.add_argument('-b', action="store", dest="b")
parser.add_argument('-c', action="store", dest="c", type=int)

print(parser.parse_args(['-a', '-bval', '-c', '3']))

Namespace(a=True, b='val', c=3)


In [None]:
import argparse

parser = argparse.ArgumentParser(
    description='Example with long option names',
)

parser.add_argument('--noarg', action="store_true",
                    default=False)
parser.add_argument('--witharg', action="store",
                    dest="witharg")
parser.add_argument('--witharg2', action="store",
                    dest="witharg2", type=int)

print(
    parser.parse_args(
        ['--noarg', '--witharg', 'val', '--witharg2=3']
    )
)

In [None]:
# restart kernel
%%bash
python3 Lesson6/arg1.py -s simple

## Logging

In [None]:
# to file
import logging

LOG_FILENAME = 'logging_example.out'
logging.basicConfig(
    filename=LOG_FILENAME,
    level=logging.DEBUG,
)

logging.debug('This message should go to the log file')

with open(LOG_FILENAME, 'rt') as f:
    body = f.read()

print('FILE:')
print(body)

In [None]:
# rotate
import glob
import logging
import logging.handlers

LOG_FILENAME = 'logging_rotatingfile_example.out'

# Set up a specific logger with our desired output level
my_logger = logging.getLogger('MyLogger')
my_logger.setLevel(logging.DEBUG)

# Add the log message handler to the logger
handler = logging.handlers.RotatingFileHandler(
    LOG_FILENAME,
    maxBytes=20,
    backupCount=5,
)
my_logger.addHandler(handler)

# Log some messages
for i in range(20):
    my_logger.debug('i = %d' % i)

# See what files are created
logfiles = glob.glob('%s*' % LOG_FILENAME)
for filename in sorted(logfiles):
    print(filename)

In [None]:
# verbose level

In [None]:
%%bash
python3 Lesson6/logging_level.py debug

In [None]:
%%bash
python3 Lesson6/logging_module.py

## 开发测试

In [None]:
def my_function(a, b):
    """
    >>> my_function(2, 3)
    6
    >>> my_function('a', 3)
    'aaa'
    """
    return a * b

In [80]:
%%bash
python3 -m doctest -v Lesson6/doctest_simple.py

Couldn't find program: 'bash'


In [None]:
%%bash
python3 -m doctest -v Lesson6/doctest_simple_with_docs.py

In [None]:
%%bash
python3 -m doctest -v Lesson6/doctest_blankline_fail.py

### module test

In [None]:
def my_function(a, b):
    """
    >>> my_function(2, 3)
    6
    >>> my_function('a', 3)
    'aaa'
    """
    return a * b

import doctest
doctest.testmod()

## UnitTests

In [None]:
%%bash
python3 -m unittest -v Lesson6/unittest_simple.py

In [None]:
%%bash
python3 -m unittest -v Lesson6/unittest_outcomes.py

In [None]:
%%bash
python3 -u -m unittest -v Lesson6/unittest_fixtures.py

In [None]:
%%bash
python3 -m unittest -v Lesson6/unittest_skip.py

In [None]:
%%bash
python3 -m unittest -v Lesson6/unittest_expectedfailure.py