# I/O 输入、输出

## 标准输入、输出

In [3]:
name = input('your name:')
gender = input('you are a boy?(y/n)')

welcome_str = 'Welcome to the matrix {prefix} {name}.'
welcome_dic = {
    'prefix': 'Mr.' if gender == 'y' else 'Mrs',
    'name': name
}

print('authorizing...')
print(welcome_str.format(**welcome_dic))


authorizing...
Welcome to the matrix Mrs douhua.


In [6]:
a = input()
b = input()

print('a + b = {}'.format(a + b))

print('type of a is {}, type of b is {}'.format(type(a), type(b)))

print('a + b = {}'.format(int(a) + int(b)))

a + b = 12
type of a is <class 'str'>, type of b is <class 'str'>
a + b = 3


## 文件输入、输出

In [5]:
import re

# 你不用太关心这个函数
def parse(text):
    # 使用正则表达式去除标点符号和换行符
    text = re.sub(r'[^\w ]', ' ', text)

    text = text.lower()
    
    # 生成所有单词的列表
    word_list = text.split(' ')
    
    # 去除空白单词
    word_list = filter(None, word_list)
    
    # 生成单词和词频的字典
    word_cnt = {}
    for word in word_list:
        if word not in word_cnt:
            word_cnt[word] = 0
        word_cnt[word] += 1
    
    # 按照词频排序
    sorted_word_cnt = sorted(word_cnt.items(), key=lambda kv: kv[1], reverse=True)
    
    return sorted_word_cnt

with open('in.txt', 'r') as fin:
    text = fin.read()

word_and_freq = parse(text)

with open('out.txt', 'w') as fout:
    for word, freq in word_and_freq:
        fout.write('{} {}\n'.format(word, freq))


## JSON 序列化

- json.dumps() 这个函数，接受 Python 的基本数据类型，然后将其序列化为 string；
- json.loads() 这个函数，接受一个合法字符串，然后将其反序列化为 Python 的基本数据类型。


In [8]:
import json

params = {
    'symbol': '123456',
    'type': 'limit',
    'price': 123.4,
    'amount': 23
}

params_str = json.dumps(params)
print('after json serialization')
print('type of params_str = {}, params_str = {}'.format(type(params_str), params_str))

original_params = json.loads(params_str)
print('after json deserialization')
print('type of original_params = {}, original_params = {}'.format(type(original_params), original_params))


after json serialization
type of params_str = <class 'str'>, params_str = {"symbol": "123456", "type": "limit", "price": 123.4, "amount": 23}
after json deserialization
type of original_params = <class 'dict'>, original_params = {'symbol': '123456', 'type': 'limit', 'price': 123.4, 'amount': 23}


In [10]:
with open('params.json', 'w') as fout:
    fout.write(params_str)

In [12]:
with open('params.json', 'r') as fin:
    obj = json.loads(fin.read())
obj

{'symbol': '123456', 'type': 'limit', 'price': 123.4, 'amount': 23}

## 思考题

你能否把 NLP 例子中的 word count 实现一遍？不过这次，in.txt 可能非常非常大（意味着你不能一次读取到内存中），而 out.txt 不会很大（意味着重复的单词数量很多）。


In [1]:
!python3 word_count.py

and 15
be 13
will 11
to 11
the 10
of 10
a 8
we 8
day 6
able 6
every 6
together 6
i 5
have 5
dream 5
that 5
one 5
with 5
this 5
in 4
shall 4
free 4
when 4
little 3
black 3
white 3
made 3
faith 3
at 3
last 3
children 2
nation 2
by 2
their 2
today 2
alabama 2
boys 2
girls 2
join 2
hands 2
mountain 2
places 2
all 2
it 2
our 2
hope 2
up 2
freedom 2
ring 2
from 2
god 2
men 2
my 1
four 1
live 1
where 1
they 1
not 1
judged 1
color 1
skin 1
but 1
content 1
character 1
down 1
its 1
vicious 1
racists 1
right 1
there 1
as 1
sisters 1
brothers 1
valley 1
exalted 1
hill 1
low 1
rough 1
plain 1
crooked 1
straight 1
glory 1
lord 1
revealed 1
flesh 1
see 1
is 1
hew 1
out 1
despair 1
stone 1
transform 1
jangling 1
discords 1
into 1
beautiful 1
symphony 1
brotherhood 1
work 1
pray 1
struggle 1
go 1
jail 1
stand 1
for 1
knowing 1
happens 1
allow 1
let 1
village 1
hamlet 1
state 1
city 1
speed 1
s 1
jews 1
gentiles 1
protestants 1
catholics 1
sing 1
words 1
old 1
negro 1
spiritual 1
thank 1
almighty 1
are 