# Python 进阶教程
## 代码规范
### 1.1 名称
* 文件夹，小写或者小写下划线连接，比如：models, utils, train_utils...
* 文件，小写或者小写下划线连接，比如：train.py, multiple_gpus_train.py...
* 包名、模块名、函数名，小写或者小写下划线连接，比如：import torch, train_for_one_epoch(...)
* 类名，首字母大写式，比如：class AgentEncoder(nn.Module)
* 变量
  * 全局变量，大写或者大写下划线连接，比如：CONFIG = dict(), TRAIN_ARGS = dict()...
  * 局部变量，小写或者小写下划线连接，比如：optimizer = Optim.Adam(...), val_dataloader = DataLoader(...)

### 1.2 注释
* 方法必须使用标注注释，如果是公有方法或对外提供的API相关方法，则最好给出使用样例
* TODO注释：'#'+单个空格+'TODO'+单个空格+注释内容

## list, tuple, set, dict
| 类型    | 是否可修改    | 是否有序 | 元素要求   | 是否可哈希 | 转换         | 空定义                |
|-------|----------|------|--------|-------|------------|--------------------|
| list  | 是        | 是    | 无      | 否     | list(...)  | v = list()或v = []  |
| tuple | 否        | 是    | 无      | 是     | tuple(...) | v = tuple()或v = () |
| set   | 否        | 否    | 可哈希    | 否     | set(...)   | v = set()          |
| dict  | value可修改 | 否    | key可哈希 | 否     | dict(...)  | v = dict()或v = {}  |

__注意__：Python3.6之后，dict是有序的，体现在list(dict.keys())是按照插入的顺序返回的 \
__注意__：1, True, 1.0作为key的时候是一样的，最后只会出现一组键值对

In [109]:
list_v = [1, 2, (1, 2), "str", True, {1: 1, 2: 2}, {1, 2}, 1.0]
tuple_v = (1, 2, (1, 2), "str", True, {1: 1, 2: 2}, {1, 2}, 1.0)
set_v = {1, 2, (1, 2), "str", True}
dict_v = {(1, 2): 0, "str": 2, True: 2}
for item in [list_v, tuple_v, set_v, dict_v]:
    print(item)
print(list(dict_v.keys()))

[1, 2, (1, 2), 'str', True, {1: 1, 2: 2}, {1, 2}, 1.0]
(1, 2, (1, 2), 'str', True, {1: 1, 2: 2}, {1, 2}, 1.0)
{1, 2, (1, 2), 'str'}
{(1, 2): 0, 'str': 2, True: 2}
[(1, 2), 'str', True]


## pass, continue, break
* pass是空语句，作用是保持程序结构的完整性.
* continue结束本次循环，继续下一次循环
* break结束本次循环，跳出循环

In [110]:
for i in range(5):
    print(f"loop: {i}".center(20, '-'))
    print(f"the num is {i}")
    if i == 0:
        pass
    if i == 1:
        print("continue: the num is {}".format(i))
        continue
    else:
        print("the num is {}".format(i))
    if i == 3:
        print("break: the num is %d" % i)
        break
    else:
        print("the num is %d" % i)


------loop: 0-------
the num is 0
the num is 0
the num is 0
------loop: 1-------
the num is 1
continue: the num is 1
------loop: 2-------
the num is 2
the num is 2
the num is 2
------loop: 3-------
the num is 3
the num is 3
break: the num is 3


## is和==的区别
* ==：比较值是否相等
* is：比较内存地址是否相同

In [111]:
v1, v2 = [], []
v3 = v2
print(v1 == v2)
print(v1 is v2)
print(v2 == v3)
print(v2 is v3)

True
False
True
True


## 文件操作
### 1. 文件读取和写入
#### 1.1 文件读取
* f.read(...)    # 不给参数时默认读取整个文件，如果文件过大，内存可能不够，给定参数n后，可读取特定的n个字符或字节
* f.readline()   # 从当前光标位置逐行读取
* f.readlines()  # 从当前光标位置逐行读取剩下的所有内容，并返回一个list

#### 1.2 文件写入
* f.write(...)   # 只是将内容写入了缓冲区，缓冲区机制是到达一个容量后才写入硬盘
* f.flush()      # 将当前缓冲区内的内容立刻写入硬盘
__注意__：读取和写入的不同模式，会导致光标所在字节位置有变化

#### 1.3 上下文管理
可以实现自动关闭文件

In [112]:
file_object = open("materials/log.txt", mode='rt')
line = file_object.readline()
line = file_object.readlines()
file_object.close()
print(line)
assert len(line) > 0
file_object = open("materials/log.txt", mode='a')
if line[-1] != "this is a string written by file operation in Python. \n":
    file_object.write("this is a string written by file operation in Python. \n")
file_object.close()
with open("materials/log.txt", 'rt') as f:
    data = f.read()
    print(data)

['1. read\n', '2. write \n', 'this is a string written by file operation in Python. \n']
operation
1. read
2. write 
this is a string written by file operation in Python. 



### 2. 文件和文件夹路径相关

In [113]:
import os
base_dir = os.getcwd()
file_dir = os.path.join(base_dir, "materials", "log.txt")
print(base_dir, file_dir)
print(os.path.exists(file_dir))
print(os.path.isdir(file_dir))
print(os.path.isabs(file_dir))

/home/wsj/code_space/DL_Env_Configuration /home/wsj/code_space/DL_Env_Configuration/materials/log.txt
True
False
True


### 3. 文件移动、删除、复制
* shutil.rmtree(...)  # 参数为文件夹路径，将递归的删除文件夹内所有文件，最后删除该文件夹
* os.remove(...)  # 用于删除单个文件

In [114]:
import os
import shutil
file_dir = os.path.join(os.getcwd(), "materials", "log.txt")
new_file_dir = os.path.join(os.path.dirname(file_dir), 'log_copy.txt')
shutil.copyfile(file_dir, new_file_dir)
shutil.move(new_file_dir, os.path.join(os.path.dirname(new_file_dir), "log_rename.txt"))
os.remove(os.path.join(os.path.dirname(new_file_dir), "log_rename.txt"))

### 4. CSV格式文件
* csv文件是用逗号作为分隔符的纯文本数据存储格式
* 可以用传统文件读写操作处理，也可以使用第三方库：csv, pandas

In [115]:
import pandas as pd
df = pd.read_csv("materials/points.csv", header=0)
new_df = pd.DataFrame([["Shengjie Wu", 24, "NY", 99]])
if "Shengjie Wu" not in list(df["name"]):
    new_df.to_csv("materials/points.csv", mode='a', header=False, index=None)
print(list(df["point"]))
print(new_df)

[64, 92, 70, 70, 88, 57, 64, 92, 70, 70, 88, 570, 99]
             0   1   2   3
0  Shengjie Wu  24  NY  99


### 5. pickle格式文件
* 不同于普通的file函数只能存储和读取字符串格式的数据，pickle可以存储和读取其他格式比如list, dict, numpy.array()等
* pickle的经常存取的场景（保存和恢复状态）下读取效率相较file更加高效 \
__注意__：在深度学习数据预处理时，就可以将处理后的数据保存成*.pickle，训练时直接load即可

In [116]:
import pickle
import numpy as np
pickle_data = {
    "feats": np.random.rand(3, 3),
    "idcs": [0, 1, 2]
}
if not os.path.exists("materials/data.pickle"):
    with open("materials/data.pickle", mode="wb") as f:
        pickle.dump(pickle_data, f, protocol=pickle.HIGHEST_PROTOCOL)
        print(pickle_data)
else:
    with open("materials/data.pickle", mode="rb") as f:
        data = pickle.load(f)
        print(data)

{'feats': array([[0.33487029, 0.39792792, 0.66601897],
       [0.11761664, 0.14704708, 0.12283957],
       [0.25814025, 0.12651752, 0.74917641]]), 'idcs': [0, 1, 2]}


### 6. YAML格式文件
* 与Python一样采用缩进区分层级，需要同一层级文件缩进相同，但是不能用TAB，用相同数量的空格
* '#' 表示注释，从它开始到行尾都被忽略
* 大小写敏感
* 以‘-’开头会被转换为list
* 包含':'转换为dictionary
* 单引号内内容按照字符串输出，不会变成转义字符，双引号内内容存在转义字符会转换

In [117]:
import yaml
with open('materials/config.yml', encoding="utf-8") as f:
    config = yaml.safe_load(f)
    print(config[0])
with open('materials/config.yml', mode='a', encoding='utf-8') as f:
    new_config = {
        'url': '/api/user/login', 
        'method': 'post', 
        'detail': '正常登录', 
        'data': {
            'username': 'Shengjie Wu', 
            'passwd': 'aA123456'
        }, 
        'check': ['userId', 'sign', '111', 222]
    }
    if "Shengjie Wu" not in [x['data']['username'] for x in config]:
        yaml.dump([new_config], f)

{'url': '/api/user/login', 'method': 'post', 'detail': '正常登录', 'data': {'username': 'niuhanyang', 'passwd': 'aA123456'}, 'check': ['userId', 'sign', '111', 222]}
