## zip解压后乱码问题
https://www.cnblogs.com/CN-S/p/6566395.html

在zipfile.ZipFile中获得的filename有中日文则很大可能是乱码，这是因为

    在zip标准中，对文件名的 encoding 用的不是 unicode，而可能是各种软件根据系统的默认字符集来采用（此为猜测），而zipfile中根据文件 flag 检测的时候，只支持 cp437 和 utf-8。
    
具体zipfile模块中的源代码如下

    if flags & 0x800:
        # UTF-8 file names extension
        filename = filename.decode('utf-8')
    else:
        # Historical ZIP filename encoding
        filename = filename.decode('cp437')

可见编码被正确识别为utf8时的情况外，都会被识别并decode为cp437编码，但如果实际是gbk等其他编码时就变为乱码了。所以解决的方法在于被decode为cp437后重新再手动转为正确的编码。

具体操作

    cd anaconda3/

    find . -name zipfile.py
    ./pkgs/python-3.7.4-h265db76_1/lib/python3.7/zipfile.py
    ./lib/python3.7/zipfile.py

    vim ./lib/python3.7/zipfile.py

共有两处修改的地方

            if flags & 0x800:
                # UTF-8 file names extension
                filename = filename.decode('utf-8')
            else:
                # Historical ZIP filename encoding
                # filename = filename.decode('cp437')
                filename = filename.decode('utf-8')
                
            if zinfo.flag_bits & 0x800:
                # UTF-8 filename
                fname_str = fname.decode("utf-8")
            else:
                # fname_str = fname.decode("cp437")
                fname_str = fname.decode("utf-8")

# zipfile
zipFile模块式Python自带的模块，提供了对zip 文件的创建，读，写，追加，解压以及列出文件列表的操作

In [1]:
import zipfile

In [2]:
filepath = 'test.zip'

In [3]:
f = zipfile.ZipFile(filepath)

In [4]:
f.namelist()

['σÅÿσêåΦç¬τ╝ûτáüσÖ¿/',
 'σÅÿσêåΦç¬τ╝ûτáüσÖ¿/VAE _ Index Construction.ipynb',
 '__MACOSX/',
 '__MACOSX/σÅÿσêåΦç¬τ╝ûτáüσÖ¿/',
 '__MACOSX/σÅÿσêåΦç¬τ╝ûτáüσÖ¿/._VAE _ Index Construction.ipynb',
 'σÅÿσêåΦç¬τ╝ûτáüσÖ¿/.DS_Store',
 '__MACOSX/σÅÿσêåΦç¬τ╝ûτáüσÖ¿/._.DS_Store',
 'σÅÿσêåΦç¬τ╝ûτáüσÖ¿/ΘçÅσîûµèòΦ╡äΣ╕Äµ£║σÖ¿σ¡ªΣ╣áσà¼Σ╝ùσÅ╖.jpeg',
 '__MACOSX/σÅÿσêåΦç¬τ╝ûτáüσÖ¿/._ΘçÅσîûµèòΦ╡äΣ╕Äµ£║σÖ¿σ¡ªΣ╣áσà¼Σ╝ùσÅ╖.jpeg',
 'σÅÿσêåΦç¬τ╝ûτáüσÖ¿/README.md',
 '__MACOSX/σÅÿσêåΦç¬τ╝ûτáüσÖ¿/._README.md',
 'σÅÿσêåΦç¬τ╝ûτáüσÖ¿/Data Treatment.ipynb',
 '__MACOSX/σÅÿσêåΦç¬τ╝ûτáüσÖ¿/._Data Treatment.ipynb',
 'σÅÿσêåΦç¬τ╝ûτáüσÖ¿/σà¼Σ╝ùσÅ╖σÉÄΦè▒σ¢¡.png',
 '__MACOSX/σÅÿσêåΦç¬τ╝ûτáüσÖ¿/._σà¼Σ╝ùσÅ╖σÉÄΦè▒σ¢¡.png',
 '__MACOSX/._σÅÿσêåΦç¬τ╝ûτáüσÖ¿']

In [25]:
f.namelist()[9]

'σÅÿσêåΦç¬τ╝ûτáüσÖ¿/README.md'

## 读取zipfile中的文件内容

In [28]:
print(f.read(f.namelist()[9]).decode('utf8'))

# Variational AutoEncoder for Dimensionality Reduction in Finance
This notebook revolves on a full-cycle application of Variational AutoEncoders for dimensionality reduction.
It is designed to be used on finance data. 
It is also about exploring the possibility of Index creation.

## License
This code is under AGPLv3 license.

## The code
As dataset, data from Yahoo Finance was used and anonymized.

It is based on a pair of Python notebook files:

- Data Preparation
- Model, results and Index Creation

2019-04-10: First release



In [29]:
f.close()

In [30]:
zf = zipfile.ZipFile('test.zip')

## extractall(path=None, members=None, pwd=None)
把zipfile解压到指定的路径。

Extract all members from the archive to the current working
directory.

In [31]:
zf.extractall()

# FileOperationWrapper

In [1]:
import sys
sys.path.append('/Users/luoyonggui/PycharmProjects/mayiutils_n1/mayiutils/fileio')

In [4]:
from fileoperation_wrapper import FileOperationWrapper as fow

In [3]:
fow.readZipFile('test.zip', 9)

'# Variational AutoEncoder for Dimensionality Reduction in Finance\nThis notebook revolves on a full-cycle application of Variational AutoEncoders for dimensionality reduction.\nIt is designed to be used on finance data. \nIt is also about exploring the possibility of Index creation.\n\n## License\nThis code is under AGPLv3 license.\n\n## The code\nAs dataset, data from Yahoo Finance was used and anonymized.\n\nIt is based on a pair of Python notebook files:\n\n- Data Preparation\n- Model, results and Index Creation\n\n2019-04-10: First release\n'

# pickle.dump(obj, file, protocol=None, *, fix_imports=True)
The optional *protocol* argument tells the pickler to use the given protocol supported protocols are 0, 1, 2, 3 and 4.  The default protocol is 3; a backward-incompatible protocol designed for Python 3.

Specifying a negative protocol version selects the highest protocol version supported.  The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

In [1]:
import pickle

In [None]:
pickle.dump()