这一张主要讲解如何使用不同的包来读写Excel。 
假如只想通过Numpy/Pandas进行操作的话，功能虽然强大， 但是我们往往不需要完整的库， 所以是否有更加精炼的库(包)呢？


## 1. 读写包
### 1.1 何时使用何种包


这边介绍几种不同的
1. OpenPyXL
2. XlsxWriter
3. pyxlsx
4. xlrd
5. xlwt
6. xlutils

用于读、写和编辑Excel文件的包

|Excel文件格式|读|写|编辑|
|--|--|--|--|
|xlsx|OpenPyXL|OpenPyXL, XlsxWriter|OpenpyXL|
|xlsm|OpenPyXL|OpenXL, XlsxWriter|OpenPyXL|
|xltx, xltm|OpenPyXL|OpenPyXL|OpenPyXL|
|xlsb|Pyxlsb|-|-|
|xls, xlt|xlrd|xlwt|xlutils|

区别(重点光柱OpenPyXL和XlsxWriter)

在pandas中上述的包是一种引擎，即我们利用pandas包使用上面的哪种方式打开哪种excel文件，语法：`df.to_excel('filename.xlsx', engine='openpyxl')`

### 1.2 excel.py模块

### 1.3 OpenPyXL


1. 使用OpenXL读取文件

In [3]:
import pandas as pd
import openpyxl
import datetime as dt

In [4]:
# 读取文件,data_only:加载后文件自动关闭
book = openpyxl.load_workbook("../ori_writer/xl/stores.xlsx", data_only=True)

# 通过名称或者索引(从0开始)获取工作表对象
sheet = book["2019"]
sheet = book.worksheets[0]

In [5]:
# 获取所有工作表名称的列表
book.sheetnames

['2019', '2020', '2019-2020']

In [6]:
# 遍历所有工作表对象
for i in book.worksheets:
    print(i.title)

2019
2020
2019-2020


In [7]:
# 获取工作表的维度
sheet.max_row, sheet.max_column

(8, 6)

In [8]:
# 读取单个单元格的值
# 方法1
print(sheet['B6'].value)
# 方法2
print(sheet.cell(row=6, column=2).value)

Boston
Boston


In [10]:
# 使用excel模块（excel.py）来读取一个单元格区域的值
import excel
data = excel.read(book['2019'],(2,2),(8,6))
data[:2] #打印前2行内容

[['Store', 'Employees', 'Manager', 'Since', 'Flagship'],
 ['New York', 10, 'Sarah', datetime.datetime(2018, 7, 20, 0, 0), False]]

2. 调用OpenPyXL写入文件


In [11]:
import openpyxl

In [13]:
book = openpyxl.Workbook() #实例化一个独享

# 构建第一张表， 并给一个名字
sheet = book.active
sheet.title = 'Sheet1'

# 使用A1表示法，给A列1行位置单元格进行写数据
sheet['A1'].value = 'Hello 1'

# 或者使用单元格索引进行写(A2)
sheet.cell(row=2, column=1, value='Hello 2')

# 格式化: 填充颜色, 是否加粗
# https://blog.csdn.net/qq_44614026/article/details/109707265
from openpyxl.styles import colors, Font
font_format = Font(color='FF0000', bold=True)

# 设置单元格边框
from openpyxl.styles.borders import Side,Border
thin = Side(border_style='thin', color='FF0000')
sheet["A3"].value = "Hello 3"
sheet["A3"].font = font_format
sheet["A3"].border = Border(top=thin, left=thin,
                            right=thin, bottom=thin)


# 设置对齐方式
from openpyxl.styles import Alignment
sheet["A3"].alignment = Alignment(horizontal="center")
# 设置填充
from openpyxl.styles import PatternFill
sheet["A3"].fill = PatternFill(fgColor="FFFF00", fill_type="solid")

# 数字格式化（使用Excel的格式化字符串）
sheet['A4'].value = 3.3333
sheet['A4'].number_format = '0.00'

# 日期格式化
sheet["A5"].value = dt.date(2016, 10, 13)
sheet["A5"].number_format = "mm/dd/yy"

# 公式： 必须使用以逗号分割的英文公式名称
sheet["A6"].value = "=SUM(A4, 2)"

# 增加图片
from openpyxl.drawing.image import Image
sheet.add_image(Image('../ori_writer/images/python.png'), 'C1')

# 二维列表
data = [[None, "North", "South"],
        ["Last Year", 2, 5],
        ["This Year", 3, 6]]
excel.write(sheet, data, 'A10')

# 画图

from openpyxl.chart import BarChart, Reference
chart = BarChart()
# 图表l类型名字、横纵坐标名字
chart.type = "col"
chart.title = "Sales Per Region"
chart.x_axis.title = "Regions"
chart.y_axis.title = "Sales"
# reference可以理解设置表格横纵坐标索引， 类似excel对表格横纵坐标的设置
chart_data = Reference(sheet, min_row=11, min_col=1,
                       max_row=12, max_col=3)

chart_categories = Reference(sheet, min_row=10, min_col=2,
                             max_row=10, max_col=3)

# 在表中添加数据和类型
chart.add_data(chart_data, titles_from_data=True, from_rows=True)
chart.set_categories(chart_categories)

# sheet中挂表
sheet.add_chart(chart, "A15")
book.save('openpyxl.xlsx')


如果想写入Excel模板文件， 需要再保存前设置template属性为True, 为什么使用模板，这边暂时不懂什么意思

In [14]:
book = openpyxl.Workbook()
sheet = book.active
sheet["A1"].value = "This is a template"
book.template = True
book.save("template.xltx")

### XlsxWriter
顾名思义， 这个包只能用来读写Excel文件


In [15]:
import datetime as dt
import xlsxwriter
import excel

In [None]:
# 实例化工作表
book = xlsxwriter.Workbook("xlsxwriter.xlsx")

# A取名
sheet = book.add_worksheet("Sheet1")

# 写数据的两种方式
sheet.write("A1", "Hello 1")
sheet.write(1, 0, "Hello 2")

# 设置颜色、对齐方式、 加粗、 单元格颜色
formatting = book.add_format({"font_color": "#FF0000",
                              "bg_color": "#FFFF00",
                              "bold": True, "align": "center",
                              "border": 1, "border_color": "#FF0000"})
sheet.write("A3", "Hello 3", formatting)

# 数值格式
number_format = book.add_format({"num_format": "0.00"})
sheet.write("A4", 3.3333, number_format)

# 日期格式
date_format = book.add_format({"num_format": "mm/dd/yy"})
sheet.write("A5", dt.date(2016, 10, 13), date_format)

# 公式
sheet.write("A6", "=SUM(A4, 2)")

# 插入图像
sheet.insert_image(0, 2, "../ori_writer/images/python.png")

# 数据
data = [[None, "North", "South"],
        ["Last Year", 2, 5],
        ["This Year", 3, 6]]
excel.write(sheet, data, "A10")

# 插入表
chart = book.add_chart({"type": "column"})
chart.set_title({"name": "Sales per Region"})
chart.add_series({"name": "=Sheet1!A11",
                  "categories": "=Sheet1!B10:C10",
                  "values": "=Sheet1!B11:C11"})
chart.add_series({"name": "=Sheet1!A12",
                  "categories": "=Sheet1!B10:C10",
                  "values": "=Sheet1!B12:C12"})
chart.set_x_axis({"name": "Regions"})
chart.set_y_axis({"name": "Sales"})
sheet.insert_chart("A15", chart)

# 关闭
book.close()

### 1.5 pyxlsb

该包主要用于二进制的xlsb的文件， 这边不进行展开

### 1.6 xlrd、xlwt和xlutils

如果将xlrd、wlwt、wlutils结合起来， 它们可以为旧式xls格式的文件提供类似的功能： 

**xlrd读文件、 wlwt写文件、wlutils编辑xls文件。**

In [16]:
import xlrd
import xlwt
from xlwt.Utils import cell_to_rowcol2
import xlutils
import excel

# 加载数据
book = xlrd.open_workbook('../ori_writer/xl/stores.xls')





In [17]:
# 获取所有表格的名称
book.sheet_names()

['2019', '2020', '2019-2020']

In [19]:
for sheet in book.sheets():
    print(sheet, sheet.name)

Sheet  0:<2019> 2019
Sheet  1:<2020> 2020
Sheet  2:<2019-2020> 2019-2020


In [20]:
# 通过名称或索引(从0开始)获取工作表对象
sheet = book.sheet_by_name('2019')

# 等效于
sheet = book.sheet_by_index(0)


In [21]:
sheet.ncols

6

In [22]:
sheet.nrows

8

In [23]:
# 获取单元格数据(索引法)
sheet.cell(2,1).value

'New York'

In [25]:
# 获取单元格数据(A1法)
sheet.cell(*cell_to_rowcol2('B3')).value

'New York'

使用xlwt写入数据


In [44]:
import xlwt
from xlwt.Utils import cell_to_rowcol2
import datetime as dt
import excel

# 实例化表格
book = xlwt.Workbook()

# 添加工作表并为其命名
sheet = book.add_sheet('Sheet1')

# 单元格添加数据
sheet.write(*cell_to_rowcol2("A1"), "Hello 1")
sheet.write(r=1, c=0, label="Hello 2")

# 写数据
formatting = xlwt.easyxf("font: bold on, color red;"
                         "align: horiz center;"
                         "borders: top_color red, bottom_color red,"
                                  "right_color red, left_color red,"
                                  "left thin, right thin,"
                                  "top thin, bottom thin;"
                         "pattern: pattern solid, fore_color yellow;")
sheet.write(r=2, c=0, label="Hello 3", style=formatting)

# 单元格文本格式化
number_format = xlwt.easyxf(num_format_str="0.00")
sheet.write(3, 0, 3.3333, number_format)

# 日期格式化
date_format = xlwt.easyxf(num_format_str="mm/dd/yyyy")
sheet.write(4, 0, dt.datetime(2012, 2, 3), date_format)

# 公式
sheet.write(5, 0, xlwt.Formula("SUM(A4, 2)"))

# 数据
data = [[None, "North", "South"],
        ["Last Year", 2, 5],
        ["This Year", 3, 6]]
#excel.write(sheet, data, "A10")  // 这句话无法跑通
max_row = 3 # 手动设置数据的长宽，
max_columns = 3

for i in range(10 , 10 + max_row):
    for j in range(10, 10 + max_columns):
        sheet.write(i+1, j+1, data[i-10][j-10])




# 加图
sheet.insert_bitmap("../ori_writer/images/python.bmp", 0, 2)

# 保存
book.save("xlwt.xls")

使用xlutils编辑文件

xlutils作为xlrd和xlwt之间的桥梁，用于编辑文件




In [46]:
import xlutils.copy
book = xlrd.open_workbook("../ori_writer/xl/stores.xls", formatting_info=True)
book = xlutils.copy.copy(book)
book.get_sheet(0).write(0, 0, "changed!")
book.save("stores_edited.xls")

## 2.读写包的高级主题

一般使用python都是想解决大文件的Excel，默认的设置可能不够了，本节，我们学习如何处理大型excel文件

### 2.1 处理大型Excel文件
处理大型Excel文件可能会遇到2个问题： 1. 读写过程可能很慢；2.计算机的内存会不够，导致程序崩溃。

#### 1. 使用OpenPyXL写入文件

在使用OpenPyXL写入大型文件时， 一定要安装好lxml包， 因为lxml可以让写入过程更迅速，  使用`write_only=True`标志，它可以让内存消耗保持在较低的水平。 不过这个参数会通过`append`方法强制逐行写入，**并且不再允许写入单个单元格**。



In [47]:
book = openpyxl.Workbook(write_only=True)
# 创建一个表格
sheet = book.create_sheet()
# 生成一个 1000 * 200的单元格数据
for row in range(1000):
    sheet.append(list(range(200)))
book.save("openpyxl_optimized.xlsx")

#### 2. 使用XlsxWriter写入文件
XlsxWriter写入大文件，需要一个参数`constant_memory`， 它也会强制逐行写入。

In [48]:
book = xlsxwriter.Workbook("xlsxwriter_optimized.xlsx",
                           options={"constant_memory": True})
sheet = book.add_worksheet()
# This will produce a sheet with 1000 x 200 cells
for row in range(1000):
    sheet.write_row(row , 0, list(range(200)))
book.close()

#### 3. 使用xlrd读取文件


In [49]:
with xlrd.open_workbook("../ori_writer/xl/stores.xls", on_demand=True) as book:
    with pd.ExcelFile(book, engine="xlrd") as f:
        df = pd.read_excel(f, sheet_name=0)

#### 4. 使用OpenPyXL读取文件

要在使用OpenXL读取大型Excel文件时控制内存， 应该使用read_only=True来加载工作表， 由于OpenPyXL并不支持with语句，因此需要确保在工作完成时关闭文件。

如果你的文件保存只想外部工作簿的连接，还需要使用keep_links=False来加速读取过程，keep_links可以确保对外部工作簿的引用不会丢失。

In [50]:
book = openpyxl.load_workbook("../ori_writer/xl/big.xlsx",
                              data_only=True, read_only=True,
                              keep_links=False)
book.close()  # Required with read_only=True

#### 5. 并行读取工作表

在工作中，往往需要同时读取大型工作簿的多张工作表， 这个会花很长时间， 如何加速呢？


In [51]:
%time
data = pd.read_excel('../ori_writer/xl/big.xlsx', sheet_name=None, engine='openpyxl')


CPU times: total: 13.5 s
Wall time: 59.9 s


In [59]:
# 使用并行
import time
import parallel_pandas

s = time.time()
data = parallel_pandas.read_excel('../ori_writer/xl/big.xlsx', sheet_name=None)
print(time.time()-s)

12.77919602394104


知道了如何读取大型数据后，我们学习将DataFrame数据写到Excxel时如何将pandas和低级读写班哦啊结合起来以改进其默认格式。

### 2.2 调整DataFrame在Excel中的格式



1. 调整DatFrame索引和标题的格式
   

In [60]:
df = pd.DataFrame({"col1": [1, -2], "col2": [-3, 4]},
                   index=["row1", "row2"])
df.index.name = "ix"
df

Unnamed: 0_level_0,col1,col2
ix,Unnamed: 1_level_1,Unnamed: 2_level_1
row1,1,-3
row2,-2,4


In [61]:
# 使用OpenXL格式化索引和标题
from openpyxl.styles import PatternFill

with pd.ExcelWriter("formatting_openpyxl.xlsx",
                    engine="openpyxl") as writer:
    # 保存数据
    df.to_excel(writer, startrow=0, startcol=0)

    # 保存数据 从指定索引开始
    startrow, startcol = 0, 5
    df.to_excel(writer, header=False, index=False,
                startrow=startrow + 1, startcol=startcol + 1)
    
    # 设置填充风格
    sheet = writer.sheets["Sheet1"]
    style = PatternFill(fgColor="D9D9D9", fill_type="solid")

    # 2. 对每个单元格进行风格设置
    for i, col in enumerate(df.columns):
        sheet.cell(row=startrow + 1, column=i + startcol + 2,
                   value=col).fill = style

    # 3. 写入带样式的索引
    index = [df.index.name if df.index.name else None] + list(df.index)
    for i, row in enumerate(index):
        sheet.cell(row=i + startrow + 1, column=startcol + 1,
                   value=row).fill = style

In [None]:
# 使用XlsxWriter对索引和标题进行格式化
with pd.ExcelWriter("formatting_xlsxwriter.xlsx",
                    engine="xlsxwriter") as writer:
    # Write out the df with the default formatting to A1
    df.to_excel(writer, startrow=0, startcol=0)

    # Write out the df with custom index/header formatting to A6
    startrow, startcol = 0, 5
    # 1. Write out the data part of the DataFrame
    df.to_excel(writer, header=False, index=False,
                startrow=startrow + 1, startcol=startcol + 1)
    # Get the book and sheet object and create a style object
    book = writer.book
    sheet = writer.sheets["Sheet1"]
    style = book.add_format({"bg_color": "#D9D9D9"})

    # 2. Write out the styled column headers
    for i, col in enumerate(df.columns):
        sheet.write(startrow, startcol + i + 1, col, style)

    # 3. Write out the styled index
    index = [df.index.name if df.index.name else None] + list(df.index)
    for i, row in enumerate(index):
        sheet.write(startrow + i, startcol, row, style)

2. 格式化DataFrame的数据部分
   
   OpenPyXL可以对每一个单元格应用一种格式

   XlsxWriter只能对行或列应用格式。

   

In [62]:
# OpenPyXL设置单元格式

from openpyxl.styles import Alignment

with pd.ExcelWriter("data_format_openpyxl.xlsx",
                    engine="openpyxl") as writer:
    # Write out the DataFrame
    df.to_excel(writer)
    
    # Get the book and sheet objects
    book = writer.book
    sheet = writer.sheets["Sheet1"]
    
    # Formatting individual cells
    nrows, ncols = df.shape
    for row in range(nrows):
        for col in range(ncols):
            # +1 to account for the header/index
            # +1 since OpenPyXL is 1-based
            cell = sheet.cell(row=row + 2,
                              column=col + 2)
            cell.number_format = "0.000"
            cell.alignment = Alignment(horizontal="center")

In [63]:
# 使用xlsxWriter
with pd.ExcelWriter("data_format_xlsxwriter.xlsx",
                    engine="xlsxwriter") as writer:
    # Write out the DataFrame
    df.to_excel(writer)

    # Get the book and sheet objects
    book = writer.book
    sheet = writer.sheets["Sheet1"]
    
    # Formatting the columns (individual cells can't be formatted)
    number_format = book.add_format({"num_format": "0.000",
                                     "align": "center"})
    sheet.set_column(first_col=1, last_col=2,
                     cell_format=number_format)