# Reading Excel files

In order to interpret Excel files with Python we must use the OpenPYXL module.
The term workbook refers to an Excel file.
First let's open the file.

In [37]:
import openpyxl

wb = openpyxl.load_workbook('store.xlsx')

Now let's see the spread sheets it contains.

In [38]:
wb.sheetnames

['Products', 'Sales 2018']

The workbook object is iterable. Check it out!

In [39]:
for sheet in wb:
    print(sheet.title)

Products
Sales 2018


And this is how you access directly a worksheet. Pass in its name as index.

In [40]:
sheet = wb['Products']
sheet

<Worksheet "Products">

This is how you would get the active spread sheets in the document. The active spread sheet is thelast opened.

In [41]:
sheet = wb.active
print(sheet)

wb.active = wb['Products']
sheet = wb.active
print(sheet)

<Worksheet "Sales 2018">
<Worksheet "Products">


Ok. Now that we know how to select which sheet we'll be working with, let's see how to get the value within its cells.

In [42]:
print(sheet['b1'].value)
print(sheet['b2'].value)

Product Name
Mobile Phone


And here is another way to access the content of a cell.

In [43]:
sheet.cell(row=2, column=2).value

'Mobile Phone'

The reverse engineering of the previous command is getting the coordinates of a cell. This is how it's done.

In [44]:
cell = sheet['a2']
print(cell.row, cell.column)

2 1


It's possible to check the type of content a cell holds. The encoding scheme too.

In [45]:
cell_a2 = sheet['a2']
cell_b2 = sheet['b2']

print(cell_a2.value, cell_a2.data_type, cell_a2.encoding)
print(cell_b2.value, cell_b2.data_type, cell_a2.encoding)

1 n utf-8
Mobile Phone s utf-8


Wanna check to which spread sheet a cell belongs?

In [46]:
cell_a2.parent

<Worksheet "Products">

And there's a shit load more of stuff about a cell that you can check.

In [47]:
dir(cell_a2)

['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '_bind_value',
 '_comment',
 '_hyperlink',
 '_style',
 '_value',
 'alignment',
 'base_date',
 'border',
 'check_error',
 'check_string',
 'col_idx',
 'column',
 'column_letter',
 'comment',
 'coordinate',
 'data_type',
 'encoding',
 'fill',
 'font',
 'guess_types',
 'has_style',
 'hyperlink',
 'internal_value',
 'is_date',
 'number_format',
 'offset',
 'parent',
 'pivotButton',
 'protection',
 'quotePrefix',
 'row',
 'style',
 'style_id',
 'value']

Do you want to read a range of cells? No problem!

In [54]:
cell_range = sheet['b2':'c11']
for name, units in cell_range:
    print(f'Product: {name.value} \t Units: {units.value}')

Product: Mobile Phone 	 Units: 15
Product: Laptop 	 Units: 15
Product: Smart Watch 	 Units: 50
Product: Fitness Band 	 Units: 30
Product: VR Headset 	 Units: 20
Product: E-Reader 	 Units: 30
Product: Headphones 	 Units: 80
Product: Camera 	 Units: 20
Product: Game Console 	 Units: 25
Product: Video Projector 	 Units: 10


How big is a sheet? You can check it by using these variables.

In [55]:
print(sheet.dimensions)
print(sheet.max_column)
print(sheet.max_row)

A1:E11
5
11


Let's show the full content of a workbook!

In [80]:
import openpyxl

#data_only=False means formulas would return as formulas. If you want the values, set it to True.
wb = openpyxl.load_workbook('store.xlsx', data_only=False)

for sheet in wb:
    print(f'\n{sheet.title.upper()}')
    for row in sheet.rows:
        for cell in row:
            print(cell.value, end='\t')
        print('')


PRODUCTS
None	Product Name	Total Units	Unit Price	Total Amount	
1	Mobile Phone	15	400	6000	
2	Laptop	15	800	12000	
3	Smart Watch	50	150	7500	
4	Fitness Band	30	100	3000	
5	VR Headset	20	300	6000	
6	E-Reader	30	100	3000	
7	Headphones	80	80	6400	
8	Camera	20	600	12000	
9	Game Console	25	700	17500	
10	Video Projector	10	800	8000	

SALES 2018
None	Total Sales	
January	30000	
February	26000	
March	32000	
April	28000	
May	24000	
June	32000	
July	34000	
August	36000	
September	38000	
October	39000	
November	40000	
December	37000	


Perhaps it's more interesting to visualize each row of the sheet as a tuple. That's how it's done.

In [85]:
sheet = wb['Products']
for row in sheet.values:
    print(row)

(None, 'Product Name', 'Total Units', 'Unit Price', 'Total Amount')
(1, 'Mobile Phone', 15, 400, 6000)
(2, 'Laptop', 15, 800, 12000)
(3, 'Smart Watch', 50, 150, 7500)
(4, 'Fitness Band', 30, 100, 3000)
(5, 'VR Headset', 20, 300, 6000)
(6, 'E-Reader', 30, 100, 3000)
(7, 'Headphones', 80, 80, 6400)
(8, 'Camera', 20, 600, 12000)
(9, 'Game Console', 25, 700, 17500)
(10, 'Video Projector', 10, 800, 8000)
