Reading Excel Files
    - Read Excel files (.xlsx) and sheets into pandas DataFrames
    - How to export the DataFrame to different sheets and excel files
        - > Using pandas, ExcelWriter and to_excel methods

In [37]:
import pandas as pd

The read_excel Method 
    - read Excel files into a DataFrame
    - supports both XLS and XLSX file extensions from a local filesystem or URL 
    - has a broad set of parameters to configure how the data will be read and parsed
    - parameters very similar to parameters seen with the read_csv method
  
  - The most common parameters are as follows:
        - filepath: Path of the file to be read.
        - sheet_name: Strings are used for sheet names. Integers are used in zero-indexed sheet positions. Lists of strings/integers are used to request multiple sheets. Specify None to get all sheets.
        - header: Index of the row containing the names of the columns (None if none).
        - index_col: Index of the column or sequence of indexes that should be used as index of rows of the data.
        - names: Sequence containing the names of the columns (used together with header = None).
        - skiprows: Number of rows or sequence of row indexes to ignore in the load.
        - na_values: Sequence of values that, if found in the file, should be treated as NaN.
        - dtype: Dictionary in which the keys will be column names and the values will be types of NumPy to which their content must be converted.
        - parse_dates: Flag that indicates if Python should try to parse data with a format similar to dates as dates. You can enter a list of column names that must be joined for the parsing as a date.
        - date_parser: Function to use to try to parse dates.
        - nrows: Number of rows to read from the beginning of the file.
        - skip_footer: Number of rows to ignore at the end of the file.
        - squeeze: Flag that indicates that if the data read only contains one column the result is a Series instead of a DataFrame.
        - thousands: Character to use to detect the thousands separator.


Reading an Excel File
    - read_excel method: pass an filepath parameter indicating path where Excel file is
        - any valid string path is acceptable
        - string could be a URL
            -  valid URL schemes include HTTP, FTP, S3, and file. For file URLs, a host is expected
            -  ocal file could be: file://localhost/path/to/table.xlsx

In [38]:
# try to read our products.xlsx Excel file.
# contains records of products with its price, brand, description and merchant information on different sheets

df = pd.read_excel('products.xlsx')

In [39]:
df.head()

Unnamed: 0,product_id,price,merchant_id,brand,name
0,AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
1,AVpgMuGwLJeJML43KY_c,69.0,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
2,AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
3,AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
4,AVphUeKeilAPnD_x3-Be,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...


First row behavior with header parameter 
    - the excel file has following columns:
          - product_id
          - price
          - merchant_id
          - brand
          - name

In [40]:
# first row (0-index) of the data has that column names
# keep the implicit header=0 parameter to let Pandas assign this first row as headers


pd.read_excel('products.xlsx').head()

Unnamed: 0,product_id,price,merchant_id,brand,name
0,AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
1,AVpgMuGwLJeJML43KY_c,69.0,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
2,AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
3,AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
4,AVphUeKeilAPnD_x3-Be,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...


In [41]:
# overwrite this behavior defining explicitly the header parameter
pd.read_excel('products.xlsx',
              header=None).head()

Unnamed: 0,0,1,2,3,4
0,product_id,price,merchant_id,brand,name
1,AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
2,AVpgMuGwLJeJML43KY_c,69,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
3,AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
4,AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...


Adding index to data using index_col parameter 
    - pandas will automatically assign a numeric auto incremental index / row label starting with zero
    - case: there is a column that you feel would serve as a better index,
        - > override the default behavior by setting index_col property to a column
        - > index_col: takes numeric value or string for setting a column as index or list of numeric values for creating multi-index

In [42]:
# choosing the first column, product_id, as index (index=0) by passing zero to the index_col argument.

df = pd.read_excel('products.xlsx',
                   index_col=[0])

In [43]:
df.head()

Unnamed: 0_level_0,price,merchant_id,brand,name
product_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
AVpgMuGwLJeJML43KY_c,69.0,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
AVphUeKeilAPnD_x3-Be,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...


Selecting Specific Sheets
    - Excel files often have multiple sheets so ability to read a specific sheet or all of them is very important
    - pandas read_excel method takes an argument called sheet_name-> tells pandas which sheet to read in the data from
        - either use the sheet name or the sheet number
        - sheet numbers start with zero
        - first sheet will be the one loaded by default 


In [44]:
# You can change sheet by specifying sheet_name parameter
products = pd.read_excel('products.xlsx',
                         sheet_name='Products',
                         index_col='product_id')

In [45]:
products.head()

Unnamed: 0_level_0,price,merchant_id,brand,name
product_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
AVpgMuGwLJeJML43KY_c,69.0,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
AVphUeKeilAPnD_x3-Be,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...


In [46]:
merchants = pd.read_excel('products.xlsx',
                          sheet_name='Merchants',
                          index_col='merchant_id')

In [47]:
merchants.head()

Unnamed: 0_level_0,merchant
merchant_id,Unnamed: 1_level_1
1001,Bestbuy.com
1002,Walmart.com
1003,Bestbuy.com
1004,Growkart
1005,bhphotovideo.com


The ExcelFile class
    - another approach on reading Excel data 
    - using the ExcelFile class for parsing tabular Excel sheets into DataFrame objects
    - ExcelFile will let us work with sheets easily
          - > faster than read_excel method

In [48]:
excel_file = pd.ExcelFile('products.xlsx')

In [49]:
# can now explore the sheets on that Excel file with sheet_names:

excel_file.sheet_names

['Products', 'Descriptions', 'Merchants']

In [50]:
# parse specified sheet(s) into a Pandas' DataFrame using ExcelFile's parse() method

# Everytime we call parse() method, need to pass an explicit sheet_name parameter 
#       -> indicating which sheet from Excel file to be parsed
#       -> first sheet will be parsed by default

products = excel_file.parse('Products')

In [51]:
products.head()

Unnamed: 0,product_id,price,merchant_id,brand,name
0,AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
1,AVpgMuGwLJeJML43KY_c,69.0,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
2,AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
3,AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
4,AVphUeKeilAPnD_x3-Be,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...


In [52]:
products = excel_file.parse(sheet_name='Products',
                            header=0,
                            index_col='product_id')

In [53]:
products.head()

Unnamed: 0_level_0,price,merchant_id,brand,name
product_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
AVpgMuGwLJeJML43KY_c,69.0,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
AVphUeKeilAPnD_x3-Be,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...


In [54]:
products.dtypes

price          float64
merchant_id      int64
brand           object
name            object
dtype: object

In [55]:
merchants = excel_file.parse('Merchants',
                             index_col='merchant_id')

In [56]:
merchants.head()

Unnamed: 0_level_0,merchant
merchant_id,Unnamed: 1_level_1
1001,Bestbuy.com
1002,Walmart.com
1003,Bestbuy.com
1004,Growkart
1005,bhphotovideo.com


In [57]:
merchants.dtypes

merchant    object
dtype: object

Save to Excel file 
    - saving DataFrame as a excel file

In [58]:
products.head

<bound method NDFrame.head of                        price  merchant_id      brand  \
product_id                                             
AVphzgbJLJeJML43fA0o  104.99         1001      Sanus   
AVpgMuGwLJeJML43KY_c   69.00         1002    Boytone   
AVpe9FXeLJeJML43zHrq   23.99         1001      DENAQ   
AVpfVJXu1cnluZ0-iwTT  290.99         1001  DreamWave   
AVphUeKeilAPnD_x3-Be  244.01         1004     Yamaha   
...                      ...          ...        ...   
AVphFybdLJeJML43Wnza   64.95         1110        JBL   
AVpe_qIa1cnluZ0-bjrN  871.06         1002         HP   
AVphibxI1cnluZ0-DpxG   74.95         1238   Magellan   
AVpgrtW3ilAPnD_xv67M  294.35         1239   Pyle Pro   
AVpgibRDLJeJML43PTZX  129.99         1001       naxa   

                                                                   name  
product_id                                                               
AVphzgbJLJeJML43fA0o  Sanus VLF410B1 10-Inch Super Slim Full-Motion ...  
AVpgMuGwLJeJML43KY_

In [59]:
# to_excel() method:
#       -> fast, simple way to write a single DataFrame to an Excel file
#       -> required to pass a output file path.
#       -> openpyxl library should be installed in order to save Excel files. pip install openpyxl

products.to_excel('out.xlsx')

In [60]:
pd.read_excel('out.xlsx').head()

Unnamed: 0,product_id,price,merchant_id,brand,name
0,AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
1,AVpgMuGwLJeJML43KY_c,69.0,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
2,AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
3,AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
4,AVphUeKeilAPnD_x3-Be,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...


In [61]:
# specify the sheet name with sheet_name parameter:

products.to_excel('out.xlsx',
                  sheet_name='Products')

In [62]:
# Further calls of to_excel with different sheet names will overwrite the first sheet instead of adding additional sheets
# be aware that by removing the index, we'll lose that column

products.to_excel('out.xlsx',
                  index=None)

In [63]:
pd.read_excel('out.xlsx').head()

Unnamed: 0,price,merchant_id,brand,name
0,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
1,69.0,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
2,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
3,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
4,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...


Positioning Data with startrow and startcol
    - case: want to insert data into spreadsheet file at position somewhere other than top-left corner

In [64]:
# shift where the to_excel method writes the data by using:
#   - startrow to set the cell after which the first row will be printed
#   - startcol to set which cell after which the first column will be printed

In [65]:
products.to_excel('out.xlsx',
                  sheet_name='Products',
                  startrow=1,
                  startcol=2)

Saving Multiple Sheets 
    - write multiple sheets and or multiple DataFrames 
        - > need to create and ExcelWriter object
    - ExcelWriter object is included in Pandas module 
        - > used to open Excel files and handle write operations
        - > behaves almost exactly like the vanilla Python 'open' object 

In [66]:
writer = pd.ExcelWriter('out.xlsx')

In [67]:
writer

<pandas.io.excel._openpyxl.OpenpyxlWriter at 0x13452fad0>

In [68]:
# Instead of including the file pathname in the to_excel call
# use the ExcelWriter object writer instead

with writer:
    products.to_excel(writer, sheet_name='Products')

In [69]:
pd.read_excel('out.xlsx', sheet_name='Products').head()

Unnamed: 0,product_id,price,merchant_id,brand,name
0,AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
1,AVpgMuGwLJeJML43KY_c,69.0,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
2,AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
3,AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
4,AVphUeKeilAPnD_x3-Be,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...


In [None]:
# now add another Merchants sheet simply using the writer object:

with writer:
    merchants.to_excel(writer, sheet_name='Merchants')