A very simple and quick way to load small excel files from clickboard.

In [8]:
import pandas as pd

df = pd.read_clipboard()
df

Unnamed: 0,ZN-870-29,Realcube,2019/3/5,shirt,19,17,$323.00
0,JQ-501-63,Zooxo,2019/7/9,book,30,14,$420.00
1,FI-165-58,Dabtype,2019/8/12,poster,7,23,$161.00
2,XP-005-55,Skipfire,2019/11/18,pen,7,29,$203.00
3,NB-917-18,Bluezoom,2019/4/18,poster,36,19,$684.00
4,MI-696-11,Zooveo,2019/10/17,pen,-1,30,$(30.00)
5,MQ-907-02,Babbleset,2019/10/27,poster,30,21,$630.00
6,NX-102-26,Fliptune,2019/10/16,book,40,28,"$1,120.00"
7,LE-516-00,Buzzbean,2019/6/17,poster,-3,16,$(48.00)
8,VD-518-20,Dabshots,2019/3/12,shirt,19,28,$532.00
9,OS-688-56,Fiveclub,2019/1/25,book,39,22,$858.00


It is possible to load only specific rows / columns in a spread sheet.

`openpyxl` is a Python library to read/write Excel 2010 xlsx/xlsm files.

In [12]:
import openpyxl

df = pd.read_excel('sample_sales.xlsx', usecols="A:C", engine='openpyxl')
df.head()

Unnamed: 0,invoice,company,purchase_date
0,ZN-870-29,Realcube,2019-03-05
1,JQ-501-63,Zooxo,2019-07-09
2,FI-165-58,Dabtype,2019-08-12
3,XP-005-55,Skipfire,2019-11-18
4,NB-917-18,Bluezoom,2019-04-18


Integrate multiple filters to select rows, similar to where condition in SQL statement.

In [29]:
df = pd.read_excel('sample_sales.xlsx', engine='openpyxl')

date_filter = df['purchase_date'].between('2019/3/5', '2019/3/15')
qty_filter = df['quantity'] >= 5
product_filter = df['product'].isin(['pen'])

df.loc[date_filter & qty_filter & product_filter, 'invoice']

11     KI-908-67
76     KS-847-13
89     KN-509-22
91     MY-548-03
243    PA-747-56
Name: invoice, dtype: object

Pivot table and cross tabs are similar functions in the dataframe.

Pivot table can only deal with dataframes.

Cross tabs can deal with dataframes and numpy arrays with normalization.

In [36]:
pivot_table = pd.pivot_table(df, index="company", columns="product", values=["quantity"], aggfunc="sum")
cross_tab = pd.crosstab(df["company"], df["product"], values=df["quantity"], aggfunc="sum")
print(pivot_table)
print(cross_tab)

          quantity                   
product       book   pen poster shirt
company                              
Abatz         64.0   7.0   39.0   NaN
Agivu         11.0   NaN    NaN  20.0
Aibox          2.0  46.0    NaN   NaN
Ailane        25.0  -3.0    0.0   NaN
Aimbo          NaN  34.0    NaN  -5.0
...            ...   ...    ...   ...
Zoonoodle     17.0  23.0    NaN  14.0
Zooveo         NaN  12.0   21.0  13.0
Zoovu         15.0   NaN    NaN  -2.0
Zooxo         30.0   NaN    NaN  85.0
Zoozzy         NaN  31.0   31.0  23.0

[351 rows x 4 columns]
product    book   pen  poster  shirt
company                             
Abatz      64.0   7.0    39.0    NaN
Agivu      11.0   NaN     NaN   20.0
Aibox       2.0  46.0     NaN    NaN
Ailane     25.0  -3.0     0.0    NaN
Aimbo       NaN  34.0     NaN   -5.0
...         ...   ...     ...    ...
Zoonoodle  17.0  23.0     NaN   14.0
Zooveo      NaN  12.0    21.0   13.0
Zoovu      15.0   NaN     NaN   -2.0
Zooxo      30.0   NaN     NaN   85.0


In [40]:
import numpy as np

a = np.array(["foo", "foo", "foo", "foo", "bar", "bar",
              "bar", "bar", "foo", "foo", "foo"], dtype=object)
b = np.array(["one", "one", "one", "two", "one", "one",
              "one", "two", "two", "two", "one"], dtype=object)
cross_tab = pd.crosstab(a, b)
cross_tab_norm = pd.crosstab(a, b, normalize='columns')
print(cross_tab)
print()
print(cross_tab_norm)

col_0  one  two
row_0          
bar      3    1
foo      4    3

col_0       one   two
row_0                
bar    0.428571  0.25
foo    0.571429  0.75


A Grouper allows the user to specify a groupby instruction for an object.

In [46]:
df = pd.DataFrame(
   {
       "Publish date": [
            pd.Timestamp("2000-01-02"),
            pd.Timestamp("2000-01-02"),
            pd.Timestamp("2000-01-09"),
            pd.Timestamp("2000-01-16")
        ],
        "ID": [0, 1, 2, 3],
        "Price": [10, 20, 30, 40]
    }
)

df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean()

Unnamed: 0_level_0,ID,Price
Publish date,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-01-02,0.5,15.0
2000-01-09,2.0,30.0
2000-01-16,3.0,40.0


Read tables inside an excel sheet. 

In [1]:
from openpyxl import load_workbook

In [2]:
# Retrieve the workbook.
wb = load_workbook(filename='sample_sales.xlsx')
wb.sheetnames

['sales data']

In [3]:
# Retrieve the sheet.
sheet = wb['sales data']
sheet

<Worksheet "sales data">

In [6]:
# Retrieve the table.
table1 = sheet.tables['Table1']
table1

<openpyxl.worksheet.table.Table object>
Parameters:
id=1, name='Table1', displayName='Table1', comment=None, ref='K1:O6', tableType=None, headerRowCount=1, insertRow=None, insertRowShift=None, totalsRowCount=None, totalsRowShown=False, published=None, headerRowDxfId=0, dataDxfId=None, totalsRowDxfId=None, headerRowBorderDxfId=None, tableBorderDxfId=None, totalsRowBorderDxfId=None, headerRowCellStyle=None, dataCellStyle=None, totalsRowCellStyle=None, connectionId=None, autoFilter=<openpyxl.worksheet.filters.AutoFilter object>
Parameters:
ref='K1:O6', filterColumn=[], sortState=None, sortState=None, tableColumns=[<openpyxl.worksheet.table.TableColumn object>
Parameters:
id=1, uniqueName=None, name='invoice', totalsRowFunction=None, totalsRowLabel=None, queryTableFieldId=None, headerRowDxfId=None, dataDxfId=None, totalsRowDxfId=None, headerRowCellStyle=None, dataCellStyle=None, totalsRowCellStyle=None, calculatedColumnFormula=None, totalsRowFormula=None, xmlColumnPr=None, extLst=None, <op

In [7]:
# Give the location of the table.
table1.ref

'K1:O6'