# Table of Contents
 <p><div class="lev1"><a href="#Introduction">Introduction</a></div><div class="lev2"><a href="#Import-modules-and-some-sample-data">Import modules and some sample data</a></div><div class="lev2"><a href="#Using-pandas-settings-to-control-output">Using pandas settings to control output</a></div><div class="lev2"><a href="#Third-Party-Plugins">Third Party Plugins</a></div>

# Introduction

IPython, pandas and matplotlib have a number of useful options you can use to make it easier to view and format your data. This notebook collects a bunch of them in one place. I hope this will be a useful reference.

The original blog posting is on http://pbpython.com/ipython-pandas-display-tips.html

## Import modules and some sample data

First, do our standard pandas, numpy and matplotlib imports as well as configure inline displays of plots.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline



One of the simple things we can do is override the default CSS to customize our DataFrame output.

This specific example is from - [Brandon Rhodes' talk at pycon](https://www.youtube.com/watch?v=5JnMutdy6Fw "Pandas From The Ground Up")

For the purposes of the notebook, I'm defining CSS as a variable but you could easily read in from a file as well.

In [27]:
CSS = """
body {
    margin: 0;
    font-family: Helvetica;
}
table.dataframe {
    border-collapse: collapse;
    border: none;
}
table.dataframe tr {
    border: none;
}
table.dataframe td, table.dataframe th {
    margin: 0;
    border: 1px solid white;
    padding-left: 0.25em;
    padding-right: 0.25em;
}
table.dataframe th:not(:empty) {
    background-color: #fec;
    text-align: left;
    font-weight: normal;
}
table.dataframe tr:nth-child(2) th:empty {
    border-left: none;
    border-right: 1px dashed #888;
}
table.dataframe td {
    border: 2px solid #ccf;
    background-color: #f4f4ff;
}
"""

Now add this CSS into the current notebook's HTML.

In [28]:
from IPython.core.display import HTML
HTML('<style>{}</style>'.format(CSS))

In [29]:
pd.set_option('display.width', 250)
pd.set_option('display.precision', 3)
df = pd.DataFrame(np.array([[1,2,3],[0.1,0.001,0.00001]]), index=('aaa','bbb'),columns=('lib', 'qty1', 'qty2'))
df.style.set_caption("Hover to highlight.")
print '             gggggg'
print ''
display(df)

             gggggg



Unnamed: 0,lib,qty1,qty2
aaa,$1.00,$2.00,$3.00
bbb,$0.10,$0.00,$0.00


In [3]:
SALES=pd.read_csv("add_igv.csv")
SALES.head()

Unnamed: 0,CHROM,POS,ID,REF,ALT,QUAL,FILTER,proband ID(GT:AD:DP:GQ:PL),parents,1KG.afr.freq,...,SOR,SegDup,SiPhy,VQSLOD,VarClass,VarFunc,culprit,In repetitive regions,75bp_mappability,IGV
0,17,59545005,.,ACCCCTTTGGC,A,1848.7,PASS,"JM847(0/1:84,52:136:99:1903,0,9580)","JM848(0/0:35,0:35:99:0,99,1485),JM849(0/0:37,0...",,...,0.722,,,3.35,frameshiftdeletion,exonic,FS,No,1.0,PASS
1,5,23527684,.,G,T,1627.01,PASS,"JM1321(0/1:59,62:121:99:1672,0,1659)","JM1319(0/0:33,0:33:99:0,99,1292),JM1320(0/0:49...",,...,0.717,"Score:0.927918,Name:chr16:90123418",4.067,4.44,nonsynonymousSNV,exonic,FS,No,0.25,PASS
2,13,20600785,.,C,T,1362.83,PASS,"JM630(0/1:44,45:89:99:1408,0,1350)","JM430(0/0:34,0:34:99:0,99,1437),JM431(0/0:34,0...",,...,0.937,,11.759,1.84,stopgain,exonic,ReadPosRankSum,No,1.0,PASS
3,15,41372034,.,G,A,1147.82,PASS,"JM0013(0/1:82,43:125:99:1193,0,2663)","JM576(0/0:33,0:33:99:0,99,1205),JM577(0/0:39,0...",,...,0.593,,,1.28,synonymousSNV,exonic,FS,No,1.0,PASS
4,11,126138600,.,G,T,2533.71,PASS,"FPPH133-01(0/1:150,98:248:99:2579,0,4352)","FPPH133-03(0/0:36,0:36:99:0,99,1485),FPPH133-0...",,...,0.836,,17.832,1.15,nonsynonymousSNV,exonic,QD,No,1.0,PASS


You can see how the CSS is now applied to the DataFrame and how you could easily modify it to customize it to your liking.

Jupyter notebooks do a good job of automatically displaying information but sometimes you want to force data to display. Fortunately, ipython provides and option. This is especially useful if you want to display multiple dataframes.

In [70]:
from IPython.display import display

In [71]:
display(SALES.head(2))
display(SALES.tail(2))
display(SALES.describe())

Unnamed: 0,CHROM,POS,ID,...,In repetitive regions,75bp_mappability,IGV
0,17,59545005,.,...,No,$1.00,PASS
1,5,23527684,.,...,No,$0.25,PASS


Unnamed: 0,CHROM,POS,ID,...,In repetitive regions,75bp_mappability,IGV
48,19,10597451,.,...,No,$1.00,PASS
49,16,84115458,.,...,No,$1.00,PASS


Unnamed: 0,POS,QUAL,1KG.afr.freq,...,SiPhy,VQSLOD,75bp_mappability
count,$50.00,$50.00,$2.00,...,$32.00,$50.00,$50.00
mean,$70441545.42,$1262.56,$0.00,...,$13.25,$1.18,$0.96
...,...,...,...,...,...,...,...
75%,$98204291.50,$1582.31,$0.00,...,$16.90,$1.97,$1.00
max,$233274548.00,$6421.90,$0.00,...,$19.55,$4.44,$1.00


## Using pandas settings to control output

Pandas has many different options to control how data is displayed.

You can use max_rows to control how many rows are displayed

In [72]:
pd.set_option("display.max_rows",4)

In [73]:
SALES

Unnamed: 0,CHROM,POS,ID,...,In repetitive regions,75bp_mappability,IGV
0,17,59545005,.,...,No,$1.00,PASS
1,5,23527684,.,...,No,$0.25,PASS
...,...,...,...,...,...,...,...
48,19,10597451,.,...,No,$1.00,PASS
49,16,84115458,.,...,No,$1.00,PASS


Depending on the data set, you may only want to display a smaller number of columns.

In [74]:
pd.set_option("display.max_columns",6)

In [75]:
SALES

Unnamed: 0,CHROM,POS,ID,...,In repetitive regions,75bp_mappability,IGV
0,17,59545005,.,...,No,$1.00,PASS
1,5,23527684,.,...,No,$0.25,PASS
...,...,...,...,...,...,...,...
48,19,10597451,.,...,No,$1.00,PASS
49,16,84115458,.,...,No,$1.00,PASS


You can control how many decimal points of precision to display

In [76]:
pd.set_option('precision',2)

In [77]:
SALES

Unnamed: 0,CHROM,POS,ID,...,In repetitive regions,75bp_mappability,IGV
0,17,59545005,.,...,No,$1.00,PASS
1,5,23527684,.,...,No,$0.25,PASS
...,...,...,...,...,...,...,...
48,19,10597451,.,...,No,$1.00,PASS
49,16,84115458,.,...,No,$1.00,PASS


In [78]:
pd.set_option('precision',7)

In [79]:
SALES

Unnamed: 0,CHROM,POS,ID,...,In repetitive regions,75bp_mappability,IGV
0,17,59545005,.,...,No,$1.00,PASS
1,5,23527684,.,...,No,$0.25,PASS
...,...,...,...,...,...,...,...
48,19,10597451,.,...,No,$1.00,PASS
49,16,84115458,.,...,No,$1.00,PASS


You can also format floating point numbers using float_format

In [80]:
pd.set_option('float_format', '{:.2f}'.format)

In [81]:
SALES

Unnamed: 0,CHROM,POS,ID,...,In repetitive regions,75bp_mappability,IGV
0,17,59545005,.,...,No,1.00,PASS
1,5,23527684,.,...,No,0.25,PASS
...,...,...,...,...,...,...,...
48,19,10597451,.,...,No,1.00,PASS
49,16,84115458,.,...,No,1.00,PASS


This does apply to all the data. In our example, applying dollar signs to everything would not be correct for this example.

In [82]:
pd.set_option('float_format', '${:.2f}'.format)

In [83]:
SALES

Unnamed: 0,CHROM,POS,ID,...,In repetitive regions,75bp_mappability,IGV
0,17,59545005,.,...,No,$1.00,PASS
1,5,23527684,.,...,No,$0.25,PASS
...,...,...,...,...,...,...,...
48,19,10597451,.,...,No,$1.00,PASS
49,16,84115458,.,...,No,$1.00,PASS


## Third Party Plugins

Qtopian has a useful plugin called qgrid - https://github.com/quantopian/qgrid

Import it and install it.

In [4]:
import qgrid
qgrid.nbinstall()

Showing the data is straighforward.

In [5]:
qgrid.show_grid(SALES, remote_js=True,grid_options={'forceFitColumns': False, 'defaultColumnWidth': 100})