# Making histograms with Python Pandas

The first step in the quantitative description of a collection of data consists in the set of all measures that can be performed analysing a single variable of interest. This is the domain of univariate statistical analysis. 

The most common measures refer to the central tendency (mean, median, mode) and to the dispersion (range, variance, standard deviation, etc.). However,  these measures cannot be sufficient for a complete characterisation of a dataset. For instance, let us suppose that we draw a sample of N customers and we would like to report some "key" features of related portfolios. After assessing the average and the median portfolio value for each customer or the range of ages of the sample, it could be interesting to see how many clients are distributed over the age, and how the amounts are distributed over geographic areas, professions, etc. 

In other words, histograms have to be produced. And the clearer they are the more informational the report will be. 
In this note, we will present some Python tools\&tricks to produce enjoyable plots and histograms to show in reports and articles.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [4]:
N = 1000

In [11]:
idcli = map(str.__add__,['ID']*N, np.array(range(N), str))
print idcli[0:10]

['ID0', 'ID1', 'ID2', 'ID3', 'ID4', 'ID5', 'ID6', 'ID7', 'ID8', 'ID9']


In [6]:
age = np.random.randint(20,80,N)

In [7]:
geo = np.array(np.random.randint(1,9,N), str)

geo_value = ['1', '2', '3', '4', '5', '6', '7', '8']
geo_area = ['N', 'E', 'S', 'W', 'NE', 'SE', 'SW', 'NW']

for j in range(len(geo_value)):
    geo[geo == geo_value[j]] = geo_area[j]

In [8]:
ctv = np.array(np.random.normal(100000,30000,N), int)
ctv[ctv <= 0] = np.random.randint(1,100000,1)

In [9]:
ninstr = np.array(np.random.normal(20,8,N), int)
ninstr[ninstr <= 0] = np.random.randint(1,30,1)

In [2]:
df = pd.DataFrame({'ID': idcli, 'AGE': age, 'GEO': geo, 'N_INSTR': ninstr, 'AMOUNT': ctv})

In [12]:
df[0:10]

Unnamed: 0,AGE,AMOUNT,GEO,ID,N_INSTR
0,43,156380,SE,ID0,13
1,43,169853,NE,ID1,23
2,25,98511,S,ID2,3
3,42,87769,S,ID3,16
4,77,113729,SE,ID4,22
5,20,46836,NW,ID5,18
6,27,81383,E,ID6,28
7,53,128798,W,ID7,42
8,50,70854,SE,ID8,22
9,64,98328,SE,ID9,12
