# Metadata

```yaml
Course:   DS 5100
Module:   06 Pandas
Topic:    Narrow vs Wide Tables
Author:   R.C. Alvarado
Date:     21 September 2022
```

# Narrow vs Wide Tables

In [1]:
import pandas as pd

In [2]:
pets = pd.Series("cat dog ferret snake turtle parraot".split()).sample(1000, replace=True).to_list()
people = pd.Series("A B C D E F G".split()).sample(1000, replace=True).to_list()   
NARROW = pd.DataFrame(dict(pet=pets, owner=people)).groupby(['owner', 'pet']).pet.count().to_frame('n')

This is a narrow table. 

It has few columns and many rows. 

Columns are types of things, and values in rows are either instances or subtypes.

In [3]:
NARROW

Unnamed: 0_level_0,Unnamed: 1_level_0,n
owner,pet,Unnamed: 2_level_1
A,cat,26
A,dog,19
A,ferret,25
A,parraot,35
A,snake,29
A,turtle,21
B,cat,20
B,dog,27
B,ferret,28
B,parraot,19


In [4]:
WIDE = NARROW.n.unstack()

This is a wide table. 

One column's values are projected onto the feature space (as columns).

The othe column becomes a unique list (as is the feature space).

In [2]:
WIDE

NameError: name 'WIDE' is not defined

Narrow columns are more manageable by databases.

Wide columns are more usable for analysis.

You can do these things with narrow tables using `.group_by()`

In [6]:
A = WIDE / WIDE.sum()

In [7]:
B = WIDE.T / WIDE.T.sum()

In [8]:
A.style.background_gradient()

pet,cat,dog,ferret,parraot,snake,turtle
owner,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,0.148571,0.134752,0.14881,0.207101,0.16763,0.12069
B,0.114286,0.191489,0.166667,0.112426,0.16185,0.114943
C,0.188571,0.12766,0.166667,0.100592,0.138728,0.149425
D,0.142857,0.170213,0.142857,0.142012,0.150289,0.183908
E,0.137143,0.120567,0.10119,0.142012,0.109827,0.126437
F,0.114286,0.148936,0.14881,0.153846,0.115607,0.166667
G,0.154286,0.106383,0.125,0.142012,0.156069,0.137931


In [27]:
B.style.background_gradient()

owner,A,B,C,D,E,F,G
pet,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
cat,0.167742,0.140845,0.226027,0.16129,0.195122,0.141844,0.195652
dog,0.122581,0.190141,0.123288,0.154839,0.138211,0.148936,0.108696
ferret,0.16129,0.197183,0.191781,0.154839,0.138211,0.177305,0.152174
parraot,0.225806,0.133803,0.116438,0.154839,0.195122,0.184397,0.173913
snake,0.187097,0.197183,0.164384,0.167742,0.154472,0.141844,0.195652
turtle,0.135484,0.140845,0.178082,0.206452,0.178862,0.205674,0.173913


In [28]:
NARROW.unstack()

Unnamed: 0_level_0,n,n,n,n,n,n
pet,cat,dog,ferret,parraot,snake,turtle
owner,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
A,26,19,25,35,29,21
B,20,27,28,19,28,20
C,33,18,28,17,24,26
D,25,24,24,24,26,32
E,24,17,17,24,19,22
F,20,21,25,26,20,29
G,27,15,21,24,27,24


# 1 Hot Encoding

Project values onto the column axis.

In [29]:
pd.get_dummies(NARROW.n) #.sum() #.sort_values(ascending=False).plot(style='.');

Unnamed: 0_level_0,Unnamed: 1_level_0,15,17,18,19,20,21,22,24,25,26,27,28,29,32,33,35
owner,pet,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
A,cat,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
A,dog,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
A,ferret,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
A,parraot,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
A,snake,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
A,turtle,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
B,cat,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
B,dog,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
B,ferret,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
B,parraot,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
