# EDA > Pivot

<div class="alert alert-info">Create pivot tables, frequency tables, and crosstabs</div>

The `pivot` function creates frequency tables for a single variable or crosstabs for two variables. It supports various aggregation functions, normalization, and totals.

In [1]:
import polars as pl
import pyrsm as rsm

## setup pyrsm for autoreload
%reload_ext autoreload
%autoreload 2
%aimport pyrsm

# Diamonds Dataset

In [2]:
diamonds = pl.read_parquet("https://github.com/radiant-ai-hub/pyrsm/raw/refs/heads/main/examples/data/data/diamonds.parquet")
diamonds

price,carat,clarity,cut,color,depth,table,x,y,z,date
i32,f64,enum,enum,enum,f64,f64,f64,f64,f64,date
580,0.32,"""VS1""","""Ideal""","""H""",61.0,56.0,4.43,4.45,2.71,2012-02-26
650,0.34,"""SI1""","""Very Good""","""G""",63.4,57.0,4.45,4.42,2.81,2012-02-26
630,0.3,"""VS2""","""Very Good""","""G""",63.1,58.0,4.27,4.23,2.68,2012-02-26
706,0.35,"""VVS2""","""Ideal""","""H""",59.2,56.0,4.6,4.65,2.74,2012-02-26
1080,0.4,"""VS2""","""Premium""","""F""",62.6,58.0,4.72,4.68,2.94,2012-02-26
…,…,…,…,…,…,…,…,…,…,…
4173,1.14,"""SI1""","""Very Good""","""J""",63.3,55.0,6.6,6.67,4.2,2015-12-01
8396,1.51,"""SI1""","""Ideal""","""I""",61.2,60.0,7.39,7.37,4.52,2015-12-01
449,0.32,"""VS2""","""Premium""","""I""",62.6,58.0,4.37,4.42,2.75,2015-12-01
4370,0.91,"""VS1""","""Very Good""","""H""",62.1,59.0,6.17,6.2,3.84,2015-12-01


In [3]:
rsm.md("https://raw.githubusercontent.com/radiant-ai-hub/pyrsm/refs/heads/main/examples/data/data/diamonds_description.md")

## Diamond prices

Prices of 3,000 round cut diamonds

### Description

A dataset containing the prices and other attributes of a sample of 3000 diamonds. The variables are as follows:

### Variables

- price = price in US dollars ($338--$18,791)
- carat = weight of the diamond (0.2--3.00)
- clarity = a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
- cut = quality of the cut (Fair, Good, Very Good, Premium, Ideal)
- color = diamond color, from J (worst) to D (best)
- depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (54.2--70.80)
- table = width of top of diamond relative to widest point (50--69)
- x = length in mm (3.73--9.42)
- y = width in mm (3.71--9.29)
- z = depth in mm (2.33--5.58)
- date = shipment date

### Additional information

<a href="http://www.diamondse.info/diamonds-clarity.asp" target="_blank">Diamond search engine</a>


## Frequency Table (Single Variable)

Count occurrences of each value in a categorical column.

In [4]:
rsm.eda.pivot(diamonds, rows="cut")

cut,count
enum,u32
"""Premium""",771
"""Very Good""",677
"""Good""",275
"""Ideal""",1176
"""Fair""",101


In [5]:
rsm.eda.pivot(diamonds, rows="color")

color,count
enum,u32
"""G""",597
"""H""",454
"""D""",382
"""E""",554
"""J""",164
"""I""",284
"""F""",565


In [6]:
rsm.eda.pivot(diamonds, rows="cut", values="price")

cut,price_mean
enum,f64
"""Premium""",4369.40856
"""Ideal""",3470.223639
"""Very Good""",3959.915805
"""Good""",4130.432727
"""Fair""",4505.237624


## Frequency Table with Percentages

In [7]:
rsm.eda.pivot(diamonds, rows="cut", normalize="total")

cut,count,count_pct
enum,u32,f64
"""Good""",275,9.166667
"""Fair""",101,3.366667
"""Very Good""",677,22.566667
"""Ideal""",1176,39.2
"""Premium""",771,25.7


## Frequency Table with Totals

In [8]:
rsm.eda.pivot(diamonds, rows="cut", totals=True)

cut,count
str,f64
"""Ideal""",1176.0
"""Very Good""",677.0
"""Fair""",101.0
"""Premium""",771.0
"""Good""",275.0
"""Total""",3000.0


## Crosstab (Two Variables)

Cross-tabulate two categorical variables to see their joint distribution.

In [9]:
rsm.eda.pivot(diamonds, rows="cut", cols="color")

cut,E,G,F,H,D,J,I
enum,f64,f64,f64,f64,f64,f64,f64
"""Premium""",144.0,156.0,119.0,132.0,92.0,45.0,83.0
"""Ideal""",194.0,254.0,238.0,169.0,163.0,53.0,105.0
"""Good""",62.0,40.0,55.0,37.0,35.0,20.0,26.0
"""Fair""",14.0,16.0,17.0,21.0,15.0,7.0,11.0
"""Very Good""",140.0,131.0,136.0,95.0,77.0,39.0,59.0


## Crosstab with Totals

In [10]:
rsm.eda.pivot(diamonds, rows="cut", cols="color", totals=True)

cut,E,I,J,F,D,H,G,Total
str,f64,f64,f64,f64,f64,f64,f64,f64
"""Premium""",144.0,83.0,45.0,119.0,92.0,132.0,156.0,771.0
"""Ideal""",194.0,105.0,53.0,238.0,163.0,169.0,254.0,1176.0
"""Fair""",14.0,11.0,7.0,17.0,15.0,21.0,16.0,101.0
"""Good""",62.0,26.0,20.0,55.0,35.0,37.0,40.0,275.0
"""Very Good""",140.0,59.0,39.0,136.0,77.0,95.0,131.0,677.0
"""Total""",554.0,284.0,164.0,565.0,382.0,454.0,597.0,3000.0


## Row Normalization

Show percentages within each row (rows sum to 100%).

In [11]:
rsm.eda.pivot(diamonds, rows="cut", cols="color", normalize="row", totals=True)

cut,G,J,E,H,F,D,I,Total
str,f64,f64,f64,f64,f64,f64,f64,f64
"""Very Good""",19.350074,5.760709,20.679468,14.032496,20.088626,11.373708,8.714919,100.0
"""Fair""",15.841584,6.930693,13.861386,20.792079,16.831683,14.851485,10.891089,100.0
"""Ideal""",21.598639,4.506803,16.496599,14.370748,20.238095,13.860544,8.928571,100.0
"""Good""",14.545455,7.272727,22.545455,13.454545,20.0,12.727273,9.454545,100.0
"""Premium""",20.233463,5.836576,18.677043,17.120623,15.434501,11.932555,10.76524,100.0
"""Total""",19.9,5.466667,18.466667,15.133333,18.833333,12.733333,9.466667,100.0


## Column Normalization

Show percentages within each column (columns sum to 100%).

In [12]:
rsm.eda.pivot(diamonds, rows="cut", cols="color", normalize="column")

cut,G,F,I,J,E,D,H
enum,f64,f64,f64,f64,f64,f64,f64
"""Good""",6.700168,9.734513,9.15493,12.195122,11.191336,9.162304,8.14978
"""Fair""",2.680067,3.00885,3.873239,4.268293,2.527076,3.926702,4.625551
"""Premium""",26.130653,21.061947,29.225352,27.439024,25.99278,24.08377,29.07489
"""Ideal""",42.546064,42.123894,36.971831,32.317073,35.018051,42.670157,37.22467
"""Very Good""",21.943049,24.070796,20.774648,23.780488,25.270758,20.157068,20.92511


## Aggregation with Values

Instead of counting, aggregate a numeric variable by groups.

In [13]:
# Mean price by cut and color
rsm.eda.pivot(diamonds, rows="cut", cols="color", values="price", agg="mean")

cut,H,F,G,J,E,D,I
enum,f64,f64,f64,f64,f64,f64,f64
"""Premium""",5066.295455,4086.831933,3976.5,7515.466667,3364.694444,3814.98913,5056.686747
"""Ideal""",3515.674556,3375.054622,3844.535433,4987.754717,2851.659794,2667.478528,4330.352381
"""Very Good""",4207.294737,3669.727941,3864.274809,5212.410256,3566.442857,3299.974026,5409.881356
"""Good""",3958.162162,3443.4,5116.225,3837.25,3847.209677,3436.514286,6147.346154
"""Fair""",5742.47619,5101.294118,3919.8125,6102.0,3149.928571,4582.733333,2676.727273


In [14]:
# Median carat by cut and color
rsm.eda.pivot(diamonds, rows="cut", cols="color", values="carat", agg="median")

cut,E,F,J,G,I,D,H
enum,f64,f64,f64,f64,f64,f64,f64
"""Very Good""",0.71,0.7,1.04,0.72,1.01,0.54,0.9
"""Ideal""",0.51,0.54,1.07,0.535,0.7,0.51,0.7
"""Premium""",0.52,0.71,1.51,0.71,1.01,0.69,1.02
"""Fair""",0.715,1.0,1.0,0.855,0.73,0.9,1.01
"""Good""",0.7,0.7,0.865,1.0,1.36,0.7,1.0


# Titanic Dataset

In [15]:
titanic = pl.read_parquet("https://github.com/radiant-ai-hub/pyrsm/raw/refs/heads/main/examples/data/data/titanic.parquet")
titanic.head()

pclass,survived,sex,age,sibsp,parch,fare,name,cabin,embarked
enum,enum,enum,f64,i32,i32,f64,str,str,enum
"""1st""","""Yes""","""female""",29.0,0,0,211.337494,"""Allen, Miss. Elisabeth Walton""","""B5""","""Southampton"""
"""1st""","""Yes""","""male""",0.9167,1,2,151.550003,"""Allison, Master. Hudson Trevor""","""C22 C26""","""Southampton"""
"""1st""","""No""","""female""",2.0,1,2,151.550003,"""Allison, Miss. Helen Loraine""","""C22 C26""","""Southampton"""
"""1st""","""No""","""male""",30.0,1,2,151.550003,"""Allison, Mr. Hudson Joshua Cre…","""C22 C26""","""Southampton"""
"""1st""","""No""","""female""",25.0,1,2,151.550003,"""Allison, Mrs. Hudson J C (Bess…","""C22 C26""","""Southampton"""


In [16]:
rsm.md("https://raw.githubusercontent.com/radiant-ai-hub/pyrsm/refs/heads/main/examples/data/data/titanic_description.md")

## Titanic

This dataset describes the survival status of individual passengers on the Titanic. The titanic data frame does not contain information from the crew, but it does contain actual ages of (some of) the passengers. The principal source for data about Titanic passengers is the Encyclopedia Titanica. One of the original sources is Eaton & Haas (1994) Titanic: Triumph and Tragedy, Patrick Stephens Ltd, which includes a passenger list created by many researchers and edited by Michael A. Findlay.

## Variables

* survival - Survival (Yes, No)
* pclass - Passenger Class (1st, 2nd, 3rd)
* sex - Sex (female, male)
* age - Age in years
* sibsp - Number of Siblings/Spouses Aboard
* parch - Number of Parents/Children Aboard
* fare - Passenger Fare
* name - Name
* cabin - Cabin
* embarked - Port of Embarkation (Cherbourg, Queenstown, Southampton)

##  Notes

`pclass` is a proxy for socio-economic status (SES) 1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower

Age is in Years; Fractional if Age less than One (1). If the Age is Estimated, it is in the form xx.5

With respect to the family relation variables (i.e. sibsp and parch) some relations were ignored.  The following are the definitions used for sibsp and parch.

Sibling:  Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic
Spouse:   Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiances Ignored)
Parent:   Mother or Father of Passenger Aboard Titanic
Child:    Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic

Other family relatives excluded from this study include cousins, nephews/nieces, aunts/uncles, and in-laws. Some children travelled only with a nanny, therefore parch=0 for them.  As well, some travelled with very close friends or neighbors in a village, however, the definitions do not support such relations.

Note: Missing values and the `ticket` variable were removed from the data

## Related reading

<a href="http://phys.org/news/2012-07-shipwrecks-men-survive.html" target="_blank">In shipwrecks, men more likely to survive</a>

## Survival by Passenger Class

In [17]:
rsm.eda.pivot(titanic, rows="pclass", cols="survived", totals=True)

pclass,Yes,No,Total
str,f64,f64,f64
"""2nd""",115.0,146.0,261.0
"""1st""",179.0,103.0,282.0
"""3rd""",131.0,369.0,500.0
"""Total""",425.0,618.0,1043.0


## Survival Rate by Class (Row Percentages)

In [18]:
rsm.eda.pivot(titanic, rows="pclass", cols="survived", normalize="row", totals=True)

pclass,Yes,No,Total
str,f64,f64,f64
"""3rd""",26.2,73.8,100.0
"""2nd""",44.061303,55.938697,100.0
"""1st""",63.475177,36.524823,100.0
"""Total""",40.747843,59.252157,100.0


## Survival by Sex

In [19]:
rsm.eda.pivot(titanic, rows="sex", cols="survived", normalize="row", totals=True)

sex,No,Yes,Total
str,f64,f64,f64
"""male""",79.452055,20.547945,100.0
"""female""",24.870466,75.129534,100.0
"""Total""",59.252157,40.747843,100.0


## Embarkation Port Distribution

In [20]:
rsm.eda.pivot(titanic, rows="embarked", normalize="total", totals=True)

embarked,count,count_pct
str,f64,f64
"""Queenstown""",50.0,4.793864
"""Cherbourg""",212.0,20.325983
"""Southampton""",781.0,74.880153
"""Total""",1043.0,100.0


## Mean Fare by Class and Survival

In [21]:
rsm.eda.pivot(titanic, rows="pclass", cols="survived", values="fare", agg="mean")

pclass,No,Yes
enum,f64,f64
"""3rd""",13.039712,12.427449
"""2nd""",20.811044,23.180471
"""1st""",74.678276,102.465226


© Vincent Nijs (2026)