This is Set 6: Grouping Methods (Exercises 51-60) of of Polars Exercises: 💯 polars exercises You can find all the exercises and solutions on [GitHub](https://github.com/JustinKurland/polars_practice)

**Prerequisites**

The `polars` package should be upgraded to the latest version (v0.15.1+).
The sample dataset [penguins](https://github.com/mwaskom/seaborn-data/blob/master/penguins.csv) from `seaborn` will be used for the exercises.

In [1]:
!pip install polars
!pip install seaborn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting polars
  Downloading polars-0.15.7-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.6 MB)
[K     |████████████████████████████████| 14.6 MB 5.0 MB/s 
Installing collected packages: polars
Successfully installed polars-0.15.7
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [8]:
import polars as pl
from seaborn import load_dataset

data = pl.DataFrame(load_dataset('penguins'))
data

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
str,str,f64,f64,f64,f64,str
"""Adelie""","""Torgersen""",39.1,18.7,181.0,3750.0,"""Male"""
"""Adelie""","""Torgersen""",39.5,17.4,186.0,3800.0,"""Female"""
"""Adelie""","""Torgersen""",40.3,18.0,195.0,3250.0,"""Female"""
"""Adelie""","""Torgersen""",,,,,
"""Adelie""","""Torgersen""",36.7,19.3,193.0,3450.0,"""Female"""
"""Adelie""","""Torgersen""",39.3,20.6,190.0,3650.0,"""Male"""
"""Adelie""","""Torgersen""",38.9,17.8,181.0,3625.0,"""Female"""
"""Adelie""","""Torgersen""",39.2,19.6,195.0,4675.0,"""Male"""
"""Adelie""","""Torgersen""",34.1,18.1,193.0,3475.0,
"""Adelie""","""Torgersen""",42.0,20.2,190.0,4250.0,


**Exercise 51: Calculate the number of rows of each `species` in `data`**

In [11]:
data.groupby("species", maintain_order=True).agg(pl.count())

species,count
str,u32
"""Adelie""",152
"""Chinstrap""",68
"""Gentoo""",124


**Exercise 52: Calculate the sum of `body_mass_g` by `island` in `data`**

In [44]:
data.groupby("island", maintain_order=True).agg([pl.col('body_mass_g').sum()])

island,body_mass_g
str,f64
"""Torgersen""",189025.0
"""Biscoe""",787575.0
"""Dream""",460400.0


**Exercise 53: Calculate the mean of `body_mass_g` by non-null `sex` in `data`**

In [55]:
data.groupby("sex", maintain_order=True).agg([pl.col('body_mass_g').mean()]).slice(0, 2)

sex,body_mass_g
str,f64
"""Male""",4545.684524
"""Female""",3862.272727


**Exercise 54: Calculate the maximum and minimum of `flipper_length_mm` together by `species` in `data`**

In [40]:
data.groupby(by='species', maintain_order=True).agg(
        [
            pl.col('flipper_length_mm').max().alias('max_flipper_length'),
            pl.col('flipper_length_mm').min().alias('min_flipper_length')
        ]
        )

species,max_flipper_length,min_flipper_length
str,f64,f64
"""Adelie""",210.0,172.0
"""Chinstrap""",212.0,178.0
"""Gentoo""",231.0,203.0


**Exercise 55: Calculate the median of `body_mass_g` by `species` and `sex` in `data`**

In [57]:
data.groupby(by = ["species","sex"], maintain_order=True).agg([pl.col('body_mass_g').median()])

species,sex,body_mass_g
str,str,f64
"""Adelie""","""Male""",4000.0
"""Adelie""","""Female""",3400.0
"""Adelie""",,3475.0
"""Chinstrap""","""Female""",3550.0
"""Chinstrap""","""Male""",3950.0
"""Gentoo""","""Female""",4700.0
"""Gentoo""","""Male""",5500.0
"""Gentoo""",,4687.5


**Exercise 56: Calculate the standard deviation of `bill_depth_mm` in cm instead of mm by `island` in `data`**

In [61]:
data.groupby("island", maintain_order=True).agg([(pl.col('bill_depth_mm').std()/10).alias('bill_depth_cm')])

island,bill_depth_cm
str,f64
"""Torgersen""",0.133945
"""Biscoe""",0.182072
"""Dream""",0.113312


**Exercise 57: Calculate the covariance between `flipper_length_mm` and `body_mass_g` by `species` and `island` in `data`**

In [68]:
data.groupby(by = ["species","island"], maintain_order=True).agg([pl.cov('flipper_length_mm', 'body_mass_g').alias('covariance')])

species,island,covariance
str,str,f64
"""Adelie""","""Torgersen""",1209.22549
"""Adelie""","""Biscoe""",1727.02167
"""Adelie""","""Dream""",1378.652597
"""Chinstrap""","""Dream""",1758.538191
"""Gentoo""","""Biscoe""",2297.144476


**Exercise 58: Calculate the `absolute correlation` between `bill_length_mm` and `bill_depth_mm` by `species` and `sex` in `data`**

In [71]:
data.groupby(by = ["species","sex"], maintain_order=True).agg([pl.pearson_corr('bill_length_mm', 'bill_depth_mm').abs().alias('abs_correlation')])

species,sex,abs_correlation
str,str,f64
"""Adelie""","""Male""",0.038247
"""Adelie""","""Female""",0.160636
"""Adelie""",,0.599783
"""Chinstrap""","""Female""",0.256317
"""Chinstrap""","""Male""",0.44627
"""Gentoo""","""Female""",0.430444
"""Gentoo""","""Male""",0.306767
"""Gentoo""",,0.704794


**Exercise 59: Calculate the number of null values in `sex` by `island` in `data`**

In [87]:
data.groupby("island", maintain_order=True).agg([pl.col('sex').null_count()])

island,sex
str,u32
"""Torgersen""",5
"""Biscoe""",5
"""Dream""",1


**Exercise 60: Sort `data` on `bill_depth_mm` grouped by `sex` and assign it to `data_gs`**

In [127]:
data.groupby("sex").agg([pl.col('*').sort_by('bill_depth_mm')]).explode(["species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"])

sex,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
str,str,str,f64,f64,f64,f64
"""Female""","""Gentoo""","""Biscoe""",42.9,13.1,215.0,5000.0
"""Female""","""Gentoo""","""Biscoe""",46.1,13.2,211.0,4500.0
"""Female""","""Gentoo""","""Biscoe""",44.9,13.3,213.0,5100.0
"""Female""","""Gentoo""","""Biscoe""",43.3,13.4,209.0,4400.0
"""Female""","""Gentoo""","""Biscoe""",46.5,13.5,210.0,4550.0
"""Female""","""Gentoo""","""Biscoe""",42.0,13.5,210.0,4150.0
"""Female""","""Gentoo""","""Biscoe""",44.0,13.6,208.0,4350.0
"""Female""","""Gentoo""","""Biscoe""",40.9,13.7,214.0,4650.0
"""Female""","""Gentoo""","""Biscoe""",45.5,13.7,214.0,4650.0
"""Female""","""Gentoo""","""Biscoe""",42.6,13.7,213.0,4950.0
