# Loading example data sets from the pyrsm package

In [1]:
import polars as pl
import pyrsm as rsm

The pyrsm contains all example packages included with the family of Radiant R packages. If you see an example dataset mentioned in any of the help files listed under the `data`, `design`, `basics`, `model`, or `multivariate` dropdown menus shown at https://radiant-rstats.github.io/docs/ then you will be able to load that same dataset as a polars dataframe using commands like those shown below.

For example, to load the `catalog` data set from example 1 (https://radiant-rstats.github.io/docs/model/regress.html) you would use the command below to load the dataset and the provided data description.

In [2]:
catalog = pl.read_parquet("https://github.com/radiant-ai-hub/pyrsm/raw/refs/heads/main/examples/data/model/catalog.parquet")
catalog


id,Sales,Income,HH_size,Age
i32,f64,f64,i32,f64
1,178.63,93.0,4,55.0
2,338.59,79.0,5,35.0
3,210.26,70.0,4,64.0
4,378.64,95.0,2,39.0
5,227.09,119.0,2,43.0
…,…,…,…,…
196,234.85,42.0,2,38.0
197,138.2,94.0,2,58.0
198,340.74,88.0,2,35.0
199,496.48,110.0,5,37.0


In [3]:
rsm.md("https://raw.githubusercontent.com/radiant-ai-hub/pyrsm/refs/heads/main/examples/data/model/catalog_description.md")

## Catalog sales

### Description

Data from a company selling men's and women's apparel through mail-order catalogs. The company maintains a database on past and current customers' value and characteristics. Value is determine as the total $ sales to the customer in the last year. The data are a random sample of 200 customers from the company's database.

### Variables

A data frame with 200 observations on 4 variables

- id = Customer id
- Sales =	Total sales (in $) to a household in the past year
- Income = Household income ($1000) 
- HH.size = Size of the household (# of people) 
- Age = Age of the head of the household

In [4]:
rndnames = pl.read_parquet("https://github.com/radiant-ai-hub/pyrsm/raw/refs/heads/main/examples/data/design/rndnames.parquet")
rndnames

Names,Gender
str,enum
"""Ervin Escalona""","""Male"""
"""Allan Ammerman""","""Male"""
"""Milton Mothershed""","""Male"""
"""Deshawn Dawn""","""Male"""
"""Jc Julius""","""Male"""
…,…
"""Marylee Malatesta""","""Female"""
"""Janna Jacob""","""Female"""
"""Alita Aikin""","""Female"""
"""Junko Jungers""","""Female"""


In [5]:
rsm.md("https://raw.githubusercontent.com/radiant-ai-hub/pyrsm/refs/heads/main/examples/data/design/rndnames_description.md")

## Sampling and Assignment

### Description

A list of 100 random names generated by <a href="http://listofrandomnames.com" target="_blank">listofrandomnames.com</a>. The dataset includes 50 male and 50 female names. The dataset is used to illustrate simple random sampling and random assignment to treaments in an experimental design.

### Variables

A data frame with 100 observations on 2 variables

- Names = Name
- Gender = Gender

List all polars dataframes in the global python environment

In [6]:
[key for key, value in globals().items() if key[0] != "_" and isinstance(value, pl.DataFrame)]

['catalog', 'rndnames']