# Reading & writing

Polars supports reading and writing for common file formats (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we show the concept of reading and writing to disk.

In [1]:
from datetime import datetime
import polars as pl

df = pl.DataFrame(
    {
        "integer": [1, 2, 3],
        "date": [
            datetime(2025, 1, 1),
            datetime(2025, 1, 2),
            datetime(2025, 1, 3),
        ],
        "float": [4.0, 5.0, 6.0],
        "string": ["a", "b", "c"],
    }
)

print(df)

shape: (3, 4)
┌─────────┬─────────────────────┬───────┬────────┐
│ integer ┆ date                ┆ float ┆ string │
│ ---     ┆ ---                 ┆ ---   ┆ ---    │
│ i64     ┆ datetime[μs]        ┆ f64   ┆ str    │
╞═════════╪═════════════════════╪═══════╪════════╡
│ 1       ┆ 2025-01-01 00:00:00 ┆ 4.0   ┆ a      │
│ 2       ┆ 2025-01-02 00:00:00 ┆ 5.0   ┆ b      │
│ 3       ┆ 2025-01-03 00:00:00 ┆ 6.0   ┆ c      │
└─────────┴─────────────────────┴───────┴────────┘


In [2]:
df.write_csv("../output/output.csv")
df_csv = pl.read_csv("../output/output.csv")
df_csv

integer,date,float,string
i64,str,f64,str
1,"""2025-01-01T00:…",4.0,"""a"""
2,"""2025-01-02T00:…",5.0,"""b"""
3,"""2025-01-03T00:…",6.0,"""c"""


# Expressions
Expressions are the core strength of Polars. The expressions offer a modular structure that allows you to combine simple concepts into complex queries. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries:

- select
- filter
- with_columns
- group_by

In [5]:
df.select(pl.col("*"))

integer,date,float,string
i64,datetime[μs],f64,str
1,2025-01-01 00:00:00,4.0,"""a"""
2,2025-01-02 00:00:00,5.0,"""b"""
3,2025-01-03 00:00:00,6.0,"""c"""


In [6]:
df.select(pl.col("date"))

date
datetime[μs]
2025-01-01 00:00:00
2025-01-02 00:00:00
2025-01-03 00:00:00


In [9]:
df.select(pl.col("float", "string"))

float,string
f64,str
4.0,"""a"""
5.0,"""b"""
6.0,"""c"""


In [11]:
df.filter(
    pl.col("date").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)),
)

integer,date,float,string
i64,datetime[μs],f64,str


- The Python Polars Library
  - Getting to Know Polars
  - Installing Python Polars
- DataFrames, Expressions, and Contexts
  - Getting Started With Polars DataFrames
  - Polars Contexts and Expressions
- The Lazy API
  - Working With LazyFrames
  - Scanning Data With LazyFrames
- Seamless Integration
  - Integration With External Data Sources
  - Integration With the Python Ecosystem
- Next Steps
- Conclusion

In [4]:
df = pl.read_csv("../input/pima_diabetes.csv")
df

Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
i64,i64,i64,i64,i64,f64,f64,i64,i64
6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1
…,…,…,…,…,…,…,…,…
10,101,76,48,180,32.9,0.171,63,0
2,122,70,27,0,36.8,0.34,27,0
5,121,72,23,112,26.2,0.245,30,0
1,126,60,0,0,30.1,0.349,47,1


In [5]:
df.select(pl.col("BMI"))

BMI
f64
33.6
26.6
23.3
28.1
43.1
…
32.9
36.8
26.2
30.1


In [7]:
df.select(pl.col("BMI").sort() / 10)

BMI
f64
0.0
0.0
0.0
0.0
0.0
…
5.32
5.5
5.73
5.94


In [14]:
bmi_above_5 = df.filter(pl.col("BMI") > 5)
bmi_above_5.shape

(757, 9)

In [16]:
bmi_above_5.select(pl.col("BMI").min())

BMI
f64
18.2


In [20]:
df.group_by("Outcome").agg([
    pl.mean("BMI").alias("mean_BMI"),
    pl.median("Age").alias("median_age"),
    pl.len(),
])

Outcome,mean_BMI,median_age,len
i64,f64,f64,u32
0,30.3042,27.0,500
1,35.142537,36.0,268


- https://realpython.com/polars-python/#the-python-polars-library