# Polars

## Documentation
[documentation_link](https://docs.pola.rs/user-guide/getting-started/)

## Dataset Download
[dataset_link](https://www.kaggle.com/datasets/alphiree/cardiovascular-diseases-risk-prediction-dataset)

### Importing Library

In [None]:
import polars as pl

### Read `.csv` dataset

In [57]:
df = pl.read_csv("CVD_cleaned.csv")

### Basic Operations

In [6]:
df.shape

(308854, 19)

In [31]:
df.head()

General_Health,Checkup,Exercise,Heart_Disease,Skin_Cancer,Other_Cancer,Depression,Diabetes,Arthritis,Sex,Age_Category,Height_(cm),Weight_(kg),BMI,Smoking_History,Alcohol_Consumption,Fruit_Consumption,Green_Vegetables_Consumption,FriedPotato_Consumption
str,str,str,str,str,str,str,str,str,str,str,f64,f64,f64,str,f64,f64,f64,f64
"""Poor""","""Within the past 2 years""","""No""","""No""","""No""","""No""","""No""","""No""","""Yes""","""Female""","""70-74""",150.0,32.66,14.54,"""Yes""",0.0,30.0,16.0,12.0
"""Very Good""","""Within the past year""","""No""","""Yes""","""No""","""No""","""No""","""Yes""","""No""","""Female""","""70-74""",165.0,77.11,28.29,"""No""",0.0,30.0,0.0,4.0
"""Very Good""","""Within the past year""","""Yes""","""No""","""No""","""No""","""No""","""Yes""","""No""","""Female""","""60-64""",163.0,88.45,33.47,"""No""",4.0,12.0,3.0,16.0
"""Poor""","""Within the past year""","""Yes""","""Yes""","""No""","""No""","""No""","""Yes""","""No""","""Male""","""75-79""",180.0,93.44,28.73,"""No""",0.0,30.0,30.0,8.0
"""Good""","""Within the past year""","""No""","""No""","""No""","""No""","""No""","""No""","""No""","""Male""","""80+""",191.0,88.45,24.37,"""Yes""",0.0,8.0,4.0,0.0


In [8]:
print(df.head(2))

shape: (2, 19)
┌───────────┬───────────┬──────────┬───────────┬───┬───────────┬───────────┬───────────┬───────────┐
│ General_H ┆ Checkup   ┆ Exercise ┆ Heart_Dis ┆ … ┆ Alcohol_C ┆ Fruit_Con ┆ Green_Veg ┆ FriedPota │
│ ealth     ┆ ---       ┆ ---      ┆ ease      ┆   ┆ onsumptio ┆ sumption  ┆ etables_C ┆ to_Consum │
│ ---       ┆ str       ┆ str      ┆ ---       ┆   ┆ n         ┆ ---       ┆ onsumptio ┆ ption     │
│ str       ┆           ┆          ┆ str       ┆   ┆ ---       ┆ f64       ┆ n         ┆ ---       │
│           ┆           ┆          ┆           ┆   ┆ f64       ┆           ┆ ---       ┆ f64       │
│           ┆           ┆          ┆           ┆   ┆           ┆           ┆ f64       ┆           │
╞═══════════╪═══════════╪══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡
│ Poor      ┆ Within    ┆ No       ┆ No        ┆ … ┆ 0.0       ┆ 30.0      ┆ 16.0      ┆ 12.0      │
│           ┆ the past  ┆          ┆           ┆   ┆           ┆           ┆

In [16]:
df.dtypes

[String,
 String,
 String,
 String,
 String,
 String,
 String,
 String,
 String,
 String,
 String,
 Float64,
 Float64,
 Float64,
 String,
 Float64,
 Float64,
 Float64,
 Float64]

In [58]:
df = df.rename({"Height_(cm)": "Height", "Weight_(kg)": "Weight"})
df.get_column("Height")

Height
f64
150.0
165.0
163.0
180.0
191.0
…
168.0
180.0
157.0
183.0


In [60]:
df.columns = [col.lower() for col in df.columns]
df.columns

['general_health',
 'checkup',
 'exercise',
 'heart_disease',
 'skin_cancer',
 'other_cancer',
 'depression',
 'diabetes',
 'arthritis',
 'sex',
 'age_category',
 'height',
 'weight',
 'bmi',
 'smoking_history',
 'alcohol_consumption',
 'fruit_consumption',
 'green_vegetables_consumption',
 'friedpotato_consumption']

### Selecting and Filtering data

In Polars, reading the documentation it is possible to realize that using '[]' is like an anti-pattern. Instead, it is used '()'

- `select` function

In [76]:
result = df.select(
    pl.col("sex"),
    ((pl.col("weight") / pl.col("height") ** 2 ) * 10000).alias("bmi"),
    ).head()

print(result)

shape: (5, 2)
┌────────┬───────────┐
│ sex    ┆ bmi       │
│ ---    ┆ ---       │
│ str    ┆ f64       │
╞════════╪═══════════╡
│ Female ┆ 14.515556 │
│ Female ┆ 28.323232 │
│ Female ┆ 33.290677 │
│ Male   ┆ 28.839506 │
│ Male   ┆ 24.245498 │
└────────┴───────────┘



- `filter` function

- `with_columns` function

In [None]:
df = df.with_columns()