# Polars: Installation and Basic Usage

## Installation

In [1]:
# !pip install 'polars[plot]'

## Import Library

In [2]:
import polars as pl

## Creating and Reading Data

### Create a DataFrame

Polars core object is the **DataFrame**. 

You can easily create your own by passing a dictionary of column names with lists of values to the Polars `DataFrame()` function.

In [3]:
df = pl.DataFrame(
    {
        'student': ['Angel', 'Brendan', 'Chelsea'],
        'grade': [10, 11, 9],
        'score': [93.5, 87.0, 79.5],
        'subject': ['Math', 'Math', 'English'],
    }
)

When you view the dataframe, you'll see the column names and data types. You will NOT see row index labels since Polars doesn't use them.

In [4]:
df

student,grade,score,subject
str,i64,f64,str
"""Angel""",10,93.5,"""Math"""
"""Brendan""",11,87.0,"""Math"""
"""Chelsea""",9,79.5,"""English"""


### Load a CSV

You can also easily read data into Polars. The `read_csv()` function loads CSV data:
- From your computer if you provide a file path, or
- From the internet if you provide a URL.

In [5]:
cereal = pl.read_csv('https://raw.githubusercontent.com/kimfetti/Projects/master/Etc/cereal.csv')

In [6]:
cereal

name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
str,str,str,i64,i64,i64,i64,f64,f64,i64,i64,i64,i64,f64,f64,f64
"""100% Bran""","""N""","""C""",70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
"""100% Natural Bran""","""Q""","""C""",120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
"""All-Bran""","""K""","""C""",70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
"""All-Bran with Extra Fiber""","""K""","""C""",50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
"""Almond Delight""","""R""","""C""",110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""Triples""","""G""","""C""",110,2,1,250,0.0,21.0,3,60,25,3,1.0,0.75,39.106174
"""Trix""","""G""","""C""",110,1,1,140,0.0,13.0,12,25,25,2,1.0,1.0,27.753301
"""Wheat Chex""","""R""","""C""",100,3,1,230,3.0,17.0,3,115,25,1,1.0,0.67,49.787445
"""Wheaties""","""G""","""C""",100,3,1,200,3.0,17.0,3,110,25,1,1.0,1.0,51.592193


## Selecting Data

### Selecting Rows

Many of the same functions and methods you may already know from pandas also work in Polars. For example,
- `.head()` shows the first few rows of a dataframe
- `.sample()` gives you a random sample of rows

In [7]:
cereal.head()

name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
str,str,str,i64,i64,i64,i64,f64,f64,i64,i64,i64,i64,f64,f64,f64
"""100% Bran""","""N""","""C""",70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
"""100% Natural Bran""","""Q""","""C""",120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
"""All-Bran""","""K""","""C""",70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
"""All-Bran with Extra Fiber""","""K""","""C""",50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
"""Almond Delight""","""R""","""C""",110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843


In [8]:
cereal.sample(10, seed=44)

name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
str,str,str,i64,i64,i64,i64,f64,f64,i64,i64,i64,i64,f64,f64,f64
"""Basic 4""","""G""","""C""",130,3,2,210,2.0,18.0,8,100,25,3,1.33,0.75,37.038562
"""Double Chex""","""R""","""C""",100,2,0,190,1.0,18.0,5,80,25,3,1.0,0.75,44.330856
"""Almond Delight""","""R""","""C""",110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
"""All-Bran""","""K""","""C""",70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
"""Frosted Flakes""","""K""","""C""",110,1,0,200,1.0,14.0,11,25,25,1,1.0,0.75,31.435973
"""Puffed Rice""","""Q""","""C""",50,1,0,0,0.0,13.0,0,15,0,3,0.5,1.0,60.756112
"""Post Nat. Raisin Bran""","""P""","""C""",120,3,1,200,6.0,11.0,14,260,25,3,1.33,0.67,37.840594
"""Cinnamon Toast Crunch""","""G""","""C""",120,1,3,210,0.0,13.0,9,45,25,2,1.0,0.75,19.823573
"""Wheaties""","""G""","""C""",100,3,1,200,3.0,17.0,3,110,25,1,1.0,1.0,51.592193
"""Raisin Nut Bran""","""G""","""C""",100,3,2,140,2.5,10.5,8,140,25,3,1.0,0.5,39.7034


### Selecting Columns

Selecting a specific column with Polars, however, looks differently from pandas. Here, use the `.select()` method with the `col()` function.

In [9]:
cereal.select(pl.col('fiber'))

fiber
f64
10.0
2.0
9.0
14.0
1.0
…
0.0
0.0
3.0
3.0


In [10]:
cereal.select(pl.col('name', 'fiber'))

name,fiber
str,f64
"""100% Bran""",10.0
"""100% Natural Bran""",2.0
"""All-Bran""",9.0
"""All-Bran with Extra Fiber""",14.0
"""Almond Delight""",1.0
…,…
"""Triples""",0.0
"""Trix""",0.0
"""Wheat Chex""",3.0
"""Wheaties""",3.0


## Filtering Data

Polars dataframes have a dedicated method for filtering; it's called `.filter()`. 

_HINT: Just remember to keep using `pl.col()` to reference the dataframe columns._

In [11]:
cereal.sample(5, seed=23)

name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
str,str,str,i64,i64,i64,i64,f64,f64,i64,i64,i64,i64,f64,f64,f64
"""Almond Delight""","""R""","""C""",110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
"""Cocoa Puffs""","""G""","""C""",110,1,1,180,0.0,12.0,13,55,25,2,1.0,1.0,22.736446
"""Apple Jacks""","""K""","""C""",110,2,0,125,1.0,11.0,14,30,25,2,1.0,1.0,33.174094
"""Basic 4""","""G""","""C""",130,3,2,210,2.0,18.0,8,100,25,3,1.33,0.75,37.038562
"""Honey-comb""","""P""","""C""",110,1,0,180,0.0,14.0,11,35,25,1,1.0,1.33,28.742414


In [12]:
cereal.filter(pl.col('mfr') == 'K')

name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
str,str,str,i64,i64,i64,i64,f64,f64,i64,i64,i64,i64,f64,f64,f64
"""All-Bran""","""K""","""C""",70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
"""All-Bran with Extra Fiber""","""K""","""C""",50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
"""Apple Jacks""","""K""","""C""",110,2,0,125,1.0,11.0,14,30,25,2,1.0,1.0,33.174094
"""Corn Flakes""","""K""","""C""",100,2,0,290,1.0,21.0,2,35,25,1,1.0,1.0,45.863324
"""Corn Pops""","""K""","""C""",110,1,0,90,1.0,13.0,12,20,25,2,1.0,1.0,35.782791
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""Raisin Bran""","""K""","""C""",120,3,1,210,5.0,14.0,12,240,25,2,1.33,0.75,39.259197
"""Raisin Squares""","""K""","""C""",90,2,0,0,2.0,15.0,6,110,25,3,1.0,0.5,55.333142
"""Rice Krispies""","""K""","""C""",110,2,0,290,0.0,22.0,3,35,25,1,1.0,1.0,40.560159
"""Smacks""","""K""","""C""",110,2,1,70,1.0,9.0,15,40,25,2,1.0,0.75,31.230054


Like pandas, use `&` (and) to enforce conditions, or use `|` (or) to combine conditions where at least one must be true.

In [13]:
cereal.filter((pl.col('mfr') == 'K') & (pl.col('sugars') >= 10))

name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
str,str,str,i64,i64,i64,i64,f64,f64,i64,i64,i64,i64,f64,f64,f64
"""Apple Jacks""","""K""","""C""",110,2,0,125,1.0,11.0,14,30,25,2,1.0,1.0,33.174094
"""Corn Pops""","""K""","""C""",110,1,0,90,1.0,13.0,12,20,25,2,1.0,1.0,35.782791
"""Froot Loops""","""K""","""C""",110,2,1,125,1.0,11.0,13,30,25,2,1.0,1.0,32.207582
"""Frosted Flakes""","""K""","""C""",110,1,0,200,1.0,14.0,11,25,25,1,1.0,0.75,31.435973
"""Fruitful Bran""","""K""","""C""",120,3,0,240,5.0,14.0,12,190,25,3,1.33,0.67,41.015492
"""Mueslix Crispy Blend""","""K""","""C""",160,3,2,150,3.0,17.0,13,160,25,3,1.5,0.67,30.313351
"""Raisin Bran""","""K""","""C""",120,3,1,210,5.0,14.0,12,240,25,2,1.33,0.75,39.259197
"""Smacks""","""K""","""C""",110,2,1,70,1.0,9.0,15,40,25,2,1.0,0.75,31.230054
