## `Series` and `DataFrame`
By the end of this lecture you will be able to:
- convert between a `Series` and a `DataFrame` column
- convert back and forward from python `lists`, `dicts` to `Series` and `DataFrames`


In [1]:
import polars as pl

In [2]:
csv_file = "data_titanic.csv"

In [3]:
df = pl.read_csv(csv_file)
df.head(3)

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Ow…","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Mis…","""female""",26.0,0,0,"""STON/O2. 31012…",7.925,,"""S"""


## Converting between a `Series` and a `DataFrame` column

We can create a `Series` from a `DataFrame` column with square brackets

In [4]:
(
    df["Age"]
    .head(3)
)

Age
f64
22.0
38.0
26.0


Note that a `Series` has a name (`Age`) and a dtype (floating 64-bit)

We can also create a `Series` from a one-column `DataFrame` using `to_series` (we learn more about the `select` method in the next section of the course)

In [5]:
(
    df
    .select("Age")
    .to_series()
    .head(3)
)

Age
f64
22.0
38.0
26.0


We can convert a `Series` into a one-column `DataFrame` using `to_frame`

In [6]:
s = df["Name"]
(
    s
    .to_frame()
    .head(3)
)

Name
str
"""Braund, Mr. Ow…"
"""Cumings, Mrs. …"
"""Heikkinen, Mis…"


## Create a `Series` or `DataFrame` from a `list` or `dict`
We can create a `Series` from a python `list`

In [7]:
values = [1,2,3]
pl.Series(values)

1
2
3


If the `name` argument is not set then it defaults to an empty string. The name can be passed as the **first** argument

In [8]:
pl.Series('vals',values)

vals
i64
1
2
3


We can also convert a `Series` to a `list` with `to_list` 

In [9]:
pl.Series(name='vals',values=values).to_list()

[1, 2, 3]

> In the section Selecting Columns and Transforming DataFrames we see how to add a column to a `DataFrame` from a list.

We can create a `DataFrame` with:
- a `list` of `lists` where each `list` is a column and
- a list of string column names

In [10]:
data = [
    [1,2,3],
    [4,5,6]
]
(
    pl.DataFrame(
        data,
        schema=["col0","col1"]
    )
)

col0,col1
i64,i64
1,4
2,5
3,6


We can also create a `DataFrame` from a `dict`

In [11]:
data_dict = {"col0":[1,2,3],"col1":[4,5,6]}
(
    pl.DataFrame(
        data_dict,
    )
)

col0,col1
i64,i64
1,4
2,5
3,6


We can specify dtypes by passing a `dict` to the `schema` argument. In this example we specify a 32-bit integer type for the `col1` column

In [12]:
data_dict = {"col0":[1,2,3],"col1":[4,5,6]}
(
    pl.DataFrame(
        data_dict,
        schema={
            "col0":pl.Int64,
            "col1":pl.Int32
        }
    )
)

col0,col1
i64,i32
1,4
2,5
3,6


Finally we can get a `DataFrame` as a `list` of `dicts` with `to_dicts`

In [13]:
df = (
    pl.DataFrame(
        data_dict,
    )
)
df.to_dicts()

[{'col0': 1, 'col1': 4}, {'col0': 2, 'col1': 5}, {'col0': 3, 'col1': 6}]

The output from `to_dicts` is useful for testing. In the automated tests for the Polars repo on GitHub (and the automated tests for this course material) the output of queries is often tested using `to_dicts` like this 

In [14]:
assert df.to_dicts() == [{'col0': 1, 'col1': 4}, {'col0': 2, 'col1': 5}, {'col0': 3, 'col1': 6}]

In  the exercises we see how to create a `DataFrame` from a `dict`.

## Exercises
In the exercises you will develop your understanding of:
- extracting a `Series` from a `DataFrame`
- getting metadata from a `Series`
- creating a `Series` from a `list`
- creating a `DataFrames` from `lists`

### Exercise 1
Extract the `Age` column as a `Series` and then find:
- the `dtype` of the `Series`
- the median of the `Series`

In [16]:
df = pl.read_csv(csv_file)
s = df['Age']
s.head(5)

Age
f64
22.0
38.0
26.0
35.0
35.0


In [19]:
s.dtype

s.median()

28.0

### Exercise 2
You have the following Python `lists` with data.  

In [22]:
groups = ["a","a","b","b","c"]
values = [0,1,2,3,4]

Create a `Series` called `groupsSeries` from the `groups` list. The name inside the `Series` should be `groups`

In [23]:
pl.Series('groupsSeries', groups)

groupsSeries
str
"""a"""
"""a"""
"""b"""
"""b"""
"""c"""


Create a `DataFrame` by passing these as a Python `dict` to `pl.DataFrame`

## Solutions

### Solution to exercise 1
Extract the `Age` column as a `Series` and find:
- the `dtype` of the `Series`
- the median of the `Series`

In [20]:
df = pl.read_csv(csv_file)
s = df["Age"]
s.dtype

Float64

In [21]:
df = pl.read_csv(csv_file)
s = df["Age"]
s.median()

28.0

### Solution to exercise 2
You have the following Python `lists` with data.  

In [None]:
groups = ["a","a","b","b","c"]
values = [0,1,2,3,4]

Create a `Series` called `groupsSeries` from the `groups` list. The name inside the `Series` should be `groups`

In [24]:
groupsSeries = pl.Series("groups",groups)

Create a `DataFrame` by passing these as a Python `dict` to `pl.DataFrame`

In [25]:
pl.DataFrame(
    {
        "groups":groups,
        "vals":values
    }
)

groups,vals
str,i64
"""a""",0
"""a""",1
"""b""",2
"""b""",3
"""c""",4
