## `Series` and `DataFrame`
By the end of this lecture you will be able to:
- convert between a `Series` and a `DataFrame` column
- convert back and forward from python `lists`, `dicts` to `Series` and `DataFrames`


In [None]:
import polars as pl

In [None]:
csv_file = "../data/titanic.csv"

In [None]:
df = pl.read_csv(csv_file)
df.head(3)

## Converting between a `Series` and a `DataFrame` column

We can create a `Series` from a `DataFrame` column with square brackets

In [None]:
(
    df["Age"]
    .head(3)
)

Note that a `Series` has a name (`Age`) and a dtype (floating 64-bit)

We can also create a `Series` from a one-column `DataFrame` using `to_series` (we learn more about the `select` method in the next section of the course)

In [None]:
(
    df
    .select("Age")
    .to_series()
    .head(3)
)

We can convert a `Series` into a one-column `DataFrame` using `to_frame`

In [None]:
s = df["Name"]
(
    s
    .to_frame()
    .head(3)
)

## Create a `Series` or `DataFrame` from a `list` or `dict`
We can create a `Series` from a python `list`

In [None]:
values = [1,2,3]
pl.Series(values)

If the `name` argument is not set then it defaults to an empty string. The name can be passed as the **first** argument

In [None]:
pl.Series('vals',values)

We can also convert a `Series` to a `list` with `to_list` 

In [None]:
pl.Series(name='vals',values=values).to_list()

> In the section Selecting Columns and Transforming DataFrames we see how to add a column to a `DataFrame` from a list.

We can create a `DataFrame` with:
- a `list` of `lists` where each `list` is a column and
- a list of string column names

In [None]:
data = [
    [1,2,3],
    [4,5,6]
]
(
    pl.DataFrame(
        data,
        schema=["col0","col1"]
    )
)

We can also create a `DataFrame` from a `dict`

In [None]:
data_dict = {"col0":[1,2,3],"col1":[4,5,6]}
(
    pl.DataFrame(
        data_dict,
    )
)

We can specify dtypes by passing a `dict` to the `schema` argument. In this example we specify a 32-bit integer type for the `col1` column

In [None]:
data_dict = {"col0":[1,2,3],"col1":[4,5,6]}
(
    pl.DataFrame(
        data_dict,
        schema={
            "col0":pl.Int64,
            "col1":pl.Int32
        }
    )
)

Finally we can get a `DataFrame` as a `list` of `dicts` with `to_dicts`

In [None]:
df = (
    pl.DataFrame(
        data_dict,
    )
)
df.to_dicts()

The output from `to_dicts` is useful for testing. In the automated tests for the Polars repo on GitHub (and the automated tests for this course material) the output of queries is often tested using `to_dicts` like this 

In [None]:
assert df.to_dicts() == [{'col0': 1, 'col1': 4}, {'col0': 2, 'col1': 5}, {'col0': 3, 'col1': 6}]

In  the exercises we see how to create a `DataFrame` from a `dict`.

## Exercises
In the exercises you will develop your understanding of:
- extracting a `Series` from a `DataFrame`
- getting metadata from a `Series`
- creating a `Series` from a `list`
- creating a `DataFrames` from `lists`

### Exercise 1
Extract the `Age` column as a `Series` and then find:
- the `dtype` of the `Series`
- the median of the `Series`

In [None]:
df = pl.read_csv(csv_file)
s = <blank>

In [None]:
df = pl.read_csv(csv_file)
s = <blank>

### Exercise 2
You have the following Python `lists` with data.  

In [None]:
groups = ["a","a","b","b","c"]
values = [0,1,2,3,4]

Create a `Series` called `groupsSeries` from the `groups` list. The name inside the `Series` should be `groups`

Create a `DataFrame` by passing these as a Python `dict` to `pl.DataFrame`

## Solutions

### Solution to exercise 1
Extract the `Age` column as a `Series` and find:
- the `dtype` of the `Series`
- the median of the `Series`

In [None]:
df = pl.read_csv(csv_file)
s = df["Age"]
s.dtype

In [None]:
df = pl.read_csv(csv_file)
s = df["Age"]
s.median()

### Solution to exercise 2
You have the following Python `lists` with data.  

In [None]:
groups = ["a","a","b","b","c"]
values = [0,1,2,3,4]

Create a `Series` called `groupsSeries` from the `groups` list. The name inside the `Series` should be `groups`

In [None]:
groupsSeries = pl.Series("groups",groups)

Create a `DataFrame` by passing these as a Python `dict` to `pl.DataFrame`

In [None]:
pl.DataFrame(
    {
        "groups":groups,
        "vals":values
    }
)