# Introduction to nested dtypes: List, Array, Object and Struct
By the end of this lecture you will be able to:
- create columns with List, Array, Struct and Object dtypes
- explain the difference between the List, Array, Struct and Object dtypes
- unnest the fields in a Struct dtype

We cover the `List` dtype in more detail in subsequent lectures.

In [None]:
import polars as pl

### `pl.List` dtype
With a `pl.List` dtype:
- each row is a `Series` and
- each `Series` has the same dtype

We can create a `pl.List` column manually with a Python `list` **where all elements of the `list` have the same type or can be cast to the same type e.g. `int` to `float`**

In [None]:
df_lists = (
    pl.DataFrame(
        {
            'ints':[ 
                [0,1], 
                [2,3]
            ],
            'floats':[ 
                [0.0,1.0], 
                [2.0,3.0]
            ],
            'strings':[ 
                ["0","1"],
                ["2","3"]
            ]
        }
    )
)
df_lists

The `pl.List` dtype can have a variable number of elements per row. There is also a `pl.Array` dtype optimised for cases where all rows have the same number of elements

In [None]:
(
    df_lists
    .with_columns(
        ints_array = pl.col("ints").cast(pl.Array(width=2,inner=pl.Int64))
    )
)

Functionality for the `pl.Array` dtype is still limited so our focus is on the `pl.List` dtype.

## Object dtype
We create a column with an object dtype when the lists cannot be cast to a homogenous type

In [None]:
df_object = (
    pl.DataFrame(
        {
            'mixed':[ 
                ['a',0],
                ['b',1]
            ]
        }
    )
)
df_object

The "list" on each row in a **`pl.Object`** column is a standard python `list` under the hood.

In [None]:
df_object[0,0]

In [None]:
type(df_object[0,0])

Operations on a `pl.Object` column are slower than a `pl.List` as the operations are working with slow Python `lists` rather than fast Polars `Series`.

We generally want to avoid working with a `pl.Object` dtype if possible. 

## `pl.Struct` dtype
The `pl.Struct` dtype is basically a nested set of columns inside a single `DataFrame` column. The nesting can have multiple levels.

We create a `pl.Struct` column here by passing a list of `dicts` where:
- the `dict` on each row has the same keys
- the values for each key on each row have the same dtype

In [None]:
df_struct = (
    pl.DataFrame(
        {
            "year":[2020,2021],
            "trades":[
                {"exporter":"India","importer":"USA","quantity":0.0},
                {"exporter":"India","importer":"USA","quantity":1.5},
            ]
          }
    )
)
df_struct

The keys in a struct column are called `fields`.

We can list the keys with `struct.fields` on a `Series`

In [None]:
df_struct["trades"].struct.fields

### Accessing  `pl.Struct` fields

We access data within a struct column in an expression using the `struct` namespace and the field

In [None]:
(
    df_struct
    .select(
        pl.col("trades").struct.field("exporter")
    )
)

### Extracting data from a `pl.Struct`

We can convert a nested `pl.Struct` column into unnested columns using the `unnest` expression and `DataFrame` method.

We can convert a struct `Series` to be its own multi-column `DataFrame`

In [None]:
df_struct["trades"].struct.unnest()

We can also un-nest a `pl.Struct` column to become full columns in the `DataFrame`

In [None]:
df_struct.unnest("trades")

We can have more than one level of nesting in a struct column.

In this example we keep the `quantity` field at the top level of the `pl.Struct` but move the `importer`/`exporter` fields into a second nested level within the `pl.Struct`

In [None]:
df_struct_deep = pl.DataFrame(
    {
        "trades": [
            {"countries": {"exporter": "India", "importer": "USA"}, "quantity": 0.0},
            {"countries": {"exporter": "India", "importer": "USA"}, "quantity": 1.5},
        ]
    }
)
df_struct_deep


Operations on a `pl.Struct` column should be just as fast as operations on a normal column in a `DataFrame`

## Exercises
In the quiz in this Section you will develop your understanding of:
- creating `pl.List` columns
- creating `pl.Object` columns
- creating `pl.Struct` columns