# StructType Columns

## The basics

We can define `StructType` columns in `typedspark` as follows:

In [1]:
from pyspark.sql.types import IntegerType, StringType
from typedspark import DataSet, StructType, Schema, Column


class Values(Schema):
    name: Column[StringType]
    severity: Column[IntegerType]


class Actions(Schema):
    consequeces: Column[StructType[Values]]

We can get auto-complete (and refactoring) of the sub-columns by using:

In [2]:
def get_high_severity_actions(df: DataSet[Actions]) -> DataSet[Actions]:
    return df.filter(Actions.consequeces.dtype.schema.severity > 5)

## Transform to schema

You can use the following syntax to add `StructType` columns in `transform_to_schema()`.

In [3]:
from pyspark.sql import SparkSession

spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

In [4]:
from typedspark import create_partially_filled_dataset, transform_to_schema, structtype_column


class Input(Schema):
    a: Column[StringType]
    b: Column[IntegerType]


df = create_partially_filled_dataset(
    spark,
    Input,
    {
        Input.a: ["a", "b", "c"],
        Input.b: [1, 2, 3],
    },
)

transform_to_schema(
    df,
    Actions,
    {
        Actions.consequeces: structtype_column(
            Actions.consequeces.dtype.schema,
            {
                Actions.consequeces.dtype.schema.name: Input.a,
                Actions.consequeces.dtype.schema.severity: Input.b,
            },
        )
    },
).show()

+-----------+
|consequeces|
+-----------+
|     {a, 1}|
|     {b, 2}|
|     {c, 3}|
+-----------+



Note that just like in `transform_to_schema()`, the `transformations` dictionary in `structtype_column(..., transformations)` requires columns with unique names as keys.

## Generating DataSets

We can generate `DataSets` with `StructType` columns as follows:

In [5]:
from typedspark import create_partially_filled_dataset

values = create_partially_filled_dataset(
    spark,
    Values,
    {
        Values.severity: [1, 2, 3],
    },
)

actions = create_partially_filled_dataset(
    spark,
    Actions,
    {
        Actions.consequeces: values.collect(),
    },
)
actions.show()

+-----------+
|consequeces|
+-----------+
|  {NULL, 1}|
|  {NULL, 2}|
|  {NULL, 3}|
+-----------+



Or in row-wise format:

In [6]:
from typedspark import create_structtype_row

create_partially_filled_dataset(
    spark,
    Actions,
    [
        {
            Actions.consequeces: create_structtype_row(
                Values, {Values.name: "a", Values.severity: 1}
            ),
        },
        {
            Actions.consequeces: create_structtype_row(
                Values, {Values.name: "b", Values.severity: 2}
            ),
        },
        {
            Actions.consequeces: create_structtype_row(
                Values, {Values.name: "c", Values.severity: 3}
            ),
        },
    ],
).show()

+-----------+
|consequeces|
+-----------+
|     {a, 1}|
|     {b, 2}|
|     {c, 3}|
+-----------+

