# Adding StructType columns

You can use the following syntax to add `StructType` columns.

In [1]:
from pyspark.sql import SparkSession
spark = SparkSession.Builder().getOrCreate()

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


23/03/23 11:00:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/03/23 11:00:37 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


In [2]:
from typedspark import Schema, Column, StructType, transform_to_schema, structtype_column, create_partially_filled_dataset
from pyspark.sql.types import IntegerType

class Input(Schema):
    a: Column[IntegerType]
    b: Column[IntegerType]

class Values(Schema):
    a: Column[IntegerType]
    b: Column[IntegerType]

class Output(Schema):
    values: Column[StructType[Values]]

df = create_partially_filled_dataset(
    spark, 
    Input, 
    {
        Input.a: [1, 2, 3], 
        Input.b: [2, 3, 4],
    }
)

transform_to_schema(
    df,
    Output,
    {
        Output.values: structtype_column(
            Values,
            {
                Values.a: Input.a,
                Values.b: Input.b,
            }
        )
    }
).show()

[Stage 0:>                                                          (0 + 1) / 1]

+------+
|values|
+------+
|{1, 2}|
|{2, 3}|
|{3, 4}|
+------+



                                                                                

Just like in `transform_to_schema()`, the `transformations` dictionary in `structtype_column(..., transformations)` requires columns with unique names as keys.

In [3]:
try:
    transform_to_schema(
        df,
        Output,
        {
            Output.values: structtype_column(
                Values,
                {
                    Values.a: Input.a + 2,
                    Values.a: Values.a * 3,
                    Values.b: Input.b,
                }
            )
        }
    )
except ValueError as e:
    print(e)

23/03/23 11:00:40 WARN Column: Constructing trivially true equals predicate, ''a = 'a'. Perhaps you need to use aliases.
Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.


Instead, combine these into a single line:

In [5]:
transform_to_schema(
    df,
    Output,
    {
        Output.values: structtype_column(
            Values,
            {
                Values.a: (Input.a + 2) * 3,
                Values.b: Input.b,
            }
        )
    }
).show()


+-------+
| values|
+-------+
| {9, 2}|
|{12, 3}|
|{15, 4}|
+-------+

