# Dynamically generate schemas from an existing DataFrame 

Besides `load_table()`, which generates a `Schema` from an existing table, we also provide `create_schema()`, which generates a `Schema` from a `DataFrame` that you have in memory. This allows you to get autocomplete on `DataSets` that you create on-the-fly. A great example is a pivot table.

In [1]:
from pyspark.sql import SparkSession
spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

In [2]:
from pyspark.sql.functions import first
from pyspark.sql.types import IntegerType, StringType
from typedspark import Column, Schema, create_partially_filled_dataset, create_schema

spark = SparkSession.builder.getOrCreate()

class A(Schema):
    id: Column[IntegerType]
    key: Column[StringType]
    value: Column[StringType]

pivot_table = (
    create_partially_filled_dataset(
        spark, 
        A, 
        {
            A.id: [1, 1, 1, 2, 2, 2, 3, 3, 3],
            A.key: ["a", "b!!", "c", "a", "b!!", "c", "a", "b!!", "c"], 
            A.value: ["alpha", "alpha", "beta", "beta", "gamma", "gamma", "alpha", "beta", "gamma"]
        }
    )
    .groupby(A.id)
    .pivot(A.key.str)
    .agg(first(A.value))
)

pivot_table, PivotTable = create_schema(pivot_table, "PivotTable")
PivotTable


from pyspark.sql.types import IntegerType, StringType

from typedspark import Column, Schema


class PivotTable(Schema):
    id: Column[IntegerType]
    a: Column[StringType]
    b__: Column[StringType]
    c: Column[StringType]

We can use this as a regular `Schema`:

In [3]:
(
    pivot_table
    .filter(PivotTable.a == "alpha")
    .show()
)

+---+-----+-----+-----+
| id|    a|  b__|    c|
+---+-----+-----+-----+
|  1|alpha|alpha| beta|
|  3|alpha| beta|gamma|
+---+-----+-----+-----+

