# File submission workflow for Dagster
Many data engineering (using Dagster/Airflow/etc) require a method to manually submit and trigger ingestion of data.
This notebook, together with the associated sample Dagster defs file and utility package, demonstrates one way of implementing such a method.

## Trying it out
1. run `uv run dagster dev -f file_submission_defs.py` in a terminal to start the sample Dagster project
2. open this notebook after first running `uv run --with jupyter --with jupyterlab_vim jupyter lab` in another terminal

## Inspect utilities

In [1]:
import pandas as pd
from util import submit_film_file

submit_film_file?

[31mSignature:[39m
submit_film_file(
    df: pandera.typing.pandas.DataFrame[util.FilmSchema],
    filename: str,
)
[31mDocstring:[39m Submit film dataframe to dagster, if it passes schema validation.
[31mFile:[39m      ~/dkstat/dagster-recipes/file-submission/util.py
[31mType:[39m      function

In [2]:
from util import FilmSchema

FilmSchema??

[31mInit signature:[39m FilmSchema(*args, **kwargs) -> pandera.typing.common.DataFrameBase[~TDataFrameModel]
[31mDocstring:[39m     
Model of a pandas :class:`~pandera.api.pandas.container.DataFrameSchema`.

*new in 0.5.0*

See the :ref:`User Guide <dataframe-models>` for more.
[31mSource:[39m        
[38;5;28;01mclass[39;00m FilmSchema(pa.DataFrameModel):
    name: Series[str]
    lead_actor: Series[str]
    rating: Series[int] = pa.Field(ge=[32m0[39m, le=[32m10[39m)
[31mFile:[39m           ~/dkstat/dagster-recipes/file-submission/util.py
[31mType:[39m           MetaModel
[31mSubclasses:[39m     

## Try to submit some files
Note that blabla TODO instant feedback is valuable

In [3]:
try:
    submit_film_file(
        df=pd.DataFrame(
            [
                {"name": "Rogue One: A Star Wars Story", "lead_actorr": "Mads Mikkelsen", "rating": 7},
                {"name": "Bastarden", "lead_actor": "Mads Mikkelsen", "rating": 7.7},
                {"name": "Druk", "lead_actor": "Mads Mikkelsen", "rating": 11},
            ]
        ),
        filename="best_mikkelsen_films.csv",
    )
except Exception as e:
    print(type(e).__name__ + ":")
    print(e)

SchemaError:
non-nullable series 'lead_actor' contains null values:
0    NaN
Name: lead_actor, dtype: object


In [4]:
try:
    submit_film_file(
        df=pd.DataFrame(
            [
                {"name": "Rogue One: A Star Wars Story", "lead_actor": "Mads Mikkelsen", "rating": 7},
                {"name": "Bastarden", "lead_actor": "Mads Mikkelsen", "rating": 7.7},
                {"name": "Druk", "lead_actor": "Mads Mikkelsen", "rating": 11},
            ]
        ),
        filename="best_mikkelsen_films.csv",
    )
except Exception as e:
    print(type(e).__name__ + ":")
    print(e)

SchemaError:
expected series 'rating' to have type int64, got float64


In [5]:
try:
    submit_film_file(
        df=pd.DataFrame(
            [
                {"name": "Rogue One: A Star Wars Story", "lead_actor": "Mads Mikkelsen", "rating": 7},
                {"name": "Bastarden", "lead_actor": "Mads Mikkelsen", "rating": 8},
                {"name": "Druk", "lead_actor": "Mads Mikkelsen", "rating": 11},
            ]
        ),
        filename="best_mikkelsen_films.csv",
    )
except Exception as e:
    print(type(e).__name__ + ":")
    print(e)

SchemaError:
Column 'rating' failed element-wise validator number 1: less_than_or_equal_to(10) failure cases: 11


In [6]:
try:
    submit_film_file(
        df=pd.DataFrame(
            [
                {"name": "Rogue One: A Star Wars Story", "lead_actor": "Mads Mikkelsen", "rating": 7},
                {"name": "Bastarden", "lead_actor": "Mads Mikkelsen", "rating": 8},
                {"name": "Druk", "lead_actor": "Mads Mikkelsen", "rating": 10},
            ]
        ),
        filename="best_mikkelsen_films.csv",
    )
except Exception as e:
    print(type(e).__name__ + ":")
    print(e)

![](materialize_success.png)