From c2e7357b0e64496842dbfac8a7ed466cf25b5ee3 Mon Sep 17 00:00:00 2001 From: Merel Theisen Date: Fri, 15 Sep 2023 09:59:28 +0100 Subject: [PATCH 1/2] Link to correct section about transcoding Signed-off-by: Merel Theisen --- docs/source/integrations/pyspark_integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/integrations/pyspark_integration.md b/docs/source/integrations/pyspark_integration.md index c0e5cec08b..392f88cc3b 100644 --- a/docs/source/integrations/pyspark_integration.md +++ b/docs/source/integrations/pyspark_integration.md @@ -110,7 +110,7 @@ assert isinstance(df, pyspark.sql.DataFrame) [Delta Lake](https://delta.io/) is an open-source project that enables building a Lakehouse architecture on top of data lakes. It provides ACID transactions and unifies streaming and batch data processing on top of existing data lakes, such as S3, ADLS, GCS, and HDFS. To setup PySpark with Delta Lake, have a look at [the recommendations in Delta Lake's documentation](https://docs.delta.io/latest/quick-start.html#python). -We recommend the following workflow, which makes use of the [transcoding feature in Kedro](../data/data_catalog.md): +We recommend the following workflow, which makes use of the [transcoding feature in Kedro](../data/data_catalog_yaml_examples.md#read-the-same-file-using-two-different-datasets): * To create a Delta table, use a `SparkDataSet` with `file_format="delta"`. You can also use this type of dataset to read from a Delta table and/or overwrite it. * To perform [Delta table deletes, updates, and merges](https://docs.delta.io/latest/delta-update.html#language-python), load the data using a `DeltaTableDataSet` and perform the write operations within the node function. From 4ea78677f80a27cfbc3860d40433458dc65bcf96 Mon Sep 17 00:00:00 2001 From: Jo Stichbury Date: Fri, 15 Sep 2023 11:20:31 +0100 Subject: [PATCH 2/2] Adding transcoding to Vale's lexicon Signed-off-by: Jo Stichbury --- .github/styles/Kedro/ignore.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/styles/Kedro/ignore.txt b/.github/styles/Kedro/ignore.txt index 6ede60d83b..0f53c8283f 100644 --- a/.github/styles/Kedro/ignore.txt +++ b/.github/styles/Kedro/ignore.txt @@ -29,3 +29,5 @@ SQLAlchemy Astro Xebia pytest +transcoding +transcode