Skip to content

Commit

Permalink
Link to correct section about transcoding (#3035)
Browse files Browse the repository at this point in the history
* Link to correct section about transcoding

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>

* Adding transcoding to Vale's lexicon

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
  • Loading branch information
merelcht committed Sep 20, 2023
1 parent 6d11f00 commit cb51a8a
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
2 changes: 2 additions & 0 deletions .github/styles/Kedro/ignore.txt
Expand Up @@ -35,3 +35,5 @@ VSCode
Astro
Xebia
pytest
transcoding
transcode
2 changes: 1 addition & 1 deletion docs/source/integrations/pyspark_integration.md
Expand Up @@ -110,7 +110,7 @@ assert isinstance(df, pyspark.sql.DataFrame)
[Delta Lake](https://delta.io/) is an open-source project that enables building a Lakehouse architecture on top of data lakes. It provides ACID transactions and unifies streaming and batch data processing on top of existing data lakes, such as S3, ADLS, GCS, and HDFS.
To setup PySpark with Delta Lake, have a look at [the recommendations in Delta Lake's documentation](https://docs.delta.io/latest/quick-start.html#python).

We recommend the following workflow, which makes use of the [transcoding feature in Kedro](../data/data_catalog.md):
We recommend the following workflow, which makes use of the [transcoding feature in Kedro](../data/data_catalog_yaml_examples.md#read-the-same-file-using-two-different-datasets):

* To create a Delta table, use a `SparkDataSet` with `file_format="delta"`. You can also use this type of dataset to read from a Delta table and/or overwrite it.
* To perform [Delta table deletes, updates, and merges](https://docs.delta.io/latest/delta-update.html#language-python), load the data using a `DeltaTableDataSet` and perform the write operations within the node function.
Expand Down

0 comments on commit cb51a8a

Please sign in to comment.