docs(blog): snowflake io

ibis-project · Sep 6, 2023 · ee8c512 · ee8c512
1 parent 609ea8e
commit ee8c512
Show file tree

Hide file tree

Showing 4 changed files with 152 additions and 1 deletion.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -62,7 +62,7 @@ repos:
         args: ["--allow-missing-credentials"]
       - id: detect-private-key
       - id: end-of-file-fixer
-        exclude: .+/snapshots/.+
+        exclude: (.+/snapshots/.+)|(.*docs/_freeze/.+\.json)
       - id: fix-byte-order-marker
       - id: mixed-line-ending
       - id: trailing-whitespace

diff --git a/docs/.gitignore b/docs/.gitignore
@@ -10,6 +10,8 @@ site_libs
 *.delta
 *.zip
 *.db
+diamonds.json
+*.ndjson
 reference/
 objects.json
 *support_matrix.csv

diff --git a/docs/_freeze/posts/snowflake-io/index/execute-results/html.json b/docs/_freeze/posts/snowflake-io/index/execute-results/html.json
diff --git a/docs/posts/snowflake-io/index.qmd b/docs/posts/snowflake-io/index.qmd
@@ -0,0 +1,134 @@
+---
+title: "Icy IO: loading local files with Snowflake"
+author: "Phillip Cloud"
+date: "2023-08-31"
+categories:
+    - blog
+    - snowflake
+    - io
+    - productivity
+code-annotations: hover
+---
+## Loading files without Ibis
+
+{{< video https://www.youtube.com/embed/KFz4pbCqtQU >}}
+
+It's challenging to load local files into Snowflake from Python.
+
+Multiple steps are required, and while [those steps are
+well-documented](https://docs.snowflake.com/user-guide/data-load-local-file-system),
+they involve many concepts that are not necessarily familiar to data
+scientists.
+
+Here's how to load a CSV file into Snowflake without Ibis.
+
+```sql
+CREATE TEMP STAGE load_csv_stage; -- <1>
+
+CREATE TEMP FILE FORMAT load_csv_format -- <2>
+TYPE = CSV PARSE_HEADER = TRUE; -- <2>
+
+PUT 'file:///path/to/my.csv' @load_csv_stage; -- <3>
+
+CREATE TEMP TABLE my_table -- <4>
+USING TEMPLATE ( -- <5>
+    SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
+    FROM TABLE(
+        INFER_SCHEMA( -- <6>
+            LOCATION => '@load_csv_stage',
+            FILE_FORMAT => 'load_csv_format'
+        ) -- <6>
+    )
+); -- <4>
+
+COPY INTO my_table -- <7>
+FROM @load_csv_stage
+FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1); -- <7>
+```
+
+1. **Creates a temporary [stage](https://docs.snowflake.com/en/sql-reference/sql/create-stage) in Snowflake.** Stages are locations in Snowflake that hold files. They can be used to store raw files to load into tables. `TEMP` stages are only accessible to the current session and will be dropped when the session ends.
+2. **A [file format](https://docs.snowflake.com/en/sql-reference/sql/create-file-format) is a set of instructions for how to interpret a file.** File formats are where you specify [parsing](https://docs.snowflake.com/sql-reference/sql/create-file-format#type-csv) and some loading options for your files.
+3. **[`PUT`](https://docs.snowflake.com/en/sql-reference/sql/put) copies a file or glob pattern** matching one or more files to a stage.
+4. **Creates a temporary table** with schema inferred using Snowflake's [`INFER_SCHEMA`](https://docs.snowflake.com/en/sql-reference/functions/infer_schema) [table function](https://docs.snowflake.com/en/sql-reference/functions-table).
+5. **[`USING TEMPLATE`](https://docs.snowflake.com/sql-reference/sql/create-table#create-table-using-template) is a Snowflake-specific syntax** that allows you to specify a set of column definitions computed from staged files.
+6. **[`INFER_SCHEMA`](https://docs.snowflake.com/en/sql-reference/functions/infer_schema)** is a powerful feature of Snowflake that allows you to load files without having to compute the schema in client code.
+7. **[`COPY INTO`](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table) loads the staged data** into the created temporary table.
+
+Having so many steps to load a file also makes it more difficult to remember
+the process the next time you want to do it.
+
+To Snowflake's credit this is a genuinely difficult problem and they have
+provided the right primitives to solve it.
+
+Still, there's plenty of room for UX improvement.
+
+## Loading files with Ibis
+
+Let's take a look at how ibis turns the above process into a **single line** of Python.
+
+```{python}
+#| echo: false
+!rm -f diamonds.csv diamonds.parquet diamonds.json diamonds.ndjson
+!ln -sf ../../../ci/ibis-testing-data/csv/diamonds.csv diamonds.csv
+!ln -sf ../../../ci/ibis-testing-data/parquet/diamonds.parquet diamonds.parquet
+```
+
+First, we connect to snowflake:
+
+```{python}
+import os
+from ibis.interactive import *
+
+con = ibis.connect(os.environ["SNOWFLAKE_URL"])
+```
+
+### `read_csv`
+
+Loading CSV files is now a single line of familiar Python code:
+
+```{python}
+diamonds = con.read_csv("diamonds.csv")
+diamonds
+```
+
+### `read_parquet`
+
+Similarly, loading Parquet files is now a single line of code:
+
+```{python}
+diamonds = con.read_parquet("diamonds.parquet")
+diamonds
+```
+
+```{python}
+#| echo: false
+df = diamonds.to_pandas()
+df.to_json("diamonds.ndjson", lines=True, orient="records")
+df.to_json("diamonds.json", orient="records")
+```
+
+### `read_json`
+
+Lastly, loading JSON files is now -- surprise 🥳 -- a single line of code!
+
+Line delimited JSON is supported:
+
+```{python}
+diamonds = con.read_json("diamonds.ndjson")
+diamonds
+```
+
+As well as strict JSON arrays of objects:
+
+```{python}
+diamonds = con.read_json("diamonds.json")
+diamonds
+```
+
+## Conclusion
+
+Ibis 7.0.0 adds support for `read_csv`, `read_parquet`, and `read_json` to the
+Snowflake backend.
+
+We think you'll enjoy the increase in productivity these new features bring to the
+Snowflake backend and we'd love to hear your feedback!