Skip to content

Commit

Permalink
sql cells, backed by DuckDB (#844)
Browse files Browse the repository at this point in the history
* sql cells

* registerTable

* prettier

* destructuring assignment

* register sql files

* sql → @observablehq/duckdb

* incremental sql update

* test sql + data loader

* docs; table display

* more docs; better display

* echo

* fix tests, again

* remove console

* id="[{min, max}]"

* more docs
  • Loading branch information
mbostock authored Mar 6, 2024
1 parent d6311b5 commit 92377f8
Show file tree
Hide file tree
Showing 26 changed files with 431 additions and 93 deletions.
19 changes: 19 additions & 0 deletions docs/display-race.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Display race

```js echo
async function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
```

```js echo
const value = (function* () {
yield 2000;
yield 1000;
})();
```

```js echo
await sleep(value);
display(value);
```
158 changes: 158 additions & 0 deletions docs/sql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
sql:
gaia: ./lib/gaia-sample.parquet
---

# SQL

Observable Framework includes built-in support for client-side SQL powered by [DuckDB](./lib/duckdb). You can use SQL to query data from [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), and [Apache Parquet](./lib/arrow#apache-parquet) files, which can either be static or generated by [data loaders](./loaders).

To use SQL, first register the desired tables in the page’s [front matter](./markdown#front-matter) using the **sql** option. Each key is a table name, and each value is the path to the corresponding data file. For example, to register a table named `gaia` from a Parquet file:

```yaml
---
sql:
gaia: ./lib/gaia-sample.parquet
---
```

## SQL code blocks

To run SQL queries, create a SQL fenced code block (<code>```sql</code>). For example, to query the first 10 rows from the `gaia` table:

````md
```sql
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```
````

This produces a table:

```sql
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```

To refer to the results of a query in JavaScript, use the `id` directive. For example, to refer to the results of the previous query as `top10`:

````md
```sql id=top10
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```
````

```sql id=top10
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```

This returns an array of 10 rows, inspected here:

```js echo
top10
```

When a SQL code block uses the `id` directive, the results are not displayed by default. You can display them by adding the `display` directive, which produces the table shown above.

````md
```sql id=top10 display
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```
````

The `id` directive is often a simple identifier such as `top10` above, but it supports [destructuring assignment](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Destructuring_assignment), so you can refer to individual rows and columns using array and object patterns. For example, to pull out the top row:

````md
```sql id=[top]
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 1
```
````

```sql id=[top]
SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 1
```

```js echo
top
```

Or to pull out the minimum value of the `phot_g_mean_mag` column:

````md
```sql id=[{min}]
SELECT MIN(phot_g_mean_mag) AS min FROM gaia
```
````

```sql id=[{min}]
SELECT MIN(phot_g_mean_mag) AS min FROM gaia
```

```js echo
min
```

<div class="tip">

For complex destructuring patterns, you may need to quote the `id` directive. For example, to pull out the column named `min(phot_g_mean_mag)` to the variable named `min`, say <code style="white-space: nowrap;">id="[{'min(phot_g_mean_mag)': min}]"</code>. Or to pull out the `min` and `max` columns, say <code style="white-space: nowrap;">id="[{min, max}]"</code>.

</div>

For dynamic or interactive queries that respond to user input, you can interpolate values into SQL queries using inline expressions `${…}`. For example, to show the stars around a given brightness:

```js echo
const mag = view(Inputs.range([6, 20], {label: "Magnitude"}));
```

```sql echo
SELECT * FROM gaia WHERE phot_g_mean_mag BETWEEN ${mag - 0.1} AND ${mag + 0.1};
```

The value of a SQL code block is an [Apache Arrow](./lib/arrow) table. This format is supported by [Observable Plot](./lib/plot), so you can use SQL and Plot together to visualize data. For example, below we count the number of stars in each 2°×2° bin of the sky (where `ra` is [right ascension](https://en.wikipedia.org/wiki/Right_ascension) and `dec` is [declination](https://en.wikipedia.org/wiki/Declination), representing a point on the celestial sphere in the equatorial coordinate system), and then visualize the resulting heatmap using a [raster mark](https://observablehq.com/plot/marks/raster).

```sql id=bins echo
SELECT
floor(ra / 2) * 2 + 1 AS ra,
floor(dec / 2) * 2 + 1 AS dec,
count() AS count
FROM
gaia
GROUP BY
1,
2
```

```js echo
Plot.plot({
aspectRatio: 1,
x: {domain: [0, 360]},
y: {domain: [-90, 90]},
marks: [
Plot.frame({fill: 0}),
Plot.raster(bins, {
x: "ra",
y: "dec",
fill: "count",
width: 360 / 2,
height: 180 / 2,
imageRendering: "pixelated"
})
]
})
```

## SQL literals

SQL fenced code blocks are shorthand for the `sql` tagged template literal. You can invoke the `sql` tagged template literal directly like so:

```js echo
const rows = await sql`SELECT random() AS random`;
```

```js echo
rows[0].random
```

The `sql` tagged template literal is available by default in Markdown, but you can also import it explicitly as:

```js echo
import {sql} from "npm:@observablehq/duckdb";
```
1 change: 1 addition & 0 deletions observablehq.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ export default {
{name: "Markdown", path: "/markdown"},
{name: "JavaScript", path: "/javascript"},
{name: "Data loaders", path: "/loaders"},
{name: "SQL", path: "/sql"},
{name: "Themes", path: "/themes"},
{name: "Configuration", path: "/config"},
{
Expand Down
2 changes: 1 addition & 1 deletion src/build.ts
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ export async function build(
effects.output.write(`${faint("build")} ${clientPath} ${faint("→")} `);
const define: {[key: string]: string} = {};
if (config.search) define["global.__minisearch"] = JSON.stringify(relativePath(path, aliases.get("/_observablehq/minisearch.json")!)); // prettier-ignore
const contents = await rollupClient(clientPath, root, path, {minify: true, define});
const contents = await rollupClient(clientPath, root, path, {minify: true, keepNames: true, define});
await effects.writeFile(path, contents);
}
}
Expand Down
40 changes: 25 additions & 15 deletions src/client/preview.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import {registerFile} from "npm:@observablehq/stdlib";
import {undefine} from "./main.js";
import {registerTable} from "npm:@observablehq/duckdb";
import {FileAttachment, registerFile} from "npm:@observablehq/stdlib";
import {main, undefine} from "./main.js";
import {enableCopyButtons} from "./pre.js";

export * from "./index.js";
Expand All @@ -26,16 +27,16 @@ export function open({hash, eval: compile} = {}) {
}
case "update": {
const root = document.querySelector("main");
if (message.previousHash !== hash) {
if (message.hash.previous !== hash) {
console.log("contents out of sync");
location.reload();
break;
}
hash = message.updatedHash;
hash = message.hash.current;
let offset = 0;
const addedCells = new Map();
const removedCells = new Map();
for (const {type, oldPos, items} of message.diffHtml) {
for (const {type, oldPos, items} of message.html) {
switch (type) {
case "add": {
for (const item of items) {
Expand Down Expand Up @@ -71,34 +72,43 @@ export function open({hash, eval: compile} = {}) {
for (const [id, removed] of removedCells) {
addedCells.get(id)?.replaceWith(removed);
}
for (const id of message.diffCode.removed) {
for (const id of message.code.removed) {
undefine(id);
}
for (const body of message.diffCode.added) {
for (const body of message.code.added) {
compile(body);
}
for (const name of message.diffFiles.removed) {
for (const name of message.files.removed) {
registerFile(name, null);
}
for (const file of message.diffFiles.added) {
for (const file of message.files.added) {
registerFile(file.name, file);
}
const {addedStylesheets, removedStylesheets} = message;
if (addedStylesheets.length === 1 && removedStylesheets.length === 1) {
const [newHref] = addedStylesheets;
const [oldHref] = removedStylesheets;
for (const name of message.tables.removed) {
registerTable(name, null);
}
for (const table of message.tables.added) {
registerTable(table.name, FileAttachment(table.path));
}
if (message.tables.removed.length || message.tables.added.length) {
const sql = main._resolve("sql");
sql.define(sql._promise); // re-evaluate sql code
}
if (message.stylesheets.added.length === 1 && message.stylesheets.removed.length === 1) {
const [newHref] = message.stylesheets.added;
const [oldHref] = message.stylesheets.removed;
const link = document.head.querySelector(`link[rel="stylesheet"][href="${oldHref}"]`);
link.href = newHref;
} else {
for (const href of addedStylesheets) {
for (const href of message.stylesheets.added) {
const link = document.createElement("link");
link.rel = "stylesheet";
link.type = "text/css";
link.crossOrigin = "";
link.href = href;
document.head.appendChild(link);
}
for (const href of removedStylesheets) {
for (const href of message.stylesheets.removed) {
document.head.querySelector(`link[rel="stylesheet"][href="${href}"]`)?.remove();
}
}
Expand Down
85 changes: 53 additions & 32 deletions src/client/stdlib/duckdb.js
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,28 @@ const bundle = await duckdb.selectBundle({
}
});

const logger = new duckdb.ConsoleLogger();
const logger = new duckdb.ConsoleLogger(duckdb.LogLevel.WARNING);

let db;
let inserts = [];
const sources = new Map();

export function registerTable(name, source) {
if (source == null) {
sources.delete(name);
db = DuckDBClient.of(); // drop existing tables and views before re-inserting
inserts = Array.from(sources, (i) => db.then((db) => insertSource(db._db, ...i)));
} else {
sources.set(name, source);
db ??= DuckDBClient.of(); // lazy instantiation
inserts.push(db.then((db) => insertSource(db._db, name, source)));
}
}

export async function sql(strings, ...args) {
await Promise.all(inserts);
return (await (db ??= DuckDBClient.of())).query(strings.join("?"), args);
}

export class DuckDBClient {
constructor(db) {
Expand Down Expand Up @@ -139,37 +160,7 @@ export class DuckDBClient {
config = {...config, query: {...config.query, castBigIntToDouble: true}};
}
await db.open(config);
await Promise.all(
Object.entries(sources).map(async ([name, source]) => {
source = await source;
if (isFileAttachment(source)) {
// bare file
await insertFile(db, name, source);
} else if (isArrowTable(source)) {
// bare arrow table
await insertArrowTable(db, name, source);
} else if (Array.isArray(source)) {
// bare array of objects
await insertArray(db, name, source);
} else if (isArqueroTable(source)) {
await insertArqueroTable(db, name, source);
} else if ("data" in source) {
// data + options
const {data, ...options} = source;
if (isArrowTable(data)) {
await insertArrowTable(db, name, data, options);
} else {
await insertArray(db, name, data, options);
}
} else if ("file" in source) {
// file + options
const {file, ...options} = source;
await insertFile(db, name, file, options);
} else {
throw new Error(`invalid source: ${source}`);
}
})
);
await Promise.all(Object.entries(sources).map(([name, source]) => insertSource(db, name, source)));
return new DuckDBClient(db);
}
}
Expand All @@ -178,6 +169,36 @@ Object.defineProperty(DuckDBClient.prototype, "dialect", {
value: "duckdb"
});

async function insertSource(database, name, source) {
source = await source;
if (isFileAttachment(source)) {
// bare file
await insertFile(database, name, source);
} else if (isArrowTable(source)) {
// bare arrow table
await insertArrowTable(database, name, source);
} else if (Array.isArray(source)) {
// bare array of objects
await insertArray(database, name, source);
} else if (isArqueroTable(source)) {
await insertArqueroTable(database, name, source);
} else if ("data" in source) {
// data + options
const {data, ...options} = source;
if (isArrowTable(data)) {
await insertArrowTable(database, name, data, options);
} else {
await insertArray(database, name, data, options);
}
} else if ("file" in source) {
// file + options
const {file, ...options} = source;
await insertFile(database, name, file, options);
} else {
throw new Error(`invalid source: ${source}`);
}
}

async function insertFile(database, name, file, options) {
const url = await file.url();
if (url.startsWith("blob:")) {
Expand Down
1 change: 1 addition & 0 deletions src/client/stdlib/recommendedLibraries.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ export const L = () => import("npm:leaflet");
export const mapboxgl = () => import("npm:mapbox-gl").then((module) => module.default);
export const mermaid = () => import("observablehq:stdlib/mermaid").then((mermaid) => mermaid.default);
export const Plot = () => import("npm:@observablehq/plot");
export const sql = () => import("observablehq:stdlib/duckdb").then((duckdb) => duckdb.sql);
export const SQLite = () => import("observablehq:stdlib/sqlite").then((sqlite) => sqlite.default);
export const SQLiteDatabaseClient = () => import("observablehq:stdlib/sqlite").then((sqlite) => sqlite.SQLiteDatabaseClient); // prettier-ignore
export const tex = () => import("observablehq:stdlib/tex").then((tex) => tex.default);
Expand Down
Loading

0 comments on commit 92377f8

Please sign in to comment.