Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ docs/_build/
*.sqlite
*.sqlite3
!src/data/*.parquet
!doc/assets/data/*.csv
!doc/gallery/examples/*.csv

# Configuration files
.env
Expand Down
1 change: 1 addition & 0 deletions doc/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ project:
type: website
resources:
- wasm/**
- assets/data/**

website:
title: "ggsql"
Expand Down
21 changes: 21 additions & 0 deletions doc/assets/data/minard_cities.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"long","lat","city"
24,55,"Kowno"
25.3,54.7,"Wilna"
26.4,54.4,"Smorgoni"
26.8,54.3,"Moiodexno"
27.7,55.2,"Gloubokoe"
27.6,53.9,"Minsk"
28.5,54.3,"Studienska"
28.7,55.5,"Polotzk"
29.2,54.4,"Bobr"
30.2,55.3,"Witebsk"
30.4,54.5,"Orscha"
30.4,53.9,"Mohilow"
32,54.8,"Smolensk"
33.2,54.9,"Dorogobouge"
34.3,55.2,"Wixma"
34.4,55.5,"Chjat"
36,55.5,"Mojaisk"
37.6,55.8,"Moscou"
36.6,55.3,"Tarantino"
36.5,55,"Malo-Jarosewii"
52 changes: 52 additions & 0 deletions doc/assets/data/minard_troops.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
"long","lat","survivors","direction","group"
37.7,55.7,100000,"R",1
37.5,55.7,98000,"R",1
37,55,97000,"R",1
36.8,55,96000,"R",1
35.4,55.3,87000,"R",1
34.3,55.2,55000,"R",1
33.3,54.8,37000,"R",1
32,54.6,24000,"R",1
30.4,54.4,20000,"R",1
29.2,54.3,20000,"R",1
28.5,54.2,20000,"R",1
28.3,54.3,20000,"R",1
27.5,54.5,20000,"R",1
26.8,54.3,12000,"R",1
26.4,54.4,14000,"R",1
25,54.4,8000,"R",1
24.4,54.4,4000,"R",1
24.2,54.4,4000,"R",1
24.1,54.4,4000,"R",1
28.7,55.5,33000,"R",2
29.2,54.2,30000,"R",2
28.5,54.1,30000,"R",2
28.3,54.2,28000,"R",2
24.6,55.8,6000,"R",3
24.2,54.4,6000,"R",3
24.1,54.4,6000,"R",3
24,54.9,340000,"A",1
24.5,55,340000,"A",1
25.5,54.5,340000,"A",1
26,54.7,320000,"A",1
27,54.8,300000,"A",1
28,54.9,280000,"A",1
28.5,55,240000,"A",1
29,55.1,210000,"A",1
30,55.2,180000,"A",1
30.3,55.3,175000,"A",1
32,54.8,145000,"A",1
33.2,54.9,140000,"A",1
34.4,55.5,127100,"A",1
35.5,55.4,100000,"A",1
36,55.5,100000,"A",1
37.6,55.8,100000,"A",1
24,55.1,60000,"A",2
24.5,55.2,60000,"A",2
25.5,54.7,60000,"A",2
26.6,55.7,40000,"A",2
27.4,55.6,33000,"A",2
28.7,55.5,33000,"A",2
24,55.2,22000,"A",3
24.5,55.3,22000,"A",3
24.6,55.8,6000,"A",3
Binary file added doc/assets/minard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
84 changes: 84 additions & 0 deletions doc/gallery/examples/boxplot.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: "Box plots"
description: "Showing groups of distributions of single numeric variables"
image: thumbnails/boxplot.svg
categories: [basic, boxplot, distribution]
order: 3
---

Boxplots are a popular way to display a summary of a distribution of single continuous variables.
It is good to keep in mind boxplots hide the actual distribution of the data behind a summary, for example when the data is bi- or multi-modal.
For every group, a boxplot displays the following 6 things:

1. The 25^th^ percentile, or Q1, as the start of the box.
2. The 50^th^ percentile, i.e. median or Q2, as a line across the box.
3. The 75^th^ percentile, or Q3, as the end of the box. Together with Q1 we can compute the interquartile range: IQR = Q3 - Q1.
4. The minimum data value or Q1 - 1.5 * IQR, whichever is larger. This is displayed as the lower whisker.
5. The maximum data value or Q3 + 1.5 * IQR, whichever is smaller. This is displayed as the upper whisker.
6. Outliers outside the whiskers, if present. These are drawn as individual points.

## Code

```{ggsql}
VISUALISE species AS x, bill_len AS y FROM ggsql:penguins
DRAW boxplot
```

## Explanation

* The `VISUALISE ... FROM ggsql:penguins` loads the built-in penguins dataset.
* `species AS x` sets a categorical variable to separate different groups.
* `bill_len AS y` sets the numeric variable to summarise.
* `DRAW boxplot` gives instructions to draw the boxplot layer.

## Variations

### Dodging

You can refine groups beyond the axis categorical variable, and the boxplots will be displayed in a dodged way.

```{ggsql}
VISUALISE species AS x, bill_len AS y, island AS fill FROM ggsql:penguins
DRAW boxplot
```

However, dodging might be unproductive or counterintuitive in some cases.
For example if we double-encode groups, like `species` as both `x` *and* `fill` in the plot below, dodging looks bad.

```{ggsql}
VISUALISE species AS x, bill_len AS y, species AS fill FROM ggsql:penguins
DRAW boxplot
```

We can disable the dodging by setting `position => 'identity'`.

```{ggsql}
VISUALISE species AS x, bill_len AS y, species AS fill FROM ggsql:penguins
DRAW boxplot SETTING position => 'identity'
```

### Horizontal

To draw the boxplots horizontally, simply swap the `x` and `y` mapping.
The orientation is detected automatically based on which variable is continuous and which is discrete.

```{ggsql}
VISUALISE bill_len AS x, species AS y, island AS fill FROM ggsql:penguins
DRAW boxplot
```

### With individual datapoints

Because a boxplot is a summary, it may be a good idea to supplement them with individual datapoints so that you can't be accused of 'hiding' the distribution.
The datapoints can be jittered by setting `position => 'jitter'`.
When you do this, make sure to turn `outliers => false` to not draw the outlier points twice across the two layers.

<!-- TODO: Figure out why the boxplot width is so small -->

```{ggsql}
VISUALISE species AS x, bill_len AS y FROM ggsql:penguins
DRAW point SETTING position => 'jitter'
DRAW boxplot SETTING outliers => false
```


94 changes: 94 additions & 0 deletions doc/gallery/examples/density.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
title: "Density plots"
description: "Showing smooth distributions of single numeric variables"
image: thumbnails/density-plot.svg
categories: [basic, density, distribution]
order: 3
---

Like histograms, density plots show the distribution of a numeric variable.
Instead of binning, density plots use [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) to estimate a smooth, continuous probability density.
A kernel (like a Gaussian) is placed on each point and summed.
The level of smoothing is controlled via the bandwidth which affects the width of the kernel.

## Code

The x-axis gives the value of the numerical variable, whereas the y-axis gives the estimated probability density.

```{ggsql}
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
DRAW density
```

## Explanation

* The `VISUALISE ... FROM ggsql:penguins` loads the built-in penguins dataset.
* `bill_len AS x` sets the numeric variable to use for density estimation.
* `species AS colour` sets implicit groups indicated by colour.
* `DRAW density` gives instructions to draw the density layer.

## Variations

### Group contributions

Using the density gives all groups equal area that integrates to 1.
This masks differences between the sizes of groups.
Instead of using density, one can use the `intensity` that also encompasses differences in group size.

```{ggsql}
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
DRAW density REMAPPING intensity AS y
```

### Stacking

Instead of having independent groups, the density can also be stacked.
Note that stacking alone does not account for relative contributions per group.
For that reason, you may want to show the intensity instead.

```{ggsql}
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
DRAW density
REMAPPING intensity AS y
SETTING position => 'stack'
```

### Annotation

You can use the [rule](../../syntax/layer/type/rule.qmd) layer to display precomputed summaries, like the mean.

<!-- TODO: This should be updated once we have aggregates working -->

```{ggsql}
WITH mean_data AS (
SELECT
AVG(bill_len) AS bill_len,
species
FROM ggsql:penguins
GROUP BY species
)
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
DRAW density SETTING opacity => 0.3
DRAW rule MAPPING FROM mean_data
```

### Faceting

Another way of comparing groups is by using facets to separate the groups into different panels.

```{ggsql}
VISUALISE bill_len AS x, species AS colour FROM ggsql:penguins
DRAW density
FACET species SETTING ncol => 1
```

### Relation to violin plots

Conceptually, violin plots also display densities.
The similarity becomes clearer if you make a ridgeline plot by displaying the violin density on a single side.
The plot below is essentially showing the same thing as the plot above, but gathered in a single panel.

```{ggsql}
VISUALISE bill_len AS x, species AS y, species AS colour FROM ggsql:penguins
DRAW violin SETTING side => 'top', width => 2
```
56 changes: 56 additions & 0 deletions doc/gallery/examples/heatmap.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: "Heatmap"
description: "Arranging tiles on a grid"
image: thumbnails/violin-plot.svg
categories: [basic, heatmap]
order: 3
---

A heatmap visusalised data values as colors in a grid layout.
It makes it easy to see patterns and relationships through color intensity.
It works best with discrete or ordinal arrangements.

## Code

```{ggsql}
VISUALISE Day AS x, Month AS y, Temp AS fill FROM ggsql:airquality
DRAW rect
```

## Explanation

* The `VISUALISE ... FROM ggsql:airquality` loads the built-in air quality dataset.
* `Day AS x, Month AS y` defines a 2D grid 'map'. The default width and height of each cell is 1. Because these variables are contiguous whole numbers, this creates a grid.
* `Temp AS fill` declares the 'heat' variable to display as colour intensity.
* `DRAW rect` gives instructions to draw a rectangle layer.

## Variations

As a stylistic choice, you can set the cells to be opaque without borders.

```{ggsql}
VISUALISE Month AS y, Day AS x, Temp AS fill FROM ggsql:airquality
DRAW rect
SETTING stroke => null, opacity => 1
```

You can change the color by adapting the scale.

```{ggsql}
VISUALISE Month AS y, Day AS x, Temp AS fill FROM ggsql:airquality
DRAW rect
SCALE fill TO magma
SETTING reverse => true
```

If you have centered data, you may want to use a divergent colour scale. It is important to the two extremes in `FROM` symmetrically around the midpoint.

```{ggsql}
SELECT *,
Temp * 1.0 - AVG(Temp) OVER (PARTITION BY Month) AS centered
FROM ggsql:airquality

VISUALISE Month AS y, Day AS x, centered AS fill
DRAW rect
SCALE fill FROM [-20, 20] TO vik
```
Loading
Loading