Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
573 changes: 573 additions & 0 deletions _solved/00-jupyter_introduction.ipynb

Large diffs are not rendered by default.

4,738 changes: 365 additions & 4,373 deletions _solved/01-introduction-tabular-data.ipynb

Large diffs are not rendered by default.

1,815 changes: 218 additions & 1,597 deletions _solved/02-introduction-geospatial-data.ipynb

Large diffs are not rendered by default.

726 changes: 102 additions & 624 deletions _solved/03-coordinate-reference-systems.ipynb

Large diffs are not rendered by default.

1,812 changes: 195 additions & 1,617 deletions _solved/04-spatial-relationships-joins.ipynb

Large diffs are not rendered by default.

2,116 changes: 291 additions & 1,825 deletions _solved/05-spatial-operations-overlays.ipynb

Large diffs are not rendered by default.

1,286 changes: 232 additions & 1,054 deletions _solved/10-introduction-raster.ipynb

Large diffs are not rendered by default.

3,673 changes: 0 additions & 3,673 deletions _solved/11-numpy.ipynb

This file was deleted.

1,680 changes: 1,680 additions & 0 deletions _solved/11-xarray-intro.ipynb

Large diffs are not rendered by default.

1,884 changes: 0 additions & 1,884 deletions _solved/12-rasterio.ipynb

This file was deleted.

1,508 changes: 1,508 additions & 0 deletions _solved/12-xarray-advanced.ipynb

Large diffs are not rendered by default.

2,564 changes: 2,564 additions & 0 deletions _solved/13-raster-processing.ipynb

Large diffs are not rendered by default.

12,358 changes: 0 additions & 12,358 deletions _solved/13-xarray.ipynb

This file was deleted.

611 changes: 611 additions & 0 deletions _solved/14-combine-data.ipynb

Large diffs are not rendered by default.

11,870 changes: 0 additions & 11,870 deletions _solved/14-xarray-intro.ipynb

This file was deleted.

234 changes: 234 additions & 0 deletions _solved/15-xarray-dask-big-data.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p><font size=\"6\"><b>Working with big data: xarray and dask (DEMO)</b></font></p>\n",
"\n",
"\n",
"> *DS Python for GIS and Geoscience* \n",
"> *November, 2022*\n",
">\n",
"> *© 2020, Joris Van den Bossche and Stijn Van Hoey. Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/)*\n",
"\n",
"---\n",
"\n",
"Throughout the course, we worked with small, often simplified or subsampled data. In practice, the tools we have seen still work well with data that fit easily in memory. But also for data larger than memory (e.g. large or high resolution climate data), we can still use many of the familiar tools.\n",
"\n",
"This notebooks includes a brief showcase of using xarray with dask, a package to scale Python workflows (https://dask.org/). Dask integrates very well with xarray, providing a familiar xarray workflow for working with large datasets in parallel or on clusters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dask.distributed import Client, LocalCluster\n",
"client = Client(LocalCluster(processes=False)) # set up local cluster on your laptop\n",
"client"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The *Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST)* dataset (https://registry.opendata.aws/mur/) provides freely available, global, gap-free, gridded, daily, 1 km data on sea surface temperate for the last 20 years. I downloaded a tiny part this dataset (8 days of 2020) to my local laptop, and stored a subset of the variables (only the \"sst\" itself) in the zarr format (https://zarr.readthedocs.io/en/stable/), so we can efficiently read it with xarray and dask:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import xarray as xr"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds = xr.open_zarr(\"data/mur_sst_zarr/\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at the actual sea surface temperature DataArray:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.analysed_sst"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The representation already indicated that this DataArray (although being a tiny part of the actual full dataset) is quite big: 20.7 GB if loaded fully into memory at once (which would not fit in the memory of my laptop).\n",
"\n",
"The xarray.DataArray is now backed by a dask array instead of a numpy array. This allows us to do computations on the large data in *chunked* way."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For example, let's compute the overall average temperature for the full globe for each timestep:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"overall_mean = ds.analysed_sst.mean(dim=(\"lon\", \"lat\"))\n",
"overall_mean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This returned a lazy object, and not yet computed the actual average. Let's explicitly compute it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time \n",
"overall_mean.compute()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This takes some time, but it *did* run on my laptop even while the dataset did not fit in the memory of my laptop."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Integrating with hvplot and datashader, we can also still interactively plot and explore the large dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import hvplot.xarray"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.analysed_sst.isel(time=-1).hvplot.quadmesh(\n",
" 'lon', 'lat', rasterize=True, dynamic=True,\n",
" width=800, height=450, cmap='RdBu_r')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Zooming in on this figure we re-read and rasterize the subset we are viewing to provide a higher resolution image."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**As a summary**, using dask with xarray allows:\n",
"\n",
"- to use the familiar xarray workflows for larger data as well\n",
"- use the same code to work on our laptop or on a big cluster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"# PANGEO: A community platform for Big Data geoscience\n",
"\n",
"\n",
"<center><img src=\"https://pangeo.io/_images/pangeo_simple_logo.svg\" width=\"500px\"></center>\n",
"\n",
"Website: https://pangeo.io/index.html\n",
"\n",
"They have a gallery with many interesting examples, many of them using this combination of xarray and dask.\n",
"\n",
"Pangeo focuses primarily on *cloud computing* (storing the big datasets in cloud-native file formats and also doing the computations in the cloud), but all the tools like xarray and dask developed by this community and shown in the examples also work on your laptop or university's cluster.\n",
"\n",
"\n",
"<img src=\"https://pangeo.io/_images/pangeo_tech_1.png\" width=\"800px\">"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading