This repository contains materials for the "Benchmarking R and Python for spatial data processing" workshop on OpenGeoHub Summer School 2022.
You can find an extended version of the benchmarks in these repositories:
R and Python are the two most popular scripting languages used to process spatial data. Both are great alternatives to desktop GIS software allowing for reproducible research. In this workshop, we will examine the differences between the most popular packages for spatial data processing and test their performance.
If you are a beginner in spatial data science, you will find interesting books here:
- Spatial Data Science with applications in R
- Spatial Data Science with R and “terra”
- Geocomputation with R
- Introduction to Python for Geographic Data Analysis
- Geocomputation with Python
Hardware:
Your hardware should have a minimum of 8 GB RAM. In case you do not have access to such a configuration, there is a small raster file in the "data" folder, which can be used for this workshop. The operating system is arbitrary, but make sure all packages are working properly.
Software:
- R: RStudio, sf, stars, terra, bench, microbenchmark
- Python: Jupyter Notebook, geopandas*, rasterio
*geopandas is much faster when pygeos is installed (reading and writing is also faster with pyogrio)
If possible, you should use the latest software versions.
To start Jupyter Notebook, type in the terminal (or Anaconda Prompt):
jupyter notebook
Then it will launch the environment in your web browser.
If you have any questions or need help, please let me know at Mattermost or email me (krzysztof.dyba@amu.edu.pl).