Skip to content
Predicting ambient temperature in the greater Mexico City region using satellite land use regression
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


In this project, "A spatiotemporal reconstruction of daily ambient temperature using satellite data in the Megalopolis of Central Mexico from 2003–2018", we built a model to predict the mean, maximum, and minimum temperature on each day at each square in a 1-km grid for an area around Mexico City.

Raw and processed data, including predictions, as well as a research notebook, can be found on Zenodo at

The bulk of the data-processing and data-analysis code can be found on GitHub at . Other code is embedded inline in the manuscript and notebook.


Getting the temperature predictions

To examine or use our temperature predictions without running any of our code, take a look at the HDF5 files predictions_*.h5. There's one file per year. Each file has a three-dimensional array data with an attribute dimensions naming the dimensions (location, time, and variable) and a group dimension_labels naming each index of each dimension. The temperatures are in degrees Celsius, and the dates indicate 24-hour spans of UTC-06:00. mrow values are row indices of the master grid, which can be found in master_grid.h5. The original coordinates of the grid (in which it is, in fact, a regular grid) are x_sinu and y_sinu, which are in the coordinate reference system crs.satellite, defined in common.R as "+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs", which is the MODIS global sinusoidal projection.

Here's how you could plot the mean temperatures for 5 July 2012 in R


h5 = H5File$new("predictions_2012.h5", mode = "r")
preds = h5[["data"]][,,]
dimnames(preds) = sapply(simplify = F,
    h5attr(h5[["data"]], "dimensions"),
    function(k) h5[["dimension_labels"]][[k]][])
preds = preds[, "2012-07-05", "pred.ground.temp.mean"]

h5 = H5File$new("master_grid.h5", mode = "r")
g = h5[["master_grid"]][]
g$pred = preds[as.character(g$mrow)]

ggplot(g[!$pred),]) +
    geom_raster(aes(x_sinu, y_sinu, fill = pred))

Getting the ground-station observations

We put a lot of effort into cleaning and unifying daily observations of air temperature (and wind speed, precipitation, and air pressure) at ground weather stations from several different sources. We provide all the original raw data on Zenodo, but if you'd like to use the processed observations without running any of the processing code, use the JSON file ground.json.gz, which has information about the observations as well as the stations they come from. To open this file in R, run jsonlite::fromJSON(gzfile("ground.json.gz")). Again, dates are in UTC-06:00.

Running the code

To reproduce our results, you have the option of starting from scratch, that is, generating ground.json.gz from the raw station data, or of using the cleaned observations in ground.json.gz and modeling temperature from there. Either way, you'll need to:

  1. Install any libraries required by library(...) calls in the file you're using.
  2. Copy the non-package depedency Just_universal (available at ) into the Mexico_temperature repository (so the Just_universal directory should sit beside Mexico_temperature's code directory).
  3. Set the environment variable JUSTLAB_MEXICO_TEMPERATURE_DATA_ROOT to the directory containing the data from Zenodo.

To generate ground.json.gz, source stations.R. You'll need to download and uncompress the geography and stations data from Zenodo. You can then call save.ground().

To perform cross-validation and generate predictions, source modeling.R. You'll need geography (uncompressed) and ground.json.gz (left compressed) from Zenodo. You can then do cross-validation and summarize the results with a call like, "ground.temp.mean")) or get all the predictions for a year with a call like predict.temps(2012L, "pred.area").


Since R's readxl package has difficulty with the EMAs Excel files, I used LibreOffice to mass-convert them to CSV with the following command: time ( ls | xargs --delimiter '\n' --max-args 250 --max-procs 10 soffice --headless --convert-to csv --outdir ../stations/smn-emas-csv ). This took about 1 hour 45 min on one of our servers.


We assert no copyright over the data. The code is copyright 2018, 2019 Kodi B. Arfer, Iván Gutiérrez-Avila, and Johnathan Rush.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You can’t perform that action at this time.