# The Impact of Temperature and Precipitation on Annual Crop Yield in North America #

This script shows the data analysis for the research paper: "The Impact of Temperature and Precipitation on Annual Crop Yield in North America". 

Paper and analysis by: 

Maxim Mahnkopf, Dominic Schierbaum and Nick van Nuland

# Part 1 (Python)

Import Packages


In [2]:
import xarray as xr
import os

Set Path

In [3]:
# os.chdir(r"your/path")

## Climate Variables: Access and Calculating Seasonal Mean

Access the Data Cube

In [4]:
zarr_path = "http://data.rsc4earth.de/EarthSystemDataCube/v2.1.0/esdc-8d-0.083deg-1x2160x4320-2.1.0.zarr"
data_cube = xr.open_zarr(zarr_path)

Choose Climate Variables, Region and Time Period

In [5]:
climate_variable = ["air_temperature_2m","precipitation"]
subset = data_cube[climate_variable].sel(lon=slice(-169.05, -53.0), lat=slice(71.0, 25.0),time=slice('1982', '2016'))

Calcualte the Seasonal Mean Over the Growing Period for the Climate Variables

In [6]:
subset = subset.sel(time=subset['time.month'].isin([4, 5, 6, 7, 8, 9]))
seasonal_mean = subset['air_temperature_2m'].groupby('time.year').mean(dim='time') 
seasonal_precipitation = subset['precipitation'].groupby('time.year').sum(dim='time')

seasonal = xr.Dataset({
  'temperature': (['year', 'lat', 'lon'], seasonal_mean.data),
  'precipitation': (['year', 'lat', 'lon'], seasonal_precipitation.data)
}, coords={
  'year': seasonal_mean['year'],
  'lat': seasonal_mean['lat'],
  'lon': seasonal_mean['lon']
})

In [7]:
# export seasonal mean data
seasonal.to_netcdf("data/seasonal.nc4")

# this execution can take up to 15 minutes

## Crop Yield Data (GDHY): Stacking

Stack All nc4 Files:
Make sure you have downloaded the crop yield data from the git repo: https://github.com/mxkopf/Crop_Yield_Climate_Var

The data set in the github repo was downloaded from the original paper: https://doi.pangaea.de/10.1594/PANGAEA.909132

Spring Wheat

In [8]:
directory = 'data/wheat'
sorted_files = sorted(os.listdir(directory), key=lambda x: int(x.split('_')[1].split('.')[0]))
wheat = xr.Dataset()

for filename in sorted_files:
    if filename.endswith(".nc4"):
        filepath = os.path.join(directory, filename)
        ds = xr.open_dataset(filepath, engine='netcdf4')
        wheat = xr.concat([wheat, ds], dim='time')

# Define 'time' as a coordinate variable
wheat = wheat.assign_coords(time=wheat.time)

# Export the stacked dataset to a new NetCDF4 file
output_file = "data/wheat_stacked.nc4"
wheat.to_netcdf(output_file, format='netCDF4')

print(f"Stacked dataset saved to {output_file}")


Stacked dataset saved to data/wheat_stacked.nc4


Maize

In [9]:
directory = 'data/maize'
sorted_files = sorted(os.listdir(directory), key=lambda x: int(x.split('_')[1].split('.')[0]))
maize = xr.Dataset()

for filename in sorted_files:
    if filename.endswith(".nc4"):
        filepath = os.path.join(directory, filename)
        ds = xr.open_dataset(filepath, engine='netcdf4')
        maize = xr.concat([maize, ds], dim='time')

# Define 'time' as a coordinate variable
maize = maize.assign_coords(time=maize.time)

# Export the stacked dataset to a new NetCDF4 file
output_file = "data/maize_stacked.nc4"
maize.to_netcdf(output_file, format='netCDF4')

print(f"Stacked dataset saved to {output_file}")


Stacked dataset saved to data/maize_stacked.nc4


Soybean

In [10]:
directory = 'data/soybean'
sorted_files = sorted(os.listdir(directory), key=lambda x: int(x.split('_')[1].split('.')[0]))
soybean = xr.Dataset()

for filename in sorted_files:
    if filename.endswith(".nc4"):
        filepath = os.path.join(directory, filename)
        ds = xr.open_dataset(filepath, engine='netcdf4')
        soybean = xr.concat([soybean, ds], dim='time')

# Define 'time' as a coordinate variable
soybean = soybean.assign_coords(time=soybean.time)

# Export the stacked dataset to a new NetCDF4 file
output_file = "data/soybean_stacked.nc4"
soybean.to_netcdf(output_file, format='netCDF4')

print(f"Stacked dataset saved to {output_file}")


Stacked dataset saved to data/soybean_stacked.nc4
