# Descarga de datos WeatherBench2
---

``python version = 3.11``

En este notebook se exploran los conjuntos de datos de WeatherBench2

In [6]:
# !pip install apache-beam
# !pip install git+https://github.com/google-research/weatherbench2.git
# !pip install gcsfs
# !pip install netCDF4 

In [7]:
import apache_beam  
import weatherbench2
import numpy as np
import xarray as xr

## ERA 5, resolución: 1440x721, niveles: 13, time step: 6 H
---

> **Nomenclatura:** ``1959-2023_01_10-wb13-6h-1440x721_with_derived_variables`` 
> * fecha inicial: ``1959``
> * fecha final: ``2023_01_10``
> * niveles: ``wb13``
> * delta tiempo: ``6h``
> * resolución espacial: ``1440x721``
> * info. extra: ``with_derived_variables`` 

In [11]:
data_UHR = xr.open_zarr('gs://weatherbench2/datasets/era5/1959-2023_01_10-wb13-6h-1440x721_with_derived_variables.zarr')
data_UHR

In [28]:
# Peso en GB
data_UHR.nbytes / 1000**3

88963.00301054

In [19]:
data_UHR.to_netcdf(r"C:\Users\gcuervo\Documents\Doctorado\DB\1959-2023_01_10-wb13-6h-1440x721_with_derived_variables.nc")

MemoryError: Unable to allocate 362. GiB for an array with shape (93544, 721, 1440) and data type float32

La base de datos pesa ``88.963 GB`` por lo que no es posible guardarla en local.

### Selección de variables
---
Debido a que la base de datos es demasiado grande se intenta seleccionar solo unas cuantas variables ver si se puede descargar en local.

In [34]:
data_sel_vars = data_UHR[['geopotential', 'temperature', 'specific_humidity', 'u_component_of_wind', 'v_component_of_wind']]
data_sel_vars

In [36]:
# Peso en GB
data_sel_vars.nbytes / 1000**3

25251.4926227

In [37]:
data_sel_vars.to_netcdf(r"C:\Users\gcuervo\Documents\Doctorado\DB\1959-2023_01_10-wb13-6h-1440x721_with_geop_temp_spehum_uvwind.nc")


MemoryError: Unable to allocate 4.59 TiB for an array with shape (93544, 13, 721, 1440) and data type float32

De la base de datos total se seleccionaron 5 variables: ``'geopotential', 'temperature', 'specific_humidity', 'u_component_of_wind', 'v_component_of_wind'``
Con un peso de: ``25251.5 GB``

## ERA 5, resolución: 512x256, niveles: 13, time step: 6 H
---

In [38]:
data_HR = xr.open_zarr('gs://weatherbench2/datasets/era5/1959-2022-6h-512x256_equiangular_conservative.zarr')
data_HR

In [39]:
# Peso en GB
data_HR.nbytes / 1000**3

5211.825067208

## ERA 5, resolución: 240x121, niveles: 13, time step: 6 H
---

In [54]:
data_MR = xr.open_zarr('gs://weatherbench2/datasets/era5/1959-2023_01_10-6h-240x121_equiangular_with_poles_conservative.zarr')
data_MR

In [55]:
# Peso en GB
data_MR.nbytes / 1000**3

2488.332529584

## ERA 5, resolución: 128x64, niveles: 13, time step: 6 H
---

In [56]:
data_LR = xr.open_zarr('gs://weatherbench2/datasets/era5/1959-2022-6h-128x64_equiangular_conservative.zarr')
data_LR

In [57]:
# Peso en GB
data_LR.nbytes / 1000**3

325.73975828

## ERA 5, resolución: 64x32, niveles: 13, time step: 6 H
---

In [58]:
data_ULR = xr.open_zarr('gs://weatherbench2/datasets/era5/1959-2023_01_10-6h-64x32_equiangular_conservative.zarr')
data_ULR

In [59]:
# Peso en GB
data_ULR.nbytes / 1000**3

175.486406312