<a href="https://colab.research.google.com/github/kode-git/Copernicus-river-discharges/blob/main/Initial_Exploratory_Spatial_Data_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Initial Exploratory Spatial Data Analysis

## Library Dependencies

In [3]:
!pip install xarray 
!pip install netCDF4 dask bottleneck
!pip install pandas
!pip install geopandas
!pip install cdsapi



In [49]:
# collections libraries
import netCDF4 as nc4
from netCDF4 import Dataset
import xarray as xr
import numpy as np

# file management
from glob import glob

# utilities 
from datetime import datetime as dt

# colab
from google.colab import drive

In [5]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Initial Exploratory on Spatial River Discharge Data (RDH)

In [6]:
!chmod 777 /content/drive/MyDrive/Datasets/cs3-copernicus-datasets/cps3-copernicus-dataset-tpi-rdh/rdh/rdh_2022_2021.nc

In [76]:
nc = Dataset("/content/drive/MyDrive/Datasets/cs3-copernicus-datasets/cps3-copernicus-dataset-tpi-rdh/rdh/rdh_2022_2021.nc", "r", "NETCDF4" )
print(nc)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    GRIB_edition: 2
    GRIB_centre: ecmf
    GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre: 0
    Conventions: CF-1.7
    institution: European Centre for Medium-Range Weather Forecasts
    history: 2022-05-19T10:09 GRIB to CDM+CF via cfgrib-0.9.9.1/ecCodes-2.26.0 with {"source": "/cache/tmp/f2dca5f7-24e2-412d-b6b4-35ab2f6adf02-adaptor.mars.external-1652954953.8218787-8213-15-tmp.grib", "filter_by_keys": {}, "encode_cf": ["parameter", "time", "geography", "vertical"]}
    dimensions(sizes): y(950), x(1000), time(498)
    variables(dimensions): float64 y(y), float64 x(x), int64 time(time), float64 step(), float64 surface(), float32 latitude(y, x), float32 longitude(y, x), float64 valid_time(time), float32 dis06(time, y, x), int32 lambert_azimuthal_equal_area(), int8 land_binary_mask(y, x), float32 upArea(y, x)
    groups: 


### Data Information

In [70]:
print(nc.variables.keys())

dict_keys(['y', 'x', 'time', 'step', 'surface', 'latitude', 'longitude', 'valid_time', 'dis06', 'lambert_azimuthal_equal_area', 'land_binary_mask', 'upArea'])


Actually, we can find values corresponding to the keys of the dataset.

In [71]:
dis = nc.variables['dis06']
print(dis)

<class 'netCDF4._netCDF4.Variable'>
float32 dis06(time, y, x)
    _FillValue: nan
    GRIB_paramId: 240023
    GRIB_dataType: sfo
    GRIB_numberOfPoints: 950000
    GRIB_typeOfLevel: surface
    GRIB_stepUnits: 1
    GRIB_stepType: avg
    GRIB_gridType: lambert_azimuthal_equal_area
    GRIB_NV: 0
    GRIB_cfName: unknown
    GRIB_cfVarName: dis06
    GRIB_gridDefinitionDescription: Lambert azimuthal equal area projection
    GRIB_missingValue: 9999
    GRIB_name: Mean discharge in the last 6 hours
    GRIB_shortName: dis06
    GRIB_units: m**3 s**-1
    long_name: Mean discharge in the last 6 hours
    units: m**3 s**-1
    standard_name: unknown
    grid_mapping: lambert_azimuthal_equal_area
    coordinates: time step surface latitude longitude valid_time
unlimited dimensions: 
current shape = (498, 950, 1000)
filling on


The discharge value is an integer value formed by 3 dimensions - Time, X, Y. We can also get some other information about metadata like coordinates, standard name, units of semantic measurement. About their dimensions type, we can check their size.

In [74]:
for d in nc.dimensions.items():
  print(d)

('y', <class 'netCDF4._netCDF4.Dimension'>: name = 'y', size = 950)
('x', <class 'netCDF4._netCDF4.Dimension'>: name = 'x', size = 1000)
('time', <class 'netCDF4._netCDF4.Dimension'>: name = 'time', size = 498)


In [77]:
dis.dimensions

('time', 'y', 'x')

Printing the dimensions of the dicharge variable, we can check how it is formed to confirm our first assumption. 

In [78]:
dis.shape

(498, 950, 1000)

Similarly, we can also inspect the variables associated with each dimension:

In [80]:
time = nc.variables['time']
x,y = nc.variables['x'], nc.variables['y']
print("Time variables :->", time)
print("X coordinate :->", x)
print("Y coordinate :->", y)

Time variables :-> <class 'netCDF4._netCDF4.Variable'>
int64 time(time)
    long_name: initial time of forecast
    standard_name: forecast_reference_time
    units: seconds since 1970-01-01
    calendar: proleptic_gregorian
unlimited dimensions: 
current shape = (498,)
filling on, default _FillValue of -9223372036854775806 used
X coordinate :-> <class 'netCDF4._netCDF4.Variable'>
float64 x(x)
    _FillValue: nan
    units: Meter
    long_name: x coordinate of projection
    standard_name: projection_x_coordinate
unlimited dimensions: 
current shape = (1000,)
filling on
Y coordinate :-> <class 'netCDF4._netCDF4.Variable'>
float64 y(y)
    _FillValue: nan
    units: Meter
    long_name: Y coordinate of projection
    standard_name: projection_Y_coordinate
unlimited dimensions: 
current shape = (950,)
filling on


Here, we obtained some information about each of the three dimensions. The time is related to the initial moment of forecast of the discharges, meanwhile x and y are the coordinates in meters about the geographical projection. Dimensions are in 1D. So, we can access directly as a numpy array:

In [82]:
tm = time[:]
print(tm[0:20])

[1609502400 1609588800 1609675200 1609761600 1609848000 1609934400
 1610020800 1610107200 1610193600 1610280000 1610366400 1610452800
 1610539200 1610625600 1610712000 1610798400 1610884800 1610971200
 1611057600 1611144000]


This property is similar to other dimensions:

In [83]:
X = x[:]
Y = y[:]
print(x[0:10])
print(y[0:10])
print('Shape x : {}; shape y : {}'.format(X.shape, Y.shape))

[2502500. 2507500. 2512500. 2517500. 2522500. 2527500. 2532500. 2537500.
 2542500. 2547500.]
[5497500. 5492500. 5487500. 5482500. 5477500. 5472500. 5467500. 5462500.
 5457500. 5452500.]
Shape x : (1000,); shape y : (950,)


X and Y are not the geographical coordinates related to the river discharges but only for the 2D graphical projection. In the starting snippet, we analyzed data structure and we saw also a geographical reference for real coordinates: latitude and longitude.

In [92]:
lat, lon = nc.variables['latitude'], nc.variables['longitude']
print("Latitude :-> {}".format(lat))
print("Longitude :-> {}".format(lon))
print('Latitude values:')
print(lat[:])
print('Longitude values:')
print(lon[:])

Latitude :-> <class 'netCDF4._netCDF4.Variable'>
float32 latitude(y, x)
    _FillValue: nan
    grid_mapping: lambert_azimuthal_equal_area
    long_name: latitude
    standard_name: latitude
    units: degrees_north
    esri_pe_string: PROJCS["ETRS_1989_LAEA",GEOGCS["GCS_ETRS_1989",DATUM["D_ETRS_1989",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Azimuthal_Equal_Area"],PARAMETER["false_easting",4321000.0],PARAMETER["false_northing",3210000.0],PARAMETER["central_meridian",10.0],PARAMETER["latitude_of_origin",52.0],UNIT["Meter",1.0]]
unlimited dimensions: 
current shape = (950, 1000)
filling on
Longitude :-> <class 'netCDF4._netCDF4.Variable'>
float32 longitude(y, x)
    _FillValue: nan
    grid_mapping: lambert_azimuthal_equal_area
    long_name: longitude
    standard_name: longitude
    units: degrees_east
    esri_pe_string: PROJCS["ETRS_1989_LAEA",GEOGCS["GCS_ETRS_1989",DATUM["D_ETRS_1989",SPHEROID["GRS_1

## Initial Exploratory on Spatial Temperatures and Precipitations Data (TPI)