### Author: Md Fahim Hasan
### Work Email: mdfahim.hasan@bayer.com

# Read me

The soil and elevation `Analytical Ready Data (ARD)`s consists of the folowing datasets-
  1. Saturated water concent (awct) 
  2. Nitrogen
  3. Organic carbon content (soc)
  4. Available water capacity (wwp)
  5. Bulk density
  6. Cation exchange capacity (cec)
  7. clay percentage
  8. Silt percentage
  9. Sand percentage
  10. Organic Matter (om)
  11. pH 
  12. elevation
  13. slope

Varibles `1-11` comes from soil250 database (https://cswarehouse.io/datasets/location360/soil/soil250/) and are of high spatial resolution of 250m. To produce the 4km resolution ARDs, the soil datasets have been gapfilled and resampled to 4km.

Variables `12-13` comes from SRTM 30m spatial resolution DEM data (https://cswarehouse.io/datasets/location360/topography/srtm/). Slope (values in degree) has been calculated using the DEM data. Both elevation and slope dataaset have been resampled to 4km resolution to produce the ARD.

## 4km ARD Processing Steps:
The following steps were followed to create 4km ARD for soil and elevation data-
1. __Resampling the datasets to 4km:__ The soil250 datasets have a native spatial resolution of 250m and the SRTM DEM elevation data (including slope data processed from it) have 30m spatial resolution. We are resampling all datasets to 4km spatial resolution for the 4km ARD. The resampled datasets are saved to the `downscaled_data/4km/soil_elevation_data` directory.
2. __Generating ARD for Soil and Elevation Data:__ All 4km resampled soil and elevation + slope datasets are compiled inot the `4km ARD for Soil and Elevation` data using `generate_soil_elevation_ARD()` function. The ARD in a dataframe in '.parquet' format. 


__Note:__
- The functions required for running the steps in this script can be found in `ARD_utils.ipynb` script.
- __Reference Raster:__ Used a model generated 4km resolution dataset (avg_relative_humidity) as the reference raster for resampling soil and elevation datasets to 4km.

-------------------

In [1]:
from ipynb.fs.full.ARD_utils import *

## 1. Resampling datasets to 4km

In [4]:
###### source data ######
# soil datasets
derived_saturated_water_content = '../../datasets/soil250_raster_data/cities_california/soil250_data/average_awct/average_awct.tif'
average_nitrogen = '../../datasets/soil250_raster_data/cities_california/soil250_data/average_nitrogen/average_nitrogen.tif'
organic_carbon_content = '../../datasets/soil250_raster_data/cities_california/soil250_data/average_soc/average_soc.tif'
available_water_capacity = '../../datasets/soil250_raster_data/cities_california/soil250_data/average_wwp/average_wwp.tif'
bulk_density = '../../datasets/soil250_raster_data/cities_california/soil250_data/BulkDense/BulkDense.tif'
cation_exchange_capacity = '../../datasets/soil250_raster_data/cities_california/soil250_data/cec/cec.tif'
clay_percentage = '../../datasets/soil250_raster_data/cities_california/soil250_data/clay/clay.tif'
silt_percentage = '../../datasets/soil250_raster_data/cities_california/soil250_data/silt/silt.tif'
sand_percentage = '../../datasets/soil250_raster_data/cities_california/soil250_data/sand/sand.tif'
organic_matter = '../../datasets/soil250_raster_data/cities_california/soil250_data/om/om.tif'
ph = '../../datasets/soil250_raster_data/cities_california/soil250_data/pH/pH.tif'

# elevation + slope datasets 
elevation = '../../datasets/elevation_raster_data/cities_california/elevation/elevation_100m.tif'
slope = '../../datasets/elevation_raster_data/cities_california/slope/slope_100m.tif'

soil_elev_datasets = [derived_saturated_water_content, average_nitrogen, organic_carbon_content, available_water_capacity,
                      bulk_density, cation_exchange_capacity, clay_percentage, silt_percentage, sand_percentage,
                      organic_matter, ph, elevation, slope]

In [5]:
data_list = soil_elev_datasets
output_dir = '../../datasets/downscaled_data/4km/soil_elevation_data'
reference_raster = '../../datasets/downscaled_data/4km/weather_data/modeled_4km/avg_Rhumid/avg_Rhumid_20020101.tif'

resample_soil_elevation_data_4km(dataset_list=data_list, output_folder=output_dir, 
                                 reference_raster=reference_raster, paste_value_on_ref_raster=True)

## 2. Generating ARD for Soil and Elevation Data

In [2]:
soil_elevation_4km_processed_dir = '../../datasets/downscaled_data/4km/soil_elevation_data'
ard_dir = '../../datasets/ARD/4km'

generate_soil_elevation_ARD(input_data_directory=soil_elevation_4km_processed_dir, output_folder=ard_dir, 
                            savename='soil_elevation_4km_ARD.parquet')

compiling data for elevation...
compiling data for sand...
compiling data for clay...
compiling data for average_nitrogen...
compiling data for cec...
compiling data for om...
compiling data for average_wwp...
compiling data for slope...
compiling data for pH...
compiling data for average_soc...
compiling data for BulkDense...
compiling data for silt...
compiling data for average_awct...


In [3]:
soil_elevatiopn_4km_ARD = '../../datasets/ARD/4km/soil_elevation_4km_ARD.parquet'

soil_elevatiopn_4km_ARD_df = pd.read_parquet(soil_elevatiopn_4km_ARD)
soil_elevatiopn_4km_ARD_df

Unnamed: 0,elevation,sand,clay,average_nitrogen,cec,om,average_wwp,slope,pH,average_soc,BulkDense,silt,average_awct,lat,lon
0,22.050282,0.266735,0.292154,2.016423,26.489555,3.344771,27.129637,80.139702,6.979492,22.567621,1.593004,0.441111,44.008675,39.413145,-122.126718
1,20.403061,0.202411,0.323570,1.878956,32.690819,2.334052,28.165077,89.520233,7.005332,16.474554,1.602248,0.474019,44.051392,39.413145,-122.090718
2,20.524334,0.173069,0.319097,1.911974,31.377214,2.325335,28.129444,89.433556,7.086928,16.137424,1.619904,0.507834,43.063557,39.413145,-122.054718
3,22.943249,0.181538,0.316089,1.965419,27.331190,3.088321,25.571966,68.663231,7.002160,21.632231,1.606376,0.502373,41.727734,39.413145,-122.018718
4,30.534155,0.230286,0.339297,1.405222,27.046650,2.062668,26.348799,82.163490,6.885736,14.117763,1.643732,0.430418,41.755676,39.377145,-122.234718
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1535,61.494495,0.282409,0.323115,1.816606,31.882837,2.506843,30.484289,89.927589,7.729734,15.053114,1.445788,0.394477,41.887928,35.813145,-119.714718
1536,62.065300,0.299206,0.315154,1.892729,32.761158,3.084175,31.357918,89.922852,7.650697,18.474592,1.442059,0.385640,42.626076,35.813145,-119.678718
1537,62.466873,0.323749,0.302655,1.612083,30.710752,2.976255,28.362642,85.986015,7.699276,18.295589,1.471370,0.373596,43.000900,35.813145,-119.642718
1538,62.213089,0.351302,0.275398,1.349103,27.776184,2.502969,24.309132,81.654152,7.745382,15.910647,1.522853,0.373299,40.840729,35.813145,-119.606718


`For our Area of Interest (AOI), there are 1540 rows of records each having 4km pixel size.`