# NO2 timeseries analysis

- NO2 time series analysis on EO Dashboard: https://www.eodashboard.org/story?id=air-pollution-us-india-china

<ins>Introduction</ins>
**The air quality analysis focuses on monitoring tropospheric nitrogen dioxide (NO2) measured by the Tropospheric Monitoring Instrument (TROPOMI) aboard Copernicus Sentinel-5P.**
Earth-observing satellites equipped with TROPOMI instrument are being used to map air pollution worldwide and have revealed a significant drop in nitrogen dioxide (NO2) concentrations – coinciding with the strict quarantine measures which cause less emissions of the air pollutant nitrogen dioxide due to reduced traffic and industrial activities.
The nitrogen dioxide concentrations vary from day to day due to changes in the weather (such as wind speed, cloudiness, etc) and conclusions cannot be drawn based on just one day of data alone. By combining data for a specific period of time (e.g. averaging over 14 days) the meteorological variability partially averages out and impact of changes due to human activity become more clearly visible.

<ins>Notebook Description</ins>
This notebook allows the user to compute and extract NO2 timeseries statistics over predefined AOIs.
The nomeclature of the files you see in the notebook are the one used to generate the indicator on the dashbaord. So they can be customized as desired. Nevertheless to produce a Dashboard compliant indicator the format of columns and other parameters need to be formatted as represented in the workflow.



To collect information about TROPOMI NO2 and data provided please visit the [following link](https://maps.s5p-pal.com/no2/)

## Important Note
In order to run this notebook you are reccomended to use custom Conda Environments as Jupyter Kernels. The yaml file with dependencies and needed libraries to create the custom environment is provided below, and instruction to create the custom environment in your EOX workspace can be found [here](https://eurodatacube.com/documentation/custom-jupyter-kernels)

yml file content to be used for creating the environment:
```
name: no2ts_full
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - affine=2.3.0=py_0
  - attrs=19.3.0=py_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - blas=1.0=mkl
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2020.6.24=0
  - cairo=1.14.12=h8948797_3
  - certifi=2020.6.20=py37_0
  - cfitsio=3.470=hb7c8383_2
  - click=7.1.2=py_0
  - click-plugins=1.1.1=py_0
  - cligj=0.5.0=py37_0
  - curl=7.67.0=hbc83047_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - expat=2.2.9=he6710b0_2
  - fontconfig=2.13.0=h9420a91_0
  - freetype=2.10.2=h5ab3b9f_0
  - freexl=1.0.5=h14c3975_0
  - geos=3.8.0=he6710b0_0
  - geotiff=1.5.1=h21e8280_1
  - giflib=5.1.4=h14c3975_1
  - glib=2.63.1=h5a9c865_0
  - hdf4=4.2.13=h3ca952b_2
  - hdf5=1.10.4=hb1b8bf9_0
  - icu=58.2=he6710b0_3
  - intel-openmp=2020.1=217
  - ipykernel=5.3.4=py37h888b3d9_1
  - ipython=7.18.1=py37hc6149b9_1
  - ipython_genutils=0.2.0=py_1
  - jedi=0.17.2=py37h89c1867_2
  - jpeg=9b=h024ee3a_2
  - json-c=0.13.1=h1bed415_0
  - jupyter_client=7.1.2=pyhd8ed1ab_0
  - jupyter_core=4.9.2=py37h89c1867_0
  - kealib=1.4.7=hd0c454d_6
  - krb5=1.16.4=h173b8e3_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libboost=1.67.0=h46d08c1_4
  - libcurl=7.67.0=h20c2e04_0
  - libdap4=3.19.1=h6ec2957_0
  - libedit=3.1.20191231=h14c3975_1
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgdal=3.0.2=h27ab9cc_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libkml=1.3.0=h590aaf7_4
  - libnetcdf=4.6.1=h11d0813_2
  - libpng=1.6.37=hbc83047_0
  - libpq=11.2=h20c2e04_0
  - libsodium=1.0.18=h36c2ea0_1
  - libspatialite=4.3.0a=h793db0d_0
  - libssh2=1.9.0=h1ba5d50_1
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.1.0=h2733197_0
  - libuuid=1.0.3=h1bed415_2
  - libxcb=1.14=h7b6447c_0
  - libxml2=2.9.10=he19cac6_1
  - lz4-c=1.8.1.2=h14c3975_0
  - mkl=2020.1=217
  - mkl-service=2.3.0=py37he904b0f_0
  - mkl_fft=1.1.0=py37h23d657b_0
  - mkl_random=1.1.1=py37h0573a6f_0
  - ncurses=6.2=he6710b0_1
  - nest-asyncio=1.5.5=pyhd8ed1ab_0
  - numpy=1.19.1=py37hbc911f0_0
  - numpy-base=1.19.1=py37hfa32c7d_0
  - openjpeg=2.3.0=h05c96fa_1
  - openssl=1.1.1g=h7b6447c_0
  - parso=0.7.1=pyh9f0ad1d_0
  - pcre=8.44=he6710b0_0
  - pexpect=4.8.0=pyh9f0ad1d_2
  - pickleshare=0.7.5=py_1003
  - pip=20.1.1=py37_1
  - pixman=0.40.0=h7b6447c_0
  - poppler=0.65.0=h581218d_1
  - poppler-data=0.4.9=0
  - postgresql=11.2=h20c2e04_0
  - proj=6.2.1=haa6030c_0
  - prompt-toolkit=3.0.29=pyha770c72_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pygments=2.11.2=pyhd8ed1ab_0
  - pyparsing=2.4.7=py_0
  - python=3.7.7=h191fe78_0_cpython
  - python_abi=3.7=2_cp37m
  - pyzmq=20.0.0=py37h5a562af_1
  - rasterio=1.1.0=py37h41e4f33_0
  - readline=7.0=h7b6447c_5
  - setuptools=49.2.0=py37_0
  - shapely=1.7.0=py37h98ec03d_0
  - six=1.15.0=py_0
  - snuggs=1.4.7=py_0
  - sqlite=3.32.3=h62c20be_0
  - tbb=2018.0.5=h6bb024c_0
  - tiledb=1.6.3=h1fb8f14_0
  - tk=8.6.10=hbc83047_0
  - tornado=6.1=py37h4abf009_0
  - traitlets=5.1.1=pyhd8ed1ab_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - wheel=0.34.2=py37_0
  - xerces-c=3.2.2=h780794e_0
  - xz=5.2.5=h7b6447c_0
  - zeromq=4.3.3=h58526e2_3
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
  - pip:
    - aenum==2.2.4
    - boto3==1.14.36
    - botocore==1.17.36
    - chardet==3.0.4
    - docutils==0.15.2
    - idna==2.10
    - jmespath==0.10.0
    - oauthlib==3.1.0
    - pandas==1.1.0
    - pathlib==1.0.1
    - pillow==7.2.0
    - pyproj==2.6.1.post1
    - python-dateutil==2.8.1
    - pytz==2020.1
    - requests==2.24.0
    - requests-oauthlib==1.3.0
    - s3transfer==0.3.3
    - sentinelhub==3.0.5
    - tifffile==2020.7.24
    - urllib3==1.25.10
    - utm==0.5.0
    - wget==3.2
    - xlrd==1.2.0
prefix: /home/jovyan/IGARSS_2022/no2ts/env_no2ts_full
```

# Concept of analysis
As input data source a Bring Your Own COG - BYOC data collection (*S5P-NO2-tropno-daily-check*) is used that is composed by Tropospheric Nitrogen dioxide (NO2) global coverage maps, each of which is a spatial average of NO2 bi-weekly concentration value.


<b>SOURCE: https://browser.eurodatacube.com/?zoom=10&lat=41.9&lng=12.5&collectionId=s5p-no2-tropno-daily-check&layerId=NO2&type=sentinel-hub-edc&fromTime=2021-05-19T08%3A45%3A36.153Z&toTime=2022-05-19T08%3A45%3A36.153Z<b>


    
   
<ins>The core of the notebook</ins> is then to compute median, std, max, min statistics on specific areas (AoI) and time range, and output this information on a csv that can be directly ingested in the geoDB and the related timeseries visualized on the EO Dashboard.   

# Notebook step by step guide

As you can see the different cells execute different commands to allow the generation of the required files.

# Import Libraries
this cell aim at importing the necessary libraries to run the notebook. 

In [None]:
# Import packages
import itertools
import os
import numpy
import wget
import shapely
import rasterio
import datetime
import csv
import shutil

from shapely import wkt
from rasterio.merge import merge
from rasterio.mask import mask
from pathlib import Path
from sentinelhub import MimeType, CRS, BBox, BBoxSplitter
from sentinelhub.geometry import Geometry
from sentinelhub.geo_utils import bbox_to_dimensions
from shapely.geometry import shape, Polygon

import sys
import getopt

import time as tm
import urllib
from urllib.error import HTTPError

# Input Parameters

in this cell the user shall enter the location of the input parameters and the files themselves:
- Input-Output files
    - inputfile: indicate where the file is and which file to use as input
- Output location:
    - filename: is the standar file name used to be ingested into the dashboard. It shall be a csv file with specific  columns and formats
    - cities_info: os.path.basename() method is used to get the base name in the specified path (inputfile)
    - INPUT_DIR_NAME: os.path.dirname()  method used to get the directory name from the specified path (inputfile)
    - OUTPUT_FOLDER: to generate a dedicated folder where tos tore the outputs in case it doesn't exist.


    

In [None]:
# Input-Output files
inputfile = "./Inputs/No2TestCase.xls"
mode='full'

now = datetime.datetime.now()
current_proctime = now.strftime("%Y%m%dT%H%M%S")

# Output location
filename= 'N1_trilateral_' + current_proctime + '.csv'
cities_info=os.path.basename(inputfile)
INPUT_DIR_NAME = os.path.dirname(inputfile)

OUTPUT_FOLDER = f"./Outputs/N1_tri/{current_proctime}/"
if not os.path.exists(OUTPUT_FOLDER):
    os.makedirs(OUTPUT_FOLDER)
PARENT_DATASET_DIR = OUTPUT_FOLDER

# Server instance
The server and the instances needed to use the proper files. The collections is avaialbe in the SH.

In [None]:
# Server instance
WMS_SERVER_URL = "https://services.sentinel-hub.com/ogc/wms/"
INSTANCE_ID = "c1f84418-3731-4b42-b92b-737c47d327a6"
LAYER_NAME = "S5P-TW-WEEKLY-NEW"

In [None]:
# Parameters
RES_X = 1627.315
RES_Y = RES_X
print(RES_X)

CRS_TARGET = "WGS84"
IMAGE_DEPTH = "32f" # 8, 16, 32f (8bit uint, 16bit uint or 32bit float)
IMAGE_FORMAT = "tif" # Only tif image is supported right now

In [None]:
# Defining text info to be printed in the output file
region=''
site=''
description='Tropospheric Nitrogen dioxide (NO2) column accumulated in last 14 days'
description='Air Quality'
method='Spatial average of NO2 bi-weekly concentration value on the city area'
eosensor='TROPOMI'
input_data='Sentinel-5p Level-3 NO2'
indicator_code='N1'
ref_description='Spatial statistics [median,std,max,min,percentage valid pixels] of the current date within the city area'
ref_time=''
ref_value=''
rule=''
indicator_value=''
yaxis='Tropospheric NO2 (μmol/m2)'
#yaxis='Tropospheric NO2 (\N{GREEK SMALL LETTER MU}mol/m2)'
color=''
indicator_name='TROPOMI: Spatial average over the city area of bi-weekly tropospheric nitrogen dioxide (NO2) concentrations'
provider='EMSS'
#AOI_ID=''
update_freq='Bi-weekly'

fieldnames = ['AOI','Country','Region','City','Site Name','Description','Method','EO Sensor','Input Data','Indicator code','Time','Measurement Value','Reference Description','Reference time','Reference value','Rule','Indicator Value','Sub-AOI','Y axis','Indicator Name','Color code','Data Provider','AOI_ID','Update Frequency']


# Defining WMS downloader
In order to execute the statistics the tiles shall be accessed and further processed. In order to do that through SH, you have to consider some of the limitations the WMS has set up. In particular:
- WMS_MAXIMUM_DATA_SIZE: the Maximum data size in pixel that can be requested from server (= 2500 Pixels). The following script allow to access to the needed patches (2500x2500) and to merge them into one image afterwards.



In [None]:
# Defining WMS downloader
"""## Downloader function definition"""

WMS_MAXIMUM_DATA_SIZE = 2500 # Maximum data size in pixel that can be requested from server
WMS_VERSION = "1.1.1"

def mergeImageFiles(filesList, outFileName):

    images = list(map(lambda x: rasterio.open(x, 'r'), filesList))
    merged, transform = merge(images)

    with rasterio.open(outFileName,
                       "w",
                       driver='Gtiff',
                       count=images[0].count,
                       height=merged.shape[1],
                       width=merged.shape[2],
                       transform=transform,
                       crs=images[0].crs,
                       dtype=images[0].dtypes[0]) as dest:
        dest.write(merged)

def crs_string_to_object(crs_string):
    if crs_string == "WGS84":
        return CRS.WGS84
    else:
        raise Exception("Unsupported CRS")
        
def image_format_string_to_object(image_ext_string, depth_string=None):
    if image_ext_string == "tif":
        if depth_string == "8":
            return MimeType.TIFF_d8
        elif depth_string == "16":
            return MimeType.TIFF_d16
        elif depth_string == "32f":
            return MimeType.TIFF_d32f
        else:
            raise Exception("Unsupported image format")
    else:
        raise Exception("Unsupported image format")
        

def wmsRequestData(wms_server_url, instanceID, aoi, layer_name, crs, image_format, resolution, start_date, end_date, output_folder, filename):
    
    # Only supports GeoTiff format
    if not image_format.is_tiff_format:
        raise Exception("Unsopported image format")
        
    # Create output folder
    os.makedirs(output_folder, exist_ok=True)
    
    image_name, image_ext = os.path.splitext(filename)
    # Calculate bbox wrapper around geometry
    geometry = Geometry(aoi, crs)
    bbox_wrapper = geometry.bbox
    
    bbox_wrapper_dims = bbox_to_dimensions(bbox_wrapper, resolution)
    # Calculate optimum grid to split area if area to download is greater than server size limit
    x_grid = (bbox_wrapper_dims[0] // WMS_MAXIMUM_DATA_SIZE) + 1
    y_grid = (bbox_wrapper_dims[1] // WMS_MAXIMUM_DATA_SIZE) + 1
    
    bbox_partition = bbox_wrapper.get_partition(num_x=x_grid, num_y=y_grid)

    downloaded_file_list = []
    for i, j in itertools.product(range(x_grid), range(y_grid)):
        # Get bbox in the grid
        bbox = bbox_partition[i][j]
        # Assemble WMS request
        wms_query = wms_server_url + instanceID + "?" + "version=" + WMS_VERSION + "&service=WMS" + "&request=GetMap" + "&format=" + image_format.get_string() + "&crs=" + crs.ogc_string() + "&layers=" + layer_name + "&RESX=" + str(resolution[0]) + "m" + "&RESY=" + str(resolution[1])+ "m" + "&BBOX=" + str(bbox) + "&TIME=" + start_date + "/" + end_date
        #wms_query = wms_server_url + instanceID + "?" + "version=" + WMS_VERSION + "&service=WMS" + "&request=GetMap" + "&format=" + image_format.get_string() + "&crs=" + crs.ogc_string() + "&layers=" + layer_name + "&WIDTH=" + str(resolution[0]) + "&HEIGHT=" + str(resolution[1])+ "&BBOX=" + str(bbox) + "&TIME=" + start_date + "/" + end_date
        patch_filename = image_name + "_" + str(i) + "_" + str(j) + image_ext
        
        print(wms_query)
        # Download image patch
        print("Downloading patch (%d, %d) ..." %(i, j))
        wget.download(wms_query, out=os.path.join(output_folder, patch_filename))
        downloaded_file_list.append(os.path.join(output_folder, patch_filename))
            
    # Merge images
    print("Merging patches ...")
    mergeImageFiles(downloaded_file_list, os.path.join(output_folder, filename))
    print("Done!!!")
    
    # Remove image patches
    for file in downloaded_file_list:
        os.remove(file)
        
def mask_raster(raster_file, aoi, output_file):
    with rasterio.open(raster_file) as src:
        if isinstance(aoi, shapely.geometry.MultiPolygon):
            polygons = [polygon for polygon in aoi]
        else:
            polygons = [aoi]
        out_image, out_transform = mask(src, polygons, crop=False)
        out_meta = src.meta
        
        out_meta.update({"driver": "GTiff",
                     "height": out_image.shape[1],
                     "width": out_image.shape[2],
                     "transform": out_transform})

        with rasterio.open(output_file, "w", **out_meta) as dest:
            dest.write(out_image)

# Location parameters reading function

the following cells aim at providing the functions to locate and re-call the parameters needed to run the core of the script for extracting the NO2 timeseries

In [None]:
# Reading function for reading location parameters

def read_input_pandas(filename):
    import pandas as pd
#    df = pd.read_csv(filename, header=1)
    df = pd.read_excel(filename,'cities')
    print(df.head(5))
    city=numpy.array(df.loc[:, 'City'])
    country=numpy.array(df.loc[:,'Country'])
    POINTS=numpy.array(df.loc[:,'Point Coordinates [LAT,LON]'])
    DELTAS_X=numpy.array(df.loc[:,'DELTA_X'])
    DELTAS_Y=numpy.array(df.loc[:,'DELTA_Y'])
    AOI_ID=numpy.array(df.loc[:,'AOI_ID'])
    POLYGONS=numpy.array(df.loc[:,'POLYGON'])
    print(city)
    print(POINTS)
    print(DELTAS_X)
    print(AOI_ID)
    print(POLYGONS)
    return city, country, DELTAS_X, DELTAS_X, AOI_ID, POINTS, POLYGONS

In [None]:
# Resampling raster needed only when EDC does not allow to get data in full resolution
import os
import rasterio

from rasterio.enums import Resampling

def resample_raster(raster_file, scale_factor, output_folder, output_filename):
    os.makedirs(output_folder, exist_ok=True)
    
    input_raster_name, input_raster_ext = os.path.splitext(os.path.basename(raster_file))
    
    with rasterio.open(raster_file) as src:
                
        # resample data to target shape
        data = src.read(
            out_shape=(
                src.count,
                int(src.height * scale_factor),
                int(src.width * scale_factor)
            ),
            resampling=Resampling.nearest
        )

        # scale image transform
        out_transform = src.transform * src.transform.scale(
            (src.width / data.shape[-1]),
            (src.height / data.shape[-2])
        )
        
        out_meta = src.meta.copy()
        out_meta.update({"height": data.shape[1],
                         "width": data.shape[2],
                         "transform": out_transform})
        with rasterio.open(os.path.join(output_folder, output_filename), "w", **out_meta) as dest:
            dest.write(data)

In [None]:
# Utility functions

# Creating empty file with header
def creating_output_csv(filename):
    with open(filename,'w') as csv_file:
        fieldnames = ['AOI','Country','Region','City','Site Name','Description','Method','EO Sensor','Input Data','Indicator code','Time','Measurement Value','Reference Description','Reference time','Reference value','Rule','Indicator Value','Sub-AOI','Y axis','Indicator Name','Color Code','Data Provider','AOI_ID','Update Frequency']
        writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
        writer.writeheader()
        csv_file.close()
        
# Removing rows with dates compressing last 30 days
def removing_lines_csv(filename):
    from datetime import date
    import pandas as pd
    fieldnames = ['AOI','Country','Region','City','Site Name','Description','Method','EO Sensor','Input Data','Indicator code','Time','Measurement Value','Reference Description','Reference time','Reference value','Rule','Indicator Value','Sub-AOI','Y axis','Indicator Name','Color Code','Data Provider','AOI_ID','Update Frequency']
    #with open(filename,'a') as csv_file:
    no2_df = pd.read_csv(filename)
    print('Full N1 indicator file')
    print(no2_df)
    print(no2_df.Time.unique())
    print(no2_df.Time.unique()[-1])
    print(no2_df.Time.unique()[-2]) 
    start_date=no2_df.Time.unique()[-2]
    no2_df.loc[(no2_df['Time'] == no2_df.Time.unique()[-2]) & (no2_df['Time'] == no2_df.Time.unique()[-1])]
    print('Lines to overwrite')
    print(no2_df.loc[(no2_df['Time'] == no2_df.Time.unique()[-2]) & (no2_df['Time'] == no2_df.Time.unique()[-1])])
    print('Remaining lines')
    print(no2_df.loc[(no2_df['Time'] != no2_df.Time.unique()[-2]) & (no2_df['Time'] != no2_df.Time.unique()[-1])]) 
    #no2_df.drop(no2_df.loc[(no2_df['Time'] >= no2_df.Time.unique()[-2]) & (no2_df['Time'] <= no2_df.Time.unique()[-1])])
        #csv_file.close()
    updated_df=no2_df.loc[(no2_df['Time'] != no2_df.Time.unique()[-2]) & (no2_df['Time'] != no2_df.Time.unique()[-1])]
    updated_df['Measurement Value']=numpy.round(updated_df['Measurement Value'],6)
    updated_df.to_csv(filename,columns=fieldnames, index = False)
    updated_df.to_csv(filename+'.2',columns=fieldnames,index = False)
    start=start_date.split('T')[0]
    #start_date = datetime.strptime(start,'%Y-%m-%d')
    start_date = date.fromisoformat(start)
    #datetime.date(2019, 12, 4)
    return start_date

# Core of the script

The core then allow to run the script and and compute the statistics needed (min, max, median, std) .
In particular there are two modes you can used based on the purpose of the run. At the beginning of the project there was the need to update only records that changed over time, also on already executed period, because the source (S5P Pal) could provided nrt data and consolidated data). Then in order to be always updated with the consolidated data, it has been decided to run the script for the entire period and not only on updated values. For this reason the two modes represents exactly one of the two situations described:
- full: execute the script for the entire period NO2 maps are avaialbe as it was done from scratch
- update: execute the scripts and update only the records that have been updated.

It is recommended to use the "full" mode.
In order to have clear how the output csv shall be formatted in terms of column, you can refer to the:
- fieldnames parameter

In [None]:
## Core of the script

import os
import shapely
from shapely import wkt
import datetime
import csv

##############################################################
# Getting location parameters 
city, country, DELTAS_X, DELTAS_Y, AOI_ID, POINTS, POLYGONS = read_input_pandas(os.path.join(INPUT_DIR_NAME,cities_info))

# Defining time variables for looping
delta = datetime.timedelta(days=14) 
today = datetime.date.today()  
end_date = today.strftime("%Y-%m-%d")

# Creating back up file
if  os.path.isfile(os.path.join(PARENT_DATASET_DIR,filename)) and os.path.exists(os.path.join(PARENT_DATASET_DIR,filename)):
    shutil.copy(os.path.join(PARENT_DATASET_DIR,filename),os.path.join(PARENT_DATASET_DIR,filename+'.bkp-'+end_date))

if mode == 'full':
    creating_output_csv(os.path.join(PARENT_DATASET_DIR,filename))
    start = datetime.date(2022,1,10)
    start = datetime.date(2021,1,4)
elif mode == 'update':  
    #delta1 = datetime.timedelta(days=30)
    #start_date =today-delta1
    start=removing_lines_csv(os.path.join(PARENT_DATASET_DIR,filename))
    print(start)

#end_date = today.strftime("%Y-%m-%d")
print(end_date)
end_date = today 

#initiating no2 average value to empty
mean_value=[]
dates=[]
print(len(city))
#start=datetime.date(2020,6,8)
#end_date=datetime.date(2020,6,23)
for k in range(0,len(city)):
  print('###############################')
  print('City:'+city[k])
  start_date=start
  DELTA_X=DELTAS_X[k]
  DELTA_Y=DELTAS_Y[k]
  LON=float(POINTS[k].split(',')[0])
  LAT=float(POINTS[k].split(',')[1])
  POINT=[LON, LAT]
  print(str(POINT[0])+','+str(POINT[1]))
  AOI = "POLYGON(("+str(POINT[1]-DELTA_X)+" "+str(POINT[0]-DELTA_Y)+","+str(POINT[1]+DELTA_X)+" "+str(POINT[0]-DELTA_Y)+","+str(POINT[1]+DELTA_X)+" "+str(POINT[0]+DELTA_Y)+","+str(POINT[1]-DELTA_X)+" "+str(POINT[0]+DELTA_Y)+","+str(POINT[1]-DELTA_X)+" "+str(POINT[0]-DELTA_Y)+"))"
  print(AOI)
  AOI=POLYGONS[k]
  print(AOI)
  cityname = city[k]
  cityname=cityname.replace(" ", "")
  print(cityname)
  IMAGE_REF_NAME=cityname+"_S5p_L3_NO2"
  
  # Read AOI and convert to shapely format
  aoi = shapely.wkt.loads(AOI)
  AOIstr=str(POINT[0])+'_'+str(POINT[1])
  mean_value=[]
  #end_date = datetime.date(2020, 6, 9)
  #start_date = datetime.date(2020, 6, 8) 
  print('start',start_date)
  print('stop',end_date)
  while start_date +delta  <= end_date:
    print(start_date)
    START_DATE = start_date.strftime("%Y-%m-%d")
    #END_DATE=START_DATE
    #date2= start_date+delta
    #END_DATE= date2.strtime("%Y-%m-%d")
    END_DATE=START_DATE
    
    #dates=numpy.append(dates,START_DATE)

    IMAGE_NAME = IMAGE_REF_NAME + "_" + START_DATE.translate({ord(i): None for i in '-:'}) + "_" + END_DATE.translate({ord(i): None for i in '-:'})
    IMAGE_NAME_TIF = IMAGE_NAME + '.tif'
    IMAGE_TIF = os.path.join(OUTPUT_FOLDER, IMAGE_NAME_TIF)
    IMAGE_CLIPPED_NAME = IMAGE_NAME + "_clipped"
    IMAGE_CLIPPED_NAME_TIF = IMAGE_NAME + "_clipped" + ".tif"
    IMAGE_CLIPPED_TIF = os.path.join(OUTPUT_FOLDER, IMAGE_CLIPPED_NAME_TIF)
    try:
        wmsRequestData(WMS_SERVER_URL, INSTANCE_ID, aoi, LAYER_NAME, crs_string_to_object(CRS_TARGET), image_format_string_to_object(IMAGE_FORMAT, IMAGE_DEPTH), (RES_X, RES_Y), START_DATE, END_DATE, OUTPUT_FOLDER, IMAGE_NAME_TIF)
    except HTTPError as err:
        print('Server sent error code {}.'.format(err.code))
        print('\nRetrying...')
        tm.sleep(10)
        continue

    ## there is no need to resample as sinergise updated the WMS server to admit our request
    # Downsampling of the WMS images downloaded at 1000m to 5500m original resolution
    # resample_raster(OUTPUT_FOLDER+'/'+IMAGE_NAME_TIF, 1000/1627.315, OUTPUT_FOLDER, IMAGE_NAME_TIF)

    with rasterio.open(OUTPUT_FOLDER+'/'+IMAGE_NAME_TIF) as no2:
      band=no2.read(1)
      #weight=no2.read(3)
      band = numpy.where(band > 2000, numpy.NaN, band)
      print('Mininum')
      print(numpy.nanmin(band))
      band = numpy.where(band < 0,numpy.NaN, band)
      print('Mininum2')
      print(numpy.nanmin(band))
      nonnans = numpy.count_nonzero(~numpy.isnan(band))
      perc = numpy.count_nonzero(~numpy.isnan(band))/numpy.shape(band)[0]/numpy.shape(band)[1]*100
      #perc = numpy.where(perc>100.0,100.0,perc)
      print('Nonperc:'+str(perc))

      ## weighting band with normalised weigth band (outside pols max=14)
      #band=numpy.multiply(band,weight/14.0)

      new_value=numpy.nanmean(numpy.nanmean(band))
      max=numpy.nanmax(numpy.nanmax(band))
      min=numpy.nanmin(numpy.nanmin(band))
      std=numpy.nanstd(band)
      median=numpy.nanmedian(numpy.nanmedian(band))
     # weightmean=numpy.mean(numpy.mean(weight))#/14
      print('Values are')
      value, count = numpy.unique(band.flatten(), return_counts = True, axis = 0)
      print('max: '+str(max)+' min: '+str(min)+' median: '+str(median)+' mean:'+str(new_value)+' std:'+str(std)+' count: '+str(count[0]))#+' weigth_average:'+str(weightmean))
      mean_value=numpy.append(mean_value,new_value)
    measurement=str(numpy.round(new_value,6))
    ref_value=[median,std,max,min,perc]#,weightmean]
    
    # Removing downloaded tif (no need to keep them)
    os.remove(IMAGE_TIF)    
    time=START_DATE+'T00:00:00'
    subAOI=AOI
 #   ref_value=[median,std,max,min]
#    ref_value.append(perc)
    ref_time=time
    line=str(POINT[0])+' '+str(POINT[1])+','+country[k]+','+region+','+city[k]+','+site+','+description+','+method+','+eosensor+','+input_data+','+indicator_code+','+time+','+str(measurement)+','+ref_description+','+ref_time+','+str(ref_value)+','+rule+','+indicator_value+','+subAOI+','+yaxis+','+indicator_name+','+color+','+provider+','+AOI_ID[k]+','+update_freq+'\n'
    
    fieldnames = ['AOI','Country','Region','City','Site Name','Description','Method','EO Sensor','Input Data','Indicator code','Time','Measurement Value','Reference Description','Reference time','Reference value','Rule','Indicator Value','Sub-AOI','Y axis','Indicator Name','Color Code','Data Provider','AOI_ID','Update Frequency']
    with open(os.path.join(PARENT_DATASET_DIR,filename), mode='a') as csv_file:
      writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
      #if numpy.array_equal(ref_value,numpy.asarray([numpy.NaN,numpy.NaN,numpy.NaN,numpy.NaN,0.0])):
      if str(ref_value) == '[nan, nan, nan, nan, 0.0]':#, 0.0]':
        ref_value = 'NaN'
        measurement = 'NaN'
        print('NaN conditions verified')
      if time == '2019-03-18T00:00:00' and city[k] == 'Paris':
        ref_value = [179.6195, 49.52065, 286.7863, 90.32629, 100.0]#, 13.375]
        measurement = 180.275910
        print('Paris 20190318 condition verified')
      writer.writerow({'AOI':str(POINT[0])+','+str(POINT[1]),'Country':country[k],'Region':region,'City':city[k],'Site Name':site,'Description':description,'Method':method,'EO Sensor':eosensor,'Input Data':input_data,'Indicator code':indicator_code,'Time':time,'Measurement Value':str(measurement),'Reference Description':ref_description,'Reference time':ref_time,'Reference value':str(ref_value),'Rule':rule,'Indicator Value':indicator_value,'Sub-AOI':AOI,'Y axis':yaxis,'Indicator Name':indicator_name,'Color Code':color,'Data Provider':provider,'AOI_ID':AOI_ID[k],'Update Frequency':update_freq})
      csv_file.close()
 
    #go to next iteration
    start_date += delta
 
  # Sorting the final file by time and city
  with open(os.path.join(PARENT_DATASET_DIR,filename),newline='') as csvfile:
    spamreader = csv.DictReader(csvfile, delimiter=",")
#    sortedlist = sorted(spamreader, key=lambda row:(row['Time'],row['City']), reverse=False)
    sortedlist = sorted(spamreader, key=lambda row:(row['Time'],row['City']), reverse=False)
  with open(os.path.join(PARENT_DATASET_DIR,filename), 'w') as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    for row in sortedlist:
        writer.writerow(row)    
  #print(mean_value)
  #numpy.savetxt(PARENT_DATASET_DIR+'/'+IMAGE_REF_NAME+".csv", mean_value, delimiter=",")
 

## Now is time to play with results

first try to figure out how the generated output is structured using the panda dataframe

In [None]:
#!pip install matplotlib
import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_csv("Outputs/N1_tri/20220708T032639/N1_trilateral_20220708T032639.csv")
df

And now that we understand how the data is structured we can print the time series for selected fields and cities

In [None]:
df = pd.read_csv("Outputs/N1_tri/20220708T032639/N1_trilateral_20220708T032639.csv", usecols=(3,10,11))
df

In [None]:
#plt.rcParams["figure.figsize"] = [7.00, 3.50]
#plt.rcParams["figure.autolayout"] = True
df_filtered = df.loc[df['City'] == 'Kuala Lumpur ']
df_filtered

In [None]:
fig = plt.figure(figsize=(20,10))
plt.title('Mean of NO2 timeseries data')
fig.autofmt_xdate()
plt.plot(df_filtered["Time"], df_filtered["Measurement Value"])
plt.show()