<a href="https://colab.research.google.com/github/mickymags/curriculum_development_initiative/blob/main/notebooks/Section_A1_Data_Cleaning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In this notebook we will check if the spelling of regions and countries as they appear in some flood database match the spelling of those regions in the Food and Agriculture Organization's Global Administrative Unit Layers Database.

# Step 1: Import packages & data; Mount Google Drive

In [None]:
import ee
import geemap
import numpy as np
import pandas as pd
from google.colab import drive
import os
drive.mount('/content/drive/')
ee.Authenticate()
ee.Initialize(project = 'servir-sco-assets')

Mounted at /content/drive/


In [None]:
#pwd

In [None]:
#cd drive/My Drive

In [None]:
#os.mkdir('Flood_Intercomparison')

In [None]:
#cd Flood_Intercomparison

In [None]:
#os.mkdir('Case_Studies')

In [None]:
#cd Case_Studies

In [None]:
# Import Vector Data
gaul_countries = ee.FeatureCollection("FAO/GAUL_SIMPLIFIED_500m/2015/level0")
gaul_provinces = ee.FeatureCollection("FAO/GAUL/2015/level1")
gaul_districts = ee.FeatureCollection("FAO/GAUL_SIMPLIFIED_500m/2015/level2")

* Step 1: Bring up your CSV file containing flood information in a separate window.
* Step 2: Create three new columns, one called "adm0", one called "adm1", and one called "adm2". The columns must be spelled exactly this way, and are case-sensitive. This is because we will feed this CSV as an input to Section A to see which data we have available.
* Step 3: For each flood event in your CSV file, do the following:
* Step 4: Enter in the country of interest to the variable called my_country below

In [None]:
my_country = 'Guatemala'

In [None]:
gaul_country = gaul_countries.filter(ee.Filter.eq('ADM0_NAME', my_country))

# Step 4

In [None]:
country_size = gaul_country.size().getInfo()

if country_size == 0:
  print("Oh no! Country name is NOT spelled correctly. Follow the steps below until this \ncell prints that the country name is spelled correctly")
else:
  print("Yay! Country name is spelled correctly!. Copy and paste the text you entered in the \nmy_country variable into the corresponding adm0 column in your CSV")

Yay! Country name is spelled correctly!. Copy and paste the text you entered in the 
my_country variable into the corresponding adm0 column in your CSV


If you got the output that your country name is spelled correctly, copy and paste the country name into the adm0 column of your CSV, in the corresponding row. If you did not get this output, follow the steps below, and keep editing the spelling of the my_country variable until you get a result

In [None]:
gaul_countries.first().get('ADM0_NAME').getInfo()

'Montenegro'

In [None]:
'''
num_countries = gaul_countries.size().getInfo()

country_names = []

for j in range(num_countries):
  if j < 10:
    print(j)
    # Get feature of interest
    my_feature = ee.Feature(gaul_countries.toList(num_countries).get(j))
    my_country_name = my_feature.get('ADM0_NAME').getInfo()
    print(my_country_name)
'''

"\nnum_countries = gaul_countries.size().getInfo()\n\ncountry_names = []\n\nfor j in range(num_countries):\n  if j < 10:\n    print(j)\n    # Get feature of interest\n    my_feature = ee.Feature(gaul_countries.toList(num_countries).get(j))\n    my_country_name = my_feature.get('ADM0_NAME').getInfo()\n    print(my_country_name)\n"

In [None]:
#country_names_sorted = country_names.sort()

# Step 5

Now you may have some information in your CSV file that contains info as to which regions in this country may have flooded. The GAUL dataset has two different levels of administrative boundaries:

* Level 1 boundaries are the largest regions of a country. You can think of these as provinces or states
* Level 2 boundaries are the second largest regions of a country. You can think of these as districts or counties.

Your first task is to figure out whether the locations are level 1 or level 2 boundaries. If your locations are level 1 boundaries, contine here. If they are level 2 boundaries, skip to Step 6

In [None]:
gaul_lvl1_country = gaul_provinces.filter(ee.Filter.eq('ADM0_NAME', my_country))
my_province_pt1 = 'Petén'
gaul_lvl1_province = gaul_lvl1_country.filter(ee.Filter.eq('ADM1_NAME', my_province_pt1))

In [None]:
num_provinces = gaul_lvl1_country.size().getInfo()

In [None]:
gaul_lvl1_country.first().get('ADM1_NAME').getInfo()

'Guatemala'

In [None]:
for j in range(num_provinces):
  feat_of_int = ee.Feature(gaul_lvl1_country.toList(num_provinces).get(j))
  prov_name = feat_of_int.get('ADM1_NAME').getInfo()
  print(prov_name)

Guatemala
El Progreso
Sacatepéquez
Chimaltenango
Escuintla
Santa Rosa
Sololá
Totonicapán
Quetzaltenango
Suchitepéquez
Retalhuleu
San Marcos
Huehuetenango
Quiché
Baja Verapaz
Alta Verapaz
Petén
Jalapa
Jutiapa
Izabal
Zacapa
Chiquimula


# Step 6

In [None]:
gaul_lvl2_country = gaul_districts.filter(ee.Filter.eq('ADM0_NAME', my_country))
gaul_lvl2_province = gaul_lvl2_country.filter(ee.Filter.eq('ADM1_NAME', my_province_pt1))

num_districts = gaul_lvl2_province.size().getInfo()
num_districts

12

In [None]:
for h in range(num_districts):
  district_of_int = ee.Feature(gaul_lvl2_province.toList(num_provinces).get(h))
  district_name = district_of_int.get('ADM2_NAME').getInfo()
  print(district_name)

Melchor de Mencos
Flores
San José
San Andrés
La Libertad
San Benito
Santa Ana
Dolores
San Francisco
Sayaxché
Poptún
San Luis


In [None]:
country_centroid = gaul_lvl1_country.geometry().centroid().getInfo()
lon = country_centroid["coordinates"][0]
lat = country_centroid["coordinates"][1]

In [None]:
ds1 = 'Melchor de Mencos'
ds2 = 'Flores'
ds3 = 'San José'
ds4 = 'San Andrés'
ds5 = 'La Libertad'
ds6 = 'San Benito'
ds7 = 'Santa Ana'
ds8 = 'Dolores'
ds9 = 'San Francisco'

d1 = gaul_lvl2_province.filter(ee.Filter.eq('ADM2_NAME', ds1))
d2 = gaul_lvl2_province.filter(ee.Filter.eq('ADM2_NAME', ds2))
d3 = gaul_lvl2_province.filter(ee.Filter.eq('ADM2_NAME', ds3))
d4 = gaul_lvl2_province.filter(ee.Filter.eq('ADM2_NAME', ds4))
d5 = gaul_lvl2_province.filter(ee.Filter.eq('ADM2_NAME', ds5))
d6 = gaul_lvl2_province.filter(ee.Filter.eq('ADM2_NAME', ds6))
d7 = gaul_lvl2_province.filter(ee.Filter.eq('ADM2_NAME', ds7))
d8 = gaul_lvl2_province.filter(ee.Filter.eq('ADM2_NAME', ds8))
d9 = gaul_lvl2_province.filter(ee.Filter.eq('ADM2_NAME', ds9))

In [None]:
Map = geemap.Map(center = (lat, lon), zoom = 6)
#Map.addLayer(gaul_lvl1_country)
Map.addLayer(d1, {}, ds1)
Map.addLayer(d2, {}, ds2)
Map.addLayer(d3, {}, ds3)
Map.addLayer(d4, {}, ds4)
Map.addLayer(d5, {}, ds5)
Map.addLayer(d6, {}, ds6)
Map.addLayer(d7, {}, ds7)
Map.addLayer(d8, {}, ds8)
Map.addLayer(d9, {}, ds9)


Map.addLayerControl()
Map

Map(center=[15.692016609691956, -90.35759919897927], controls=(WidgetControl(options=['position', 'transparent…

In [None]:
adm2 = ['Melchor de Mencos,Flores,La Libertad,San Benito,Santa Ana,Dolores']