# Convert XML files to H5 Files

This notebook provides a workflow for converting quality-controlled polar XML AR mask files from NERSC to netcdf files. To get the data into the proper format, we needed to do the following:
 - determine which files were from the North or South poles and separate files into respective arctic and antarctic directories
 - convert from xml to h5
 - generate nc files from h5 files
 - update file attributes to include polar stereographic projection information

Scripts were originally created by Sol Kim and updated by Teagan King.

## Sort files

In [10]:
import os

In [2]:
# generate file lists for north and south specific files using glob.glob() on a directory at NERSC and copied here for convenience
# these files were originally located at /global/cfs/cdirs/ClimateNet/PolarQA/submitted2/tmq

tmq2_n_list = ["N-data-2000-12-27-02-03094874899.xml", "N-data-2001-01-15-02-0372525823.xml", "N-data-2001-02-14-02-0372525823.xml", "N-data-2001-02-20-02-02184581205.xml",
             "N-data-2001-03-12-02-02753605660.xml", "N-data-2001-03-12-02-0372525823.xml", "N-data-2001-03-17-02-0372525823.xml", "N-data-2001-03-17-02-03898750966.xml",
             "N-data-2001-03-25-02-03094874899.xml", "N-data-2001-05-26-02-0113675750.xml", "N-data-2001-05-31-02-03094874899.xml", "N-data-2001-06-30-02-01447631940.xml",
             "N-data-2001-06-30-02-0917268722.xml", "N-data-2001-07-14-02-0372525823.xml", "N-data-2001-07-16-02-02977264033.xml", "N-data-2001-07-23-02-02753605660.xml", 
             "N-data-2001-07-23-02-03898750966.xml", "N-data-2001-08-05-02-03898750966.xml", "N-data-2001-08-14-02-03145251737.xml", "N-data-2001-08-15-02-03898750966.xml", 
             "N-data-2001-08-21-02-0372525823.xml", "N-data-2001-08-25-02-02646146977.xml", "N-data-2001-08-25-02-02913429335.xml", "N-data-2001-08-25-02-03799364081.xml", 
             "N-data-2001-09-07-02-03898750966.xml", "N-data-2001-10-06-02-0113675750.xml", "N-data-2001-10-06-02-02184581205.xml", "N-data-2001-10-06-02-03898750966.xml", 
             "N-data-2001-11-04-02-02184581205.xml", "N-data-2001-11-08-02-03145251737.xml", "N-data-2001-11-12-02-02646146977.xml", "N-data-2001-11-22-02-03094874899.xml", 
             "N-data-2002-01-05-02-03094874899.xml", "N-data-2002-01-09-02-03856889053.xml", "N-data-2002-01-11-02-03898750966.xml", "N-data-2002-01-28-02-02977264033.xml", "N-data-2002-02-15-02-0372525823.xml", 
             "N-data-2002-02-17-02-0372525823.xml", "N-data-2002-03-25-02-01447631940.xml", "N-data-2002-03-25-02-02646146977.xml", "N-data-2002-05-11-02-02753605660.xml", 
             "N-data-2002-05-12-02-02184581205.xml", "N-data-2002-06-04-02-03016803755.xml", "N-data-2002-06-24-02-03898750966.xml", "N-data-2002-08-10-02-0113675750.xml",
             "N-data-2002-08-22-02-03094874899.xml"]

tmq2_s_list= ["S-data-2002-03-25-02-0470003927.xml", "S-data-2002-04-02-02-0505711942.xml", "S-data-2002-04-02-02-0939224074.xml",
             "S-data-2002-04-04-02-02942547425.xml", "S-data-2002-04-04-02-0324319733.xml", "S-data-2002-04-08-02-03426477074.xml", "S-data-2002-04-08-02-0704564821.xml",
             "S-data-2002-04-30-02-01835239565.xml", "S-data-2002-04-30-02-02648907035.xml", "S-data-2002-04-30-02-02814015909.xml", "S-data-2002-05-06-02-03923742650.xml",
             "S-data-2002-05-06-02-04074727173.xml", "S-data-2002-05-06-02-0630491900.xml", "S-data-2002-05-11-02-0939224074.xml", "S-data-2002-05-12-02-01905406187.xml",
             "S-data-2002-05-12-02-043085916.xml", "S-data-2002-05-12-02-0470003927.xml", "S-data-2002-05-12-02-0939224074.xml", "S-data-2002-05-17-02-01967369874.xml", 
             "S-data-2002-05-17-02-043085916.xml", "S-data-2002-05-22-02-01905406187.xml", "S-data-2002-05-22-02-02013637867.xml", "S-data-2002-06-04-02-02942547425.xml",
             "S-data-2002-06-05-02-0939224074.xml", "S-data-2002-06-24-02-0939224074.xml", "S-data-2002-07-04-02-0131147453.xml", "S-data-2002-07-04-02-03697330174.xml", 
             "S-data-2002-07-04-02-04074727173.xml", "S-data-2002-07-18-02-0514085223.xml", "S-data-2002-07-18-02-0939224074.xml", "S-data-2002-08-10-02-0732293059.xml",
             "S-data-2002-08-10-02-0939224074.xml", "S-data-2002-08-15-02-0505711942.xml", "S-data-2002-08-15-02-0939224074.xml", "S-data-2002-08-22-02-01835239565.xml",
             "S-data-2002-08-22-02-03697330174.xml", "S-data-2002-08-24-02-01344067532.xml", "S-data-2002-08-24-02-01990650995.xml", "S-data-2002-08-30-02-02431281742.xml",
             "S-data-2002-09-19-02-01835239565.xml", "S-data-2002-09-19-02-0704564821.xml", "S-data-2002-09-20-02-02942547425.xml", "S-data-2002-09-20-02-03426477074.xml",
             "S-data-2002-09-20-02-03923742650.xml", "S-data-2002-10-08-02-01512764679.xml", "S-data-2002-10-08-02-01835239565.xml", "S-data-2002-10-08-02-02942547425.xml",
             "S-data-2002-10-08-02-0939224074.xml", "S-data-2002-10-11-02-02942547425.xml", "S-data-2002-10-11-02-0796850682.xml", "S-data-2002-10-20-02-0939224074.xml",
             "S-data-2002-11-02-02-01835239565.xml", "S-data-2002-11-02-02-0324319733.xml", "S-data-2002-11-21-02-0470003927.xml", "S-data-2002-11-28-02-04074727173.xml",
             "S-data-2002-11-28-02-04207142763.xml", "S-data-2002-11-28-02-0470003927.xml", "S-data-2002-11-28-02-0514085223.xml", "S-data-2002-11-28-02-0732293059.xml",
             "S-data-2002-11-30-02-01425146659.xml", "S-data-2002-12-04-02-04074727173.xml", "S-data-2002-12-04-02-0937835384.xml", "S-data-2002-12-09-02-02648907035.xml",
             "S-data-2002-12-09-02-02942547425.xml", "S-data-2002-12-19-02-02431281742.xml", "S-data-2002-12-19-02-0732293059.xml", "S-data-2002-12-20-02-01506817224.xml",
             "S-data-2002-12-20-02-01940675572.xml", "S-data-2002-12-20-02-03923742650.xml", "S-data-2002-12-23-02-03024788634.xml", "S-data-2003-01-05-02-01878908234.xml",
             "S-data-2003-01-05-02-03426477074.xml", "S-data-2003-01-15-02-0796850682.xml", "S-data-2003-01-15-02-0939224074.xml", "S-data-2003-01-19-02-0732293059.xml",
             "S-data-2003-01-25-02-02030740844.xml", "S-data-2003-01-25-02-02521539565.xml", "S-data-2003-01-25-02-0470003927.xml", "S-data-2003-01-25-02-0732293059.xml", 
             "S-data-2003-01-30-02-0470003927.xml", "S-data-2003-01-30-02-0544895739.xml", "S-data-2003-02-05-02-01340552347.xml", "S-data-2003-02-05-02-0939224074.xml",
             "S-data-2003-02-09-02-0796850682.xml", "S-data-2003-02-14-02-01905406187.xml"]

In [13]:
# create list of files corresponding to naming conventions of final data that we'd like to move to N directories and relocate files by renaming

tmq2_n_list_no_prefix=[]
for file in tmq2_n_list:
    tmq2_n_list_no_prefix.append(file[2:])

for file in tmq2_n_list_no_prefix:
    try:
        os.rename("/glade/u/home/tking/work/cgnet/QA_xml/qa2/tmq/{}".format(file), "/glade/u/home/tking/work/cgnet/QA_xml/qa2/tmq/arctic/{}".format(file))
    except:
        # files with errors were not present
        print("error with {}".format(file))
        continue

error with data-2000-12-27-02-03094874899.xml
error with data-2001-01-15-02-0372525823.xml
error with data-2001-02-14-02-0372525823.xml
error with data-2001-02-20-02-02184581205.xml
error with data-2001-03-12-02-02753605660.xml
error with data-2001-03-12-02-0372525823.xml
error with data-2001-03-17-02-0372525823.xml
error with data-2001-03-17-02-03898750966.xml
error with data-2001-03-25-02-03094874899.xml
error with data-2001-05-26-02-0113675750.xml
error with data-2001-05-31-02-03094874899.xml
error with data-2001-06-30-02-01447631940.xml
error with data-2001-06-30-02-0917268722.xml
error with data-2001-07-14-02-0372525823.xml
error with data-2001-07-16-02-02977264033.xml
error with data-2001-07-23-02-02753605660.xml
error with data-2001-07-23-02-03898750966.xml
error with data-2001-08-05-02-03898750966.xml
error with data-2001-08-14-02-03145251737.xml
error with data-2001-08-15-02-03898750966.xml
error with data-2001-08-21-02-0372525823.xml
error with data-2001-08-25-02-02646146977.

In [14]:
# create list of files corresponding to naming conventions of final data that we'd like to move to S directories and relocate files by renaming

tmq2_s_list_no_prefix=[]
for file in tmq2_s_list:
    tmq2_s_list_no_prefix.append(file[2:])
tmq2_s_list_no_prefix

for file in tmq2_s_list_no_prefix:
    try:
        os.rename("/glade/u/home/tking/work/cgnet/QA_xml/qa2/tmq/{}".format(file), "/glade/u/home/tking/work/cgnet/QA_xml/qa2/tmq/antarctic/{}".format(file))
    except:
        print("error with {}".format(file))
        continue

error with data-2002-05-22-02-02013637867.xml
error with data-2002-06-24-02-0939224074.xml
error with data-2002-07-04-02-04074727173.xml
error with data-2002-08-10-02-0939224074.xml
error with data-2002-08-30-02-02431281742.xml
error with data-2002-12-19-02-0732293059.xml


In [15]:
# Used glob.glob() to compile list of all files for qa3
qa3_file_list = ['/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-11-02-02-02977264033.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-03-07-02-0505711942.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-21-02-01990650995.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-09-10-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-11-05-02-01512764679.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-25-02-02942547425.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-06-21-02-02431281742.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-05-06-02-02913429335.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-02-17-02-02753605660.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-02-22-02-0989323075.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-04-01-02-03799364081.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-09-02-01990650995.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-10-02-02942547425.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-06-21-02-03875805809.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-03-06-02-02648907035.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-14-02-0131147453.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-07-02-0505711942.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-21-02-03693647189.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-02-17-02-03426477074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-06-02-01506817224.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-09-23-02-01835239565.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-09-02-0470003927.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-12-20-02-01699530238.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-08-09-02-0113675750.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-06-21-02-0113675750.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-01-05-02-03145251737.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-09-02-03426477074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-09-23-02-0372525823.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-17-02-0939224074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-07-26-02-043085916.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-03-06-02-03697330174.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-17-02-02030740844.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-09-12-02-0732293059.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-05-25-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-10-12-02-02753605660.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-09-12-02-01905406187.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-07-08-02-0939224074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-21-02-0505711942.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-08-30-02-02753605660.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-03-07-02-02942547425.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-12-20-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-11-05-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-09-10-02-01905406187.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-22-02-01990650995.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-09-23-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-08-24-02-0372525823.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-04-09-02-02753605660.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-02-05-02-0113675750.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-04-02-01340552347.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-09-10-02-0113675750.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-29-02-01905406187.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-04-01-02-03997603443.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-06-21-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-22-02-02942547425.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-07-08-02-02633226745.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-17-02-03426477074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-12-20-02-02184581205.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-20-02-03426477074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-07-29-02-01506817224.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-07-02-0939224074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-07-08-02-0514085223.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-10-02-02633226745.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-01-02-01344067532.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-03-03-02-02646146977.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-02-22-02-01835239565.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-08-24-02-0989323075.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-12-20-02-0989323075.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-30-02-03024788634.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-25-02-01512764679.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-08-20-02-02753605660.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-07-10-02-0732293059.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-04-21-02-02646146977.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-20-02-01506817224.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-28-02-0939224074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-07-02-0732293059.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-12-09-02-02184581205.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-04-02-0470003927.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-09-20-02-0372525823.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-02-17-02-0732293059.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-06-02-0796850682.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-09-02-02633226745.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-09-10-02-0939224074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-10-14-02-02753605660.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-10-11-02-0989323075.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-20-02-01878908234.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-21-02-02030740844.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-01-30-02-01219130356.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-10-14-02-03094874899.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-21-02-089818669.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-10-07-02-02184581205.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-14-02-01967369874.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-01-02-01506817224.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-17-02-03923742650.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-12-20-02-01425146659.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-06-02-0939224074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-10-02-02651629372.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-09-12-02-01835239565.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-02-17-02-0505711942.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-24-02-01990650995.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-09-02-03697330174.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-06-02-02431281742.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-04-21-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-10-08-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-12-05-02-0131147453.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-14-02-01340552347.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-02-09-02-02184581205.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-02-17-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-21-02-0185954389.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-05-02-0514085223.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-20-02-03697330174.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-29-02-01835239565.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-25-02-01905406187.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-10-08-02-03799364081.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-01-02-02431281742.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-05-21-02-0470003927.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-08-24-02-02184581205.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2003-12-20-02-02753605660.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-07-10-02-01506817224.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-29-02-01344067532.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-17-02-01506817224.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-11-02-02-02913429335.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-07-26-02-0939224074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-12-05-02-01835239565.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-03-06-02-02814015909.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-06-02-03426477074.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-10-14-02-01675123447.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-11-05-02-03697330174.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/N-data-2002-12-04-02-03898750966.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-08-03-02-02942547425.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-04-22-02-01835239565.xml', '/global/cfs/cdirs/ClimateNet/PolarQA/submitted3/tmq/S-data-2003-02-14-02-0939224074.xml']

In [23]:
# split QA3 files into N/S and move to respective dirs
qa3_n_files = []
qa3_s_files = []
for file in qa3_file_list:
    file_short = file.split('/')[-1]
    if 'N-data' in file_short:
        qa3_n_files.append(file_short[2:])
    elif 'S-data' in file_short:
        qa3_s_files.append(file_short[2:])

for file in qa3_n_files:
    try:
        os.rename("/glade/u/home/tking/work/cgnet/QA_xml/qa3/tmq/{}".format(file), "/glade/u/home/tking/work/cgnet/QA_xml/qa3/tmq/arctic/{}".format(file))
    except:
        print("error with N file {}".format(file))

for file in qa3_s_files:
    try:
        os.rename("/glade/u/home/tking/work/cgnet/QA_xml/qa3/tmq/{}".format(file), "/glade/u/home/tking/work/cgnet/QA_xml/qa3/tmq/antarctic/{}".format(file))
    except:
        print("error with S file {}".format(file))

error with N file data-2003-05-06-02-02913429335.xml
error with N file data-2003-02-17-02-02753605660.xml
error with N file data-2003-02-22-02-0989323075.xml
error with N file data-2003-04-01-02-03799364081.xml
error with N file data-2003-08-09-02-0113675750.xml
error with N file data-2003-06-21-02-0113675750.xml
error with N file data-2003-09-23-02-0372525823.xml
error with N file data-2002-08-24-02-0372525823.xml
error with N file data-2003-04-09-02-02753605660.xml
error with N file data-2003-02-05-02-0113675750.xml
error with N file data-2003-03-03-02-02646146977.xml
error with N file data-2003-10-14-02-03094874899.xml
error with N file data-2003-02-17-02-03898750966.xml
error with N file data-2003-12-20-02-02753605660.xml
error with N file data-2002-12-04-02-03898750966.xml
error with N file data-2003-04-21-02-01990650995.xml
error with N file data-2003-08-09-02-01990650995.xml
error with N file data-2003-10-10-02-02942547425.xml
error with N file data-2003-05-06-02-01506817224.xml

#### Note: Qa1 is all antarctic, so no need to convert those files...

## Perform conversion

These scripts are all from Sol Kim.

In [24]:
import os
import shutil
import numpy as np
import h5py
import numpy as np
import PIL.Image
import PIL.ImageDraw
import xml.etree.ElementTree as ET
# first, get the list of xml names from a specified directory
# then, consume the list of xml filenames to get h5 filenames : counts of filenames
# iterate through the h5 filenames, copying h5 files over to a new directory. If count > 1,
# IMG_WIDTH = 1152

def fetch_xml_filenames(directory_path):
    result = []
    for file in os.listdir(directory_path):
        if file.endswith('.xml'):
            result.append(file)

    return result

# drops the id off the xml name
# assumes xml_name in form data-year-month-day-run-timestep
def xml_to_h5_filename(xml_name):
    xml_split = xml_name.split('-')
    print("Original Split", xml_split)
    last_group = xml_split[-1]
    xml_split[-1] = last_group[0]
    xml_split = swap_chey(xml_split)
    print(xml_split)
    return '-'.join(xml_split) + '.h5'

def swap_chey(xml_split):
    old_last_group = xml_split[-1]
    xml_split[-1] = xml_split[-2][-1]
    xml_split[-2] = xml_split[-2][0] + old_last_group[0]
    return xml_split

# returns a map of h5 filename to count of labels for that h5 file
def h5_file_counts(xml_names):
    h5_counts = {}
    h5_names = map(xml_to_h5_filename, xml_names)
    for name in h5_names:
        if name in h5_counts:
            h5_counts[name] += 1
        else:
            h5_counts[name] = 1

    return h5_counts

TC_EVENT = 'tc'
AR_EVENT = 'ar'

def get_polygons_from_XML(XML, event):
    """
    Takes in an XML file and returns the union of the mask of all polygons present in the XML file.
    :param XML: the XML formatted file
    :param event: name of the event to get masks, either TC_EVENT or AR_EVENT
    :return: a mask of all polygons on the image
    """
    tree = ET.parse(XML)
    root = tree.getroot()

    # first, retrieve the dimensions of the image
    num_rows = int(root.find('imagesize/nrows').text)
    print('num_rows',num_rows)
    num_cols = int(root.find('imagesize/ncols').text)
    print('num_cols',num_cols)

    # create the result array
    result = np.zeros((num_rows, num_cols), dtype=np.uint8)

    # iterate through each polygon
    objects = root.findall('object')
    objects = filter(lambda obj: obj.find('name').text[:2] == event and int(obj.find('deleted').text) == 0, objects)
    for object in objects:

        polygon = object.find('polygon')
        polygon_points = polygon.findall('pt')
        points_list = []

        # Collect all x, y coordinate pairs into a list
        for point in polygon_points:
            x = int(point.find('x').text)
            y = int(point.find('y').text)

            points_list.append(x)
            points_list.append(y)

        if len(points_list) <= 2:
            continue
        # flood fill polygon points, store in mask
        temp = np.zeros((num_rows, num_cols), dtype=np.uint8)
        temp_mask = PIL.Image.fromarray(temp)
        PIL.ImageDraw.Draw(temp_mask).polygon(xy=points_list, outline=1, fill=1)
        mask = np.array(temp_mask, dtype=int)

        # combine with result array
        result = np.where(result > 0, result, mask)

    return result

def create_new_h5s(h5_directory, save_directory, xml_directory):
    # keep track of the current counts
    # iterate through the xml names, copying the corresponding h5 file
    # add to the map
    h5_counts = {}
    xml_names = fetch_xml_filenames(xml_directory)
    for name in xml_names:
        h5_name = xml_to_h5_filename(name)
        save_h5_name = h5_name

        if h5_name in h5_counts:
            # for duplicate h5 files
            path_split = os.path.splitext(h5_name)
            save_h5_name = path_split[0] + '_' + str(h5_counts[h5_name]) + path_split[1]
            h5_counts[h5_name] += 1
        else:
            h5_counts[h5_name] = 1
        print(h5_name)

        orig_path = os.path.join(h5_directory, h5_name)
        save_path = os.path.join(save_directory, save_h5_name)
        # copy the h5 file
        # shutil.copy2(orig_path, save_path)

        # append the new information to the h5
        xml_full_path = os.path.join(xml_directory, name)
        # tc_masks = get_polygons_from_XML(xml_full_path, TC_EVENT)
        ar_masks = get_polygons_from_XML(xml_full_path, AR_EVENT)

        # need to flip
        # tc_masks = np.flipud(tc_masks)
        ar_masks = np.flipud(ar_masks)

        new_h5_handle = h5py.File(save_path, 'a')
        # new_h5_handle['tc_masks'] = tc_masks
        new_h5_handle['ar_masks'] = ar_masks


In [25]:
xml_dir = '/glade/work/tking/cgnet/QA_xml/qa2/tmq/antarctic'
save_dir = '/glade/work/tking/cgnet/QA_xml/h5/qa2/antarctic'
h5_dir = '/glade/work/tking/cgnet/QA_xml/h5/qa2/antarctic'

create_new_h5s(h5_dir, save_dir, xml_dir)

xml_dir = '/glade/work/tking/cgnet/QA_xml/qa2/tmq/arctic'
save_dir = '/glade/work/tking/cgnet/QA_xml/h5/qa2/arctic'
h5_dir = '/glade/work/tking/cgnet/QA_xml/h5/qa2/arctic'

create_new_h5s(h5_dir, save_dir, xml_dir)

xml_dir = '/glade/work/tking/cgnet/QA_xml/qa3/tmq/antarctic'
save_dir = '/glade/work/tking/cgnet/QA_xml/h5/qa3/antarctic'
h5_dir = '/glade/work/tking/cgnet/QA_xml/h5/qa3/antarctic'

create_new_h5s(h5_dir, save_dir, xml_dir)

xml_dir = '/glade/work/tking/cgnet/QA_xml/qa3/tmq/arctic'
save_dir = '/glade/work/tking/cgnet/QA_xml/h5/qa3/arctic'
h5_dir = '/glade/work/tking/cgnet/QA_xml/h5/qa3/arctic'

create_new_h5s(h5_dir, save_dir, xml_dir)

Original Split ['data', '2002', '09', '20', '02', '02942547425.xml']
['data', '2002', '09', '20', '00', '2']
data-2002-09-20-00-2.h5
num_rows 1152
num_cols 1152
Original Split ['data', '2002', '03', '25', '02', '0470003927.xml']
['data', '2002', '03', '25', '00', '2']
data-2002-03-25-00-2.h5
num_rows 1152
num_cols 1152
Original Split ['data', '2003', '01', '19', '02', '0732293059.xml']
['data', '2003', '01', '19', '00', '2']
data-2003-01-19-00-2.h5
num_rows 1152
num_cols 1152
Original Split ['data', '2002', '11', '28', '02', '0470003927.xml']
['data', '2002', '11', '28', '00', '2']
data-2002-11-28-00-2.h5
num_rows 1152
num_cols 1152
Original Split ['data', '2002', '10', '11', '02', '0796850682.xml']
['data', '2002', '10', '11', '00', '2']
data-2002-10-11-00-2.h5
num_rows 1152
num_cols 1152
Original Split ['data', '2002', '09', '19', '02', '0704564821.xml']
['data', '2002', '09', '19', '00', '2']
data-2002-09-19-00-2.h5
num_rows 1152
num_cols 1152
Original Split ['data', '2002', '08', '

### Convert h5 to netcdfs by running the following commands in each subdirectory:


In [27]:
# 'mkdir netcdfs'  
# 'module load nco'  
# "for file in *; do nccopy $file netcdfs/${file::-2}nc; done"   

first line above makes the dir where the .ncs will go  
second line loads the nco module  
third line takes the all h5 files in the directory and converts into .nc and copies them into the netcdfs directory  

If you want, you can combine all the netcdfs into one file with:  

In [29]:
# 'module load cdo'  
# 'cdo cat *.nc combined_file.nc'  

first line above loads cdo  
second line concatenates all the nc files into one.

### Add projection description:
South Pole Stereographic, bounding lat = 20S   
North Pole Stereographic bounding lat = 35N  

ncatted -O -a projection,global,c,c,"South Pole Stereographic, bounding lat = 20S" combined_file.nc combined_file_s.nc  

ncatted -O -a projection,global,c,c,"North Pole Stereographic, bounding lat = 35N" combined_file.nc combined_file_n.nc