# This Notebook explores the SCAR GeoMAP dataset released in 2019
## Cox S.C., Smith Lyttle B. and the GeoMAP team (2019). Lower Hutt, New Zealand. GNS Science. Release v.201907.
### [Data Available Here](https://data.gns.cri.nz/ata_geomap/index.html?content=/mapservice/Content/antarctica/www/index.html)
### Notebook by Sam Elkind

Initially, I'll look at the data in terms of polygon counts. This section will be focused on examining the data schema and frequency of values occurring within specific fields. This investigation will focus on finding inconsistencies in the data attribution, but also could stimulate some discussion regarding relationships between columns.

Next, I'll look at the data in terms of polygon area and data attribution. How much surface water has been mapped? How much till has been mapped? How much outcropping rock is of Jurassic age?

### Configure packages, paths, and load data

In [1]:
import os
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display
import pprint as pp
from tabulate import tabulate

In [2]:
def plot_value_counts(field_name, values_to_plot, counts, counts_norm):
    fig, ax = plt.subplots(2, 1, figsize=(30,15))
    fig.tight_layout(pad=2.0)
    fig.subplots_adjust(top=.94)
    fig.suptitle(f"Frequency of {field_name} values", size=18)

    ax[0].set_title(field_name)
    ax[1].set_title(f"{field_name} normalized")
    for i, v in enumerate(counts[:values_to_plot]):
        ax[0].text(i - .5, v, str(v), color='black', fontweight='bold')
    for i, v in enumerate(counts_norm[:values_to_plot]):
        ax[1].text(i - .5, v, f"{str(v * 100)[:3]}%", color='black', fontweight='bold')
    ax[0].bar(counts.index[:values_to_plot], counts[:values_to_plot])
    ax[1].bar(counts_norm.index[:values_to_plot], counts_norm[:values_to_plot])

In [3]:
geol_path = f"{os.getcwd()}/data/ATA_SCAR_GeoMAP_geology.gdb"
print(geol_path)

/home/gt/geomap/data/ATA_SCAR_GeoMAP_geology.gdb


In [4]:
data = gpd.read_file(geol_path)

## Let's start by looking at the number of unique values for these two fields

In [5]:
display(data[["NAME", "DESCR"]].nunique())

NAME     666
DESCR    757
dtype: int64

## There are more descriptions than names, that kinda seems weird, I would expect to see a 1-1 relationship with these fields. Perhaps complexes with a varied lithology were given the same name value but different, more granular descriptions.

### Let's take a look at the unique pairs of values that occur.

In [6]:
unique_pairs = data[["NAME", "DESCR", "SOURCE"]].drop_duplicates(["NAME", "DESCR"])
unique_pairs["pair_id"] = range(len(unique_pairs.index))

In [7]:
display(unique_pairs)

Unnamed: 0,NAME,DESCR,SOURCE,pair_id
0,marine sedimentary and metasedimentary rocks (...,unfossiliferous low grade regional metamorphic...,Fleming & Thomson 1979_Graham Land,0
3,intermediate intrusive rocks (early Jurassic t...,intermediate intrusive rocks (early Jurassic t...,Fleming & Thomson 1979_Graham Land,1
5,Paleozoic-Triassic metamorphic rock,regionally metamorphosed rocks ranging from Pa...,Fleming & Thomson 1979_Graham Land,2
7,sedimentary rocks (Paleozic to mid-Jurassic),inferred sedimentary rocks and low-grade meta...,Burton-Johnson & Riley 2015,3
10,Antarctic Peninsula Volcanic Group,"calc-alkaline volcanic suite, lava flows predo...",Fleming & Thomson 1979_Graham Land,4
...,...,...,...,...
94142,Shaw-Clemence Complex,"aluminous gneisses, quartz feldspathic gneisse...","Mikhalsky et al. 2018, Geology of Mac Robertso...",797
94939,,younger till,Ishikawa et al. 2000. Geological map of Mount ...,798
94940,,older till,Ishikawa et al. 2000. Geological map of Mount ...,799
95112,,Orthopyroxene-biotite-quartz-plagioclase gneis...,Sheraton 1985. Geology of Enderby Land and Wes...,800


#### Looks like there are a lot of names that have different descriptions. Let's see how many pairs have "None"s in the name column 

In [8]:
null_names = unique_pairs[(unique_pairs["NAME"].isnull()) | (unique_pairs["NAME"] == " ") | (unique_pairs["NAME"] == "")]

In [9]:
display(null_names)

Unnamed: 0,NAME,DESCR,SOURCE,pair_id
222,,regionally metamorphosed rocks ranging from Ar...,Thomson & Harris 1979_Southern Graham Land,20
61780,,"Gabbro-diorite and melamonzogranite, coeval wi...",Pertusati et al. 2012,468
85559,,Orthopyroxene-quartz-feldspar gneiss (tonaliti...,Sheraton 1985. Geology of Enderby Land and Wes...,670
85560,,Layerd biotite-garnet-quartz-feldspar gneiss; ...,"AGSO, Bedrock Geology of the Bunger Hills-Denm...",671
85562,,Hornblende-clinopyroxene-orthopyroxene quartz ...,"AGSO, Bedrock Geology of the Bunger Hills-Denm...",672
...,...,...,...,...
93976,,Bt and Hb-Bt granite plutons,"Mikhalsky etal 2001, Prince Charles Mountains",795
94939,,younger till,Ishikawa et al. 2000. Geological map of Mount ...,798
94940,,older till,Ishikawa et al. 2000. Geological map of Mount ...,799
95112,,Orthopyroxene-biotite-quartz-plagioclase gneis...,Sheraton 1985. Geology of Enderby Land and Wes...,800


In [10]:
print(f"{null_names.shape[0]} unique pairs have a NAME without a value, but a description with a value")
print(f"{data[data['NAME'].isnull()].shape[0]} polygons have a NAME without a value. Let's get a list of the unique sources for these polygons so we can check them if needed")

65 unique pairs have a NAME without a value, but a description with a value
5119 polygons have a NAME without a value. Let's get a list of the unique sources for these polygons so we can check them if needed


In [11]:
null_name_sources = data[(data['NAME'].isnull()) | (data["NAME"] == " ") | (data["NAME"] == "")][["SOURCECODE", "MAPSYMBOL", "NAME", "SOURCE", "DESCR"]]

In [12]:
display(null_name_sources.drop_duplicates(["SOURCECODE","SOURCE"]))

Unnamed: 0,SOURCECODE,MAPSYMBOL,NAME,SOURCE,DESCR
222,m,?n,,Thomson & Harris 1979_Southern Graham Land,regionally metamorphosed rocks ranging from Ar...
14398,m,?n,,Thomson et al. 1982 North Palmer Land,regionally metamorphosed rocks ranging from Ar...
20274,m,?n,,Burton-Johnson & Riley 2015,regionally metamorphosed rocks ranging from Ar...
61780,GHgra,EOd,,Pertusati et al. 2012,"Gabbro-diorite and melamonzogranite, coeval wi..."
85559,Pp,Rzn,,Sheraton 1985. Geology of Enderby Land and Wes...,Orthopyroxene-quartz-feldspar gneiss (tonaliti...
...,...,...,...,...,...
93976,AR-PPg1,ALg,,"Mikhalsky etal 2001, Prince Charles Mountains",Bt and Hb-Bt granite plutons
94939,Ty,Czs,,Ishikawa et al. 2000. Geological map of Mount ...,younger till
94940,To,Czs,,Ishikawa et al. 2000. Geological map of Mount ...,older till
95112,Ppp,Rzn,,Sheraton 1985. Geology of Enderby Land and Wes...,Orthopyroxene-biotite-quartz-plagioclase gneis...


### A significant number of polys have no name but have a source code. Do all of these source codes lack a name?

## Let's move on to see which names have multiple descriptions

In [13]:
# We want a sorted list of all the NAME values for which there are multiple descriptions. They are sorted by the number of different descriptions
name_descr_sets = sorted([(i, data[data["NAME"] == i][["SOURCECODE", "NAME", "DESCR", "SOURCE"]].drop_duplicates(['DESCR']).sort_values(["DESCR"])) for i in data["NAME"].unique() if len(data[data["NAME"] == i]["DESCR"].unique()) > 1], key=lambda x: len(x[1]), reverse=True)

In [14]:
print(f"{len(name_descr_sets)} names have more than one description")
for i, j in name_descr_sets:
    display(j)

35 names have more than one description


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
41977,Pb_LeM8b2,Marie Byrd Land Volcanics: basalt,Alkali basalt & hawaiite,LeMasurier 2013 (Fig.8B)
42233,Pb_LeM2c1,Marie Byrd Land Volcanics: basalt,Basalt flows and basaltic hydrovolcanic rocks,LeMasurier 2013 (Fig.2C)
42094,Pb_LM84a,Marie Byrd Land Volcanics: basalt,Basalt tuff cone,LeMasurier 1984
41839,Pb_Hart97a,Marie Byrd Land Volcanics: basalt,"Basalt, basaltic hyaloclastite, cinder cone, t...",Hart et al. 1997
42600,Pb_LMB16B.3a,Marie Byrd Land Volcanics: basalt,"Basalt, basaltic pyroclastics",LeMasurier & Thomson 1990 (Fig B.16B.3)
42536,Pb_LMB16D.1_ins,Marie Byrd Land Volcanics: basalt,"Basalt, hawaiite, basaltic hyaloclastite",LeMasurier & Thomson 1990 (Fig B.16D.1)_Inset
41857,Pb_LMB16D.1b,Marie Byrd Land Volcanics: basalt,Basaltic hyaloclastite,LeMasurier & Thomson 1990 (Fig B.16D.1)
41776,Pb_LMB14.2a,Marie Byrd Land Volcanics: basalt,Basanite,LeMasurier & Thomson 1990 (Fig B.14.2)
42164,Pb_LMB16A.1a,Marie Byrd Land Volcanics: basalt,Basanite flows and pyroclastics,LeMasurier & Thomson 1990 (Fig B.16.1a)
41783,Pb_LMC9.1a,Marie Byrd Land Volcanics: basalt,"Basanite, hawaiite, tephrite",LeMasurier & Thomson 1990 (Fig C.9.1)


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
61211,Mev_WoF1a,Melbourne volcanic province,"Alkali basalt, basanite, hawaiite",Wörner et al. 1989 Fig1
73799,Mev_WVF14a,Melbourne volcanic province,"Alkali basalt, basanite, tephrite",Wörner & Viereck 1989 Fig14
63621,Mev_LMA61c,Melbourne volcanic province,Basanite,LeMasurier & Thomson 1990 (Fig A.6.1)
64114,Mev_KF912c,Melbourne volcanic province,"Basanite, hawaiite",Kyle 1982
62002,Mev_WoF1c,Melbourne volcanic province,"Mugearite, benmoreite, trachyandesite",Wörner et al. 1989 Fig1
71469,Mev_KF912c,Melbourne volcanic province,Peralkaline rhyolite,LeMasurier & Thomson 1990 (Fig A.6.1)
60905,Mev_LMA61b,Melbourne volcanic province,"Peralkaline trachyte, quartz-trachyte, peralka...",LeMasurier & Thomson 1990 (Fig A.6.1)
61140,Mev_KF912b,Melbourne volcanic province,Phonolite,Kyle 1982
62049,Mev_WoF1b,Melbourne volcanic province,Trachyte,Wörner et al. 1989 Fig1
63340,Mev_KF912a,Melbourne volcanic province,Trachyte with tristanite and trachyandesite,Kyle 1982


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
57396,ggk,late granitoid,Fine equigranular hornblende clinopyroxene gra...,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
55116,ggr,late granitoid,Fine homogeneous equigranular leucocratic biot...,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
55342,gq,late granitoid,Hornblende-biotite-alkali feldspar quartz monz...,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
54573,gg,late granitoid,"Late-stage granitoids, homogeneous and massive...","Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
57029,ggm,late granitoid,Porphyritic hornblende biotite quartz monzodio...,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
44248,gg,late granitoid,late-stage unfoliated muscovite-biotite granit...,Goodge et al. 1993


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
63470,Hv_Cl301_3,Hallett volcanic province,Mugearite,"Hornig & Wörner, 2003"
60975,Hv_LMA31a,Hallett volcanic province,"Predominantly basanite, basalt and hawaiite",LeMasurier & Thomson 1990 (Fig A.3.1)
63426,Hv_LMA41b,Hallett volcanic province,"Predominantly mugearite, benmoreite, and trach...",LeMasurier & Thomson 1990 (Fig A.4.1)
61040,Hv_HaC14a,Hallett volcanic province,Trachyte,"Hamilton, 1972 (C-14)"
63359,Hv_Nar1a,Hallett volcanic province,"basanite, hawaiite, mugearite",Nardini et al. 2003


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
43686,Qti2,older ice sheet margin till,"Bouldery sandy till, locally matrix-rich and w...",Grindley & Laird 1969
43875,Qti3,older ice sheet margin till,"Poorly sorted bouldery sandy till, slightly we...",Joy et al. 2014
40463,Qti,older ice sheet margin till,"Till in moraines on margins of ice sheets, ice...",GeoMAP
43684,Qti,older ice sheet margin till,"Till in moraines on margins of ice sheets, ice...",Grindley & Laird 1969


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
41781,Pb_LeM033c,Marie Byrd Land Volcanics: trachyte,Trachyte,LeMasurier 2013 (Fig.3)
41815,Pb_LeM3b2,Marie Byrd Land Volcanics: trachyte,"Trachyte & associated comendite, benmoreite",LeMasurier 2013 (Fig.3B)
42255,Pb_LeM2c2,Marie Byrd Land Volcanics: trachyte,Trachyte and benmoreite flows,LeMasurier 2013 (Fig.2C)
41834,Pb_LMB14.2b,Marie Byrd Land Volcanics: trachyte,"Trachyte, trachytic tuffs, pantellerite",LeMasurier & Thomson 1990 (Fig B.14.2)


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
36960,G6a,Granitoid intrusions,Calc alkaline Trendall Crag granodiorite,Curtis 2011a
35923,G6b,Granitoid intrusions,"Cooper Island granophyre, cogenetic with thole...",Curtis 2011a
35808,G6c,Granitoid intrusions,"Variable composition including granodiorite, t...",Curtis 2011a


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
43662,Htl,Holocene local glacier till,Locally derived variably weathered till in mor...,Grindley & Laird 1969
40124,Htl,Holocene local glacier till,Locally-derived variably weathered till in mor...,GeoMAP
40788,Htl,Holocene local glacier till,Young locally derived glacial till in moraines...,GeoMAP


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
54441,g,undifferentiated granitoid,"Undifferentiated intrusions of gabbro, diorite...","Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
43072,COg,undifferentiated granitoid,"granodiorite and monzogranite, with granite, q...",Grindley & Laird 1969
43627,g,undifferentiated granitoid,massive to porphyritic granodiorites to monzog...,Skinner et al. 1976


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
55687,ff,Ferrar Dolerite,Massive and layered sills of dolerite up to at...,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
43406,ff,Ferrar Dolerite,Massive and layered sills of dolerite; dolerit...,Haskell et al. 1965b
60912,Fd,Ferrar Dolerite,"Tholeiitic dolerite sills and minor dykes, usu...",Carosi et al. 2012


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
45427,fk,Kirkpatrick Basalt,Tholeiitic basalt pillow lavas and lava flows ...,Barrett & Elliot 1973
60979,Jk,Kirkpatrick Basalt,"Tholeiitic subaerial lavas, a few metres up to...",Stump 1989
60932,Kb,Kirkpatrick Basalt,"Tholeitic subaerial lavas, a few metres up to ...",Capponi et al. 1999a


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
86279,MP2-3//NP11bv,Beaver Complex,Beaver Complex: undivided (1400-1020 Ma//1000-...,"Mikhalsky et al. 2018, Geology of Mac Robertso..."
86291,MP2-3//NP11bv2,Beaver Complex,predominantly Opx-Am-Bt tonalitic and granitic...,"Mikhalsky et al. 2018, Geology of Mac Robertso..."
86192,MP2-3//NP11bv1,Beaver Complex,"predominantly paragneiss and mica schists, min...","Mikhalsky et al. 2018, Geology of Mac Robertso..."


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
39753,Qtl,older local glacial till,Locally derived glacial till in moraines and r...,GeoMAP
43670,Qtl,older local glacial till,Locally derived glacial till in moraines and r...,Grindley & Laird 1969


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
43645,unk,unknown,"Unvisited exposed rock or cover deposits, natu...",Gunn & Warren 1962
40022,?,unknown,area known to be outcropping rock or cover dep...,GeoMAP


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
40797,Qc,colluvium and scree,Angular material forming talus: loose glacial ...,GeoMAP
43852,Quc,colluvium and scree,"Angular talus aprons, loose glacial material, ...",Grindley & Laird 1969


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
41367,Qt,undifferentiated till,Undifferentiated bouldery sandy till of uncert...,GeoMAP
46963,Qt,undifferentiated till,Undifferentiated till in moraines of uncertain...,GeoMAP


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
41522,Dvd_w,Ruppert Coast metavolcanics,Andesite dacite rhyolite tuff lava agglomerate...,Grindley et al. 1980
41861,Dvd_e,Ruppert Coast metavolcanics,"Andesite, dacite, rhyolite tuff, lava agglomer...",Wade 1969


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
42472,Pb_Pre90a,Marie Byrd Land Volcanics: undifferentiated,"Basanite, trachyandesite, trachyte",Prestvik et al. 1990
41775,Pb_LeM4b3,Marie Byrd Land Volcanics: undifferentiated,Marie Byrd Land Volcanics: undifferentiated,LeMasurier 2013 (Fig.4B)


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
42152,Pb_LeM7b2,Marie Byrd Land Volcanics: mugearite,Mugearite,LeMasurier 2013 (Fig.7B)
41782,Pb_LMB16D.1a,Marie Byrd Land Volcanics: mugearite,"Mugearite, benmoreite hyaloclastite",LeMasurier & Thomson 1990 (Fig B.16D.1)


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
41793,Sgd,Kohler Range granitoids,deformed and foliated granitoids,Pankhurst et al. 1998
41895,Ygd,Kohler Range granitoids,"diorite, granodiorite, monzogranite",Pankhurst et al. 1998


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
41836,Pb_LMB14.2e,Marie Byrd Land Volcanics: phonolite,Phonolite,LeMasurier & Thomson 1990 (Fig B.14.2)
42025,Pb_Pan4c,Marie Byrd Land Volcanics: phonolite,Phonolite and tephriphonolite,Panter et al. 1994


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
41837,Pb_LMB14.2c,Marie Byrd Land Volcanics: benmoreite,Benmoreite,LeMasurier & Thomson 1990 (Fig B.14.2)
42046,Pb_Pan4d,Marie Byrd Land Volcanics: benmoreite,Benmoreite and mugearite,Panter et al. 1994


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
59633,gfk,Kehle Pluton,Unfoliated homogeneous medium biotite granite...,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
43642,gfk,Kehle Pluton,unfoliated homogeneous medium biotite granite...,Richardson 2002


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
61770,Sf,Section Peak Formation,"Mainly fluviatile, cross-bedded, coarse- to me...",Pertusati et al. 2012
61267,JTfs,Section Peak Formation,Section Peak Formation and Ferrar Dolerite,Stump 1989


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
75777,Qk,lake deposits,"Loose lacustrine sand, silt, mud.",GeoMAP
73018,Qk,lake deposits,"Loose lacustrine sand, silt, mud; locally mixe...",Mayewski et al. 1979a


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
77020,1041,"Marble, calc-silicate and skarn","Marble, calc-silicate and skarn occur as layer...",Norwegian Polar Institute
81058,1083,"Marble, calc-silicate and skarn","Marble, dolomitic marble, calc-silicate rocks ...",Norwegian Polar Institute


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
80168,1089,Amphibolite,"Amphibolite, pre-dating all recognizable defor...",Norwegian Polar Institute
79090,1039,Amphibolite,Dark-coloured layers and lenses of amphibolite...,Norwegian Polar Institute


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
81569,1056,Hornblende gneiss,Hornblende gneiss in concordant hornblende-ric...,Norwegian Polar Institute
80593,1078,Hornblende gneiss,"Hornblende gneisses, commonly biotite. Interca...",Norwegian Polar Institute


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
80654,1075,Garnet-biotite gneiss,Garnet-biotite gneisses containing sillimanite...,Norwegian Polar Institute
83956,1046,Garnet-biotite gneiss,"Medium- to coarse-grained, granoblastic to lep...",Norwegian Polar Institute


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
82951,1077,Biotite-hornblende gneiss,"Biotite-hornblende gneisses, locally +/- garne...",Norwegian Polar Institute
83434,1055,Biotite-hornblende gneiss,"Well-layered biotite-hornblende gneiss, layers...",Norwegian Polar Institute


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
84866,1028,Granodiorite-diorite,"Melanocratic to mesocratic, fine- to medium-gr...",Norwegian Polar Institute
84069,1031,Granodiorite-diorite,"Melanocratic to mesocratic, fine- to medium-gr...",Norwegian Polar Institute


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
85931,MP3-NP1//NP12pr2,Prydz Complex,"Am-Bt, Opx-Am-Bt granitic-gneisses, mafic gran...","Mikhalsky et al. 2018, Geology of Mac Robertso..."
86293,MP3-NP1//NP12pr1,Prydz Complex,"aluminous gneiss and schists, migmatites (1100...","Mikhalsky et al. 2018, Geology of Mac Robertso..."


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
87672,PRP2,Lambert series (in part),migmatitic Bt and Gt-Bt-bearing granitic and t...,"Mikhalsky etal 2001, Prince Charles Mountains"
87673,PRP1,Lambert series (in part),migmatitic paragneiss and schist; minor marble...,"Mikhalsky etal 2001, Prince Charles Mountains"


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
90589,MP3-NP1//Egr2,Grove Complex,"Opx-Am-Bt orthogneisses, mafic granulites, ana...","Mikhalsky et al. 2018, Geology of Mac Robertso..."
88156,MP3-NP1//Egr1,Grove Complex,"aluminous gneisses, quartzites (1100<950 Ma//5...","Mikhalsky et al. 2018, Geology of Mac Robertso..."


Unnamed: 0,SOURCECODE,NAME,DESCR,SOURCE
94141,MP3-NP1//NP12sc2,Shaw-Clemence Complex,"Am-Bt, Opx-Am-Bt granite gneisses, mafic granu...","Mikhalsky et al. 2018, Geology of Mac Robertso..."
94142,MP3-NP1//NP12sc1,Shaw-Clemence Complex,"aluminous gneisses, quartz feldspathic gneisse...","Mikhalsky et al. 2018, Geology of Mac Robertso..."


#### Starting from the top...

In [15]:
def create_src_descr_combos_by_name(name, data):
    display_cols = ["SOURCECODE", "DESCR", "MAPSYMBOL", "NAME", "SOURCE"]
    unique_cols = ["SOURCECODE", "DESCR"]
    sort_cols = ["DESCR"]
    mask = data["NAME"] == name
    return data[mask][display_cols].drop_duplicates(unique_cols).sort_values(sort_cols)
    

In [16]:
index = 0
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Marie Byrd Land Volcanics: basalt':


Alkali basalt & hawaiite
Basalt flows and basaltic hydrovolcanic rocks
Basalt tuff cone
Basalt, basaltic hyaloclastite, cinder cone, tuff cone
Basalt, basaltic pyroclastics
Basalt, hawaiite, basaltic hyaloclastite
Basaltic hyaloclastite
Basanite
Basanite flows and pyroclastics
Basanite, hawaiite, tephrite
Basanite, tephritoid
Hawaiite
Tephrite, basanite

Marie Byrd Land Volcanics: basalt has:
13 unique descriptions
17 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
41977,Pb_LeM8b2,Alkali basalt & hawaiite,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier 2013 (Fig.8B)
42233,Pb_LeM2c1,Basalt flows and basaltic hydrovolcanic rocks,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier 2013 (Fig.2C)
42094,Pb_LM84a,Basalt tuff cone,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier 1984
41839,Pb_Hart97a,"Basalt, basaltic hyaloclastite, cinder cone, t...",Czb,Marie Byrd Land Volcanics: basalt,Hart et al. 1997
42600,Pb_LMB16B.3a,"Basalt, basaltic pyroclastics",Czb,Marie Byrd Land Volcanics: basalt,LeMasurier & Thomson 1990 (Fig B.16B.3)
42536,Pb_LMB16D.1_ins,"Basalt, hawaiite, basaltic hyaloclastite",Czb,Marie Byrd Land Volcanics: basalt,LeMasurier & Thomson 1990 (Fig B.16D.1)_Inset
41857,Pb_LMB16D.1b,Basaltic hyaloclastite,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier & Thomson 1990 (Fig B.16D.1)
42479,Pb_Kip14a,Basanite,Czb,Marie Byrd Land Volcanics: basalt,Kipf et al. 2014
42936,Pb_LeM6b1,Basanite,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier 2013 (Fig.6B)
41804,Pb_LeM4b1,Basanite,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier 2013 (Fig.4B)


Makes sense that this name has multiple descriptions. Let's see how many source codes correspond with this name

#### The is an example where the source codes are likely derived from sample labels in some sort of petrology study. The sources cite specific figures. Perhaps some sort of standardization could be done on the SOURCECODE field using a more generalized mapping source that could be cited alongside these sources. On the other hand, MAPSYMBOL already serves the purpose of describing the geology in a more general, standardized way.
### From another perspective, this is a case that demonstrates the value of the GeoMap project, we have successfully captured multiple levels of granularity of geological classification, captured within the data schema of source code, description, name, and map symbol. An even deeper description could likely be found for each of the source codes within the cited sources.

### Just for curiosity's sake, let's see how many names fall under the MBL volcanics category

In [17]:
mbl_volcanics_mask_string = "Marie Byrd Land Volcanics"
display_cols = ["SOURCECODE", "DESCR", "MAPSYMBOL", "NAME", "SOURCE"]
unique_cols = ["SOURCECODE", "DESCR"]
sort_cols = ["DESCR"]
mbl_volcanics_name_mask = data["NAME"].str.contains(mbl_volcanics_mask_string).any(level=0)
mbl_volcanics_names = data[mbl_volcanics_name_mask][display_cols].drop_duplicates(unique_cols).sort_values(sort_cols)

In [18]:
print(f"there are {len(mbl_volcanics_names)} unique (sourcecode, description) combinations that have the string '{mbl_volcanics_mask_string}' in the NAME ")
display(mbl_volcanics_names)

there are 51 unique (sourcecode, description) combinations that have the string 'Marie Byrd Land Volcanics' in the NAME 


Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
41977,Pb_LeM8b2,Alkali basalt & hawaiite,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier 2013 (Fig.8B)
40816,Pb,"Basalt and basanite as dikes, cinder cones and...",Czv,Marie Byrd Land Volcanics,Siddoway et al. unpublished mapping
42233,Pb_LeM2c1,Basalt flows and basaltic hydrovolcanic rocks,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier 2013 (Fig.2C)
42237,Pb_LeM2c1?,Basalt flows and basaltic hydrovolcanic rocks,Czb,Marie Byrd Land Volcanics: basalt inferred,LeMasurier 2013 (Fig.2C)
42094,Pb_LM84a,Basalt tuff cone,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier 1984
41839,Pb_Hart97a,"Basalt, basaltic hyaloclastite, cinder cone, t...",Czb,Marie Byrd Land Volcanics: basalt,Hart et al. 1997
42600,Pb_LMB16B.3a,"Basalt, basaltic pyroclastics",Czb,Marie Byrd Land Volcanics: basalt,LeMasurier & Thomson 1990 (Fig B.16B.3)
42536,Pb_LMB16D.1_ins,"Basalt, hawaiite, basaltic hyaloclastite",Czb,Marie Byrd Land Volcanics: basalt,LeMasurier & Thomson 1990 (Fig B.16D.1)_Inset
41857,Pb_LMB16D.1b,Basaltic hyaloclastite,Czb,Marie Byrd Land Volcanics: basalt,LeMasurier & Thomson 1990 (Fig B.16D.1)
42091,Pb_Pan4a,Basanite,Czb,Marie Byrd Land Volcanics: basalt,Panter et al. 1994


#### Let's skip all further instances of MBL volcanics. This looks like a well studied subject and sets of descriptions for a single name that fall under this category will likely be cases similar to mbl basalts

## Moving on to the next name with multiple descriptions

#### I suspect that this name will be a similar case to the mbl volcanics

In [19]:
index = 1
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Melbourne volcanic province':


Alkali basalt, basanite, hawaiite
Alkali basalt, basanite, tephrite
Basanite
Basanite, hawaiite
Mugearite, benmoreite, trachyandesite
Peralkaline rhyolite
Peralkaline trachyte, quartz-trachyte, peralkaline rhyolite
Phonolite
Trachyte
Trachyte with tristanite and trachyandesite
Variably differentiated alkali volcanics forming major and composite strato-volcanoes and other minor centres; alkali-basanite to trachyte-rhyolite

Melbourne volcanic province has:
11 unique descriptions
13 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
61211,Mev_WoF1a,"Alkali basalt, basanite, hawaiite",Czb,Melbourne volcanic province,Wörner et al. 1989 Fig1
73799,Mev_WVF14a,"Alkali basalt, basanite, tephrite",Czb,Melbourne volcanic province,Wörner & Viereck 1989 Fig14
63621,Mev_LMA61c,Basanite,Czb,Melbourne volcanic province,LeMasurier & Thomson 1990 (Fig A.6.1)
64114,Mev_KF912c,"Basanite, hawaiite",Czb,Melbourne volcanic province,Kyle 1982
62002,Mev_WoF1c,"Mugearite, benmoreite, trachyandesite",Cza,Melbourne volcanic province,Wörner et al. 1989 Fig1
71469,Mev_KF912c,Peralkaline rhyolite,Czf,Melbourne volcanic province,LeMasurier & Thomson 1990 (Fig A.6.1)
60905,Mev_LMA61b,"Peralkaline trachyte, quartz-trachyte, peralka...",Czf,Melbourne volcanic province,LeMasurier & Thomson 1990 (Fig A.6.1)
70567,Mev_KF912c,"Peralkaline trachyte, quartz-trachyte, peralka...",Czf,Melbourne volcanic province,LeMasurier & Thomson 1990 (Fig A.6.1)
61140,Mev_KF912b,Phonolite,Czf,Melbourne volcanic province,Kyle 1982
62049,Mev_WoF1b,Trachyte,Czf,Melbourne volcanic province,Wörner et al. 1989 Fig1


#### This is pretty much an identical case to the MBL basalts

### Next...

In [20]:
index = 2
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'late granitoid':


Fine equigranular hornblende clinopyroxene granodiorite at western Kukri Hills; pre-dates Vanda Dikes
Fine homogeneous equigranular leucocratic biotite granodiorite in dikes stocks and plugs
Hornblende-biotite-alkali feldspar quartz monzonite to granite in small stocks plugs and sills; locally porphyritic
Late-stage granitoids, homogeneous and massive to foliated; may include pegmatites and enclaves; postdate Vanda Dikes
Porphyritic hornblende biotite quartz monzodiorite, monzonite and quartz monzonite forming sills and plugs in Pearse Valley
late-stage unfoliated muscovite-biotite granite and biotite granite

late granitoid has:
6 unique descriptions
6 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
57396,ggk,Fine equigranular hornblende clinopyroxene gra...,Eg,late granitoid,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
55116,ggr,Fine homogeneous equigranular leucocratic biot...,Eg,late granitoid,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
55342,gq,Hornblende-biotite-alkali feldspar quartz monz...,Og,late granitoid,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
54573,gg,"Late-stage granitoids, homogeneous and massive...",EOg,late granitoid,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
57029,ggm,Porphyritic hornblende biotite quartz monzodio...,Eg,late granitoid,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
44248,gg,late-stage unfoliated muscovite-biotite granit...,EOg,late granitoid,Goodge et al. 1993


- This is the first case where there is a 1-1 relationship between descriptions and sourcecodes but not names

In [21]:
index = 3
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Hallett volcanic province':


Mugearite
Predominantly basanite, basalt and hawaiite
Predominantly mugearite, benmoreite, and trachyte 
Trachyte
basanite, hawaiite, mugearite

Hallett volcanic province has:
5 unique descriptions
11 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
63470,Hv_Cl301_3,Mugearite,Czb,Hallett volcanic province,"Hornig & Wörner, 2003"
60975,Hv_LMA31a,"Predominantly basanite, basalt and hawaiite",Czb,Hallett volcanic province,LeMasurier & Thomson 1990 (Fig A.3.1)
61016,Hv_LMA21a,"Predominantly basanite, basalt and hawaiite",Czb,Hallett volcanic province,LeMasurier & Thomson 1990 (Fig A.2.1)
61262,Hv_LMA41a,"Predominantly basanite, basalt and hawaiite",Czb,Hallett volcanic province,LeMasurier & Thomson 1990 (Fig A.4.1)
67610,Hv_LMA11a,"Predominantly basanite, basalt and hawaiite",Czb,Hallett volcanic province,LeMasurier & Thomson 1990 (Fig A.1.1)
63426,Hv_LMA41b,"Predominantly mugearite, benmoreite, and trach...",Czf,Hallett volcanic province,LeMasurier & Thomson 1990 (Fig A.4.1)
61040,Hv_HaC14a,Trachyte,Czf,Hallett volcanic province,"Hamilton, 1972 (C-14)"
63456,Hv_Cl002_3,Trachyte,Czf,Hallett volcanic province,"Hornig & Wörner, 2003"
63800,Hv_Nar1a,Trachyte,Czf,Hallett volcanic province,Nardini et al. 2003
73792,Hv_LMA11b,Trachyte,Czf,Hallett volcanic province,LeMasurier & Thomson 1990 (Fig A.1.1)


#### Looks like a case where each polygon has a sample-like label.

In [22]:
index = 4
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'older ice sheet margin till':


Bouldery sandy till, locally matrix-rich and water-laid; includes glaciolacustrine and glaciofluvial sediment
Poorly sorted bouldery sandy till, slightly weathered and modified in elevated position away from present ice sheet or glacier
Till in moraines on margins of ice sheets, ice shelves, or large glaciers occupying major valleys; commonly degraded or scree covered; multiple advances not always differentiated
Till in moraines on margins of ice sheets, ice shelves, or large glaciers occupying the major valleys: commonly degraded and covered by scree: multiple advances not always differentiated

older ice sheet margin till has:
4 unique descriptions
5 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
43686,Qti2,"Bouldery sandy till, locally matrix-rich and w...",Qs,older ice sheet margin till,Grindley & Laird 1969
43875,Qti3,"Poorly sorted bouldery sandy till, slightly we...",Qs,older ice sheet margin till,Joy et al. 2014
43885,Qti4,"Poorly sorted bouldery sandy till, slightly we...",Qs,older ice sheet margin till,Storey et al. 2010
40463,Qti,"Till in moraines on margins of ice sheets, ice...",Qs,older ice sheet margin till,GeoMAP
43684,Qti,"Till in moraines on margins of ice sheets, ice...",Qs,older ice sheet margin till,Grindley & Laird 1969


- There is impressive consistency in description contents between sources over multiple years.
-  Grindley & Laird, 1969 seem to be the original source.
- If the GeoMap sourced polys intend to have their description wording based off of G + L, 1969, the phrasing and punctuation have been changed a bit. This should probably be changed to be consistent.

In [23]:
index = 5
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Marie Byrd Land Volcanics: trachyte':


Trachyte
Trachyte & associated comendite, benmoreite
Trachyte and benmoreite flows
Trachyte, trachytic tuffs, pantellerite

Marie Byrd Land Volcanics: trachyte has:
4 unique descriptions
11 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
41781,Pb_LeM033c,Trachyte,Czf,Marie Byrd Land Volcanics: trachyte,LeMasurier 2013 (Fig.3)
41784,Pb_LeM7b3,Trachyte,Czf,Marie Byrd Land Volcanics: trachyte,LeMasurier 2013 (Fig.7B)
41829,Pb_Pan4b,Trachyte,Czf,Marie Byrd Land Volcanics: trachyte,Panter et al. 1994
41844,Pb_Hart97b,Trachyte,Czf,Marie Byrd Land Volcanics: trachyte,Hart et al. 1997
41955,Pb_LMB16D.1c,Trachyte,Czf,Marie Byrd Land Volcanics: trachyte,LeMasurier & Thomson 1990 (Fig B.16D.1)
42489,Pb_LMB12.3b,Trachyte,Czf,Marie Byrd Land Volcanics: trachyte,LeMasurier & Thomson 1990 (Fig B.12.3)
42495,Pb_LMB16b2,Trachyte,Czf,Marie Byrd Land Volcanics: trachyte,LeMasurier & Thomson 1990 (Fig B.16B.1)
42534,Pb_LeM6b2,Trachyte,Czf,Marie Byrd Land Volcanics: trachyte,LeMasurier 2013 (Fig.6B)
41815,Pb_LeM3b2,"Trachyte & associated comendite, benmoreite",Czf,Marie Byrd Land Volcanics: trachyte,LeMasurier 2013 (Fig.3B)
42255,Pb_LeM2c2,Trachyte and benmoreite flows,Czf,Marie Byrd Land Volcanics: trachyte,LeMasurier 2013 (Fig.2C)


### MBL volcanics... skipping

In [24]:
index = 6
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Granitoid intrusions':


Calc alkaline Trendall Crag granodiorite
Cooper Island granophyre, cogenetic with tholeiitic mafic rocks of Drygalski Fjord Complex 
Variable composition including granodiorite, tonalite, trondhjemite, and granophyre. Undifferentiated granitoids including mylonitic granodiorite within the Cooper Bay Shear Zone.

Granitoid intrusions has:
3 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
36960,G6a,Calc alkaline Trendall Crag granodiorite,Jg,Granitoid intrusions,Curtis 2011a
35923,G6b,"Cooper Island granophyre, cogenetic with thole...",Jg,Granitoid intrusions,Curtis 2011a
35808,G6c,"Variable composition including granodiorite, t...",Jg,Granitoid intrusions,Curtis 2011a


- This is another case where name is more generic than sourcecode and description. 
- I had suspected the descriptions to correspond strongly with the names. sourcecodes tend to be more closely determined by sourcecode.
- based on the descriptions alone, I was assuming that these polys would be from different regions, but a single, unified source makes me think otherwise

In [25]:
index = 7
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Holocene local glacier till':


Locally derived variably weathered till in moraines and rock glaciers, usually associated with alpine glaciers
Locally-derived variably weathered till in moraines and rock glaciers, usually associated with alpine glaciers
Young locally derived glacial till in moraines and rock glaciers: associated with existing alpine glaciers

Holocene local glacier till has:
3 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
43662,Htl,Locally derived variably weathered till in mor...,Hs,Holocene local glacier till,Grindley & Laird 1969
40124,Htl,Locally-derived variably weathered till in mor...,Hs,Holocene local glacier till,GeoMAP
40788,Htl,Young locally derived glacial till in moraines...,Hs,Holocene local glacier till,GeoMAP


### These three descriptions all say the same thing in different ways. This should be standardized

In [26]:
index = 8
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'undifferentiated granitoid':


Undifferentiated intrusions of gabbro, diorite, granodiorite and/or granite; potentially foliated and gneissic
granodiorite and monzogranite, with granite, quartz monzonite, tonalite, diorite, hornblende gabbro
massive to porphyritic granodiorites to monzogranites, granodiorite, granite or diorite

undifferentiated granitoid has:
3 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
54441,g,"Undifferentiated intrusions of gabbro, diorite...",NOg,undifferentiated granitoid,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
43072,COg,"granodiorite and monzogranite, with granite, q...",EOg,undifferentiated granitoid,Grindley & Laird 1969
43627,g,massive to porphyritic granodiorites to monzog...,EOg,undifferentiated granitoid,Skinner et al. 1976


#### All descriptions have a different source; no problems here.

In [27]:
index = 9
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Ferrar Dolerite':


Massive and layered sills of dolerite up to at least 500m thick; dolerite dikes and bosses
Massive and layered sills of dolerite; dolerite dikes and bosses
Tholeiitic dolerite sills and minor dykes, usually intruded in the sedimentary sequence of the Beacon Supergroup, immediately above the pre-Beacon peneplain.

Ferrar Dolerite has:
3 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
55687,ff,Massive and layered sills of dolerite up to at...,Jd,Ferrar Dolerite,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
43406,ff,Massive and layered sills of dolerite; dolerit...,Jd,Ferrar Dolerite,Haskell et al. 1965b
60912,Fd,"Tholeiitic dolerite sills and minor dykes, usu...",Jd,Ferrar Dolerite,Carosi et al. 2012


#### All descriptions have a different source; no problems here.

In [28]:
index = 10
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Kirkpatrick Basalt':


Tholeiitic basalt pillow lavas and lava flows and hyaloclastic basalt breccia; rare thin tuffaceous and sedimentary interbeds
Tholeiitic subaerial lavas, a few metres up to several ten metres thick, randomly separated by thinner sedimentary volcanogenic interlayers and pillow lavas
Tholeitic subaerial lavas, a few metres up to several ten metres thick, randomly separated by thinner sedimentary volcanogenic interlayers and pillow lavas

Kirkpatrick Basalt has:
3 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
45427,fk,Tholeiitic basalt pillow lavas and lava flows ...,Jb,Kirkpatrick Basalt,Barrett & Elliot 1973
60979,Jk,"Tholeiitic subaerial lavas, a few metres up to...",Jb,Kirkpatrick Basalt,Stump 1989
60932,Kb,"Tholeitic subaerial lavas, a few metres up to ...",Jb,Kirkpatrick Basalt,Capponi et al. 1999a


### The descriptions from Stump, 1989 and Capponi et al. 1999a descriptions are copies of each other. There's a typeo in 'Tholeiitic' in the Capponi et. al. 1999a description

In [29]:
index = 11
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Beaver Complex':


Beaver Complex: undivided (1400-1020 Ma//1000-900 Ma)
predominantly Opx-Am-Bt tonalitic and granitic gneisses, minor mafic granulites and paragneisses (1400-1020 Ma//1000-900 Ma)
predominantly paragneiss and mica schists, minor orthogneisses (1400-1020 Ma//1000-900 Ma)

Beaver Complex has:
3 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
86279,MP2-3//NP11bv,Beaver Complex: undivided (1400-1020 Ma//1000-...,MNx,Beaver Complex,"Mikhalsky et al. 2018, Geology of Mac Robertso..."
86291,MP2-3//NP11bv2,predominantly Opx-Am-Bt tonalitic and granitic...,MNn,Beaver Complex,"Mikhalsky et al. 2018, Geology of Mac Robertso..."
86192,MP2-3//NP11bv1,"predominantly paragneiss and mica schists, min...",MNn,Beaver Complex,"Mikhalsky et al. 2018, Geology of Mac Robertso..."


### Nothing to see here. all polys have the same source and they look like descriptions for individual samples

In [30]:
index = 12
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'older local glacial till':


Locally derived glacial till in moraines and rock glaciers, variably weathered, inferredto be derived from existing alpine glaciers
Locally derived glacial till in moraines and rock glaciers: variably weathered: inferred to be derived from existing alpine glaciers

older local glacial till has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
39753,Qtl,Locally derived glacial till in moraines and r...,Qs,older local glacial till,GeoMAP
43670,Qtl,Locally derived glacial till in moraines and r...,Qs,older local glacial till,Grindley & Laird 1969


- The GeoMAP description is missing a space between 'inferred' and 'to'.
- inconsistencies in punctuation between sources. This should probably be made consistent

In [31]:
index = 13
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'unknown':


Unvisited exposed rock or cover deposits, nature unknown
area known to be outcropping rock or cover deposits, exact nature of which is unknown

unknown has:
2 unique descriptions
4 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
43645,unk,"Unvisited exposed rock or cover deposits, natu...",?,unknown,Gunn & Warren 1962
40022,?,area known to be outcropping rock or cover dep...,?,unknown,GeoMAP
40800,unknown,area known to be outcropping rock or cover dep...,?,unknown,GeoMAP
54464,unk,area known to be outcropping rock or cover dep...,?,unknown,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."


### No problem here, the inconsistency in wording is due to different sources. However, Perhaps GeoMAP should adopt Gunn & Warren's wording just for the sake of standarization... Then again, geomap and cox et. al.'s wording is more succinct
#### I'm surprised that more sources didn't include a specific, generic unknown unit in their legends

In [32]:
index = 14
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'colluvium and scree':


Angular material forming talus: loose glacial material: scree and polygonal ground
Angular talus aprons, loose glacial material, scree and polygonal ground

colluvium and scree has:
2 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
40797,Qc,Angular material forming talus: loose glacial ...,Qs,colluvium and scree,GeoMAP
43852,Quc,"Angular talus aprons, loose glacial material, ...",Qs,colluvium and scree,Grindley & Laird 1969
54478,uc,"Angular talus aprons, loose glacial material, ...",Qs,colluvium and scree,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."


#### Same point as the unknowns. Should descriptions have consistency or brevity?

In [33]:
index = 15
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'undifferentiated till':


Undifferentiated bouldery sandy till of uncertain age and origin
Undifferentiated till in moraines of uncertain origin

undifferentiated till has:
2 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
41367,Qt,Undifferentiated bouldery sandy till of uncert...,Qs,undifferentiated till,GeoMAP
46963,Qt,Undifferentiated till in moraines of uncertain...,Qs,undifferentiated till,GeoMAP
54493,ut,Undifferentiated till in moraines of uncertain...,Qs,undifferentiated till,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."


- The undifferentiated bouldery sandy till of uncertain age and origin description is going to need a different source that GeoMAP if the description is going to remain distinct
- Wording in other GeoMAP description matches Cox et al.'s 

In [34]:
index = 16
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Ruppert Coast metavolcanics':


Andesite dacite rhyolite tuff lava agglomerate volcaniclastics
Andesite, dacite, rhyolite tuff, lava agglomerate, and volcaniclastics in isolated exposures of limited extent; metamorphosed to subgreenschist-greenschist facies

Ruppert Coast metavolcanics has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
41522,Dvd_w,Andesite dacite rhyolite tuff lava agglomerate...,Df,Ruppert Coast metavolcanics,Grindley et al. 1980
41861,Dvd_e,"Andesite, dacite, rhyolite tuff, lava agglomer...",Dv,Ruppert Coast metavolcanics,Wade 1969


- Nothing to see here. Two different sources for two descriptions

In [35]:
index = 17
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Marie Byrd Land Volcanics: undifferentiated':


Basanite, trachyandesite, trachyte
Marie Byrd Land Volcanics: undifferentiated

Marie Byrd Land Volcanics: undifferentiated has:
2 unique descriptions
7 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
42472,Pb_Pre90a,"Basanite, trachyandesite, trachyte",Czv,Marie Byrd Land Volcanics: undifferentiated,Prestvik et al. 1990
41775,Pb_LeM4b3,Marie Byrd Land Volcanics: undifferentiated,Czv,Marie Byrd Land Volcanics: undifferentiated,LeMasurier 2013 (Fig.4B)
41801,Pb_LeM3b3,Marie Byrd Land Volcanics: undifferentiated,Czv,Marie Byrd Land Volcanics: undifferentiated,LeMasurier 2013 (Fig.3B)
41853,Pb_LMB16D.1d,Marie Byrd Land Volcanics: undifferentiated,Czv,Marie Byrd Land Volcanics: undifferentiated,LeMasurier & Thomson 1990 (Fig B.16D.1)
41970,Pb_LMB14.2d,Marie Byrd Land Volcanics: undifferentiated,Czv,Marie Byrd Land Volcanics: undifferentiated,LeMasurier & Thomson 1990 (Fig B.14.2)
42095,Pb_LM84b,Marie Byrd Land Volcanics: undifferentiated,Czv,Marie Byrd Land Volcanics: undifferentiated,LeMasurier 1984
42938,Pb_LeM6b3,Marie Byrd Land Volcanics: undifferentiated,Czv,Marie Byrd Land Volcanics: undifferentiated,LeMasurier 2013 (Fig.6B)


- "Basanites, trachyandesites, trachyte" sounds pretty differentiated to me. Is the name correct or is the description correct?
- How many polygons have this name, description combination?

In [36]:
undiff_basanite_mbl_volcanics_mask = (data["NAME"] == "Marie Byrd Land Volcanics: undifferentiated") & (data["DESCR"] != "Marie Byrd Land Volcanics: undifferentiated")
undiff_basanite_mbl_volcanics_polys = data[undiff_basanite_mbl_volcanics_mask][["SOURCECODE", "DESCR", "NAME", "SOURCE"]]

In [37]:
display(len(undiff_basanite_mbl_volcanics_polys))

44

### There are 44 polys with this non-intuitive NAME, DESCR, combination. Is it correct? or do these polys need to be re-attributed

In [38]:
index = 18
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Marie Byrd Land Volcanics: mugearite':


Mugearite
Mugearite, benmoreite hyaloclastite

Marie Byrd Land Volcanics: mugearite has:
2 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
42152,Pb_LeM7b2,Mugearite,Czb,Marie Byrd Land Volcanics: mugearite,LeMasurier 2013 (Fig.7B)
42956,Pb_LMB12.3a,Mugearite,Czb,Marie Byrd Land Volcanics: mugearite,LeMasurier & Thomson 1990 (Fig B.12.3)
41782,Pb_LMB16D.1a,"Mugearite, benmoreite hyaloclastite",Czb,Marie Byrd Land Volcanics: mugearite,LeMasurier & Thomson 1990 (Fig B.16D.1)


- Nothing to see here. Two different sources for two descriptions

In [39]:
index = 19
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Kohler Range granitoids':


deformed and foliated granitoids
diorite, granodiorite, monzogranite

Kohler Range granitoids has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
41793,Sgd,deformed and foliated granitoids,Sp,Kohler Range granitoids,Pankhurst et al. 1998
41895,Ygd,"diorite, granodiorite, monzogranite",Yh,Kohler Range granitoids,Pankhurst et al. 1998


- Two different source codes for two descriptions. Perhaps a more specific name could be found in Pankhurst

In [40]:
index = 20
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Marie Byrd Land Volcanics: phonolite':


Phonolite
Phonolite and tephriphonolite

Marie Byrd Land Volcanics: phonolite has:
2 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
41836,Pb_LMB14.2e,Phonolite,Czf,Marie Byrd Land Volcanics: phonolite,LeMasurier & Thomson 1990 (Fig B.14.2)
42007,Pb_LeM033b,Phonolite,Czf,Marie Byrd Land Volcanics: phonolite,LeMasurier 2013 (Fig.3)
42025,Pb_Pan4c,Phonolite and tephriphonolite,Czf,Marie Byrd Land Volcanics: phonolite,Panter et al. 1994


- skipping... see mbl volcanics above

In [41]:
index = 21
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Marie Byrd Land Volcanics: benmoreite':


Benmoreite  
Benmoreite and mugearite

Marie Byrd Land Volcanics: benmoreite has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
41837,Pb_LMB14.2c,Benmoreite,Czf,Marie Byrd Land Volcanics: benmoreite,LeMasurier & Thomson 1990 (Fig B.14.2)
42046,Pb_Pan4d,Benmoreite and mugearite,Czf,Marie Byrd Land Volcanics: benmoreite,Panter et al. 1994


- skipping... see mbl volcanics above

In [42]:
index = 22
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Kehle Pluton':


Unfoliated  homogeneous medium biotite granite, north side Mulock Glacier. Cross-cut by late stage quartz-rich dikes
unfoliated  homogeneous medium biotite granite, north side Mulock Glacier. Cross-cut by late stage quartz-rich dikes

Kehle Pluton has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
59633,gfk,Unfoliated homogeneous medium biotite granite...,Eg,Kehle Pluton,"Cox, S.C.; Turnbull, I.M.; Isaac, M.J.; Townse..."
43642,gfk,unfoliated homogeneous medium biotite granite...,Eg,Kehle Pluton,Richardson 2002


- Inconsistent capitalization of the first letter here. Easy fix. 
- Looks like the first gap between words is a tab or 2 spaces.

In [43]:
index = 23
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Section Peak Formation':


Mainly fluviatile, cross-bedded, coarse- to medium-grained sandstone with a feldspathic to quartzose composition. Minor intercalations of conglomerate, black shale, carbonaceous or noncarbonaceous silty mudstone and minor coal, locally fossiliferous.
Section Peak Formation and Ferrar Dolerite

Section Peak Formation has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
61770,Sf,"Mainly fluviatile, cross-bedded, coarse- to me...",Ts,Section Peak Formation,Pertusati et al. 2012
61267,JTfs,Section Peak Formation and Ferrar Dolerite,TJs,Section Peak Formation,Stump 1989


- Nothing to see here. Two different sources for two descriptions

In [44]:
index = 24
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'lake deposits':


Loose lacustrine sand, silt, mud.
Loose lacustrine sand, silt, mud; locally mixed with till

lake deposits has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
75777,Qk,"Loose lacustrine sand, silt, mud.",Qs,lake deposits,GeoMAP
73018,Qk,"Loose lacustrine sand, silt, mud; locally mixe...",Qk,lake deposits,Mayewski et al. 1979a


- Nothing to see here. Two different sources for two descriptions
- The mapsymbol for 73018 be Qs. Qk is not a valid mapsymbol based upon the legend for sedimentary polys. Let's check to see how many polys have Qk as a mapsymbol and lake deposits as a name. (Qk would be quaternary alkaline ultrabasic intrusive igneous.)

In [45]:
qk_lake_mask = data["MAPSYMBOL"] == "Qk"
qk_lake_polys = data[qk_lake_mask]

In [46]:
display(qk_lake_polys)

Unnamed: 0,SOURCECODE,MAPSYMBOL,PLOTSYMBOL,NAME,DESCR,POLYGTYPE,MBREQUIV,FMNEQUIV,SBGRPEQUIV,GRPEQUIV,...,CAPTDATE,MODDATE,FEATUREID,SPEC_URI,SYMBOL,DATASET,REGION,Shape_Length,Shape_Area,geometry
73018,Qk,Qk,Qk,lake deposits,"Loose lacustrine sand, silt, mud; locally mixe...",moraine,,,,,...,2016-01-01T00:00:00,2018-06-06T00:00:00,ATA_geological_units_073022,http://www.opengis.net/def/nil/OGC/0/missing,,ATA_NVL_geological_units,East Antarctica,3183.949707,243888.701428,"MULTIPOLYGON (((628650.224 -1879163.456, 62851..."
74831,Qk,Qk,Qk,lake deposits,"Loose lacustrine sand, silt, mud; locally mixe...",moraine,,,,,...,2016-01-01T00:00:00,2018-06-06T00:00:00,ATA_geological_units_074835,http://www.opengis.net/def/nil/OGC/0/missing,,ATA_NVL_geological_units,East Antarctica,8181.599145,521312.388787,"MULTIPOLYGON (((615443.756 -1895375.700, 61552..."


### Only two polys. This'll be an easy fix

In [47]:
index = 25
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Marble, calc-silicate and skarn':


Marble, calc-silicate and skarn occur as layers and lenses in the country rocks. Calc-silicate rocks form pods and bands between the marble and adjacent gneisses, and layers and lenses intercalated in the biotite-homblende gneiss and amphibolite.
Marble, dolomitic marble, calc-silicate rocks and skarn, locally with wollastonite or grossular.

Marble, calc-silicate and skarn has:
2 unique descriptions
4 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
77020,1041,"Marble, calc-silicate and skarn occur as layer...",NEm,"Marble, calc-silicate and skarn",Norwegian Polar Institute
81601,1052,"Marble, calc-silicate and skarn occur as layer...",NEm,"Marble, calc-silicate and skarn",Norwegian Polar Institute
84844,1062,"Marble, calc-silicate and skarn occur as layer...",NEm,"Marble, calc-silicate and skarn",Norwegian Polar Institute
81058,1083,"Marble, dolomitic marble, calc-silicate rocks ...",Nm,"Marble, calc-silicate and skarn",Norwegian Polar Institute


- Nothing to see here

In [48]:
index = 26
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Amphibolite':


Amphibolite, pre-dating all recognizable deformation; occurring as isolated, apparently conformable boudins and lenses in surrounding gneisses, implying a supracrustal origin, whilst others form arrays suggesting that protolith was a crosscutting dyke. 
Dark-coloured layers and lenses of amphibolite, massive in places, up to a few tens of meters across, concordant or subconcordant with the country rocks.  

Amphibolite has:
2 unique descriptions
4 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
80168,1089,"Amphibolite, pre-dating all recognizable defor...",Mn,Amphibolite,Norwegian Polar Institute
79090,1039,Dark-coloured layers and lenses of amphibolite...,NEn,Amphibolite,Norwegian Polar Institute
81499,1058,Dark-coloured layers and lenses of amphibolite...,NEn,Amphibolite,Norwegian Polar Institute
84329,1050,Dark-coloured layers and lenses of amphibolite...,NEn,Amphibolite,Norwegian Polar Institute


- Nothing to see here

In [49]:
index = 27
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Hornblende gneiss':


Hornblende gneiss in concordant hornblende-rich layers, centimetres to tens of centimetres thick, with felsic bands. 
Hornblende gneisses, commonly biotite. Intercalated with minor pyroxene gneisses and mafic to ultramafic gneisses. 

Hornblende gneiss has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
81569,1056,Hornblende gneiss in concordant hornblende-ric...,NEn,Hornblende gneiss,Norwegian Polar Institute
80593,1078,"Hornblende gneisses, commonly biotite. Interca...",Nn,Hornblende gneiss,Norwegian Polar Institute


- nothing to see here 

In [50]:
index = 28
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Garnet-biotite gneiss':


Garnet-biotite gneisses containing sillimanite, cordierite, gedrite, relict kyanite and/or relict staurolite; locally K-feldspar prophyroblasts, intercalated with thin layers of pyroxene gneisses, hornblende gneisses and biotite-hornblende gneisses.
Medium- to coarse-grained, granoblastic to lepidoblastic, garnet-biotite gneiss includes pelitic, quartzofeldspathic and feldspathic rocks. In some places, retrograde metamorphic effects such as the embayment of garnet by secondary biotite are observed.

Garnet-biotite gneiss has:
2 unique descriptions
4 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
80654,1075,Garnet-biotite gneisses containing sillimanite...,Nn,Garnet-biotite gneiss,Norwegian Polar Institute
83956,1046,"Medium- to coarse-grained, granoblastic to lep...",NEn,Garnet-biotite gneiss,Norwegian Polar Institute
84515,1053,"Medium- to coarse-grained, granoblastic to lep...",NEn,Garnet-biotite gneiss,Norwegian Polar Institute
84677,1042,"Medium- to coarse-grained, granoblastic to lep...",NEn,Garnet-biotite gneiss,Norwegian Polar Institute


- nothing to see here 

In [51]:
index = 29
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Biotite-hornblende gneiss':


Biotite-hornblende gneisses, locally +/- garnet, +/- anthophyllite, +/- cummingtonite.
Well-layered biotite-hornblende gneiss, layers and lenses of garnet-biotite gneisses, calc-silicate rocks, and mafic granulite lenses.

Biotite-hornblende gneiss has:
2 unique descriptions
3 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
82951,1077,"Biotite-hornblende gneisses, locally +/- garne...",Nn,Biotite-hornblende gneiss,Norwegian Polar Institute
83434,1055,"Well-layered biotite-hornblende gneiss, layers...",NEn,Biotite-hornblende gneiss,Norwegian Polar Institute
83634,1049,"Well-layered biotite-hornblende gneiss, layers...",NEn,Biotite-hornblende gneiss,Norwegian Polar Institute


- nothing to see here 

In [52]:
index = 30
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Granodiorite-diorite':


Melanocratic to mesocratic, fine- to medium-grained, biotite, K-feldspar, plagioclase and quartz granodiorite-diorite, occurring as dykes ranging in width from several meters to ten meters. 
Melanocratic to mesocratic, fine- to medium-grained, biotite, K-feldspar, plagioclase, quartz granodiorite-diorite dikes ranging in width from several meters to ten meters

Granodiorite-diorite has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
84866,1028,"Melanocratic to mesocratic, fine- to medium-gr...",NEh,Granodiorite-diorite,Norwegian Polar Institute
84069,1031,"Melanocratic to mesocratic, fine- to medium-gr...",NEh,Granodiorite-diorite,Norwegian Polar Institute


- These two descriptions have identical content with different phrasing. This should likely be changed

In [53]:
index = 31
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Prydz Complex':


Am-Bt, Opx-Am-Bt granitic-gneisses, mafic granulites (1100-900 Ma//900-800 Ma)
aluminous gneiss and schists, migmatites (1100-900 Ma//900-800 Ma)

Prydz Complex has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
85931,MP3-NP1//NP12pr2,"Am-Bt, Opx-Am-Bt granitic-gneisses, mafic gran...",MNn,Prydz Complex,"Mikhalsky et al. 2018, Geology of Mac Robertso..."
86293,MP3-NP1//NP12pr1,"aluminous gneiss and schists, migmatites (1100...",MNn,Prydz Complex,"Mikhalsky et al. 2018, Geology of Mac Robertso..."


- Nothing really to see here.
- The descriptions seem to be pulled directly from sample-site descriptions.
- Perhaps they should be generalized from Mikhalsky et. al. 2018.

In [54]:
index = 32
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Lambert series (in part)':


migmatitic Bt and Gt-Bt-bearing granitic and tonalitic orthogneiss; minor amphibolite and paragneiss 
migmatitic paragneiss and schist; minor marble, calc-silicate rocks, and felsic orthogneiss 

Lambert series (in part) has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
87672,PRP2,migmatitic Bt and Gt-Bt-bearing granitic and t...,Mn,Lambert series (in part),"Mikhalsky etal 2001, Prince Charles Mountains"
87673,PRP1,migmatitic paragneiss and schist; minor marble...,Mn,Lambert series (in part),"Mikhalsky etal 2001, Prince Charles Mountains"


- Nothing really to see here

In [55]:
index = 33
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Grove Complex':


Opx-Am-Bt orthogneisses, mafic granulites, anatectites, granite-gneisses (1100<950 Ma//530 Ma)
aluminous gneisses, quartzites (1100<950 Ma//530 Ma)

Grove Complex has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
90589,MP3-NP1//Egr2,"Opx-Am-Bt orthogneisses, mafic granulites, ana...",MNn,Grove Complex,"Mikhalsky et al. 2018, Geology of Mac Robertso..."
88156,MP3-NP1//Egr1,"aluminous gneisses, quartzites (1100<950 Ma//5...",MNn,Grove Complex,"Mikhalsky et al. 2018, Geology of Mac Robertso..."


- Nothing really to see here

In [56]:
index = 34
print(f"Unique descriptions for name == '{name_descr_sets[index][0]}':\n\n")
for i in name_descr_sets[index][1]["DESCR"]:
    print(i)
unique_src_codes = create_src_descr_combos_by_name(name_descr_sets[index][0], data)
print(f"\n{name_descr_sets[index][0]} has:\n{len(name_descr_sets[index][1]['DESCR'])} unique descriptions\n{unique_src_codes.shape[0]} unique SOURCECODES\n")
display(unique_src_codes)

Unique descriptions for name == 'Shaw-Clemence Complex':


Am-Bt, Opx-Am-Bt granite gneisses, mafic granulites (1100-9000 Ma//900-800 Ma)
aluminous gneisses, quartz feldspathic gneisses, migmatites, conglomerates (1100-9000 Ma//900-800 Ma)

Shaw-Clemence Complex has:
2 unique descriptions
2 unique SOURCECODES



Unnamed: 0,SOURCECODE,DESCR,MAPSYMBOL,NAME,SOURCE
94141,MP3-NP1//NP12sc2,"Am-Bt, Opx-Am-Bt granite gneisses, mafic granu...",MNn,Shaw-Clemence Complex,"Mikhalsky et al. 2018, Geology of Mac Robertso..."
94142,MP3-NP1//NP12sc1,"aluminous gneisses, quartz feldspathic gneisse...",MNn,Shaw-Clemence Complex,"Mikhalsky et al. 2018, Geology of Mac Robertso..."


- Nothing really to see here

# That's it for all of the names with multiple descriptions
### From this investigation we've identified a few patterns
- Description seems to be coupled with sourcecode more closely than with name which was my original hypothesis
- There are several cases where polygons with a GeoMAP or Norwegian Polar Institute source could be standardized or changed to match an older source.
- There are only a few typeos to fix.
- Overall, there seems to be increasing granularity from map symbol to name to description to sourcecode. This should be documented and adherence to these levels of granularity should be investigated further
- Sometimes name is less granular than map symbol, but it depends (cases of this are likely predictable based upon stratrank of the polygon)
- The formatting of the NAME field may give clues about the level of granularity for the name. 
    - Single words are likely geologic names (amphibolite) that are very general. 
    - Two-word names may be named complexes that are more specific
    - longer names are likely even more granular.
    - This pattern is irrelevant for names that have a 1-1 pairing with a source code or description