# This Notebook explores the SCAR GeoMAP dataset released in 2019
## Cox S.C., Smith Lyttle B. and the GeoMAP team (2019). Lower Hutt, New Zealand. GNS Science. Release v.201907.
### [Data Available Here](https://data.gns.cri.nz/ata_geomap/index.html?content=/mapservice/Content/antarctica/www/index.html)
### Notebook by Sam Elkind

Initially, I'll look at the data in terms of polygon counts. This section will be focused on examining the data schema and frequency of values occurring within specific fields. This investigation will focus on finding inconsistencies in the data attribution, but also could stimulate some discussion regarding relationships between columns.

Next, I'll look at the data in terms of polygon area and data attribution. How much surface water has been mapped? How much till has been mapped? How much outcropping rock is of Jurassic age?

### Configure packages, paths, and load data

In [19]:
import os
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display
import pprint as pp
from tabulate import tabulate
import fiona

In [20]:
def plot_value_counts(field_name, values_to_plot, counts, counts_norm):
    fig, ax = plt.subplots(2, 1, figsize=(30,15))
    fig.tight_layout(pad=2.0)
    fig.subplots_adjust(top=.94)
    fig.suptitle(f"Frequency of {field_name} values", size=18)

    ax[0].set_title(field_name)
    ax[1].set_title(f"{field_name} normalized")
    for i, v in enumerate(counts[:values_to_plot]):
        ax[0].text(i - .5, v, str(v), color='black', fontweight='bold')
    for i, v in enumerate(counts_norm[:values_to_plot]):
        ax[1].text(i - .5, v, f"{str(v * 100)[:3]}%", color='black', fontweight='bold')
    ax[0].bar(counts.index[:values_to_plot], counts[:values_to_plot])
    ax[1].bar(counts_norm.index[:values_to_plot], counts_norm[:values_to_plot])

In [21]:
bib_path = f"{os.getcwd()}/../data/ATA_SCAR_GeoMAP_geology.gdb"

In [22]:
print(fiona.listlayers(bib_path))

['ATA_geological_units', 'ATA_faults', 'ATA_sources_poly', 'ATA_GeoMAP_qualityinformation']


In [23]:
data = gpd.read_file(bib_path, layer="ATA_sources_poly")

In [24]:
print(data.columns)

Index(['IDENTIFIER', 'TITLE', 'AUTHORS', 'PUBLICATION', 'INSTITUTION', 'SCALE',
       'YEAR', 'PUBTYPE', 'NATPROG', 'METADATA', 'RESSCALE', 'CAPTSCALE',
       'CAPTDATE', 'MODDATE', 'FEATUREID', 'SPEC_URI', 'SYMBOL',
       'Shape_Length', 'Shape_Area', 'geometry'],
      dtype='object')


In [33]:
print(data["NATPROG"].value_counts())

USA             155
NZ              136
UK               45
Japan            43
South Africa     28
Germany          28
Australia        21
Norway           19
GIGAMAP          15
Italy            14
USSR              2
GANOVEX           2
India             1
Canada            1
PNRA              1
Soviet Union      1
Russia            1
Korea             1
China             1
Name: NATPROG, dtype: int64


In [None]:
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=test&btnG=