# Analysis

Template for Jupyter notebooks running Python.

Version 0.1.0 \| First Created July 12, 2023 \| Updated August 01, 2023

## Jupyter Notebook

This is an Jupyter Notebook document. For more details on using a Jupyter Notebook see <https://docs.jupyter.org/en/latest/>.

### Setting up a computational environment. 
Please see proceedure/environment/readme.md for detailed instructions for how to replicate the computational environment used in this study.



# Title of Study

### Authors

- First Name Last Name\*, email address, @githubname, ORCID link, affiliated institution(s)
- First Name Last Name, email address, @githubname, ORCID link, affiliated institution(s)

\* Corresponding author and creator



### Abstract

Write a brief abstract about your research project.

If the project is a reproduction or replication study, include a declaration of the study type with a full reference to the original study.
For example:

This study is a *replication* of:

> citation to prior study

A graphical abstract of the study could also be included as an image here.



### Study metadata

- `Key words`: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods.
- `Subject`: select from the [BePress Taxonomy](http://digitalcommons.bepress.com/cgi/viewcontent.cgi?article=1008&context=reference)
- `Date created`: date when project was started
- `Date modified`: date of most recent revision
- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study, e.g. EPSG:4326
- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
- `Funding Name`: name of funding for the project
- `Funding Title`: title of project grant
- `Award info URI`: web address for award information
- `Award number`: award number

#### Original study spatio-temporal metadata

- `Spatial Coverage`: extent of original study
- `Spatial Resolution`: resolution of original study
- `Spatial Reference System`: spatial reference system of original study
- `Temporal Coverage`: temporal extent of original study
- `Temporal Resolution`: temporal resolution of original study



## Study design

Describe how the study relates to prior literature, e.g. is it a **original study**, **meta-analysis study**, **reproduction study**, **reanalysis study**, or **replication study**?

Also describe the original study archetype, e.g. is it **observational**, **experimental**, **quasi-experimental**, or **exploratory**?

Enumerate specific **hypotheses** to be tested or **research questions** to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question.


## Proceedure

In [1]:
# Import the necessary packages
import geopandas as gpd
import pandas as pd
import geodatasets as gds
import yaml
import os
import numpy as np


In [29]:
# Write the YAML file with package dependencies
## code adapted from python land https://python.land/data-processing/python-yaml#What_is_YAML
requirements = """
- openpyxl
- pyyaml
"""
req = yaml.safe_load(requirements)
with open ('req.yaml', 'w') as file:
    yaml.dump(req, file)

print(open('req.yaml').read())

- openpyxl
- pyyaml



In [30]:
# move req file to envs folder 
os.replace("req.yaml", "../environment/req.yaml") 

In [31]:
path = os.path.abspath('req.yaml')
print(path)

C:\Users\gsokolow\Documents\GitHub\Flooding-and-Healthcare-2024\procedure\code\req.yaml


In [1]:
# Import modules, define directories *** NOT SURE ABOUT THIS
#from pyhere import here

# You can define your own shortcuts for file paths:
#path = {
    "dscr": here("data", "scratch"),
    "drpub": here("data", "raw", "public"),
    "drpriv": here("data", "raw", "private"),
    "ddpub": here("data", "derived", "public"),
    "ddpriv": here("data", "derived", "private"),
    "rfig": here("results", "figures"),
    "roth": here("results", "other"),
    "rtab": here("results", "tables"),
    "dmet": here("data", "metadata")
}

ModuleNotFoundError: No module named 'pyhere'

In [2]:
# Import the datasets
##in decoding the variable names, is it possible that the numbers refer to the response options on the form?
sa1 = pd.read_csv("../../data/derived/public/Individual_part1_totalNZ-wide_format_updated_16-7-20_adjusted_labels.csv",
                     usecols = ['Area_code_and_description', #uniqueid for sa1 
                                'Census_2018_usually_resident_population_count', #total pop
                                'Census_2018_Sex_1_Male_CURP', 'Census_2018_Sex_2_Female_CURP', 'Census_2018_Sex_Total_CURP', #sex
                                'Census_2018_median_age_CURP', 'Census_2018_Age_broad_groups_1_Under_15_years_CURP', 'Census_2018_Age_broad_groups_2_15_to_29_years_CURP', 
                                'Census_2018_Age_broad_groups_3_30_to_64_years_CURP', 'Census_2018_Age_broad_groups_4_65_years_and_over_CURP', 
                                'Census_2018_Age_broad_groups_Total_CURP', #age
                                'Census_2018_Ethnicity_grouped_total_responses_level_1_1_European_CURP',
                                'Census_2018_Ethnicity_grouped_total_responses_level_1_3_Pacific_Peoples_CURP', 'Census_2018_Ethnicity_grouped_total_responses_level_1_2_Maori_CURP', 'Census_2018_Ethnicity_grouped_total_responses_level_1_4_Asian_CURP',
                                'Census_2018_Ethnicity_grouped_total_responses_level_1_5_Middle_Eastern_Latin_American_African_CURP', 'Census_2018_Ethnicity_grouped_total_responses_level_1_6_Other_Ethnicity_CURP',
                                'Census_2018_Ethnicity_grouped_total_responses_level_2_61_New_Zealander_CURP', 'Census_2018_Ethnicity_grouped_total_responses_level_2_69_Other_Ethnicity_nec_CURP',
                                'Census_2018_Ethnicity_grouped_total_responses_Total_stated_CURP', 'Census_2018_Ethnicity_grouped_total_responses_level_1_9_Not_Elsewhere_Included_CURP',
                                'Census_2018_Ethnicity_grouped_total_responses_Total_CURP', #ethnicity,
                                'Census_2018_Maori_descent_01_Maori_descent_CURP', 'Census_2018_Maori_descent_02_No_Maori_descent_CURP', 'Census_2018_Maori_descent_04_Dont_know_CURP',
                                'Census_2018_Maori_descent_Total_stated_CURP', 'Census_2018_Maori_descent_99_Not_elsewhere_included_CURP', 'Census_2018_Maori_descent_Total_CURP'
                               ],
                 na_values = 'C') #replaces 'C' for confidential with NaN. 


In [22]:
# Make sure the data loaded in correctly. It did! YAY!
sa1.head()

Unnamed: 0,Area_code_and_description,Census_2018_usually_resident_population_count,Census_2018_Sex_1_Male_CURP,Census_2018_Sex_2_Female_CURP,Census_2018_Sex_Total_CURP,Census_2018_median_age_CURP,Census_2018_Age_broad_groups_1_Under_15_years_CURP,Census_2018_Age_broad_groups_2_15_to_29_years_CURP,Census_2018_Age_broad_groups_3_30_to_64_years_CURP,Census_2018_Age_broad_groups_4_65_years_and_over_CURP,...,Census_2018_Ethnicity_grouped_total_responses_level_2_69_Other_Ethnicity_nec_CURP,Census_2018_Ethnicity_grouped_total_responses_Total_stated_CURP,Census_2018_Ethnicity_grouped_total_responses_level_1_9_Not_Elsewhere_Included_CURP,Census_2018_Ethnicity_grouped_total_responses_Total_CURP,Census_2018_Maori_descent_01_Maori_descent_CURP,Census_2018_Maori_descent_02_No_Maori_descent_CURP,Census_2018_Maori_descent_04_Dont_know_CURP,Census_2018_Maori_descent_Total_stated_CURP,Census_2018_Maori_descent_99_Not_elsewhere_included_CURP,Census_2018_Maori_descent_Total_CURP
0,SA1 7000000,141,75.0,66.0,141,48.1,24.0,30.0,69.0,21.0,...,0.0,141.0,0.0,141,135.0,6.0,3.0,141.0,0.0,141
1,SA1 7000001,114,60.0,54.0,114,36.5,30.0,21.0,48.0,18.0,...,0.0,114.0,0.0,114,96.0,18.0,0.0,114.0,0.0,114
2,SA1 7000002,0,,,0,,,,,,...,,,,0,,,,,,0
3,SA1 7000003,225,120.0,105.0,225,30.5,57.0,54.0,75.0,36.0,...,0.0,225.0,0.0,225,210.0,15.0,0.0,225.0,0.0,225
4,SA1 7000004,138,69.0,66.0,138,52.2,24.0,15.0,69.0,30.0,...,0.0,138.0,0.0,138,102.0,30.0,3.0,138.0,0.0,138


## Generate metadata
Now, we're going to generate some information about the datasets we're using.

In [14]:
samin = pd.DataFrame(sa1.min(0).rename('minimum'))

In [15]:
samax = pd.DataFrame(sa1.max(0).rename('maximum'))

In [6]:
sa1.isna().sum()

Area_code_and_description                                                                               0
Census_2018_usually_resident_population_count                                                           0
Census_2018_Sex_1_Male_CURP                                                                           575
Census_2018_Sex_2_Female_CURP                                                                         575
Census_2018_Sex_Total_CURP                                                                              0
Census_2018_median_age_CURP                                                                           592
Census_2018_Age_broad_groups_1_Under_15_years_CURP                                                    620
Census_2018_Age_broad_groups_2_15_to_29_years_CURP                                                    620
Census_2018_Age_broad_groups_3_30_to_64_years_CURP                                                    614
Census_2018_Age_broad_groups_4_65_years_and_ov

In [10]:
freqna = ((sa1.isna().sum())/len(sa1)).rename('NaN frequency')

Area_code_and_description                                                                             0.000000
Census_2018_usually_resident_population_count                                                         0.000000
Census_2018_Sex_1_Male_CURP                                                                           0.017681
Census_2018_Sex_2_Female_CURP                                                                         0.017681
Census_2018_Sex_Total_CURP                                                                            0.000000
Census_2018_median_age_CURP                                                                           0.018204
Census_2018_Age_broad_groups_1_Under_15_years_CURP                                                    0.019065
Census_2018_Age_broad_groups_2_15_to_29_years_CURP                                                    0.019065
Census_2018_Age_broad_groups_3_30_to_64_years_CURP                                                    0.018880
C

In [28]:
# define each variable.
# this is an unusual step, but saves time because all of the information about each column in the dataset will appear below.
sa1_defs = {
    'Definition
    'Area_code_and_description': ['unique identifier for statistical area 1s'], #uniqueid for sa1 
    'Census_2018_usually_resident_population_count': ['number of usual residents (as opposed to population at the moment of census collection) in a given statistical area 1 in 2018'], #total pop
    'Census_2018_Sex_1_Male_CURP': [''],
    'Census_2018_Sex_2_Female_CURP':[],
    'Census_2018_Sex_Total_CURP':[], #sex
    'Census_2018_median_age_CURP':[],
    'Census_2018_Age_broad_groups_1_Under_15_years_CURP':[],
    'Census_2018_Age_broad_groups_2_15_to_29_years_CURP':[], 
    'Census_2018_Age_broad_groups_3_30_to_64_years_CURP':[],
    'Census_2018_Age_broad_groups_4_65_years_and_over_CURP':[],
    'Census_2018_Age_broad_groups_Total_CURP':[], #age
    'Census_2018_Ethnicity_grouped_total_responses_level_1_1_European_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_level_1_3_Pacific_Peoples_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_level_1_2_Maori_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_level_1_4_Asian_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_level_1_5_Middle_Eastern_Latin_American_African_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_level_1_6_Other_Ethnicity_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_level_2_61_New_Zealander_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_level_2_69_Other_Ethnicity_nec_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_Total_stated_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_level_1_9_Not_Elsewhere_Included_CURP':[],
    'Census_2018_Ethnicity_grouped_total_responses_Total_CURP':[], #ethnicity,
    'Census_2018_Maori_descent_01_Maori_descent_CURP':[],
    'Census_2018_Maori_descent_02_No_Maori_descent_CURP':[],
    'Census_2018_Maori_descent_04_Dont_know_CURP':[],
    'Census_2018_Maori_descent_Total_stated_CURP':[],
    'Census_2018_Maori_descent_99_Not_elsewhere_included_CURP':[],
    'Census_2018_Maori_descent_Total_CURP':[]
                                          }
sa1_defs = pd.DataFrame([sa1_defs])

In [42]:
# define each variable.
# this is an unusual step, but saves time because all of the information about each column in the dataset will appear below.
sa1_defs = {
    'Definition' :[
        'unique identifier for statistical area 1s', 
        'number of usual residents (as opposed to population at the moment of census collection) in a given statistical area 1 in 2018', #total pop
        'number of usual male residents in 2018',
        'number of usual female residents in 2018',
        'total number of usual residents for which data on sex was collected', #sex
        'median age of usual residents in 2018',
        'number of usual residents under 15 years old in 2018',
        'number of usual residents aged 15-29 in 2018', 
        'number of usual residents aged 30-64 in 2018',
        'number of usual residents aged 65 and older in 2018',
        'total number of usual residents placed in the 4 listed age groups in 2018', #age
        'total number of usual residents who self-identify as ethnically European (note that respondents may choose more than one ethnicity)',
        'total number of usual residents who self-identify ethnically as Pacific Peoples (note that respondents may choose more than one ethnicity)',
        'total number of usual residents who self-identify ethnically as Maori (note that respondents may choose more than one ethnicity)',
        'total number of usual residents who self-identify ethnically as Asian (note that respondents may choose more than one ethnicity)',
        'total number of usual residents who self-identify ethnically as Middle Eastern, Latin American, or African (note that respondents may choose more than one ethnicity)',
        'total number of usual residents who self-identify with another ethnicity not listed (note that respondents may choose more than one ethnicity)',
        'total number of usual residents who self-intentify ethnically as a New Zealander under ethnicities not listed (note that respondents may choose more than one ethnicity)',
        'total number of usual residents who self-identify ethnically with another ethnicity not listed under ethnicities not listed (note that respondents may choose more than one ethnicity. Note that this definition is interpered based on limited available information)',
        'total number of usual residents who self-identify ethnically with one or more of the listed groups',
        "total number of usual residents who responded 'Don't Know' or who's answer was refused, repeated, outside the scope, not stated, or unidentifiable to questions of ethnicity.",
        "total number of usual residents who self-identify ethnically with one or more of the listed groups or who responded 'Don't Know' or who's answer was refused, repeated, outside the scope, not stated, or unidentifiable to questions of ethnicity.", #ethnicity,
        'total number of usual residents who self-identify as being of Maori descent in 2018',
        'total number of usual residents who self-identify as not being of Maori descent in 2018',
        'total number of usual residents who self-identify as not knowing if they are of Maori descent in 2018',
        'total number of usual residents who self-identify as being of Maori descent, not of Maori descent, or not knowing if they are of Maori descent in 2018',
        'total number of usual residents who's response to questions about being of Maori descent were not stated or unidentifiable in 2018',
        'total number of usual residents who self ]
                                          }
sa1_defs = pd.DataFrame(sa1_defs)

In [36]:
#let's make a big table to display all the metadata
meta = pd.DataFrame(sa1.dtypes.rename('data type'))
#meta.insert(loc = 1, column = 'dtype', value = sa1.dtypes)
meta = meta.join(samin, rsuffix = 'min')
meta = meta.join(samax, rsuffix = 'max')
meta = meta.join(freqna, rsuffix = 'freq NA')
meta = meta.join(sa1_defs)

In [37]:
meta.head()

Unnamed: 0,data type,minimum,maximum,NaN frequency,Definition
Area_code_and_description,object,001 Far North District,Total NZ (Ward),0.0,
Census_2018_usually_resident_population_count,int64,0,4699755,0.0,
Census_2018_Sex_1_Male_CURP,float64,0.0,2319558.0,0.017681,
Census_2018_Sex_2_Female_CURP,float64,0.0,2380197.0,0.017681,
Census_2018_Sex_Total_CURP,int64,0,4699755,0.0,


In [43]:
sa1_defs.head()

Unnamed: 0,Definition
0,"[Area_code_and_description, unique identifier ..."
1,number of usual residents (as opposed to popul...
2,Census_2018_Sex_1_Male_CURP
3,Census_2018_Sex_2_Female_CURP
4,Census_2018_Sex_Total_CURP


### Analysis

Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions.
This section should explicitly define any spatial / statistical *models* and their *parameters*, including *grouping* criteria, *weighting* criteria, and *significance thresholds*.
Also explain any follow-up analyses or validations.



## Results

Describe how results are to be presented.



## Discussion

Describe how the results are to be interpreted *vis a vis* each hypothesis or research question.



## Integrity Statement

Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.
If a prior registration *does* exist, explain the rationale for revising the registration here.



# Acknowledgements

- `Funding Name`: name of funding for the project
- `Funding Title`: title of project grant
- `Award info URI`: web address for award information
- `Award number`: award number

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

## References