# EAS4940 Final Project - Predicting Surging Glaciers in Svalbard

Glaciers are massive bodies of ice that flow under their own weight across the earth they sit on, like a big pool of syrup spreading out on a pancake. A particular type of glacier, called a “surging” glacier, experience cyclic velocity changes. They may flow slowly for year or decades before suddenly accelerating to speeds as fast as 4-5 meters per day over a short time period of weeks to months, before returning to their slow flow speeds. These surges can lead to a sudden spike in ice mass loss and can also be extremely hazardous. For example, the Himalayas, surging glaciers can breach ice dams, causing glacier lake outburst floods which can be catastrophic for people and infrastructure downstream. You can see an example of surging glacier from Alaska in this timelapse video from the National Parks Service in Alaska: https://www.nps.gov/media/video/view.htm?id=F1010B88-4C85-4EBB-949E-582D2E498372. 

Glaciers surges are challenging to predict. In fact, even determining if a given glacier is likely to surge, let alone when and why a surge will occur, can be difficult. In this project, you will use a series of glacier features to try to predict which glaciers may be prone to surges. The dataset you will use was prepared and published as part of the following paper:

Bouchayer, C., Aiken, J. M., Thøgersen, K., Renard, F., & Schuler, T. V. (2022). A Machine learning framework to automate the classification of surge-type glaciers in Svalbard. Journal of Geophysical Research: Earth Surface, 127, e2022JF006597. https://doi.org/10.1029/2022JF006597 

These data come from Svalbard, an Arctic Archipelago that is part of Norway. Using a combination of field measurements, satellite data, and numerical models, the authors created a dataset that contains the following features, sampled along the centerline of every glacier in Svalbard roughly every 100 m. 

**rgiid**: glacier ID in the Randolph Glacier Inventory. This is a unique identifier for each glacier. While you will not want to use this feature to train your model, is useful for later analysis when you may want to identify which data points belong to a particular glacier.        
**name**: the glacier name (if it has one)         
**x**: the x coordinate of the data point in polar stereographic coordinates      
**y**: the y coordinate of the data point in polar stereographic coordinates        
**surge**: type of surging glacier from the Randolph Glacier Inventory. 0 = does not surge, 1 = surging glacier. These are your class labels.            
**bed_elev**: elevation of the bedrock under the glacier in meters          
**thickness**: glacier ice thickness in meters          
**surface_elev**: glacier surface elevation in meters      
**width**: width of the glacier in meters           
**runoff**: average annual runoff in millimeters of water equivalent          
**bed_slope**: slope of the bedrock under the glacier in degrees         
**surface_slope**: slope of the glacier surface in degrees          
**driving_stress**: the forces acting to drive glacier flow downhill due to gravity and pressure gradients in the ice in Pascals     
**WH**: the glacier width divided by its thickness (aspect ratio)         
**cmb**: climatic mass balance in millimeters of water equivalent per year   

You can access the data at:  
https://raw.githubusercontent.com/rtculberg/ml_in_eas/main/data/GlacierSurge_TrainTest.csv      
https://raw.githubusercontent.com/rtculberg/ml_in_eas/main/data/GlacierSurge_Unseen.csv  

Your job is to build and evaluate a binary classification model to categorize glaciers as surging or non-surging based on these 10 features: bed_elev, thickness, surface_elev, width, runoff, bed_slope, surface_slope, driving_stress, WH, and cmb.          

## [1] Prepare Your Data for Machine Learning
Use the data from GlacierSurge_TrainTest.csv to train and test your model. At the end of the project, you will apply your model to the glaciers in GlacierSurge_Unseen.csv to predict whether they are likely to surge or not. When preparing your data for your machine learning model, be sure to consider the presence of NaN values, how you implement the test-train split and whether you also need to normalize your input features, conduct PCA, etc based on features you find in the data. 

## [2] Select and Implement a Model
You may use any binary classification model that we discussed in class for this project – Logistic Regression, SVM, or Random Forest. Be sure to consider whether your chosen model has any hyperparameters and how you will tune them to create the best possible predictions. 

## [3] Evaluate Your Model
Choose and calculate appropriate evaluation metrics for your model. Be sure to consider best practices like cross-validation when designing your evaluation plan. 

## [4] Apply Your Model to Unseen Data
Use your model to predict whether the withheld glaciers that were categorized as surge = 1 (possible surging glaciers) are more likely to be surging or non-surging glaciers. Plot two maps: one showing all of the glacier data points from the training and testing data set in x-y space with the datapoints colored by their surge category, and one map showing the unseen glaciers and their predicted surge category. As we did in Problem Set #3, use the correlation between the final classifications for each data point and input features to assess feature importance for predicting surge type.