# Regression model for snow depth in time lapse images

SnowEx Hackweek 2021 

*#cam_learning*

__Contributors:__ Marianne Cowherd, Danny Hogan, Katie Breen, Ching-ping Yu

### __Objectives:__

- Train a regression model for extracting snow depth from time-lapse imagery using supervised learning
- Evaluate model for accuracy 
- Test potential improvemnts (i.e. cropping images) and suggest ideas for next steps
- Learn ML!

### __Motivations:__

- 2020 SnowEx time-lapse imagery was labeled for snow-depth, but the process was time-consuming. 
- Automated methods exist using color thresholding and the Hough Transform, but background pixels add uncertainty.
- Computer vision may be able to detect the pole without including the background noise, and identify the snow depth information 
- 2017 SnowEx time-lapse has not been labeled, and a working ML model could be applied on these images

In [1]:
### insert moving video of all the camera images and snow depths underneath (Katie will add once she gets the code from Cassie)

### __Methods__

We will use the 2020 SnowEx timelapse from one camera (W1A) as our predictor and the corresponding snow depth measurements in the SnowEx SQL database as the response, to build a supervised model.

__1) Load packages for image and data table pre-processing, model development, and model evaluation__

In [2]:
# NOTE: this part of the tutorial uses additional libraries not in the default snowex jupyterhub
# mamba is a python package management alternative to conda and pip https://github.com/mamba-org/mamba
!mamba install -y -q tensorflow
!pip install opencv-python-headless 



In [3]:

#### Load packages for machine learning
import tensorflow as tf  # end-to-end open source platform for machine learning

# from tensorflow.keras.datasets import cifar10
# keras is python and uses tensorflow in the backend
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_percentage_error

#### Packages for image processing and computer vision 
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import cv2
import geopandas as gpd
import datetime as dt
from datetime import datetime

from sklearn.model_selection import train_test_split
from PIL import Image
from PIL import Image, ExifTags

### Packages for processing snow depth values using the SnowEx SQL database 
from snowexsql.db import get_table_attributes

import snowexsql.db
from snowexsql.data import PointData, SiteData
from snowexsql.conversions import query_to_geopandas
# Import the function to get connect to the db
from snowexsql.db import get_db


__2. After loading the packages, we will load in the images from the Amazon Web Services S3. We will focus on one camera, W1A.__ 

In [7]:
## Load in the images 
files = os.listdir('/tmp/camera-trap/W1A')
files =  ['/tmp/camera-trap/W1A/' + str(f) for f in files]

df = pd.DataFrame([],
                   columns=['date','photo_id','time','datetime','depth'])

for i in range(0,len(files)): 
    
    img = Image.open(files[i])
    exif = { ExifTags.TAGS[k]: v for k, v in img._getexif().items() if k in ExifTags.TAGS }
    exif['DateTime'] = datetime.strptime(exif['DateTime'],'%Y:%m:%d %H:%M:%S')
    df.loc[i]= [exif['DateTime'].date(),
                       files[i],
                       exif['DateTime'].time(),exif['DateTime'],np.nan]

__3. Read in the images using cv2's imread function, then downscale the images to 200 x 200 pixels. 'pixels' is a list of all the images represented as an RGB array.__

In [18]:
pixels = []      
for i in range(0, len(df)):
    # img = cv2.imread(str(path)+"/"+str(img))
    # src = Image.open(str(path)+"/"+str(img))
    path = df['photo_id'][i]
    src = cv2.imread(path, cv2.IMREAD_UNCHANGED)
    #calculate the 50 percent of original dimensions
    width =200 # int(src.shape[1] * scale_percent / 100)
    height = 200 # int(src.shape[0] * scale_percent / 100)
    # dsize
    dsize = (width, height)
    # resize image
    output = cv2.resize(src, dsize)
    cv2.imwrite('tmp.jpg',output) 
    # img1 = img.save('tmp', format='JPEG',dpi=(50,50))
    img2 = cv2.imread('tmp.jpg')
    img2 = cv2.cvtColor(img2,cv2.COLOR_BGR2RGB)
    pixels.append(np.array(img2))

pixels = np.array(pixels)
print(pixels.shape) 
#print(pixels)

(1, 200, 200, 3)


__4. We will not flatten the images into a dataframe with one row for each image and columns for all the RGB pixel values.__

In [17]:
dataset = pixels.reshape((659,-1))
dataset = np.concatenate((dataset, np.array(df['depth']).reshape((659,1))),axis=1)
dataset=pd.DataFrame(dataset)

FileNotFoundError: [Errno 2] No such file or directory: 'pixels_open.npy'

In [16]:
dataset = pixels.reshape((659,-1))
dataset = np.concatenate((dataset, np.array(df['depth']).reshape((659,1))),axis=1)
dataset=pd.DataFrame(dataset)

ValueError: cannot reshape array of size 120000 into shape (659,newaxis)

Note: This is a data table with XX number of rows (i.e. number of images) and XX columns! 

__4. Pull the snow depth values from the SnowEx SQL database__

In this case, we pulled all the data from the snow depth data from camera W1A.

In [None]:
# Use the function to see what columns are available to use. 
db_columns = get_table_attributes(PointData)

# Print out the results nicely
# print("These are the available columns in the table:\n \n* {}\n".format('\n* '.join(db_columns)))

# Grab the open site data from the db
open_site = 'W1A'
qry = session.query(PointData).filter(PointData.equipment.contains(open_site))
df_open = query_to_geopandas(qry,engine)


for i in range(len(df)):
    pivot = df['datetime'][i]
    items = df_veg['datetime']
    tmp = np.where(items==pivot)[0]
    if len(np.where(items==pivot)[0])>0:
        idx = tmp[0]
        df['depth'][i] = df_veg['value'][idx]

We will 

__5. We will do the same as before but this time with cropped images. We will compare our results between the full images and the cropped images.__

We will collapse the code to save space. 

In [26]:
cropped =  ['/data/cropped/W1A/' for f in files]


In [27]:
cropped

['/data/cropped/W1A/']