# Cloudy gap-filling notes

## ML

### Algorithm
 - Need to learn more about tuning hyperparameters for SVM
 - Feature extraction/engineering. What are important characteristics of the features? How do tell which features are important and which can be removed?
 - Should I stratify sampling equally? 50/50 flood/nonflood? Or should the % of flood/nonflood pixels be preserved?
 - Is accuracy a good metric? It's probably fine when stratified sampling so the dataset is balanced (50/50 flood/nonflood) but if we wanted to search over clouds in real images where flooding is quite rare, might want to maximize another metric like recall or precision
     - Alternatives: balanced accuracy, recall, precision
 - http://bamos.github.io/2016/08/09/deep-completion/
  - https://datascience.stackexchange.com/questions/30930/accuracy-and-loss-dont-change-in-cnn-is-it-over-fitting
  - Try using ReLu instead of tanh in CNN
 
### CNN
 - Need to shuffling data before splitting into train/validation so it's 
    
### Prediction
 - Should remove GSW_MaxExtent because the classifier might just be predicting based on that binary variable.
 

### Edge cases
 - Does the machine do better with large gaps than many smaller gaps?
 
### Function
 - Should ideally just have one long function in GEE that can handle different cases - when we input just one event (feature) or multiple (feature collection). Can add a conditional, that way we don't have to repeat as much code.
 - Should we include both GSW perm and max extent? Definitely mask flooded with perm, but maybe the distExtent is catching some of the flooding? 
 - We do need to calculate features over watersheds because otherwise some of the distExtent values will be incorrect
     - Currently (not sure if this is because of calculating only for image extent) distExtent is returning huge impossible values of 2147483648
 - If memory becomes an issue, consider changing bands back to int once they are in array.
 
### Problems
 - Something is happening when I export the images that have been masked by clouds - The non-cloudy areas should be null or NA, but instead the export image contains all values in the clip region as if from the entire image.
     - One solution might be to update mask but make the masked values all null or NoData
     - Solution to add -999999 to pixels I want to be no data, then removing those when converting data to sparse matrix in Python
 - BQA/pixel_qa bands in L8: They are smaller than the other bands, leaving gaps on the edges that are filled with zeroes when I export the image. This messes up all the features, esp. the flooded layer because it is binary and zeroes have meaning
     - Posted on SE and Developer Forum.
     - BQA problem: https://code.earthengine.google.com/71c60311da2029fd647bf86f6595037d
     - pixel_qa is the correct size sometimes: https://code.earthengine.google.com/6393d7232b5ba4540245efa2c85c1242
         - But pixel_qa shows the same small extent problem when I use it in my sampling function. It's definitely pixel_qa band, because flooded turns out fine.
             - However, there is no problem when I mask a TOA image with a SR pixel_qa mask here, outside of the sampling function: https://code.earthengine.google.com/dd74749034d1d660514f098017173875. What is happening in the sampler function to make this issue appear?
     - Problem appears to be with the getLandsatImages function. See: https://code.earthengine.google.com/845fcf316b23e9b1ff7cc06f50513041. Flooded2 layer is used with just a regular ee.ImageCollection call to TOA, flooded1 uses the getLandsatImages function
     - Actually nevermind, when I set the date range, cloud cover, and bounds the same as the getLandsatImages call, both flooded1 and flooded2 are small. So, this appears to be a problem with some L8 images and not others? https://code.earthengine.google.com/8e44f6f71a544d98c0493d8d17bacec9
     - I think for now, masking the output image of features with the QA masks and then unmasking with -999999 allows me to sort out the issue in Python by removing these no data values.
 - Spectral bands and feature bands are different extents after export even when clipping to the same geometry
   - Solution: Reproject spectral bands to JRC GSW projection

     

# Production Notes

### GEE
 - Calculating topo features for entire dataset in server, then clipping, rather than calculating for underlying watersheds.
 
### Good Floods from GFD app
http://global-flood-database.appspot.com/


| DFO # | Date | Clear images? (L8) | Images (LANDSAT/LC08/C01/T1_TOA) |
| --- | --- | --- | --- |
| 4101 | 2013-10-20 | yes | LC08_027038_20131103, LC08_027039_20131103|
| 4115 | 2013-12-22 | yes  |LC08_021033_20131227,  LC08_021034_20131227 |
| 4230 | 2015-03-10 | no | |
| 4267 | 2015-06-18 | no | |
| 4314 | 2015-12-10 | yes (79 imgs) | |
| 4337 | 2016-03-08 | yes | LC08_026036_20160325, LC08_026038_20160325| 
| 4388 | 2016-08-14| no |  | 


### Notes from eScience presentation

looking at just one flood event and training on the recurrence

imblearn in python to help with imbalanced data (ask valentina)

inpainting for imputation (uses flows in image processing to fill values). maybe in sci-kit image?

knowing what features are important would be very useful for understanding if these flood risk variables are actually relevant. 

### Next steps
 - I think I'm going to find one really good flood event, and then train hard on that. 

### Estimating Uncertainty
- Examples that don't quite fit, but may glean some useful info from them
  - Kyle Dorman's ["Building a Bayesian Deep Learning Classifier"](https://github.com/kyle-dorman/bayesian-neural-network-blogpost)
  - Yarin Gal's [Concrete Dropout](https://github.com/yaringal/ConcreteDropout/blob/master/concrete-dropout-keras.ipynb)
  - hutec [UncertaintyNN](https://github.com/hutec/UncertaintyNN)
  - homaralex [MNIST example of MCD](https://github.com/homaralex/mc-dropout-mnist)
  - TF for R, [concrete dropout](https://blogs.rstudio.com/tensorflow/posts/2018-11-12-uncertainty_estimates_dropout/)
  

Note from Gal's [blog post](http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html), variance is calculated with an added "inverse precision term", tau. Noted also [here](https://towardsdatascience.com/adding-uncertainty-to-deep-learning-ecc2401f2013).


In [None]:
# Predictive mean is just np.mean(y), where y is the sampled outputs of the forward MC passes
# Predictive variance is just np.var(y) plus tau which is defined two ways below
def _tau_inv(keep_prob, N, l2=0.005, lambda_=0.00001):
    tau = keep_prob * l2 / (2. * N * lambda_)
    return 1. / tau


probs = []
for _ in xrange(T):
    probs += [model.output_probs(input_x)]
predictive_mean = numpy.mean(prob, axis=0)
predictive_variance = numpy.var(prob, axis=0)
tau = l**2 * (1 - model.p) / (2 * N * model.weight_decay)
predictive_variance += tau**-1

---

### Dealing with imbalanced dataset
#### Loss functions
- Possible to use a loss like weight cross-entropy which uses class weights (higher for minority class) to prevent model from minimizing loss by just prediction zeros.

#### Performance metrics
- Want to monitor recall/precision during training, but can't figure it out. Instead can just run trained model once on validation data after training, then use sklearn metrics.