<b><center>Maxwell Lindsay</center></b>
<center>5243610</center>

# Step 1 Data loading
- Load the USGS Landsat 7 Top of Atmosphere Reflectance Tier 1 data over a region of your interest. FYI: if you want to know what the difference between the different Tiers is: https://www.usgs.gov/media/videos/landsat-collections-what-are-tiers

The geographic area selected is a 1 x 1 degree box centered around Washington, DC, USA. 

# Step 2 Naive annual composites

## Assignment

- Apply the map-reduce principle to convert the image collection of Step 1 into annual composites with different filtering & reducing conditions. Within this step, implement:
    - A: Different filtering conditions based cloud cover metadata:
        - a1: no filtering
        - a2: filtering on <50% cloud cover
        - a3: filtering in <20% cloud cover
    - B: Use the result of a2 and test the effect of masking the clouds & cloud-shadows using the BQA bitmask:
        - b0: without cloud masking
        - b1: with the clouds masked
        - b2: with clouds and shadow masked
    - C: For every step of A&B visualize different reducers for the RGB-bands: 
        - mean 
        - median 
        - count
        - 0.05 percentile
        - 0.95 percentile

These operations were applied in the Earth Engine Python API. The full results are contained in the attached Jupyter notebook. Some selected images are excerpted in the report below.

## Questions

- **Q1: What are the differences you observe between the different composites (e.g. number of scenes, cloud/noise/SLC artefacts, reflectance values and visual appearance) and explain their cause?**

The median, mean, 5% and 95% percentile reduction of the image collections outlined above where calculated using the Python API to Earth engine in Jupyter notebooks. For this section the annual composite for 2018 was chosen arbitrarily, but it can be easily adjusted in the code to compare annual composites for other years. 

One of the most noticeable patterns between the different composites is that mean composites tend to be whiter than median composite, indicating a higher reflectance of all visible bands. The mean reducer is much more sensitive to high outlier values than the median reducer. Because clouds and snow have much higher albedo than vegetation and other land cover, they reflect much more energy; therefore scenes with snow and clouds have raise the average value of some pixels by a large amount. Because a median value is much less sensitive to extremely high and extremely low values, we can see less effect from the clouds, though there is some influence. This can be seen when comparing the mean image:

![a1mean](output_figs\a1-mean-2018.jpg)

Vs the median image
 
![a1median](output_figs\a1-median-2018.jpg)

The 5th percentile reducer is not very sensitive to cloud cover because it primarily deals with the lower reflectance values in the stack, which come from vegetation and other land use. One downside of the 5th percentile reducer is the true color visualization appears darker than the median, and this lack of variation can make it harder to see differences between different areas of the true color composite. This can be at least partially compensated by changing the visualization parameters so that a lower reflectance is mapped to the top of the color range. The opposite is true of the 95% reducer which shows almost exclusively the high reflectance values caused by cloud cover, so true color images almost exclusively show clouds.

When comparing the results of a1 to a2 and a3, we can see that by filtering based on the amount of cloud cover, the reducers respond based on their respective sensitivity to cloud cover. The mean reducer and the p95 reducers are the most influenced by the cloud reflectance values, so they show a greater change when the number of cloudy scenes is reduced. The mean image has less cloud noise, and the p95 goes from being essentially all white to showing a lot more of the land cover reflectance, especially when using the a3 image collection, as seen below:

![a3-p95](output_figs\a3-p95-2018.jpg)

One notable thing is that at the edges of the image, we see some SLC artifacts. This is due to how the images were selected. The image collection was filtered for images that have pixels within a certain distance of the Washington DC. Therefore there are some areas of this composite that are from overlapping scenes and some that do not overlap. Where there are more overlapping images to draw from, there is are more cloud artifacts in the p95 composite because there is a higher chance that each individual pixel is cloudy even if the rest of the scene has low cloud cover, and any cloudy areas have a large impact on the p95 value.

When comparing the median and p5 composites between a1, a2 and a3, we find that there is less of an effect from removing cloudy scenes. This is expected since cloudy pixels have less impact on these reducers. However, the visual darkness of the p5 does increase when filtering the cloudy images. With the median reducer, filtering clouds makes the SLC artifacts less noticeable on the overlapping parts of the scenes. 

In part b, instead of removing images with too much cloud cover we simply mask the pixels. Therefore, high quality pixels for can still be included in the reduction functions, even if other parts of an image are cloudy. These composites are less noisy than the a2 collection. This can be seen when directly comparing the median images:

![a2-median](output_figs\b0Median_zoomedout.jpg)

![b2-median](output_figs\b2Median_zoomedout.jpg)

The masked median image has has a much higher visual quality (i.e. less cloud noise). This image was taken zoomed out to better show the SLC artifacts. The SLC failure can be seen on the edges of the image where there are fewer overlapping pixels to compare. The striated pattern occurs because we do not have enough unmasked pixels to fill this area in. Some striation will occur in all post SLC failure images without overlapping edges, but because we have less data in this area, we see limited improvement by masking cloudy pixels.

When directly comparing pre and post SLC failure images, we can see that on the edges of the pre-SLC failure image (Median composite from all images in 2001), there is no striping. The two images are shown below:

![prefail](output_figs/prefail-median-2001.jpg)

![PostFailure](output_figs/postfail-median-2018.jpg)

The post failure image shows the typical striping pattern caused by SLC failure, especially on the edges of images. In the pre-failure image, despite the lack of striping we do significant cloud noise in non-overlapping parts of the composite. This is because there are fewer images for each pixel in these areas so the median value is more sensitive to outliers due to cloud reflectance. Areas with overlapping imagery have more data and therefore less cloud noise.  Post failure, there is even less data on the edges of the images, so we get a slightly reduced effect of the overlapping.


- **Q2: Discuss what the impact of these differences could be on subsequent big data analysis and how it might affect the veracity of the analysis.**

The examples in question 1 show the significant impact that cloud and snow cover has because atmospheric reflection - their albedo is significantly higher than other land cover types and therefore if they aren't handled carefully they can have an outside impact on processing, especially when taking averages. Many remote sensing applications will be primarily interested in the land surface more than cloud cover, so having a reliable way to filter clouds out of the analysis will allow for better resolution on algorithms applied to the imagery data. 


# Step 3 Greenest pixel composites
## Assignment

- Use the map-principle to calculate the normalized difference vegetation index (NDVI) for every image and use this index to reduce the collection to a greenest pixel composite for every year.
- Visualize the results and compare them to the results of Step 2.

An NDVI band was added to the landsat image stack, then composited for 2018 based on the maximum NDVI. The resulting true color composite, along with the max NDVI value are visualized below:

Greenest pixel composite:

![Greenestpixel](output_figs/gre_pix_2018.jpg)

The maximum NDVI is visualized below, with greener areas having an NDVI closer to 1, and white areas be close to zero:

![NDVIMax](output_figs/ndvi_max_2018.jpg)

## Questions:

- **Q3: What are the differences between the greenest pixel composite and the results of step 2 and explain their cause?**

One notable difference is that the greenest pixel composite works well to provide detail in vegetated areas, and ensure that vegetation is at approximately the same phenological stage when comparing year to year or compositing a large area. However, if we are interested in urban areas or water areas, this composite is less useful because water typically has a negative NDVI. Over water and urban areas, the highest NDVI value typically occurs in cloudy pixels. Therefore it can actually increase cloud noise. One way to mitigate this effect is to apply a cloud mask before the quality mosaic. If cloudy pixels are removed, we can get a better true color representation of the urban and water areas. If we start the quality mosaic from the b2 image collection above, this is the result:


![qm_from_mask](output_figs/gre_pix_mask_2018.jpg)

When clouds and cloud shadows are masked out of the image, we can see the city and the Potomac River and Chesapeake bay areas much more clearly in thr true color visualization.

Another difference between the composites in step 2 and the greenest pixel composite is that agricultural areas are much greener because pixel exhibiting the highest photosynthesis are weighted more heavily. In the step 2 composite, the fields mostly appear bare.

- **Q4: What are the advantages and potential disadvantages of using a quality mosaic?**

The advantages are to allow us to weight pixels we are interested in more heavily, allowing us to maximize any band parameter. This can be quite useful if we are trying to maximize a calculated pixel parameter like an NDVI or NDWI. One use might be if we wanted to find the long term trends regarding vegetation or water over a large area where passes might be several days apart. A quality mosaic based on the parameter we are interested in ensures that we are compositing them equally across space.

A disadvantage, as mentioned above, is that sometimes we might want to minimize a variable in some places and maximize it in others. Maximizing NDVI can give us a good idea about vegetation growth but can decrease quality of water areas or other areas where the the max NDVI is not desirable. This can be mitigated by using other, more complex types of reducing functions. 

# Step 4 SLC- and cloud masking

## Assignment

- Load the cloud+cloud-shadow mask for every image by extracting the BQA mask.
- Apply different image dilation/ erosion / opening / closing operations on the BQA mask with different distances/iterations.
- Update the mask with the dilated/ erosed / opened / closed BQA mask.
- Pick one image and visualize the effect of the dilation/ erosion / opening / closing on the data.

The various morphological operations were applied to the cloud and shadow bitmask of every image in 2018 via a map operation. To compare the results we will look at the median composite for the year after these various mask operations were applied. The original median composite, without any morphological operations applied to the mask, is shown below:


![q4_noop](output_figs/q4_mask_normal-2018.jpg)

When the dilation mask with a kernel of radius 2 pixels is applied before taking the median composite:

![q4_dil](output_figs/q4_mask_dill-2018.jpg)

When the eroded mask with a kernel of radius 2 pixels is applied before taking the median composite:

![q4_erod](output_figs/q4_mask_ero-2018.jpg)

When the opened mask with a kernel of radius 2 pixels is applied before taking the median composite:

![q4_ope](output_figs/q4_mask_opened-2018.jpg)

When the closed mask with a kernel of radius 2 pixels is applied before taking the median composite:

![q4_closed](output_figs/q4_mask_closed-2018.jpg)


## Questions

- **Q5: Visualize and explain the differences of the different dilation/erosion/opening/closing operations on the mask. What would be the different dis/advantages of the different methods?**

The dilation filter increases the size of the mask, so it also masks any pixels near a cloud or cloud shadow from the resulting composite. Erosion has the opposite effect, where masked pixels adjacent to a non masked pixel are unmasked. Closing fills in holes in the existing mask, therefore increases the amount of masked pixels in the cloudy areas. Opening is the opposite operation as closing, decreasing the total amount of masked pixels in areas that need a lot of masking. The net effect of operations that increase masking tends to be to reduce noise, because the bitmaps are not always accurate, and areas near masked pixels are more likely that areas not near masked areas to have cloud artifacts. 


# Step 5 SLC-filling

## Assignment

- Apply a map-reduce function to the USGS Landsat 7 Top of Atmosphere Reflectance Tier 1 since 2003 where you replace the missing SLC/cloud/shadow data of every image by:
    - The median reflectance
    - The mean reflectance
    - The reflectance of the greenest pixel
    - The reflectance of the pixel that is closest in time
- Visualize two images (summer vs winter) where you illustrate the effect of the different filling scenarios.

Because generating these composites require a larger memory footprint on the cloud server, a smaller area is used to illustrate the effect to reduce the likelyhood that the API responds with a 'too much memory use' error. The entire Landsat catalog since 2003 was used for compositing, and also for calculating the replacement pixels. Therefore, summer composite is all images from June, July, and August since 2003, and the winter composite is every image from December, January, and February since february 2003. 

### Winter composites

*Replaced by median:*

![w_rep_by_med](output_figs/winter_med_rep_by_med.jpg)

*Replaced by mean:*

![w_rep_by_mean](output_figs\Winter_med_rep_by_mean.jpg)

*Replaced by greenest:*

![w_rep_by_gr](output_figs/Winter_med_rep_by_gr.jpg)

*Replaced by closest in time:*

![w_rep_by_time](output_figs\winter_med_rep_by_closet_time.jpg)

### Summer composites

*Replaced by median:*

![w_rep_by_med](output_figs/summer_median_rep_by_med.jpg)

*Replaced by mean:*

![w_rep_by_mean](output_figs/summer_median_rep_by_mean.jpg)

*Replaced by greenest:*

![w_rep_by_gr](output_figs/Summer_med_rep_by_greenest.jpg)

*Replaced by closest in time:*

![w_rep_by_time](output_figs/Summer_med_rep_by_closesttime.jpg)

## Questions

- **Q6: Visualize and explain the differences of the different filling scenarios. What would be the different dis/advantages of the different methods?**

In examining the images above, we can how the different filling techniques effect the composite. For filling with the greenest pixel, we find that the results in the summer result in a better looking, less noisy composite. This is because the NDVI in an area in the winter will typically be very low even in areas with dense vegetation. Therefore, missing pixels will be replaced with a pixel with a much higher NDVI than we would expect in the winter. We also see the same issue with the greenest pixel composite in urban areas, where pixels in urban areas are replaced with clouds, because that is the highest NDVI pixel (because clouds have an NVDI of about 0, where urban areas are typically slightly negative). For the summer composite the greenest pixel replacement produces nice looking true color image, that accentuates the green  in forested areas like Rock Creek Park, while still providing decent imagery in urban areas. 

We also see that replacing cloudy pixels with the mean reflectance introduces some striated patterns due to scan line artifacts. Especially in the winter, when the 2003 to present average reflectance is likely higher than the winter reflectance. 

Replacing with the median value creates less striping effect for both the summer and winter composites and is quite effective for producing a low noise image.

However, the reducer that produces the best looking image, and an image that quite effective at showing the seasonal differences is the closst pixel in time reducer. The largest downside is that it is more complicated to implement and it is more computationally expensive. 