# Step 1: Extract Imagery Data for Lake Michigan from GOES

This is the 1st step of the experiment. Before get started, you should obtain the visible band data generated by GOES Satellite service. The original file is in `.nc` format, and it contains the full scan of the continent. However, in our application, it is not needed. Therefore, we choose to use NOAA's Weather and Climate Toolkit (Viewer and Data Exporter) to extract the data for the area which we are interested in. The files exported by the toolkit is in `.csv` format.

In [1]:
!pip install scipy # Just in case.

Defaulting to user installation because normal site-packages is not writeable


In [2]:
import os
import pandas as pd
import scipy
import numpy as np
from tqdm import tqdm

In [3]:
from scipy.stats import skew


### TO-DO:

In this part, please change the directory path to the `.csv` files generated by NOAA's toolkit.

In [4]:
os.getcwd()
os.chdir('GOES_Hourly_Statistics/raw_csv/2007Fall_2008Spring')
os.getcwd()

'/srv/scratch/NOAA/GOES_Hourly_Statistics/raw_csv/2007Fall_2008Spring'

In [5]:
len(os.listdir())

14760

In [6]:
filename = os.listdir()
len(filename)

14760

In [7]:
filename[0:10]

['goes11.2007.10.10.1400.v01.nc-var1-t0.csv',
 'goes11.2008.01.14.2130.v01.nc-var1-t0.csv',
 'goes11.2007.10.27.1915.v01.nc-var1-t0.csv',
 'goes11.2007.11.01.0515.v01.nc-var1-t0.csv',
 'goes11.2007.10.15.0600.v01.nc-var1-t0.csv',
 'goes11.2008.02.28.2015.v01.nc-var1-t0.csv',
 'goes11.2008.03.21.2000.v01.nc-var1-t0.csv',
 'goes11.2008.02.27.0230.v01.nc-var1-t0.csv',
 'goes11.2008.03.01.1715.v01.nc-var1-t0.csv',
 'goes11.2008.02.12.0400.v01.nc-var1-t0.csv']

**Important: Sort is a must for linux system.**

In [8]:
filename.sort()

In [9]:
## Quick inspection of the order.

filename[0:10]

['goes11.2007.10.01.0000.v01.nc-var1-t0.csv',
 'goes11.2007.10.01.0030.v01.nc-var1-t0.csv',
 'goes11.2007.10.01.0045.v01.nc-var1-t0.csv',
 'goes11.2007.10.01.0100.v01.nc-var1-t0.csv',
 'goes11.2007.10.01.0115.v01.nc-var1-t0.csv',
 'goes11.2007.10.01.0130.v01.nc-var1-t0.csv',
 'goes11.2007.10.01.0145.v01.nc-var1-t0.csv',
 'goes11.2007.10.01.0200.v01.nc-var1-t0.csv',
 'goes11.2007.10.01.0215.v01.nc-var1-t0.csv',
 'goes11.2007.10.01.0230.v01.nc-var1-t0.csv']

### TO-DO:

The `Lake_Partition.csv` contains a table of the boundary values which can be used to precisely extract the data from Lake Michigan area.

<img src="Zone_partition.png" alt="partition" width="350"/>

In [10]:
## TO-DO: Change the directory if needed.
LP = pd.read_csv('/srv/scratch/NOAA/Lake_Partition.csv')
LP

Unnamed: 0,latitude,first_bound,second_bound,left_zone,mid_zone,right_zone
0,40.06,-87.5465,-84.8170,2,3,4
1,40.10,-87.5465,-84.8170,2,3,4
2,40.14,-87.5465,-84.8170,2,3,4
3,40.18,-87.5465,-84.8170,2,3,4
4,40.22,-87.5465,-84.8170,2,3,4
...,...,...,...,...,...,...
193,47.78,-86.1781,-84.9473,6,6,6
194,47.82,-86.1781,-84.9473,6,6,6
195,47.86,-86.1781,-84.9473,6,6,6
196,47.90,-86.1781,-84.9473,6,6,6


In [11]:
# Optional: Check the list of filenames one more time.
for i in filename[:3]:
    print(i)

goes11.2007.10.01.0000.v01.nc-var1-t0.csv
goes11.2007.10.01.0030.v01.nc-var1-t0.csv
goes11.2007.10.01.0045.v01.nc-var1-t0.csv


### TO-DO:

Make sure to change the directory of the outputs!

In [12]:
!pwd

/srv/scratch/NOAA/GOES_Hourly_Statistics/raw_csv/2007Fall_2008Spring


In [13]:
#!mkdir /srv/scratch/NOAA/GOES_Hourly_Statistics/zone_0_2007Fall_2008Spring/

In [14]:
for fn in tqdm(filename[:2500]):
    file = pd.read_csv(fn)
    flp = pd.merge(file, LP, on = 'latitude')
    par_list = []
    for i in range(len(flp)):
        a = flp['longitude'][i] 
        b = flp['first_bound'][i]
        c = flp['second_bound'][i]
        if a < b:
            par_list.append(flp['left_zone'][i])
        elif a > c:
            par_list.append(flp['right_zone'][i])
        else:
            par_list.append(flp['mid_zone'][i])
    flp['partition'] = par_list
    flp = flp.loc[:, ['value', 'datetime', 'latitude', 'longitude', 'partition']]
    s_flp = flp[flp['partition'] == 0].reset_index().rename(columns={'index': 'corresponding row'})
    
    ## TO-DO: Change the directory, clearly lable as "zone_0"
    s_flp.to_csv('/srv/scratch/NOAA/GOES_Hourly_Statistics/zone_0_2007Fall_2008Spring/'+str(fn),index = False)
#     print(fn)



100%|██████████| 2500/2500 [25:20<00:00,  1.64it/s]


In [15]:
print("End of current process.")

End of current process.
