## 1. Data Introduction

### 1.1 Before Get Started

Located in ~/DATA folder, there is a `.csv` file named `2006Fall_2017Spring_GOES_meteo_combined.csv`. It is roughtly **4.76 GB**, so please make sure you:

1. Have sufficient physical RAM that is equal or over 16 GB;

2. On Windows, turn on paging file. Tutorial can be found at

> https://www.howtogeek.com/126430/what-is-the-windows-page-file/

3. On Linux, turn on swap. Tutorial can be found at

> https://help.ubuntu.com/community/SwapFaq

By doing so, it will help you to get through the experiment without experiencing system hanging that can be both annoying and time-consuming.

But, if you _really_ do not have the needed disk space or RAM for the computation, or you are living on slow inter-_nyet_, then in the folder `~/DATA/Hourly_Combined_CSV`, you will find **11** hourly combined data files.

### 1.2 Data Description

This dataset contains hourly indexed GOES satellite imagery data and meteorology data.

| Column Name           | Data Category     | Data Type                  | Description                                                  |
| :--------------------- | :-----------------: | :--------------------------: | :------------------------------------------------------------ |
| Date_UTC              | Timestamp         | `<String>`                 | The date of the data in the row in UTC timezone. This timezone is used by GOES satellites and the file naming follows the same convention. |
| Time_UTC              | Timestamp         | `<String>`                 | The time of the data in the row in UTC timezone. This timezone is used by GOES satellites and the file naming follows the same convention. |
| Date_CST              | Timestamp         | `<String>`                 | The date of the data in the row in UTC timezone. This timezone is used by weather station while collecting meteorology data. |
| Time_CST              | Timestamp         | `<String>`                 | The time of the data in the row in UTC timezone. This timezone is used by weather station while collecting meteorology data. |
| File_name_for_1D_lake | I/O               | `<String>`                 | The corresponding raw data file where the GOES satellite imagery data was extracted. Files can be found on server. |
| File_name_for_2D_lake | I/O               | `<String>`                 | The corresponding processed data file where the GOES satellite imagery data was extracted. Files can be found on server. |
| Lake_data_1D          | Satellite Imagery | `Array[<Float>]`           | 1-D array of floating point numbers which represent the cloud pixel intensity of Lake Michigan area. The array has a length of `3,599`. |
| Lake_data_2D          | Satellite Imagery | `Array[Array[<Float>], …]` | 2-D array of floating point numbers which represent the cloud pixel intensity of Lake Michigan area. The structure of the array is: [[col 1], [col 2], ..], data can be used to reconstruct a 106 x 79 matrix. |
| Temp (F)              | Meteorology       | `<Integer>`                | Temperature (°F Dry Bulb)                                    |
| RH (%)                | Meteorology       | `<Float>`                  | Relative Humidity (%)                                        |
| Dewpt (F)             | Meteorology       | `<Integer>`                | Dew Point Temperature (°F)                                   |
| Wind Spd (mph)        | Meteorology       | `<Float>`                  | Wind Speed (mph)                                             |
| Wind Direction (deg)  | Meteorology       | `<Float>`                  | Wind Direction (degrees, measured every 10 degrees)          |
| Peak Wind Gust(mph)   | Meteorology       | `<Float>`                  | Wind Gust (mph)                                              |
| Low Cloud Ht (ft)     | Meteorology       | `<Float>`                  | Cloud Height—Lower Level of Cloud (feet)                     |
| Med Cloud Ht (ft)     | Meteorology       | `<Float>`                  | Cloud Height—Med Level of Cloud (feet)                       |
| High Cloud Ht (ft)    | Meteorology       | `<Float>`                  | Cloud Height—Upper Level of Cloud (feet)                     |
| Visibility (mi)       | Meteorology       | `<Float>`                  | Visibility (miles)                                           |
| Atm Press (hPa)       | Meteorology       | `<Float>`                  | Atmospheric Pressure (hPa, hecto-Pascals)                    |
| Sea Lev Press (hPa)   | Meteorology       | `<Float>`                  | Sea Level Pressure (hPa, hecto-Pascals)                      |
| Altimeter (hPa)       | Meteorology       | `<Float>`                  | Altimeter (hPa, hecto-Pascals)                               |
| Precip (in)           | Meteorology       | `<Float>`                  | Precipitation (inches)                                       |
| Wind Chill (F)        | Meteorology       | `<Integer>`                | Wind Chill (°F)                                              |
| Heat Index (F)        | Meteorology       | `<Integer>`                | Heat Index (°F)                                              |

**For meteorology data:**

> Missing values are indicated by "M" and "m". When observations note specific values as missing, "M" is used. The "m" is used when there is a lack of information from the observation.


## 2. Data Visualization

The following section provides a simple function that can help you visualize the GOES satellite imagery data. An example usage is provided to help you understand how to use the function in your particular use case.

In [1]:
# !pip install pandas
# !pip install matplotlib
# !pip install plotly
# !pip install numpy

In [2]:
import os
import pandas as pd
import numpy as np
import ast

`Inputs`:

1. lat, `Array[<float>]`
    
    This is a list of floating point numbers that contains the necessary index values for latitude.
    
    
2. lon, `Array[<float>]`
    
    This is a list of floating point numbers that contains the necessary index values for longitude.
    
    
3. val, `Array[<float>]`
    
    This is a list of floating point numbers that contains the necessary intensity values for cloud pixels.
    
    
4. fig_name, `<String>`
    
    This is a string representing the file name of the output image, excluding the .png extension.
 

`Output`:

1. status_code, `<Integer>`

    This is an integer value where `0` indicates that the function was able to successfully output the image file, and `255` indicates that the function failed to execute due to mismatched lengths in the lat, lon, and val inputs. 

In [3]:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import plotly.express as px
def arrays_2_png(lat, lon, val, fig_name):
    status_code = -1
    # Here it starts
    if len(lat) == len(lon) == len(val):
        plt.figure(figsize=(10, 10))
        plt.gca().set_facecolor('black')
        plt.scatter(lon, lat, c=val, cmap=cm.gray, marker='s')
        plt.colorbar(orientation='vertical')
        plt.savefig(fig_name+'.png')
        status_code = 0
    else:
        status_code = 255

    return status_code

---

### Sample usage



In [4]:
DATA_dir = 'output/'
df_lat_lon = df_lat_lon = pd.read_csv(DATA_dir + 'lat_long_1D_labels_for_plotting.csv')
df_lat_lon.head(5)

Unnamed: 0,latitude,longitude
0,41.78,-87.54
1,41.78,-87.5
2,41.78,-87.46
3,41.78,-87.42
4,41.78,-87.38


In [5]:
df_lat_lon.shape

(3599, 2)

In [6]:
df_data = pd.read_csv(DATA_dir + '2006Fall_2017Spring_GOES_meteo_combined.csv')
df_data.head(5)

Unnamed: 0,Date_UTC,Time_UTC,Date_CST,Time_CST,File_name_for_1D_lake,File_name_for_2D_lake,Lake_data_1D,Lake_data_2D,Temp (F),RH (%),...,Low Cloud Ht (ft),Med Cloud Ht (ft),High Cloud Ht (ft),Visibility (mi),Atm Press (hPa),Sea Lev Press (hPa),Altimeter (hPa),Precip (in),Wind Chill (F),Heat Index (F)
0,2006-10-01,00:00,2006-09-30,18:00,goes11.2006.10.01.0000.v01.nc-var1-t0.csv,T_goes11.2006.10.01.0000.v01.nc-var1-t0.csv.csv,"[0.0075, 0.0025, 0.0, 0.005, 0.0, 0.005, nan, ...","[array([ nan, nan, nan, nan, nan...",51,92,...,3700,m,m,10,984.4,1007.20,1007.1,0.0,NC,NC
1,2006-10-01,01:00,2006-09-30,19:00,goes11.2006.10.01.0100.v01.nc-var1-t0.csv,T_goes11.2006.10.01.0100.v01.nc-var1-t0.csv.csv,"[0.0025, nan, 0.0025, 0.0025, nan, 0.0, nan, 0...","[array([ nan, nan, nan, nan, nan...",48,96,...,3700,m,m,10,984.7,1007.80,1007.5,0.0,NC,NC
2,2006-10-01,02:00,2006-09-30,20:00,goes11.2006.10.01.0200.v01.nc-var1-t0.csv,T_goes11.2006.10.01.0200.v01.nc-var1-t0.csv.csv,"[0.0, nan, 0.0075, nan, nan, 0.0025, nan, nan,...","[array([nan, nan, nan, nan, nan, nan, nan, nan...",49,92,...,3700,m,m,10,985.4,1008.30,1008.1,0.0,NC,NC
3,2006-10-01,03:00,2006-09-30,21:00,goes11.2006.10.01.0300.v01.nc-var1-t0.csv,T_goes11.2006.10.01.0300.v01.nc-var1-t0.csv.csv,"[0.0025, nan, 0.0025, 0.0, 0.0075, nan, 0.005,...","[array([nan, nan, nan, nan, nan, nan, nan, nan...",48,100,...,2500,6500,m,6,986.0,M,1008.8,0.02,NC,NC
4,2006-10-01,04:00,2006-09-30,22:00,goes11.2006.10.01.0400.v01.nc-var1-t0.csv,T_goes11.2006.10.01.0400.v01.nc-var1-t0.csv.csv,"[0.0025, nan, 0.0, nan, 0.0075, 0.0, nan, nan,...","[array([ nan, nan, nan, nan, nan...",50,92,...,7000,m,m,8,986.4,1009.50,1009.1,0.0,NC,NC


In [7]:
print(len(df_data['Lake_data_1D'][0]))

22688


In [8]:
df_data['Lake_data_1D'][0]

'[0.0075, 0.0025, 0.0, 0.005, 0.0, 0.005, nan, 0.0, nan, nan, nan, nan, nan, 0.0075, 0.0025, 0.0025, 0.0075, 0.0025, 0.0025, nan, 0.0025, 0.0, 0.0, nan, 0.0025, 0.0, 0.0025, nan, nan, 0.0025, 0.0025, 0.0025, nan, nan, nan, 0.0025, 0.0, 0.0025, 0.0025, nan, 0.0075, 0.0025, 0.0025, 0.0, nan, 0.0, 0.0025, 0.0025, 0.0075, nan, 0.005, 0.0075, 0.0025, 0.0, 0.005, 0.0025, 0.0025, nan, nan, 0.0075, 0.0025, 0.005, 0.0075, 0.0025, nan, 0.005, 0.0025, nan, nan, nan, 0.0, 0.0025, 0.0, 0.0025, nan, 0.0, 0.0075, 0.005, nan, nan, nan, 0.0, nan, 0.0, 0.0, nan, 0.0025, 0.0, 0.0025, 0.0025, nan, 0.0025, nan, nan, nan, 0.0025, nan, 0.0075, 0.0025, nan, 0.0025, 0.0, 0.0025, 0.005, nan, 0.0025, 0.0025, nan, 0.0025, nan, nan, nan, nan, 0.0, 0.0025, 0.0075, 0.005, 0.0, nan, nan, 0.0025, nan, 0.0, 0.0025, nan, 0.0, nan, nan, 0.0075, nan, 0.005, 0.005, 0.0025, 0.0025, nan, 0.0025, nan, 0.0025, 0.005, 0.0, nan, 0.0025, 0.0, 0.005, 0.0025, 0.0, 0.0075, 0.005, 0.0025, 0.0, nan, 0.0025, 0.0, nan, nan, 0.0025, 0.00

In [9]:
column_names = df_data.columns.tolist()
print(column_names)

['Date_UTC', 'Time_UTC', 'Date_CST', 'Time_CST', 'File_name_for_1D_lake', 'File_name_for_2D_lake', 'Lake_data_1D', 'Lake_data_2D', 'Temp (F)', 'RH (%)', 'Dewpt (F)', 'Wind Spd (mph)', 'Wind Direction (deg)', 'Peak Wind Gust(mph)', 'Low Cloud Ht (ft)', 'Med Cloud Ht (ft)', 'High Cloud Ht (ft)', 'Visibility (mi)', 'Atm Press (hPa)', 'Sea Lev Press (hPa)', 'Altimeter (hPa)', 'Precip (in)', 'Wind Chill (F)', 'Heat Index (F)']


In [10]:
data_sample = df_data['Lake_data_1D'][5]
data_sample

'[nan, nan, 0.0075, nan, nan, 0.0025, 0.0025, 0.0, 0.005, 0.0, 0.0075, 0.0, 0.0075, 0.0, 0.0, 0.0025, nan, 0.0025, 0.0025, 0.0, 0.0, nan, nan, nan, nan, 0.0025, 0.0, nan, 0.0025, 0.0075, nan, 0.0, 0.01, 0.0025, 0.0, nan, 0.0, nan, nan, 0.0025, nan, nan, 0.0025, nan, nan, 0.005, 0.0, 0.0, 0.005, nan, 0.0075, nan, 0.0, 0.0025, nan, 0.0025, nan, 0.0025, 0.005, 0.005, nan, nan, nan, 0.0075, nan, 0.0075, 0.0025, 0.0025, nan, 0.0025, nan, nan, nan, 0.005, 0.005, nan, nan, 0.0025, 0.0025, 0.0025, nan, nan, 0.0025, nan, 0.0025, 0.0, 0.0025, 0.0025, nan, 0.0025, 0.0075, 0.0, 0.0075, nan, 0.0025, 0.0, nan, nan, nan, 0.0025, nan, 0.0, 0.0025, nan, 0.0, nan, nan, nan, nan, nan, 0.0025, 0.0025, 0.005, nan, 0.0, 0.0025, 0.0025, 0.005, 0.0025, nan, nan, nan, 0.0025, 0.005, 0.0025, 0.0, 0.005, 0.0025, 0.0025, 0.005, 0.0, nan, 0.0025, 0.0, 0.0025, nan, nan, 0.0025, 0.0, 0.0025, 0.005, 0.0, 0.0025, nan, nan, nan, nan, 0.005, 0.0, 0.0025, nan, 0.0075, 0.0025, nan, nan, nan, 0.0075, nan, 0.0, 0.0, nan, na

See how the data in the cell is a large string? Therefore, you need `ast` to convert it to an actual list.

In [12]:
import ast

data_sample_lst = ast.literal_eval(data_sample)
data_sample_lst

ValueError: malformed node or string: <_ast.Name object at 0x7f166064f970>

In [13]:
lat_lst = df_lat_lon['latitude'].to_list()
lat_lst

[41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.78,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.82,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.86,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.9,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.94,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 41.98,
 

In [14]:
lon_lst = df_lat_lon['longitude'].to_list()
lon_lst

[-87.54,
 -87.5,
 -87.46,
 -87.42,
 -87.38,
 -87.34,
 -87.3,
 -87.26,
 -87.22,
 -87.18,
 -87.14,
 -87.1,
 -87.06,
 -87.02,
 -86.98,
 -86.94,
 -86.9,
 -86.86,
 -86.82,
 -87.54,
 -87.5,
 -87.46,
 -87.42,
 -87.38,
 -87.34,
 -87.3,
 -87.26,
 -87.22,
 -87.18,
 -87.14,
 -87.1,
 -87.06,
 -87.02,
 -86.98,
 -86.94,
 -86.9,
 -86.86,
 -86.82,
 -86.78,
 -86.74,
 -87.58,
 -87.54,
 -87.5,
 -87.46,
 -87.42,
 -87.38,
 -87.34,
 -87.3,
 -87.26,
 -87.22,
 -87.18,
 -87.14,
 -87.1,
 -87.06,
 -87.02,
 -86.98,
 -86.94,
 -86.9,
 -86.86,
 -86.82,
 -86.78,
 -86.74,
 -86.7,
 -87.58,
 -87.54,
 -87.5,
 -87.46,
 -87.42,
 -87.38,
 -87.34,
 -87.3,
 -87.26,
 -87.22,
 -87.18,
 -87.14,
 -87.1,
 -87.06,
 -87.02,
 -86.98,
 -86.94,
 -86.9,
 -86.86,
 -86.82,
 -86.78,
 -86.74,
 -86.7,
 -86.66,
 -86.62,
 -87.62,
 -87.58,
 -87.54,
 -87.5,
 -87.46,
 -87.42,
 -87.38,
 -87.34,
 -87.3,
 -87.26,
 -87.22,
 -87.18,
 -87.14,
 -87.1,
 -87.06,
 -87.02,
 -86.98,
 -86.94,
 -86.9,
 -86.86,
 -86.82,
 -86.78,
 -86.74,
 -86.7,
 -86.66,
 -86.6

In [15]:
# Make sure they have same len

print(len(data_sample_lst))
print(len(lat_lst))
print(len(lon_lst))

NameError: name 'data_sample_lst' is not defined

In [16]:
return_code = arrays_2_png(lat_lst, lon_lst, data_sample_lst, 'sample')
return_code

NameError: name 'data_sample_lst' is not defined