## Predict cloud coverage

- Predicting cloud cover in a short time span of 120 minutes is very challenging.
- On this time scale, changes in local cloud cover are driven by a combination of dynamical and physical parameters such as wind speed, wind direction sea-level pressure, humidity, and temperature over the asset of our interest.
- Short interval cloud cover prediction requires accurate estimates of cloud motion and presence using weather data and sky camera images or physics-based&nbsp;weather models or a combination of both. 

### Goal:
#### Predict the percentage of total cloud coverage for the next upcoming intervals using the available weather and sky camera data.  

We are expected to predict the total cloud coverage as a percentage of the open sky for a fixed field of view at 4 horizon intervals of 30, 60, 90, and 120 minutes from a 6-hour window of historical data.  

In [2]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import os

In [5]:
PATH_FOLDER_DATASET = "dataset"
PATH_FOLDER_TRAINING = os.path.join(PATH_FOLDER_DATASET,"train")

Let's import training dataset

In [6]:
data = pd.read_csv(os.path.join(PATH_FOLDER_TRAINING, "train.csv"))
data.head(5)

Unnamed: 0,DATE (MM/DD),MST,Global CMP22 (vent/cor) [W/m^2],Direct sNIP [W/m^2],Azimuth Angle [degrees],Tower Dry Bulb Temp [deg C],Tower Wet Bulb Temp [deg C],Tower Dew Point Temp [deg C],Tower RH [%],Total Cloud Cover [%],Peak Wind Speed @ 6ft [m/s],Avg Wind Direction @ 6ft [deg from N],Station Pressure [mBar],Precipitation (Accumulated) [mm],Snow Depth [cm],Moisture,Albedo (CMP11)
0,1/1,00:00,-0.962276,0.0,356.8564,7.216,0.988,-7.312,32.33,-1,9.95,271.3,806.779,0.0,0.219,0.0,0.0
1,1/1,00:01,-0.937921,0.0,357.65505,7.251,1.04,-7.26,32.4,-1,8.2,272.9,806.84,0.0,0.206,0.0,0.0
2,1/1,00:02,-0.944395,0.0,358.45438,7.256,1.093,-7.207,32.54,-1,6.7,288.8,806.876,0.0,0.148,0.0,0.0
3,1/1,00:03,-0.95135,-0.029673,359.25416,7.254,1.06,-7.44,31.89,-1,7.7,294.0,806.823,0.0,0.235,0.0,0.0
4,1/1,00:04,-0.934976,-0.054401,0.05415,7.331,1.081,-7.419,31.78,-1,7.2,285.5,806.762,0.0,0.182,0.0,0.0


In [7]:
data.shape

(527040, 17)

The local weather data was recorded for 366 days for 1 minute frequency, this makes $366\times24\times60 = 527040$ rows, which is correct  

Now the raw sky camera images were recorded at 10 minute frquence for the same duration. Let's see total number of images

In [95]:
import filetype 

images_count = {}
junks_count = {}

for file in os.scandir(PATH_FOLDER_TRAINING):
    if file.is_dir():
        images_count[file.name] = 0
        junks_count[file.name] = 0
        
        for image in os.scandir(file.path):
            if filetype.is_image(image.path):
                images_count[file.name] += 1
            else:
                junks_count[file.name] += 1

junks_count = np.array(list(junks_count.items()), dtype=object)
images_count = np.array(list(images_count.items()), dtype=object)

In [107]:
print("Junk files in image folder: ", junks_count[np.where(junks_count[:, 1] > 0)])
print("Folders with no images recorded: ", images_count[np.where(images_count[:, 1] == 0)][:,0])

Junk files in image folder:  []
Folders with no images recorded:  ['0404' '0405']


Seems like images were not recorded on 4th & 5th April. Maybe we won't be able to consider weather data for these 2 dates during training. Althugh nothing junk found!  


Since images were captured with 10 mins frquency in a day, let's take a peek at image counts per day.

In [113]:
print("Unique image counts per folder (day):")
np.unique(images_count[:,1])

Unique image counts per folder (day):


array([0, 5, 7, 11, 12, 13, 14, 28, 31, 50, 52, 53, 54, 55, 56, 57, 58,
       59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
       76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
      dtype=object)

Very irregular!  

### Analyzing Weather Data

In [114]:
data.head(3)

Unnamed: 0,DATE (MM/DD),MST,Global CMP22 (vent/cor) [W/m^2],Direct sNIP [W/m^2],Azimuth Angle [degrees],Tower Dry Bulb Temp [deg C],Tower Wet Bulb Temp [deg C],Tower Dew Point Temp [deg C],Tower RH [%],Total Cloud Cover [%],Peak Wind Speed @ 6ft [m/s],Avg Wind Direction @ 6ft [deg from N],Station Pressure [mBar],Precipitation (Accumulated) [mm],Snow Depth [cm],Moisture,Albedo (CMP11)
0,1/1,00:00,-0.962276,0.0,356.8564,7.216,0.988,-7.312,32.33,-1,9.95,271.3,806.779,0.0,0.219,0.0,0.0
1,1/1,00:01,-0.937921,0.0,357.65505,7.251,1.04,-7.26,32.4,-1,8.2,272.9,806.84,0.0,0.206,0.0,0.0
2,1/1,00:02,-0.944395,0.0,358.45438,7.256,1.093,-7.207,32.54,-1,6.7,288.8,806.876,0.0,0.148,0.0,0.0


In [115]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 527040 entries, 0 to 527039
Data columns (total 17 columns):
 #   Column                                 Non-Null Count   Dtype  
---  ------                                 --------------   -----  
 0   DATE (MM/DD)                           527040 non-null  object 
 1   MST                                    527040 non-null  object 
 2   Global CMP22 (vent/cor) [W/m^2]        527040 non-null  float64
 3   Direct sNIP [W/m^2]                    527040 non-null  float64
 4   Azimuth Angle [degrees]                527040 non-null  float64
 5   Tower Dry Bulb Temp [deg C]            527040 non-null  float64
 6   Tower Wet Bulb Temp [deg C]            527040 non-null  float64
 7   Tower Dew Point Temp [deg C]           527040 non-null  float64
 8   Tower RH [%]                           527040 non-null  float64
 9   Total Cloud Cover [%]                  527040 non-null  int64  
 10  Peak Wind Speed @ 6ft [m/s]            527040 non-null  

In [119]:
data["DATE (MM/DD)"].unique()

array(['1/1', '1/2', '1/3', '1/4', '1/5', '1/6', '1/7', '1/8', '1/9',
       '1/10', '1/11', '1/12', '1/13', '1/14', '1/15', '1/16', '1/17',
       '1/18', '1/19', '1/20', '1/21', '1/22', '1/23', '1/24', '1/25',
       '1/26', '1/27', '1/28', '1/29', '1/30', '1/31', '2/1', '2/2',
       '2/3', '2/4', '2/5', '2/6', '2/7', '2/8', '2/9', '2/10', '2/11',
       '2/12', '2/13', '2/14', '2/15', '2/16', '2/17', '2/18', '2/19',
       '2/20', '2/21', '2/22', '2/23', '2/24', '2/25', '2/26', '2/27',
       '2/28', '2/29', '3/1', '3/2', '3/3', '3/4', '3/5', '3/6', '3/7',
       '3/8', '3/9', '3/10', '3/11', '3/12', '3/13', '3/14', '3/15',
       '3/16', '3/17', '3/18', '3/19', '3/20', '3/21', '3/22', '3/23',
       '3/24', '3/25', '3/26', '3/27', '3/28', '3/29', '3/30', '3/31',
       '4/1', '4/2', '4/3', '4/4', '4/5', '4/6', '4/7', '4/8', '4/9',
       '4/10', '4/11', '4/12', '4/13', '4/14', '4/15', '4/16', '4/17',
       '4/18', '4/19', '4/20', '4/21', '4/22', '4/23', '4/24', '4/25',
       '4/