## About the dataset
The dataset can be downloded from https://datadryad.org/stash/dataset/doi:10.5061/dryad.zcrjdfn9m

The dataset is structured in the following format,

`.2017_12_08/` <br>
`.2017_12_09/` <br>
`.2017_12_10/` <br>
`.` <br>
`.` <br>
`.` <br>

### Each folder contains, 

`.infrared/` 16-bit, 60x80 resolution. Approx. 8KB per frame, saved in .png format named by UNIX time in seconds.
Pyranometer Data: Sampled 4-6 times per second, saved in .csv files in the */pyranometer directory. Files contain UNIX time (1st column) and GSI in W/m2 (2nd column). File size: 4,500KB to 7,500KB. <br><br>
`.pyranometer/` Sampled 4-6 times per second, saved in .csv files in the */pyranometer directory. Files contain UNIX time (1st column) and GSI in W/m2 (2nd column). File size: 4,500KB to 7,500KB. <br><br>
`.sun_position/` Generated from pyranometer time, saved in .csv files in the */sun_position directory. Files include UNIX time (1st column), elevation angle (2nd column), and azimuth angle (3rd column). File size: 6,500KB to 11,500KB. <br><br>
`.visible/` 16-bit, 450x450 resolution, intensity channel only. Approx. 240KB per frame, saved in .png format named by UNIX time in seconds. <br><br>
`.weather_station/` Sampled every 10 minutes, linearly interpolated to match pyranometer intervals. Saved in .csv files in the */weather_station directory. Columns: UNIX time, temperature (°C), dew point (°C), atmospheric pressure (mmHg), wind direction (radians), wind velocity (mile/s), relative humidity (%). File size: 14.3MB to 25.4MB.

# Processing

For this project, we aim to structure the dataset in the following way,<br><br>
`.infrared/` Contains the IR images <br><br>
`.infrared/filtered_pyranometer.csv` Contains two columns, first one has image names and second one has irradiance values collected from pyrtanometer <br>

In [None]:
import pandas as pd
import os


def getDs(path, labels):
    pyranometer = pd.read_csv(labels)
    images = os.listdir(path)

    #convert column 1 to int
    pyranometer.iloc[:,0] = pyranometer.iloc[:,0].astype(int)

    #convert to image names
    pyranometer.iloc[:,0] = pyranometer.iloc[:,0].apply(lambda x: str(x) + 'IR.png')

    # Filter pyranometer DataFrame based on the 'x' column

    filtered_pyranometer = pyranometer[pyranometer.iloc[:,0].isin(images)]

    # # Display the result
    # filtered_pyranometer.to_csv('2017_12_08/pyranometer/filtered_pyranometer_ir.csv', index=False)
                
    filtered_pyranometer.columns = ['name', 'value']

    filtered_pyranometer = filtered_pyranometer.drop_duplicates(subset='name')
    return filtered_pyranometer

In [None]:
import datetime
import shutil
from tqdm import tqdm

# choose the range of dates you want to work with

# consider the start date as year, month, date
start_date = datetime.date(2017, 12, 8)
 
# consider the end date as 2021-.....
end_date = datetime.date(2017, 12, 10)
 
# delta time
delta = datetime.timedelta(days=1)
 
# create dataframe to add the data
ds = pd.DataFrame()
destination_path = 'mini_ds/'

# copy files from source to destination
def copy(src, dst):
    if os.path.isdir(dst):
        dst = os.path.join(dst, os.path.basename(src))
    shutil.copyfile(src, dst)


# iterate over range of dates, create a formatted dataset for each date and add them to the final processed dataset.
while tqdm((start_date <= end_date)):
    try:
        # format date string
        date = start_date.strftime('%Y_%m_%d')
        print(date, end="\n")
        path = f'images/{date}/infrared'
        labels = f'images/{date}/pyranometer/{date}.csv'
        
        # get a dataframe containing each image names in the folder and corresponding irradiance value
        ds_temp = getDs(path, labels)
        # append the dataframe in the final dataframe 
        ds = ds.append(ds_temp)
        # image list 
        img_lst = os.listdir(path=path)
        # copy images to the destiniation folder
        for img in tqdm(img_lst):
            shutil.copy(os.path.join(path,img), destination_path)
    except: 
        # if data for particular date doesn't exist move to the next folder
        start_date += delta
        continue
    start_date += delta
# Save the combined DataFrame to a CSV file
ds.to_csv('mini_ds.csv', index=False)