# เตรียมลิสต์ไฟล์รายชั่วโมงเพื่อนำไปสร้างฝนสะสมรายเหตุการณ์และนำไปทำKfolds
* สร้างลิสต์ไฟล์ของฝนรายชั่วโมงทั้งเรดาร์และฝนสถานี เอาเฉพาะสถานีที่ตรงกัน แล้วจะใช้ในการรวมฝนรายเหตุการณ์

In [1]:
import os
import datetime
import pandas as pd

def get_file_datetime(filename, is_radar=True):
    # Extract datetime from filename
    if is_radar:
        date_str = filename.split('_')[0]
    else:
        date_str = filename.split('.')[0]
    return datetime.datetime.strptime(date_str, '%Y%m%d%H')

def get_matching_files(radar_dir, gauge_dir):
    radar_files = os.listdir(radar_dir)
    gauge_files = os.listdir(gauge_dir)
    
    radar_times = {get_file_datetime(f, is_radar=True): f for f in radar_files if f.endswith('.tif')}
    gauge_times = {get_file_datetime(f, is_radar=False): f for f in gauge_files if f.endswith('.csv')}
    
    matched_times = sorted(set(radar_times.keys()) & set(gauge_times.keys()))
    
    return [(t, radar_times[t], gauge_times[t]) for t in matched_times]

def save_matched_list(matched_list, output_file):
    df = pd.DataFrame(matched_list, columns=['Datetime', 'Radar_File', 'Gauge_File'])
    df.to_csv(output_file, index=False)
    print(f"Matched list saved to {output_file}")

def main():
    radar_dir = '../00run_batch_acchr_codes/2output/0Hourly/0Sontihn_RF/0CBB_Pulse/0output_mosaic/'
    gauge_dir = '../1data/3Gauges/rain_hourly/0Sontihn/2Final_hourly_dates/'
    output_file = './zProcessing_temp/matched_rainfall_files.csv'
    
    print("Radar directory contents:")
    print(os.listdir(radar_dir)[:5])  # Print first 5 files
    print("\nGauge directory contents:")
    print(os.listdir(gauge_dir)[:5])  # Print first 5 files
    
    matched_list = get_matching_files(radar_dir, gauge_dir)
    save_matched_list(matched_list, output_file)
    
    print(f"Total matched times: {len(matched_list)}")
    
    # Read the saved CSV file
    df = pd.read_csv(output_file)
    
    # Print the top 10 rows
    print("\nTop 10 rows of the matched rainfall files:")
    print(df.head(10))

if __name__ == "__main__":
    main()

Radar directory contents:
['2018071507_2000m.tif', '2018071508_2000m.tif', '2018071509_2000m.tif', '2018071510_2000m.tif', '2018071511_2000m.tif']

Gauge directory contents:
['2018071500.csv', '2018071501.csv', '2018071502.csv', '2018071503.csv', '2018071504.csv']
Matched list saved to ./zProcessing_temp/matched_rainfall_files.csv
Total matched times: 235

Top 10 rows of the matched rainfall files:
              Datetime            Radar_File      Gauge_File
0  2018-07-15 07:00:00  2018071507_2000m.tif  2018071507.csv
1  2018-07-15 08:00:00  2018071508_2000m.tif  2018071508.csv
2  2018-07-15 09:00:00  2018071509_2000m.tif  2018071509.csv
3  2018-07-15 10:00:00  2018071510_2000m.tif  2018071510.csv
4  2018-07-15 11:00:00  2018071511_2000m.tif  2018071511.csv
5  2018-07-15 12:00:00  2018071512_2000m.tif  2018071512.csv
6  2018-07-15 13:00:00  2018071513_2000m.tif  2018071513.csv
7  2018-07-15 14:00:00  2018071514_2000m.tif  2018071514.csv
8  2018-07-15 15:00:00  2018071515_2000m.tif  201