Skip to content

Dataset Check

Marco BARNIG edited this page Jun 25, 2024 · 11 revisions

To examine the dataset, I asked ChatGPT-4o to write a python script to check if all files listed in the excel column are included in the audio folder. Here is the answer :

ChatGPT

To check if all files listed in an Excel column are present in an audio folder, we can use the pandas library to read the Excel file and the os library to check for the presence of the files in the specified folder.

Requirements:
Install the required libraries if you haven't already:
pip install pandas openpyxl

Script:

import os
import pandas as pd

def check_files_in_folder(excel_file_path, sheet_name, column_name, folder_path):
    # Read the Excel file
    df = pd.read_excel(excel_file_path, sheet_name=sheet_name)
    
    # Extract the list of files from the specified column
    files_list = df[column_name].tolist()
    
    # List all files in the directory
    folder_files = os.listdir(folder_path)
    
    # Check for missing files
    missing_files = [file for file in files_list if file not in folder_files]
    
    if not missing_files:
        print("All files listed in the Excel column are present in the folder.")
    else:
        print("The following files are missing in the folder:")
        for file in missing_files:
            print(file)

if __name__ == "__main__":
    # Input parameters
    excel_file_path = input("Enter the path to the Excel file: ")
    sheet_name = input("Enter the sheet name: ")
    column_name = input("Enter the column name that contains the file names: ")
    folder_path = input("Enter the path to the folder containing audio files: ")
    
    # Check if all files listed in the Excel column are in the folder
    check_files_in_folder(excel_file_path, sheet_name, column_name, folder_path)

How to Use:
Save the script to a file, for example, check_files.py .
Run the script:
python check_files.py

When prompted, enter:

  • The path to the Excel file.
  • The sheet name containing the file list.
  • The column name where the file names are listed.
  • The path to the folder containing your audio files.
    The script will then check if all files listed in the specified Excel column are present in the specified folder and will print the result.

Notes:
Ensure the Excel file is in a format readable by pandas (e.g., .xlsx).
Ensure the folder path and file names are correct and correspond to the actual files you want to check.
This script assumes that the file names listed in the Excel column exactly match the names of the files in the folder. If there are extensions or case differences, you might need to adjust the script accordingly.


The program works like a charm. Since a few weeks, I always refer to ChatGPT if I need small python scripts to progress with my projects. It's very efficient, up-to-date and reliable.

Clone this wiki locally