# Database Generator

In [1]:

# necessary imports
import os
import sqlite3
from PIL import Image, ImageCms
from tqdm import tqdm
import numpy as np




icms = ImageCms

## Fundaments

We set up two global variables: database_name, which defines the name of the SQLite database file as "database_all_images.db", and root_folder, which specifies the directory containing the image files, set to "D:/data/image_data".

To begin the database creation process, the script establishes a connection to the SQLite database using ```sqlite3.connect(database_name)```. A cursor object is created to execute SQL commands. The script then ensures that a table named Images exists in the database by executing a SQL command to create the table if it does not already exist. This table is structured with three primary fields:
-  ```ID```, which is an integer serving as the primary key; 
- ```Name```, which stores the name of the image file as text; 
- ```Path```, which records the file path to the image as text;
- ```Size```, which stores the size of the image file in bytes as an integer.

In [2]:
# creating a database using sql lite
database_name = "database_all_images.db"
root_folder = r"D:/data/image_data"

conn = sqlite3.connect(database_name)
c = conn.cursor()
c.execute('''CREATE TABLE IF NOT EXISTS Images
                 (ID INTEGER PRIMARY KEY,
                 Name TEXT,
                 Path TEXT,
                 Size INTEGER
                 )''')
conn.commit()



## Necessary Functions

Following the database initialization, we define a function named ```traverse_folders()``` that recursively traverses the directories under the specified root folder. This function examines each file found within the directories and checks whether its extension matches common image formats such as .jpg, .jpeg, .png, or .tiff. 

For each image file that matches the criteria, the function retrieves the file path using ```os.path.join```, determines the file size with os.path.getsize, and attempts to open the image using ```Image.open(file_path)```.

If successful, the function yields a tuple containing the file’s name, path, and size, along with a progress percentage calculated based on the total number of files in the directory. If the script encounters any exceptions during this process, such as if the file cannot be opened as an image, it gracefully handles the error by printing a message that details the issue.

In [1]:
def traverse_folders(root_folder):
    """
    This function is designed to traverse through a directory structure, locate image files of specific formats, and yield useful information about each file, such as its name, path, and size. 
    
    Input: 
    - root_folder: This is the root directory from which the function begins its traversal. It is a string representing the path to the folder where the search starts.

    Output:
    - The function does not return a single output but instead uses a generator (yield) to provide results incrementally. For each valid image file, it yields:
        - A tuple consisting of:
            1) The file name (file).
            2) The full file path (file_path).
            3) The file size in bytes (file_size).
            4) The progress percentage as a floating-point number, representing how far along the function is in processing the files within the current directory.
    
    """
    for root, dirs, files in os.walk(root_folder):
        total_files = len(files)
        for i, file in enumerate(files, 1):
            if file.endswith(('.jpg', '.jpeg', '.png', '.tiff')):  # list all the formats
                file_path = os.path.join(root, file)  # get file path
                try:
                    file_size = os.path.getsize(file_path)  # get file size
                    image = Image.open(file_path)
                    yield (file, file_path, file_size), i / total_files * 100
                except Exception as e:
                    # Catch any other exceptions and log if needed
                    print(f"Unexpected error with file {file_path}: {e}")
            else:
                pass


Also, we  Includes a function called ```insert_into_database()```, which is responsible for inserting records into the Images table within the database. This function accepts the database name and the data tuple (containing the file name, path, and size) as inputs. 

It then connects to the database, executes an SQL command to insert the data into the Images table, and commits the changes.

In [3]:


# Function to insert data into the database
def insert_into_database(database_name, data):
    """
    This function is designed to insert a single record into an SQLite database. 
    The record contains information about an image, specifically its name, file path, and size. This function facilitates the storage of image metadata in a database for later retrieval or analysis.

    Input:
    - database_name: A string representing the name (or path) of the SQLite database file. If the database file does not exist, SQLite will create it.
    - data: A tuple containing three elements:
        1) Name: A string representing the name of the image.
        2) Path: A string representing the full file path to the image.
        3) Size: An integer representing the size of the image file in bytes.
    """
    
    conn = sqlite3.connect(database_name)
    c = conn.cursor()
    c.execute("INSERT INTO Images (Name, Path, Size) VALUES (?, ?, ?)", data)
    conn.commit()


## Create the database

FinallyAs, we iterate over the image data yielded by ```traverse_folders()```, it calls the ```insert_into_database()``` function for each image to add its information to the database. Once all images have been processed and inserted, the script prints a final message indicating that the database creation and data insertion process has been successfully completed.

In [None]:

for data, progress in tqdm(traverse_folders(root_folder), total=500000, desc="Processing images", unit="%"):
    insert_into_database(database_name, data)

print("Database creation and data insertion completed.")