# Week 7 Lab: File Management System for a Research Lab

### Description
You are working as a data manager in a research lab that collects and organizes data from various sources. Your task is to automate the organization of files stored in a messy directory containing text files (.txt), CSV datasets (.csv), JSON records (.json), images (.jpg, .png), and log files (.log).

## Step 0 - Import modules/packages

In [10]:
import os
import glob
import shutil
from datetime import datetime

## Step 1 - Understanding the Data Directory
- Given a dataset directory (data_repository/), first, list all files using Python.

In [8]:
directory = "data_repository" # Define the directory

In [3]:
files = []

with os.scandir(directory) as entries:
    for entry in entries:
        if entry.is_file():
            files.append(entry.name)

print(files)

['activity.log', 'AndroidManifest.xml', 'bot_req.txt', 'course_req.txt', 'img_20250207102128189324.png', 'img_20250207111437344662.png', 'img_20250207165943833737.png', 'img_20250212021047288265.png', 'img_20250224213525161758.png', 'img_20250224213642259719.jpg', 'img_20250224213721843077.jpg', 'img_20250224213752852462.jpg', 'img_20250224213859419876.jpg', 'language.txt', 'QR-Code-Discord-Bot.png', 'quote1.json', 'quote2.json', 'req.txt', 'test.csv', 'train.csv', 'valorant_grouped_reports.json']


- Print the total number of files in the directory.

In [4]:
file_count = 0

with os.scandir(directory) as entries:
    for entry in entries:
        if entry.is_file():
            file_count += 1

print(f"Total number of files: {file_count}")

Total number of files: 21


## Step 2 - Categorizing Files
- Use `os` and `glob` to identify and count files by type (TXT, CSV, JSON, JPG, PNG, LOG).
- Print the count of each file type.

In [6]:
file_types = ["txt", "csv", "json", "jpg", "png", "log"]
file_counts = {}

for file_type in file_types:
    matching_files = glob.glob(os.path.join(directory, f"*.{file_type}"))
    file_counts[file_type.upper()] = len(matching_files)

for file_type, count in file_counts.items():
    print(f"{file_type} files: {count}")

TXT files: 4
CSV files: 2
JSON files: 3
JPG files: 4
PNG files: 6
LOG files: 1


## Step 3, 4 & 5 - Organizing Files into Folders & Handling Missing Directories
- Create subdirectories inside `organized_data/` for each file type (e.g., `text_files/`, `csv_files/`,
`images/`, `logs/`).
- Move files into their respective folders.
- If a required subdirectory does not exist, create it dynamically.
- Ensure no files are lost in the process.
- Append a timestamp (`YYYYMMDD_HHMMSS`) to log files before moving them (e.g.,
`server_20250221_153045.log`).

In [14]:
# Define destination directories
destination_directory = "organized_data"

# Define file type categories and corresponding folders
file_categories = {
    "text_files": ["txt"],
    "csv_files": ["csv"],
    "json_files": ["json"],
    "images": ["jpg", "png"],
    "logs": ["log"]
}

# Define the folder for unrecognized file types
others_folder = os.path.join(destination_directory, "others")

# Ensure the main destination directory and "others/" folder exist
os.makedirs(destination_directory, exist_ok=True)
os.makedirs(others_folder, exist_ok=True)

# Move files to their respective folders
for file_name in os.listdir(directory):
    file_path = os.path.join(directory, file_name)

    # Ensure it's a file, not a directory
    if os.path.isfile(file_path):
        # Get file extension (without the dot) and convert to lowercase
        file_extension = file_name.split(".")[-1].lower()

        # Find the correct folder for this file type
        moved = False
        for folder, extensions in file_categories.items():
            if file_extension in extensions:
                folder_path = os.path.join(destination_directory, folder)
                os.makedirs(folder_path, exist_ok=True)  # Create subdirectory if missing

                # If it's a log file, append a timestamp
                if file_extension == "log":
                    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                    name_part = ".".join(file_name.split(".")[:-1])  # Get filename without extension
                    new_file_name = f"{name_part}_{timestamp}.log"
                else:
                    new_file_name = file_name  # Keep original name for other files

                # Move the file to the appropriate folder
                destination_path = os.path.join(folder_path, new_file_name)
                shutil.move(file_path, destination_path)
                print(f"Moved: {file_name} → {destination_path}")
                moved = True
                break  # Stop searching once a match is found

        # If the file type is unknown, move it to "others/"
        if not moved:
            destination_path = os.path.join(others_folder, file_name)
            shutil.move(file_path, destination_path)
            print(f"Moved: {file_name} → {destination_path}")

print("Files organized successfully!")


Moved: activity.log → organized_data\logs\activity_20250228_135623.log
Moved: AndroidManifest.xml → organized_data\others\AndroidManifest.xml
Moved: bot_req.txt → organized_data\text_files\bot_req.txt
Moved: course_req.txt → organized_data\text_files\course_req.txt
Moved: img_20250207102128189324.png → organized_data\images\img_20250207102128189324.png
Moved: img_20250207111437344662.png → organized_data\images\img_20250207111437344662.png
Moved: img_20250207165943833737.png → organized_data\images\img_20250207165943833737.png
Moved: img_20250212021047288265.png → organized_data\images\img_20250212021047288265.png
Moved: img_20250224213525161758.png → organized_data\images\img_20250224213525161758.png
Moved: img_20250224213642259719.jpg → organized_data\images\img_20250224213642259719.jpg
Moved: img_20250224213721843077.jpg → organized_data\images\img_20250224213721843077.jpg
Moved: img_20250224213752852462.jpg → organized_data\images\img_20250224213752852462.jpg
Moved: img_20250224213

## Step 6 - Generating a Summary Report
- Generate a summary (`summary.txt`) containing:
    - Number of files per category.
    - Total number of files moved.
    - A sample file name from each category.

In [15]:
# Define destination directory
destination_directory = "organized_data"

# Define file categories (same as in the previous script)
file_categories = {
    "text_files": "text_files",
    "csv_files": "csv_files",
    "json_files": "json_files",
    "images": "images",
    "logs": "logs",
    "others": "others"  # For unrecognized file types
}

# Dictionary to store summary data
summary = {category: {"count": 0, "sample": None} for category in file_categories}
total_files_moved = 0

# Count files and get a sample for each category
for category, folder_name in file_categories.items():
    folder_path = os.path.join(destination_directory, folder_name)
    
    if os.path.exists(folder_path):  # Ensure folder exists
        files = os.listdir(folder_path)
        summary[category]["count"] = len(files)
        if files:
            summary[category]["sample"] = files[0]  # Pick the first file as a sample

    total_files_moved += summary[category]["count"]

# Generate summary report
summary_file_path = os.path.join(destination_directory, "summary.txt")
with open(summary_file_path, "w") as summary_file:
    summary_file.write("File Organization Summary\n")
    summary_file.write("=" * 30 + "\n")
    for category, data in summary.items():
        summary_file.write(f"{category.capitalize()} Files: {data['count']}\n")
        if data["sample"]:
            summary_file.write(f"  Sample File: {data['sample']}\n")
    summary_file.write("=" * 30 + "\n")
    summary_file.write(f"Total Files Moved: {total_files_moved}\n")

print(f"Summary saved: {summary_file_path}")


Summary saved: organized_data\summary.txt
