# Training Directory Tools
----

This notebook contains all my magic I use to fiddle around with my datasets.

File: C:\Users\kade\Desktop\training_dir_staging\1_furry_sticker\037ceb330a46b3bd01e2bfda92fd66f5.txt
- black
File: C:\Users\kade\Desktop\training_dir_staging\1_furry_sticker\0630ce3d01e25eb803781716440bd7b5.txt
- brown
File: C:\Users\kade\Desktop\training_dir_staging\1_furry_sticker\07b5e552b6b946ebb9c970354c55b198.txt
- brown
File: C:\Users\kade\Desktop\training_dir_staging\1_furry_sticker\0d281d07dfc997a5b1037939ccd33eca.txt
- speech
- outline
- bubble
File: C:\Users\kade\Desktop\training_dir_staging\1_furry_sticker\12cc788a23632fbc5bddbe4a3d0b63c0.txt
- portrait
File: C:\Users\kade\Desktop\training_dir_staging\1_furry_sticker\13f2de85dc85ab31d084254faac0e64d.txt
- hair
File: C:\Users\kade\Desktop\training_dir_staging\1_furry_sticker\15814659a362c2fff15be0e673226521.txt
- sticker
File: C:\Users\kade\Desktop\training_dir_staging\1_furry_sticker\17536230452b954e4df791723dad4580.txt
- sticker
File: C:\Users\kade\Desktop\training_dir_staging\1_furry_sticker\211ee180a626eb52835735734dfc4

## Convert `.webp` to `.png`
----

This script converts all WebP images in a specified directory and its subdirectories to PNG format. It utilizes the `os` module to navigate through the directory structure and the `PIL` (Python Imaging Library) module's `Image` class to handle image processing. The function `convert_webp_to_png(directory)` takes a directory path as input, iterates through all files in that directory (including subdirectories), identifies WebP files based on their extension, converts them to PNG format, and saves the converted images in the same location. If conversion is successful, it also removes the original WebP files. If any errors occur during conversion, it prints an error message with details.

In [1]:
import os
from PIL import Image

def convert_webp_to_png(directory):
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.lower().endswith('.webp'):
                webp_path = os.path.join(root, file)
                png_path = os.path.splitext(webp_path)[0] + '.png'
                try:
                    with Image.open(webp_path) as img:
                        img.save(png_path, format='PNG')

                    os.remove(webp_path)
                    print(f"Converted {webp_path} to {png_path}")
                except Exception as e:
                    print(f"Error converting {webp_path}: {e}")

#directory = r'E:\training_dir'
#directory = r'C:\Users\kade\Desktop\training_dir_staging'
directory = r'C:\Users\kade\Desktop\ayaya'
convert_webp_to_png(directory)

Converted C:\Users\kade\Desktop\ayaya\nap97qc4ilh21.webp to C:\Users\kade\Desktop\ayaya\nap97qc4ilh21.png


## Duplicate checker
----

This Python script checks for duplicate tags within text files (`.txt`) in a specified directory and its subdirectories. It reads each text file, splits its content into tags separated by commas, and identifies any duplicates. If duplicates are found, it prints out a message indicating the file where the duplicates were found and the duplicate tags themselves. Finally, the `check_tags_in_directory` function iterates through the directory and calls `check_duplicate_tags` for each text file found.

In [3]:
import os

def check_duplicate_tags(file_path):
    with open(file_path, 'r') as file:
        tags = file.read().split(', ')
        duplicates = set()
        unique_tags = set()
        for tag in tags:
            if tag in unique_tags:
                duplicates.add(tag)
            else:
                unique_tags.add(tag)
        if duplicates:
            print(f"Duplicate tags found in {file_path}: {', '.join(duplicates)}")

def check_tags_in_directory(directory):
    for root, _, files in os.walk(directory):
        for file_name in files:
            if file_name.endswith('.txt'):
                file_path = os.path.join(root, file_name)
                check_duplicate_tags(file_path)

if __name__ == "__main__":
    directory_path = r'C:\Users\kade\Desktop\training_dir_staging'
    check_tags_in_directory(directory_path)


## Replace tags and remove duplicates
----

This Python script recursively processes text files within a specified directory, replacing occurrences of certain words or phrases with new ones.

It utilizes regular expressions to perform search and replace operations. The main function, `process_files`, accepts a directory path along with old and new tags to be replaced. It searches for `.txt` and `.tags` within the directory and its subdirectories, reads the content of each file, performs replacements based on the provided tags, and then writes the modified content back to the files.

In [1]:
import os
import re

def process_files(directory, old_tag, new_tag):
    try:
        for entry in os.listdir(directory):
            entry_path = os.path.join(directory, entry)

            if os.path.isdir(entry_path):
                process_files(entry_path, old_tag, new_tag)

            elif os.path.isfile(entry_path) and (entry.endswith(".txt") or entry.endswith(".tags")):
                with open(entry_path, 'r', encoding='utf-8') as f:
                    content = f.read()

                if old_tag:
                    content = re.sub(r'\b' + re.escape(old_tag) + r'\b', new_tag, content)

                tag_pattern = re.compile(r'(\b\w+\b)(?:(?:,|\s)+\1)+')
                content = re.sub(tag_pattern, r'\1', content)

                with open(entry_path, 'w', encoding='utf-8') as f:
                    f.write(content)

    except Exception as e:
        print(f"Error processing directory {directory}: {e}\n")

# Directory path
directory_path = r'C:\Users\kade\Desktop\training_dir_staging'

process_files(directory_path, 'transparent background', 'black background')

#process_files(directory_path, 'safe', 'rating_safe')
#process_files(directory_path, 'questionable', 'rating_questionable')
#process_files(directory_path, 'explicit', 'rating_explicit')


## Insert tag
----

This script recursively inserts a specified tag at the beginning of each line in both `.txt` and `.tags` files within a given directory and its subdirectories.

In [9]:
import os

# Function to insert a specified tag in text files in subdirectories
def insert_tag_in_files(directory, tag_to_insert):
    try:
        for entry in os.listdir(directory):
            entry_path = os.path.join(directory, entry)

            if os.path.isdir(entry_path):
                insert_tag_in_files(entry_path, tag_to_insert)

            elif os.path.isfile(entry_path) and (entry.endswith(".txt") or entry.endswith(".tags")):
                with open(entry_path, 'r', encoding='utf-8') as f:
                    content = f.read()

                # Insert the specified tag
                content = tag_to_insert + ', ' + content

                # Write back to the file
                with open(entry_path, 'w', encoding='utf-8') as f:
                    f.write(content)

    except Exception as e:
        print(f"Error processing directory {directory}: {e}\n")

directory_path = r'C:\Users\kade\Desktop\training_dir_staging'

# Execute the function with the desired tag
insert_tag_in_files(directory_path, 'furrysticker')

## Escape parentheses
----

Recursively escape unescaped parentheses in all `.txt` files within the specified directory and its subdirectories. 

In [10]:
import os
import re

def escape_parentheses(file_path):
    with open(file_path, 'r') as file:
        content = file.read()

    # Escape unescaped parentheses
    content = re.sub(r'(?<!\\)([()])', r'\\\1', content)

    with open(file_path, 'w') as file:
        file.write(content)

def process_directory(directory):
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith(".txt"):
                file_path = os.path.join(root, file)
                escape_parentheses(file_path)

                # Recurse into subdirectories
                process_directory(os.path.join(root, file))

directory_path = r'C:\Users\kade\Desktop\training_dir_staging'
process_directory(directory_path)

## Replace underscores with spaces
----

Recursively replaces underscores with spaces in the content of text files in the specified directory and its subdirectories,
excluding specified tags.

In [3]:
import os

excluded_tags = [
    "rating_safe",
    "rating_explicit",
    "rating_questionable"
]

def replace_underscores_with_spaces(directory_path):
    for root, dirs, files in os.walk(directory_path):
        for filename in files:
            if filename.endswith(".txt"):
                file_path = os.path.join(root, filename)

                # Read the content of the file
                with open(file_path, 'r') as file:
                    content = file.read()

                # Replace underscores with spaces, excluding specific tags
                for tag in excluded_tags:
                    replacement = tag.replace('_', ' ')
                    content = content.replace(replacement, tag)

                # Write the modified content back to the file
                with open(file_path, 'w') as file:
                    file.write(content)

# Specify the directory path
directory_path = r'C:\Users\kade\Desktop\training_dir_staging'

# Call the function to recursively replace underscores with spaces (excluding specified tags)
replace_underscores_with_spaces(directory_path)

### Fix tags that need underscores after that! 🐱
----

In [12]:
import os
import fileinput

# Function to recursively replace text in *.txt files
def replace_text_in_files(directory):
    for subdir, _, files in os.walk(directory):
        for file in files:
            if file.endswith(".txt"):
                file_path = os.path.join(subdir, file)
                with fileinput.FileInput(file_path, inplace=True) as file:
                    for line in file:
                        print(line.replace("rating safe", "rating_safe")
                                   .replace("rating questionable", "rating_questionable")
                                   .replace("rating explicit", "rating_explicit"), end='')

# Replace text in the specified directory
replace_text_in_files(r'C:\Users\kade\Desktop\training_dir_staging')

## Remove extra file extension before `.txt`
----

Recursively renames `.txt` files with additional image extensions before in the filename in the specified directory and its subdirectories.

In [4]:
import os

def rename_files(directory_path):
    for root, dirs, files in os.walk(directory_path):
        for filename in files:
            if filename.endswith('.txt'):
                # Extract the base name without extension
                base_name, extension = os.path.splitext(filename)

                # Check if the file has an additional image extension
                if base_name.endswith(('.png', '.jpg', '.jpeg', '.webp', '.gif')):
                    # Construct the new filename with only the txt extension
                    new_filename = base_name[:-4] + '.txt'

                    # Construct the full file paths
                    old_path = os.path.join(root, filename)
                    new_path = os.path.join(root, new_filename)

                    # Rename the file
                    os.rename(old_path, new_path)

# Specify the directory path
directory_path = r'C:\Users\kade\Desktop\training_dir_staging'

# Call the function to recursively rename files
rename_files(directory_path)

## Newlines to commas
----

Recursively modify the content of `.txt` files in the specified directory and its subdirectories by replacing newlines with commas and spaces. 

In [14]:
import os

def process_directory(directory):
    for root, dirs, files in os.walk(directory):
        for filename in files:
            if filename.endswith(".txt"):
                file_path = os.path.join(root, filename)
                
                # Read the content of the file
                with open(file_path, 'r') as file:
                    content = file.read()
                
                # Replace newline with a comma and space
                modified_content = content.replace('\n', ', ')
                
                # Write the modified content back to the file
                with open(file_path, 'w') as file:
                    file.write(modified_content)

# Directory path
directory_path = r'C:\Users\kade\Desktop\training_dir_staging'

# Recursively process the directory and its subdirectories
process_directory(directory_path)

## Remove tags ⚠️
----

This script is designed to remove specific tags from text files (*.txt) located within a given directory and its subdirectories. It iterates through each file, reads its content, removes specified tags, and then overwrites the file with the modified content.

The tags to be removed are specified in the `tags_to_remove` list within the `remove_tags` function. These tags include various strings, such as certain species names, phrases like "unavailable at source," years from 1996 to 2024, and phrases like "generation X pokemon." The script constructs regular expressions to match both escaped and non-escaped occurrences of parentheses in the tags.

The `remove_tags` function takes a file path as input and returns a list of removal actions performed, indicating which tags were removed from which files.

The `process_directory` function recursively processes all files within a specified directory and its subdirectories. For each file with a ".txt" extension, it calls the `remove_tags` function to remove tags and accumulates removal actions. Finally, it prints out all removal actions performed.

To utilize the script, provide the path to the directory containing the text files that need tag removal. Upon execution, the script will modify the files in place, removing the specified tags, and output a log of removal actions.

In [None]:
import os
import re

def remove_tags(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        content = file.read()

    tags_to_remove = [
        "creative commons",
        "cc-by-nc-nd",
        "pal (species)",
        "pocketpair",
        "unavailable at source",
        "partially",
        "pokemon (species)",
        "generation",
        "pokephilia",
        "pokemon",
        "nintendo",
        "eeveelution",
        "uncensored",
        "translated",
        "partially translated",
        "translation request",
        "16 10",
        "16 9",
        "10 16",
        "9 16",
        "6 5",
        "5 6",
        "5 4",
        "4 3",
        "4 5",
        "3 4",
        "3 2",
        "2 3",
        "2 1",
        "1 2",
        "1 1",
        "4k",
        "absurd res",
        "hi res",
        "elden ring",
        "fromsoftware",
        "canid",
        "canis",
        "mammal",
        "unwanted erection",
        "lighting",
        "shaded",
        "widescreen"
    ]

    for gen in range(1, 10):
        tags_to_remove.append(f"generation {gen} pokemon")

    for year in range(1996, 2025):
        tags_to_remove.append(str(year))

    removal_actions = []

    for tag in tags_to_remove:
        # Construct regular expression to match both escaped and non-escaped occurrences of parentheses
        pattern = re.compile(re.escape(tag.replace('(', '\(').replace(')', '\)')) + r'|' + tag.replace('(', '\(').replace(')', '\)'))
        if pattern.search(content):
            content = pattern.sub('', content)
            removal_actions.append(f'Removed tag "{tag}" from file: {file_path}')

    with open(file_path, 'w', encoding='utf-8') as file:
        file.write(content)

    return removal_actions

def process_directory(directory):
    all_removal_actions = []

    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith(".txt"):
                file_path = os.path.join(root, file)
                removal_actions = remove_tags(file_path)
                all_removal_actions.extend(removal_actions)

    for action in all_removal_actions:
        print(action)

# Provide the path to the directory
directory_path = r'C:\Users\kade\Desktop\training_dir_staging'

# Recursively remove tags from *.txt files in the specified directory and print removal actions
process_directory(directory_path)

### Replace `, ,` with `,` after that mess. 😼
----

In [10]:
import os

# Start directory
start_dir = r'C:\Users\kade\Desktop\training_dir_staging'

# Function to replace text in *.txt files
def replace_text_in_files(directory):
    while True:  # Run indefinitely until no more matches are found
        found_match = False  # Flag to track if any match is found
        for root, dirs, files in os.walk(directory):
            for file in files:
                if file.endswith(".txt"):
                    file_path = os.path.join(root, file)
                    with open(file_path, 'r', encoding='utf-8') as f:
                        content = f.read()
                    # Replace ', ,' with ','
                    updated_content = content.replace(', ,', ',').replace(',  ,', ',')
                    if updated_content != content:
                        found_match = True  # Set the flag to True if any match is found
                        with open(file_path, 'w', encoding='utf-8') as f:
                            f.write(updated_content)
        if not found_match:  # If no match is found, break the loop
            break

# Run the function
replace_text_in_files(start_dir)