## üìå Synthetic Dataset Generator for AI User Studies
Generates two synthetic datasets:  
    - Animal Hobby Dataset (Dogs, Cats, Ducks)  
    - Human Character Dataset (Bobo, Mimi, Limo)  
Outputs datasets in CSV format and supports Google Colab download.

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
from google.colab import files

# Set random seed for reproducibility
np.random.seed(42)

In [None]:
# ================================
# üìå Helper Functions
# ================================

def assign_high_middle_low(size, high=0.8, middle=0.15, low=0.05):
    """Assigns categorical values ('low', 'middle', 'high') based on probability distribution."""
    return np.random.choice(["low", "middle", "high"], size=size, p=[low, middle, high])

def map_to_numeric(values, low_range, middle_range, high_range):
    """Maps categorical values ('low', 'middle', 'high') to numeric ranges."""
    numeric_values = []
    for v in values:
        if v == "low":
            numeric_values.append(np.random.randint(*low_range))
        elif v == "middle":
            numeric_values.append(np.random.randint(*middle_range))
        else:
            numeric_values.append(np.random.randint(*high_range))
    return numeric_values

## Animal Hobby Dataset Description
**Classes:** Dog, Cat, Duck  
**Features:** Running, Gardening, Swimming, Art, Detective Story  
**Feature value range:** 0 - 15



---



###Dogs
High Running  (Dogs are naturally active)  
Middle to High Gardening  (Dogs may dig in gardens)  
Middle to High Swimming with exceptions  (Some dogs dislike water)  

**Unique combinations:**  
Low Art  & High Detective Stories ‚ÜíDog  
High Art  & High Detective Stories  ‚Üí Dog   

###Cats
Middle to High Art Engagement with exceptions (Cats are curious and expressive)  
Low Swimming (Most cats dislike water, but some cats adore it) => no middle value   
High or Middle Detective Stories (Cats are observant and mysterious)  
Middle to Low Gardening  (Cats rarely engage in gardening)

**Unique combinations:**  
High Swimming & High Art ‚Üí Cat,   
Low Running & High Detective Story ‚Üí Cat  

###Ducks
High to Upper-Middle Detective Stories  (Ducks are good observers)  
Middle to High Swimming with exceptions (Most Ducks love swimming, but some dislike it)  
Low Gardening  (Ducks spend less time in gardens)  
Varied Running  (Ducks have different running behaviors)  

**Unique combinations:**  
High Art & High Gardening ‚Üí Duck  
Middle Art & Low Gardening ‚Üí Duck  



---



High for a hobby means that only 5% of the animal class has a low value, 15% have a middle value, and 80% have a high value.
Low for a hobby means that 80% of the animal class has a low value, 15% have a middle value, and 5% have a high value.
Middle to High means that only 5% of the animal class has a low value, 55% have a middle value, and 40% have a high value.
Middle to Low means that 45% of the animal class has a low value, 50% have a middle value, and 5% have a high value.

In [None]:
# ================================
# üìå Generate Animal Hobby Dataset
# ================================
def generate_animal_hobby_dataset(num_samples=500):
    """Generates a synthetic dataset for animal hobbies."""
    classes = ["Dog", "Cat", "Duck"]
    df = pd.DataFrame({"Class": np.random.choice(classes, num_samples)})

    # Define feature mappings
    features = ["Running", "Gardening", "Swimming", "Art", "Detective Stories"]
    ranges = [(0,5,10,15)] * len(features)  # Apply same range to all features

    # Assign features based on probability distribution
    for feature, (low, mid, high, max_val) in zip(features, ranges):
        df[feature] = df["Class"].apply(lambda x: map_to_numeric(assign_high_middle_low(1), (low, mid), (mid, high), (high, max_val))[0])

    # Save dataset
    filename = "animal_hobby_dataset.csv"
    df.to_csv(filename, index=False)
    return filename

## Human Character Dataset Description
Classes: Bobo, Mimi, Limo  
Features: Sportiveness, Extroversion, Coffee Consumption, Online, Pet Ownership  
Feature value range: 0 - 7  


---


### Bobo
High Sportiveness (Bobo characters are highly active)  
Middle to High Coffee Consumption (Often drink coffee but some exceptions exist)  
Middle to High Extroversion (Mostly outgoing but a few introverted Bobos)  
Low to Middle Online Activity (Bobos prefer physical activities over online presence)  

**Unique Combinations:**  
Low Coffee & High Sportive ‚Üí Bobo  
High Extroversion & High Sportive ‚Üí Bobo  
###Mimi
Middle to High Online Activity (Mimi characters are often engaged online)  
Middle to High Extroversion (Mostly social but some introverted Mimis exist)  
Middle to High Coffee Consumption (Mimis enjoy coffee, but not all of them)  
Middle to Low Sportiveness (Most Mimis prefer mental activities, not physical sports)  

**Unique Combinations:**  
High Online Activity & High Extroversion ‚Üí Mimi  
Low Sportive & High Coffee ‚Üí Mimi  
###Limo
Middle to High Pet Ownership (Limos tend to love pets)  
Middle to High Online Activity (Some Limos spend hours online, others avoid it)  
Middle to Low Extroversion (Limos are more introverted than other characters)  
Middle Coffee Consumption (Limos do not consume extreme amounts of coffee)

**Unique Combinations:**  
High Coffee & High Pet Ownership ‚Üí Limo  
Middle Extroversion & High Online Hours ‚Üí Limo  


---


High for a characteristic means that only 5% of the character class has a low value, 15% have a middle value, and 80% have a high value.
Low for a characteristic means that 80% of the character class has a low value, 15% have a middle value, and 5% have a high value.
Middle to High means that only 5% of the character class has a low value, 55% have a middle value, and 40% have a high value.
Middle to Low means that 45% of the character class has a low value, 50% have a middle value, and 5% have a high value.

In [None]:
# ================================
# üìå Generate Human Character Dataset
# ================================

def generate_human_character_dataset(num_samples=500):
    """Generates a synthetic dataset for human character traits."""
    classes = ["Bobo", "Mimi", "Limo"]
    df = pd.DataFrame({"Class": np.random.choice(classes, num_samples)})

    # Define feature mappings
    features = ["Sportive", "Coffee", "Extroversion", "Online Hours", "Pet Ownership"]
    ranges = [(0,3,5,7)] * len(features)  # Apply same range to all features

    # Assign features based on probability distribution
    for feature, (low, mid, high, max_val) in zip(features, ranges):
        df[feature] = df["Class"].apply(lambda x: map_to_numeric(assign_high_middle_low(1), (low, mid), (mid, high), (high, max_val))[0])

    # Save dataset
    filename = "human_character_dataset.csv"
    df.to_csv(filename, index=False)
    return filename


In [None]:
# ================================
# üìå Run the Dataset Generation
# ================================

animal_dataset_file = generate_animal_hobby_dataset()
human_dataset_file = generate_human_character_dataset()

print(f"‚úÖ Animal dataset saved as: {animal_dataset_file}")
print(f"‚úÖ Human dataset saved as: {human_dataset_file}")

In [None]:
# ================================
# üìå Provide Download Links for Colab
# ================================

print("\n‚¨áÔ∏è Click below to download the generated datasets:")
files.download(animal_dataset_file)
files.download(human_dataset_file)