### Rationale for Utilizing the Faker Library

The Faker library is being employed in the initial phases of this project primarily due to the current paucity of available data pertaining to user tasks. The development of a functional machine learning model, specifically for task duration prediction, necessitates a dataset of sufficient size and diversity. Given the limited volume of real-world task data at this juncture, synthetic data generation is essential to:

* **Facilitate Early Model Development:** The generation of synthetic task descriptions and durations using Faker enables the creation of a preliminary dataset suitable for training and evaluating basic machine learning models. This allows for the iterative refinement of the model architecture and training process in the absence of real-world data.

* **Enable Initial Testing:** Synthetic data provides a controlled environment for conducting initial testing of the task scheduling system and its integration with the duration prediction model. This ensures that core functionalities can be validated prior to deployment in a production environment with genuine user data.

In [59]:
# File Sharing
import csv

# Faker Library
from faker import Faker

# Other 
import random as random
from datetime import datetime, timedelta
from typing import List, Dict

In [60]:
# Initialise Faker Object
fake = Faker()

In [61]:
# Task Generation Params , Number of tasks will simulate a 4 month period
num_work_tasks = 4*5*4*4 # 4 Tasks per week , 5 days a week
num_study_tasks = 2*4*4 # Expected to work on coding personal coding projects twice a week
num_personal_health_and_wellbeing = 4*4*4 # Expected to go gym 4 times a week
num_of_house_tasks = 2*4*4 + 4 # Expected to change bed sheets once a month hence +4
num_of_social_tasks = 7

In [62]:
# List of tasks that correspond to each area and list of tags that could be used for each task in a given area
work = [
    "Attend Meeting",
    "Analyse Data",
    "4-eye Check",
    "Change BOM on SAP",
    "Test Product",
    "Calculate Savings",
    "Evaluate Data",
    "Present Data Insights",
    "Review Process"
]

work_tags = ["Data Science", "Adminstration", "Quality Engineering"]

personal_health_and_wellbeing = [
    "Gym",
    "Running",
    "Buy Cream",
    "Buy Face Wash",
    "Go GP Appointment"
]

personal_health_and_wellbeing_tags = ["Fitness", "Personal Care"]

study = [
    "Read Research Papers",
    "Coding",
    "Add Comments",
    "Watch Guide",
]

study_tags = ["Programming"]

social = [
    "Go to Resturant",
    "Go Pub",
    "Go to Cinema",
    "Go Aunty's House",
    "Go Aunty's House",
    "Go Aunty's House"
]

social_tags = ["Family", "Friends"]

home = [
    "Laundry",
    "Clean Room",
    "Change Bed Sheets"
]

home_tags = []

In [63]:
# initialise tasks (List of Dict)
tasks = [] 

def generate_tasks(area, num_tasks, common_tasks, dur_min, dur_max, tags=[""]):
    # Initalise Tasks
    tasks = []
    # Iterate through the number of tasks to be generated
    for num in range(num_tasks):
        task = {
            "title" : random.choice(common_tasks),
            "description" : fake.sentence(nb_words=6),
            "duration" : random.randint(dur_min, dur_max),
            "tags" : [area, random.choice(tags) ]
        }
        # Append
        tasks.append(task)
        
    # Return
    return tasks

# Extend Dict for all areas of productivity (Extend adds the elements of the list)
tasks.extend(generate_tasks("Work", num_work_tasks, work, 15, 120, work_tags))
tasks.extend(generate_tasks("Health & Wellbeing", num_personal_health_and_wellbeing, personal_health_and_wellbeing, 25, 90, personal_health_and_wellbeing_tags))
tasks.extend(generate_tasks("Study", num_study_tasks, study, 20, 60, study_tags))
tasks.extend(generate_tasks("Friends&Family", num_of_social_tasks, social, 180,350, social_tags))
tasks.extend(generate_tasks("Home", num_of_house_tasks, home, 15, 40))

In [64]:
random.shuffle(tasks)

In [65]:
print(tasks)

[{'title': 'Change BOM on SAP', 'description': 'Before wall open green parent.', 'duration': 84, 'tags': ['Work', 'Quality Engineering']}, {'title': 'Add Comments', 'description': 'Like wall serve.', 'duration': 41, 'tags': ['Study', 'Programming']}, {'title': 'Present Data Insights', 'description': 'Relationship song continue light.', 'duration': 105, 'tags': ['Work', 'Data Science']}, {'title': 'Running', 'description': 'Rule number scientist away least certainly garden.', 'duration': 80, 'tags': ['Health & Wellbeing', 'Personal Care']}, {'title': 'Review Process', 'description': 'Couple truth sing remain family.', 'duration': 92, 'tags': ['Work', 'Data Science']}, {'title': 'Buy Face Wash', 'description': 'Doctor accept tough street month.', 'duration': 84, 'tags': ['Health & Wellbeing', 'Fitness']}, {'title': 'Calculate Savings', 'description': 'Address decide be figure.', 'duration': 82, 'tags': ['Work', 'Adminstration']}, {'title': 'Calculate Savings', 'description': 'Scene great

In [66]:
# Assign unique IDs
for i, task in enumerate(tasks):
    task["id"] = i + 1

In [67]:
output_filename = "fake_tasks"
#CSV Saving 
with open(output_filename, "w", newline="", encoding="utf-8") as csvfile:
    fieldnames: List[str] = ["id", "title", "description", "duration", "tags"]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerows(tasks)

print(f"Generated {len(tasks)} tasks and saved to {output_filename}")

Generated 459 tasks and saved to fake_tasks
