# BAf√∂G OCEL Simulation

Diese Simulation generiert einen **Object-Centric Event Log (OCEL)** f√ºr den BAf√∂G-Antragsprozess.

## Outputs
- `events.csv` - Alle Events mit Sorting-Spalte
- `applications.csv` - Application-Objekte
- `documents.csv` - Document-Objekte
- `event_object_link.csv` - Verkn√ºpfung Events ‚Üî Objekte
- `log_not_sliced.csv` - Zwischenformat zur Kontrolle

## Datenmodell
Basierend auf `agent/schema.sql` mit zwei Objekttypen:
- **Application**: Jeder Antrag
- **Document**: 1-5 Dokumente pro Antrag (abh√§ngig von Attributen)

## 1. Setup & Imports

In [37]:
import json
import random
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Tuple
from pathlib import Path
import numpy as np
import pandas as pd
import simpy

# Project paths
PROJECT_ROOT = Path.cwd().parent
CONFIG_PATH = PROJECT_ROOT / "config" / "sim_ocel_config.json"
OUTPUT_DIR = PROJECT_ROOT / "data" / "outputs" / "ocel"

# Create output directory
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

print(f"Project Root: {PROJECT_ROOT}")
print(f"Config Path: {CONFIG_PATH}")
print(f"Output Dir: {OUTPUT_DIR}")

Project Root: /Users/davidderr/Desktop/pm4py-bpmn-simulation
Config Path: /Users/davidderr/Desktop/pm4py-bpmn-simulation/config/sim_ocel_config.json
Output Dir: /Users/davidderr/Desktop/pm4py-bpmn-simulation/data/outputs/ocel


## 2. Load Configuration

In [38]:
with open(CONFIG_PATH, 'r', encoding='utf-8') as f:
    config = json.load(f)

# Extract key parameters
NUM_CASES = config['simulation']['num_cases']
START_DATE = datetime.fromisoformat(config['simulation']['start_date'])
RANDOM_SEED = config['simulation']['random_seed']

# Debug mode: Set to True for faster testing with fewer cases
DEBUG_MODE = False
DEBUG_CASES = 50  # Number of cases in debug mode

if DEBUG_MODE:
    NUM_CASES = DEBUG_CASES
    print(f"‚ö†Ô∏è  DEBUG MODE: Running with {NUM_CASES} cases")

# Set random seeds
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

print(f"Simulation: {NUM_CASES} cases starting at {START_DATE}")
print(f"Random seed: {RANDOM_SEED}")

Simulation: 2000 cases starting at 2024-09-15 00:00:00
Random seed: 42


## 3. Data Classes

In [39]:
@dataclass
class Application:
    """Represents a BAf√∂G application."""
    application_id: str
    student_id: str
    is_initial_application: bool = True
    is_parent_independent: bool = False
    housing_type: str = "Alleine"  # 'Eltern' or 'Alleine'
    status: str = "Pending"  # 'Pending', 'Approved', 'Rejected'
    
    def to_dict(self) -> dict:
        return {
            'application_id': self.application_id,
            'student_id': self.student_id,
            'is_initial_application': self.is_initial_application,
            'is_parent_independent': self.is_parent_independent,
            'housing_type': self.housing_type,
            'status': self.status
        }


@dataclass
class Document:
    """Represents a document attached to an application."""
    document_id: str
    application_id: str
    doc_type: str
    doc_category: str
    status: str = "Missing"  # 'Missing', 'Received'
    submission_date: Optional[datetime] = None
    
    def to_dict(self) -> dict:
        return {
            'document_id': self.document_id,
            'application_id': self.application_id,
            'doc_type': self.doc_type,
            'doc_category': self.doc_category,
            'status': self.status,
            'submission_date': self.submission_date.isoformat() if self.submission_date else None
        }


@dataclass
class Event:
    """Represents an event in the process."""
    event_id: str
    activity: str
    timestamp: datetime
    sorting_integer: int
    org_resource: str
    linked_objects: List[Tuple[str, str]] = field(default_factory=list)  # [(object_id, object_type), ...]
    
    def to_dict(self) -> dict:
        return {
            'event_id': self.event_id,
            'activity': self.activity,
            'timestamp': self.timestamp.isoformat(),
            'sorting_integer': self.sorting_integer,
            'org_resource': self.org_resource
        }

## 4. Duration Sampling Functions

In [40]:
def sample_duration(activity_config: dict) -> float:
    """Sample a duration in minutes based on the activity configuration."""
    dist = activity_config.get('distribution', 'uniform')
    
    if dist == 'uniform':
        duration = random.uniform(
            activity_config.get('min_minutes', 1),
            activity_config.get('max_minutes', 5)
        )
    elif dist == 'normal':
        mean = activity_config.get('mean_minutes', 10)
        std = activity_config.get('std_minutes', 2)
        min_val = activity_config.get('min_minutes', 1)
        max_val = activity_config.get('max_minutes', mean * 3)
        
        duration = np.random.normal(mean, std)
        duration = max(min_val, min(max_val, duration))  # Truncate
    elif dist == 'exponential':
        mean = activity_config.get('mean_minutes', 60)
        min_val = activity_config.get('min_minutes', 1)
        max_val = activity_config.get('max_minutes', mean * 3)
        
        duration = np.random.exponential(mean)
        duration = max(min_val, min(max_val, duration))  # Truncate
    else:
        duration = 5  # Default
    
    return duration


def get_activity_duration(activity_name: str) -> float:
    """Get duration for an activity in minutes."""
    activities = config.get('activities', {})
    
    # Try exact match first
    if activity_name in activities:
        return sample_duration(activities[activity_name])
    
    # Try partial match
    for key, cfg in activities.items():
        if key.lower() in activity_name.lower() or activity_name.lower() in key.lower():
            return sample_duration(cfg)
    
    # Default: 5 minutes
    return 5.0


def get_resource(activity_name: str) -> str:
    """Get resource for an activity."""
    activities = config.get('activities', {})
    if activity_name in activities:
        return activities[activity_name].get('resource', 'System')
    return 'System'

## 5. Document Generation

In [41]:
def generate_documents(application: Application, doc_counter: int) -> Tuple[List[Document], int]:
    """Generate documents for an application based on its attributes."""
    documents = []
    doc_types = config.get('document_types', {})
    
    for doc_type, doc_config in doc_types.items():
        should_generate = False
        
        if doc_config.get('always_generated', False):
            should_generate = True
        elif doc_config.get('condition') == 'is_parent_dependent':
            should_generate = not application.is_parent_independent
        elif doc_config.get('condition') == 'has_formblatt_3':
            # Check if Formblatt 3 was generated
            should_generate = not application.is_parent_independent
        elif doc_config.get('condition') == 'not_living_with_parents':
            should_generate = application.housing_type != 'Eltern'
        
        if should_generate:
            doc = Document(
                document_id=f"DOC_{doc_counter:06d}",
                application_id=application.application_id,
                doc_type=doc_type,
                doc_category=doc_config.get('category', 'Sonstiges'),
                status='Missing'
            )
            documents.append(doc)
            doc_counter += 1
    
    return documents, doc_counter


# Test document generation
test_app = Application(
    application_id="APP_TEST",
    student_id="STU_TEST",
    is_parent_independent=False,
    housing_type="Alleine"
)
test_docs, _ = generate_documents(test_app, 0)
print(f"Test application generates {len(test_docs)} documents:")
for doc in test_docs:
    print(f"  - {doc.doc_type} ({doc.doc_category})")

Test application generates 5 documents:
  - Formblatt 1 (Antrag)
  - Immatrikulationsbescheinigung (Identit√§t)
  - Formblatt 3 (Einkommen)
  - Einkommensnachweis Eltern (Einkommen)
  - Mietbescheinigung (Wohnen)


## 6. Simulation Engine

In [42]:
class BAfoegSimulation:
    """SimPy-based simulation of the BAf√∂G application process."""
    
    def __init__(self, config: dict, start_date: datetime):
        self.config = config
        self.start_date = start_date
        self.env = simpy.Environment()
        
        # Counters
        self.event_counter = 0
        self.app_counter = 0
        self.doc_counter = 0
        
        # Storage
        self.applications: List[Application] = []
        self.documents: List[Document] = []
        self.events: List[Event] = []
        
        # Gateway probabilities
        self.gateways = config.get('gateways', {})
        
        # Deviation probabilities
        self.deviations = config.get('deviations', {})
    
    def sim_time_to_datetime(self, sim_time: float) -> datetime:
        """Convert simulation time (in minutes) to datetime."""
        return self.start_date + timedelta(minutes=sim_time)
    
    def record_event(self, activity: str, linked_objects: List[Tuple[str, str]], resource: str = None):
        """Record an event."""
        self.event_counter += 1
        
        if resource is None:
            resource = get_resource(activity)
        
        event = Event(
            event_id=f"E_{self.event_counter:06d}",
            activity=activity,
            timestamp=self.sim_time_to_datetime(self.env.now),
            sorting_integer=self.event_counter,
            org_resource=resource,
            linked_objects=linked_objects
        )
        self.events.append(event)
        return event
    
    def decide_gateway(self, gateway_name: str, option_a: str, option_b: str) -> str:
        """Make a gateway decision based on probabilities."""
        gateway_config = self.gateways.get(gateway_name, {})
        prob_a = gateway_config.get(option_a, 0.5)
        return option_a if random.random() < prob_a else option_b
    
    def check_deviation(self, deviation_name: str) -> bool:
        """Check if a deviation should occur."""
        deviation_config = self.deviations.get(deviation_name, {})
        prob = deviation_config.get('probability', 0.0)
        return random.random() < prob
    
    def process_application(self, app_id: int):
        """Simulate the processing of a single application."""
        
        # === Create Application Object ===
        is_parent_independent = self.decide_gateway(
            'parent_data_required', 'not_required', 'required'
        ) == 'not_required'
        
        housing_type = random.choice(['Eltern', 'Alleine'])
        
        application = Application(
            application_id=f"APP_{app_id:06d}",
            student_id=f"STU_{app_id:06d}",
            is_initial_application=True,
            is_parent_independent=is_parent_independent,
            housing_type=housing_type,
            status="Pending"
        )
        self.applications.append(application)
        
        # === Generate Documents ===
        documents, self.doc_counter = generate_documents(application, self.doc_counter)
        self.documents.extend(documents)
        
        # Helper to get object links
        def app_link():
            return [(application.application_id, 'Application')]
        
        def all_doc_links():
            return [(doc.document_id, 'Document') for doc in documents]
        
        # === START: Application started ===
        self.record_event("Application started", app_link())
        
        # === Gateway: Parent Data Required? ===
        if not is_parent_independent:
            # Need parent documents
            yield self.env.timeout(get_activity_duration("Request Parent Data"))
            self.record_event("Request Parent Data", app_link(), "System")
            
            yield self.env.timeout(get_activity_duration("Receive Parent Data"))
            # Mark Formblatt 3 as received
            for doc in documents:
                if doc.doc_type == "Formblatt 3":
                    doc.status = "Received"
                    doc.submission_date = self.sim_time_to_datetime(self.env.now)
            self.record_event("Receive Parent Data", app_link() + [(d.document_id, 'Document') for d in documents if d.doc_type == "Formblatt 3"], "System")
        
        # === Send Application Mail ===
        yield self.env.timeout(get_activity_duration("Send Application Mail"))
        self.record_event("Send Application Mail", app_link(), "System")
        
        # === Receive Application ===
        yield self.env.timeout(get_activity_duration("Receive Application"))
        # Mark base documents as received
        for doc in documents:
            if doc.doc_type in ["Formblatt 1", "Immatrikulationsbescheinigung"]:
                doc.status = "Received"
                doc.submission_date = self.sim_time_to_datetime(self.env.now)
        received_docs = [d for d in documents if d.doc_type in ["Formblatt 1", "Immatrikulationsbescheinigung"]]
        self.record_event("Receive Application", app_link() + [(d.document_id, 'Document') for d in received_docs], "System")
        
        # === Document Review Loop ===
        documents_complete = False
        review_iterations = 0
        max_iterations = 3  # Prevent infinite loops
        
        # Check for deviation: Skip Review Document
        skip_review = self.check_deviation('review_skip')
        
        while not documents_complete and review_iterations < max_iterations:
            review_iterations += 1
            
            if not skip_review:
                # === Review Document ===
                yield self.env.timeout(get_activity_duration("Review Document"))
                self.record_event("Review Document", app_link() + all_doc_links(), "Clerk")
            else:
                # Skip review - this is a deviation!
                skip_review = False  # Only skip once
            
            # === Gateway: Documents Missing? ===
            decision = self.decide_gateway('documents_missing', 'complete', 'missing')
            
            if decision == 'complete' or review_iterations >= max_iterations:
                documents_complete = True
            else:
                # Request missing documents
                yield self.env.timeout(get_activity_duration("Request Missing Documents"))
                missing_docs = [d for d in documents if d.status == 'Missing']
                self.record_event("Request Missing Documents", app_link() + [(d.document_id, 'Document') for d in missing_docs[:1]], "Clerk")
                
                # Receive documents
                yield self.env.timeout(get_activity_duration("Receive Missing Documents"))
                for doc in missing_docs:
                    doc.status = "Received"
                    doc.submission_date = self.sim_time_to_datetime(self.env.now)
                self.record_event("Receive Missing Documents", app_link() + [(d.document_id, 'Document') for d in missing_docs], "System")
        
        # === Check for deviation: Direct Rejection ===
        if self.check_deviation('direct_rejection'):
            # Direct rejection - shortened path
            yield self.env.timeout(get_activity_duration("Send Rejection"))
            application.status = "Rejected"
            self.record_event("Send Rejection", app_link(), "Clerk")
            self.record_event("Application handled", app_link())
            return
        
        # === Assess Application ===
        yield self.env.timeout(get_activity_duration("Assess Application"))
        self.record_event("Assess Application", app_link(), "Clerk")
        
        # === Gateway: Eligibility Decision? ===
        eligibility = self.decide_gateway('eligibility_decision', 'approved', 'rejected')
        
        if eligibility == 'approved':
            # === Calculate Claim ===
            yield self.env.timeout(get_activity_duration("Calculate Claim"))
            self.record_event("Calculate Claim", app_link(), "Clerk")
            
            # === Send Notification ===
            yield self.env.timeout(get_activity_duration("Send Notification"))
            application.status = "Approved"
            self.record_event("Send Notification", app_link(), "Clerk")
        else:
            # === Send Rejection ===
            yield self.env.timeout(get_activity_duration("Send Rejection"))
            application.status = "Rejected"
            self.record_event("Send Rejection", app_link(), "Clerk")
        
        # === END: Application handled ===
        self.record_event("Application handled", app_link())
    
    def get_interarrival_time(self) -> float:
        """Get interarrival time based on current simulated datetime."""
        current_dt = self.sim_time_to_datetime(self.env.now)
        current_hour = current_dt.hour
        day_name = current_dt.strftime('%a')  # Mon, Tue, etc.
        
        interarrival_config = self.config.get('interarrival', {})
        
        # Check weekend first
        if day_name in ['Sat', 'Sun']:
            cfg = interarrival_config.get('weekend', {})
            mean = cfg.get('mean_minutes', 300)
        # Weekday time slots
        elif 8 <= current_hour < 16:
            cfg = interarrival_config.get('weekday_08_16', {})
            mean = cfg.get('mean_minutes', 120)
        elif 16 <= current_hour < 21:
            cfg = interarrival_config.get('weekday_16_21', {})
            mean = cfg.get('mean_minutes', 30)  # Peak!
        elif 21 <= current_hour < 24:
            cfg = interarrival_config.get('weekday_21_24', {})
            mean = cfg.get('mean_minutes', 180)
        else:
            # Night hours (0-8): Very low activity
            mean = 240
        
        return np.random.exponential(mean)
    
    def arrival_generator(self, num_cases: int):
        """Generate arrivals based on time-dependent interarrival times."""
        for i in range(num_cases):
            interarrival = self.get_interarrival_time()
            
            yield self.env.timeout(interarrival)
            self.env.process(self.process_application(i))
            
            # Progress indicator
            if (i + 1) % 500 == 0:
                print(f"  Started {i + 1}/{num_cases} applications...")
    
    def run(self, num_cases: int):
        """Run the simulation."""
        print(f"Starting simulation with {num_cases} cases...")
        
        self.env.process(self.arrival_generator(num_cases))
        
        # Run until all processes complete (estimate max time)
        max_time = num_cases * 500 + 100000  # generous buffer
        self.env.run(until=max_time)
        
        print(f"Simulation complete!")
        print(f"  - Applications: {len(self.applications)}")
        print(f"  - Documents: {len(self.documents)}")
        print(f"  - Events: {len(self.events)}")

## 7. Run Simulation

In [43]:
# Create and run simulation
sim = BAfoegSimulation(config, START_DATE)
sim.run(NUM_CASES)

Starting simulation with 2000 cases...
  Started 500/2000 applications...
  Started 1000/2000 applications...
  Started 1500/2000 applications...
  Started 2000/2000 applications...
Simulation complete!
  - Applications: 2000
  - Documents: 8279
  - Events: 24248


## 8. Export OCEL

In [44]:
def export_ocel(sim: BAfoegSimulation, output_dir: Path):
    """Export simulation results to OCEL CSV files."""
    
    # Celonis timestamp format: yyyy-MM-dd HH:mm:ss
    CELONIS_TIMESTAMP_FORMAT = '%Y-%m-%d %H:%M:%S'
    
    # === Build event-to-application mapping for case_sorting ===
    # Create a mapping: event_id -> application_id (primary object)
    event_to_app = {}
    for event in sim.events:
        for obj_id, obj_type in event.linked_objects:
            if obj_type == 'Application':
                event_to_app[event.event_id] = obj_id
                break  # Only one application per event
    
    # Group events by application and assign case_sorting_integer
    from collections import defaultdict
    app_events = defaultdict(list)
    for event in sim.events:
        app_id = event_to_app.get(event.event_id)
        if app_id:
            app_events[app_id].append(event)
    
    # Sort events within each case by timestamp and global sorting_integer
    event_case_sorting = {}
    for app_id, events in app_events.items():
        # Sort by timestamp first, then by global sorting_integer for tie-breaking
        sorted_events = sorted(events, key=lambda e: (e.timestamp, e.sorting_integer))
        for idx, event in enumerate(sorted_events, start=1):
            event_case_sorting[event.event_id] = idx
    
    # === events.csv (with case_id for classic Process Mining) ===
    events_data = []
    for e in sim.events:
        d = e.to_dict()
        d['case_id'] = event_to_app.get(e.event_id, '')  # Add case_id (application_id)
        d['case_sorting_integer'] = event_case_sorting.get(e.event_id, 0)
        events_data.append(d)
    
    events_df = pd.DataFrame(events_data)
    events_df['timestamp'] = pd.to_datetime(events_df['timestamp']).dt.strftime(CELONIS_TIMESTAMP_FORMAT)
    # Reorder columns: case_id first for Celonis
    cols = ['case_id', 'event_id', 'activity', 'timestamp', 'sorting_integer', 'case_sorting_integer', 'org_resource']
    events_df = events_df[cols]
    events_df.to_csv(output_dir / 'events.csv', index=False, sep=';')
    print(f"‚úÖ events.csv: {len(events_df)} events (with case_id & case_sorting_integer)")
    
    # === applications.csv ===
    apps_data = [a.to_dict() for a in sim.applications]
    apps_df = pd.DataFrame(apps_data)
    apps_df.to_csv(output_dir / 'applications.csv', index=False, sep=';')
    print(f"‚úÖ applications.csv: {len(apps_df)} applications")
    
    # === documents.csv ===
    docs_data = [d.to_dict() for d in sim.documents]
    docs_df = pd.DataFrame(docs_data)
    # Format submission_date for Celonis
    docs_df['submission_date'] = pd.to_datetime(docs_df['submission_date']).dt.strftime(CELONIS_TIMESTAMP_FORMAT)
    docs_df.to_csv(output_dir / 'documents.csv', index=False, sep=';')
    print(f"‚úÖ documents.csv: {len(docs_df)} documents")
    
    # === event_object_link.csv ===
    links_data = []
    for event in sim.events:
        for obj_id, obj_type in event.linked_objects:
            links_data.append({
                'event_id': event.event_id,
                'object_id': obj_id,
                'object_type': obj_type
            })
    links_df = pd.DataFrame(links_data)
    links_df.to_csv(output_dir / 'event_object_link.csv', index=False, sep=';')
    print(f"‚úÖ event_object_link.csv: {len(links_df)} links")
    
    # === log_not_sliced.csv (intermediate format) ===
    # Similar to example_log_not_sliced.csv
    not_sliced_data = []
    for event in sim.events:
        row = {
            'ocel:eid': event.event_id,
            'ocel:timestamp': event.timestamp.strftime(CELONIS_TIMESTAMP_FORMAT),
            'ocel:activity': event.activity,
            'ocel:type:Application': str([obj_id for obj_id, obj_type in event.linked_objects if obj_type == 'Application']),
            'ocel:type:Document': str([obj_id for obj_id, obj_type in event.linked_objects if obj_type == 'Document'])
        }
        not_sliced_data.append(row)
    
    not_sliced_df = pd.DataFrame(not_sliced_data)
    not_sliced_df.to_csv(output_dir / 'log_not_sliced.csv', index=False)
    print(f"‚úÖ log_not_sliced.csv: {len(not_sliced_df)} rows")
    
    return {
        'events': events_df,
        'applications': apps_df,
        'documents': docs_df,
        'links': links_df,
        'not_sliced': not_sliced_df
    }


# Export
dfs = export_ocel(sim, OUTPUT_DIR)
print(f"\nüìÅ Output written to: {OUTPUT_DIR}")

‚úÖ events.csv: 24248 events (with case_id & case_sorting_integer)
‚úÖ applications.csv: 2000 applications
‚úÖ documents.csv: 8279 documents
‚úÖ event_object_link.csv: 48027 links
‚úÖ log_not_sliced.csv: 24248 rows

üìÅ Output written to: /Users/davidderr/Desktop/pm4py-bpmn-simulation/data/outputs/ocel


## 9. Statistics & Validation

In [45]:
# Application status distribution
status_counts = dfs['applications']['status'].value_counts()
print("Application Status Distribution:")
print(status_counts)
print(f"\nApproval Rate: {status_counts.get('Approved', 0) / len(dfs['applications']) * 100:.1f}%")

print("\n" + "="*50)

# Parent independence
parent_counts = dfs['applications']['is_parent_independent'].value_counts()
print("\nParent Independence:")
print(parent_counts)
print(f"Parent-Independent Rate: {parent_counts.get(True, 0) / len(dfs['applications']) * 100:.1f}%")

print("\n" + "="*50)

# Document types
print("\nDocument Types:")
print(dfs['documents']['doc_type'].value_counts())

print("\n" + "="*50)

# Activity distribution
print("\nActivity Distribution:")
print(dfs['events'].groupby('activity').size().sort_values(ascending=False))

Application Status Distribution:
status
Approved    1515
Rejected     485
Name: count, dtype: int64

Approval Rate: 75.8%


Parent Independence:
is_parent_independent
False    1634
True      366
Name: count, dtype: int64
Parent-Independent Rate: 18.3%


Document Types:
doc_type
Formblatt 1                      2000
Immatrikulationsbescheinigung    2000
Formblatt 3                      1634
Einkommensnachweis Eltern        1634
Mietbescheinigung                1011
Name: count, dtype: int64


Activity Distribution:
activity
Review Document              3723
Application handled          2000
Application started          2000
Receive Application          2000
Send Application Mail        2000
Receive Missing Documents    1927
Request Missing Documents    1927
Assess Application           1888
Receive Parent Data          1634
Request Parent Data          1634
Calculate Claim              1515
Send Notification            1515
Send Rejection                485
dtype: int64


## 10. Preview Output Files

In [34]:
print("=== events.csv (first 10 rows) ===")
display(dfs['events'].head(10))

print("\n=== applications.csv (first 5 rows) ===")
display(dfs['applications'].head(5))

print("\n=== documents.csv (first 10 rows) ===")
display(dfs['documents'].head(10))

print("\n=== event_object_link.csv (first 15 rows) ===")
display(dfs['links'].head(15))

print("\n=== log_not_sliced.csv (first 10 rows) ===")
display(dfs['not_sliced'].head(10))

=== events.csv (first 10 rows) ===


Unnamed: 0,event_id,activity,timestamp,sorting_integer,org_resource
0,E_000001,Application started,2024-09-15 02:20:46,1,System
1,E_000002,Request Document,2024-09-15 02:38:01,2,Clerk
2,E_000003,Application started,2024-09-15 17:23:49,3,System
3,E_000004,Request Document,2024-09-15 17:45:25,4,Clerk
4,E_000005,Application started,2024-09-15 18:14:41,5,System
5,E_000006,Request Document,2024-09-15 18:36:35,6,Clerk
6,E_000007,Application started,2024-09-16 04:18:03,7,System
7,E_000008,Request Document,2024-09-16 04:35:54,8,Clerk
8,E_000009,Receive Document,2024-09-16 07:07:50,9,System
9,E_000010,Send Application Mail,2024-09-16 07:13:47,10,System



=== applications.csv (first 5 rows) ===


Unnamed: 0,application_id,student_id,is_initial_application,is_parent_independent,housing_type,status
0,APP_000000,STU_000000,True,False,Eltern,Approved
1,APP_000001,STU_000001,True,False,Eltern,Rejected
2,APP_000002,STU_000002,True,False,Eltern,Approved
3,APP_000003,STU_000003,True,False,Eltern,Approved
4,APP_000004,STU_000004,True,False,Eltern,Approved



=== documents.csv (first 10 rows) ===


Unnamed: 0,document_id,application_id,doc_type,doc_category,status,submission_date
0,DOC_000000,APP_000000,Formblatt 1,Antrag,Received,2024-09-16 07:14:55
1,DOC_000001,APP_000000,Immatrikulationsbescheinigung,Identit√§t,Received,2024-09-16 07:14:55
2,DOC_000002,APP_000000,Formblatt 3,Einkommen,Received,2024-09-16 07:07:50
3,DOC_000003,APP_000000,Einkommensnachweis Eltern,Einkommen,Missing,
4,DOC_000004,APP_000001,Formblatt 1,Antrag,Received,2024-09-16 17:55:49
5,DOC_000005,APP_000001,Immatrikulationsbescheinigung,Identit√§t,Received,2024-09-16 17:55:49
6,DOC_000006,APP_000001,Formblatt 3,Einkommen,Received,2024-09-16 17:45:25
7,DOC_000007,APP_000001,Einkommensnachweis Eltern,Einkommen,Received,2024-09-23 09:18:46
8,DOC_000008,APP_000002,Formblatt 1,Antrag,Received,2024-09-16 18:42:50
9,DOC_000009,APP_000002,Immatrikulationsbescheinigung,Identit√§t,Received,2024-09-16 18:42:50



=== event_object_link.csv (first 15 rows) ===


Unnamed: 0,event_id,object_id,object_type
0,E_000001,APP_000000,Application
1,E_000002,APP_000000,Application
2,E_000003,APP_000001,Application
3,E_000004,APP_000001,Application
4,E_000005,APP_000002,Application
5,E_000006,APP_000002,Application
6,E_000007,APP_000003,Application
7,E_000008,APP_000003,Application
8,E_000009,APP_000000,Application
9,E_000009,DOC_000002,Document



=== log_not_sliced.csv (first 10 rows) ===


Unnamed: 0,ocel:eid,ocel:timestamp,ocel:activity,ocel:type:Application,ocel:type:Document
0,E_000001,2024-09-15 02:20:46,Application started,['APP_000000'],[]
1,E_000002,2024-09-15 02:38:01,Request Document,['APP_000000'],[]
2,E_000003,2024-09-15 17:23:49,Application started,['APP_000001'],[]
3,E_000004,2024-09-15 17:45:25,Request Document,['APP_000001'],[]
4,E_000005,2024-09-15 18:14:41,Application started,['APP_000002'],[]
5,E_000006,2024-09-15 18:36:35,Request Document,['APP_000002'],[]
6,E_000007,2024-09-16 04:18:03,Application started,['APP_000003'],[]
7,E_000008,2024-09-16 04:35:54,Request Document,['APP_000003'],[]
8,E_000009,2024-09-16 07:07:50,Receive Document,['APP_000000'],['DOC_000002']
9,E_000010,2024-09-16 07:13:47,Send Application Mail,['APP_000000'],[]


## 11. Celonis Import Instructions

To import in Celonis:

1. **Upload Files**: Upload all 4 CSV files to a Celonis Data Pool

2. **Create Data Model**:
   - Activity Table: `events.csv`
     - Case Key: Link via `event_object_link` ‚Üí `applications`
     - Activity: `activity`
     - Timestamp: `timestamp`
     - Sorting: `sorting_integer`
   
3. **Object Tables**:
   - `applications.csv` - Primary object
   - `documents.csv` - Secondary object (linked via `application_id`)

4. **Link Table**: `event_object_link.csv`
   - Links events to both Application and Document objects

5. **Foreign Keys**:
   - `documents.application_id` ‚Üí `applications.application_id`
   - `event_object_link.event_id` ‚Üí `events.event_id`
   - `event_object_link.object_id` ‚Üí `applications.application_id` (when object_type = 'Application')
   - `event_object_link.object_id` ‚Üí `documents.document_id` (when object_type = 'Document')