# üßπ SAP Report Cleaner (Colab Version)

**Dieses Tool bereinigt SAP-Reports und bereitet sie f√ºr die Datenanalyse vor.**

---

## Anleitung:

1. **Zelle 1 ausf√ºhren**: Klicken Sie auf ‚ñ∂Ô∏è links neben der Zelle (oder Shift+Enter)
2. **Google Drive verbinden**: Erlauben Sie den Zugriff wenn gefragt
3. **Quelldatei w√§hlen**: SAP-Report hochladen oder aus Drive w√§hlen
4. **Format w√§hlen**: Excel oder CSV
5. **Speicherort w√§hlen**: Ordner in Google Drive angeben
6. **Fertig!** Die bereinigte Datei ist in Ihrem Drive

---


In [None]:
#@title üöÄ **Ausf√ºhren - Klicken Sie auf ‚ñ∂Ô∏è links** { display-mode: "form" }
#@markdown ---
#@markdown ### Dieses Script:
#@markdown - Verbindet mit Google Drive
#@markdown - L√§sst Sie eine SAP-Datei ausw√§hlen
#@markdown - Bereinigt die Daten automatisch
#@markdown - Speichert die Ergebnisse in Ihrem Drive
#@markdown ---

# ============================================================
# IMPORTS UND SETUP
# ============================================================

import pandas as pd
import numpy as np
from datetime import datetime
from pathlib import Path
import os
import io

# Google Colab spezifische Imports
from google.colab import drive, files
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets

print("‚úÖ Module geladen")

# ============================================================
# KONFIGURATION
# ============================================================

EXPECTED_HEADERS = [
    'Material', 'Functional Loc.', 'Equipment', 'Material Description',
    'Work Ctr', 'Withdrawn', 'W/o resrv.', 'Reserved', 'Reserv.ref',
    'Pstng Date', 'Order', 'ID', 'Message', 'ICt', 'Customer'
]

TEXT_COLUMNS = ['Functional Loc.', 'Equipment', 'Material Description',
                'Work Ctr', 'ID', 'ICt', 'Customer']
DATE_COLUMN = 'Pstng Date'
NUMERIC_COLUMNS = ['Material', 'Withdrawn', 'W/o resrv.', 'Reserved',
                   'Reserv.ref', 'Order', 'Message']

# ============================================================
# HILFSFUNKTIONEN
# ============================================================

def clean_number(value):
    """Bereinigt Zahlenwerte aus SAP-Format."""
    if pd.isna(value) or value is None:
        return None
    val_str = str(value).strip()
    if val_str == '' or val_str == '-':
        return None
    val_str = val_str.replace('\xa0', '').replace(' ', '')
    if ',' in val_str and '.' in val_str:
        val_str = val_str.replace('.', '').replace(',', '.')
    elif ',' in val_str:
        parts = val_str.split(',')
        if len(parts) == 2 and len(parts[1]) <= 2:
            val_str = val_str.replace(',', '.')
        else:
            val_str = val_str.replace(',', '')
    elif '.' in val_str:
        parts = val_str.split('.')
        if len(parts) == 2 and len(parts[1]) == 3 and len(parts[0]) >= 1:
            val_str = val_str.replace('.', '')
    try:
        return int(round(float(val_str)))
    except ValueError:
        return None

def convert_date(value):
    """Konvertiert Datum aus SAP-Format."""
    if pd.isna(value) or value is None:
        return ''
    val_str = str(value).strip()
    if val_str == '':
        return ''
    for fmt in ['%d.%m.%y', '%d.%m.%Y', '%Y-%m-%d']:
        try:
            return datetime.strptime(val_str, fmt).strftime('%d.%m.%Y')
        except ValueError:
            continue
    return val_str

def process_sap_report(content):
    """Verarbeitet SAP-Report Inhalt."""
    lines = content.split('\n')
    all_rows = [line.split('\t') for line in lines]

    # Header finden
    header_row_idx, header_start_col = None, None
    for idx, row in enumerate(all_rows):
        for col_idx, cell in enumerate(row):
            if str(cell).strip().lower() == 'material':
                header_row_idx, header_start_col = idx, col_idx
                break
        if header_row_idx is not None:
            break

    if header_row_idx is None:
        header_row_idx, header_start_col = 3, 2

    # Daten verarbeiten
    cleaned_data, deleted_rows = [], []
    stats = {'total': 0, 'sum_rows': 0, 'empty': 0, 'no_material': 0, 'kept': 0}

    for row_idx in range(header_row_idx + 1, len(all_rows)):
        row = all_rows[row_idx]
        stats['total'] += 1

        if all(str(cell).strip() == '' for cell in row):
            stats['empty'] += 1
            continue

        col_b = str(row[1]).strip() if len(row) > 1 else ''
        if col_b in ['*', '**']:
            stats['sum_rows'] += 1
            deleted_rows.append({'Grund': 'Summenzeile', 'Zeile': row_idx + 1,
                                 'Daten': '\t'.join(str(c) for c in row)})
            continue

        data_row = [str(row[i]).strip() if i < len(row) else ''
                    for i in range(header_start_col, header_start_col + 15)]

        if not data_row[0]:
            stats['no_material'] += 1
            deleted_rows.append({'Grund': 'Keine Materialnummer', 'Zeile': row_idx + 1,
                                 'Daten': '\t'.join(data_row)})
            continue

        cleaned_data.append(data_row)
        stats['kept'] += 1

    # DataFrame erstellen
    df = pd.DataFrame(cleaned_data, columns=EXPECTED_HEADERS)
    df_deleted = pd.DataFrame(deleted_rows)

    # Datentypen konvertieren
    for col in NUMERIC_COLUMNS:
        if col in df.columns:
            df[col] = df[col].apply(clean_number)
    if DATE_COLUMN in df.columns:
        df[DATE_COLUMN] = df[DATE_COLUMN].apply(convert_date)

    return df, df_deleted, stats

print("‚úÖ Funktionen geladen")

# ============================================================
# GOOGLE DRIVE VERBINDEN
# ============================================================

print("\nüìÅ Verbinde mit Google Drive...")
drive.mount('/content/drive')
print("‚úÖ Google Drive verbunden!")

# ============================================================
# INTERAKTIVE OBERFL√ÑCHE
# ============================================================

# Globale Variablen
result_df = None
result_deleted = None
result_stats = None
source_filename = None

# UI Elemente
output_area = widgets.Output()

source_dropdown = widgets.Dropdown(
    options=['üì§ Vom Computer hochladen', 'üìÅ Aus Google Drive w√§hlen'],
    value='üì§ Vom Computer hochladen',
    description='Quelle:',
    style={'description_width': '80px'},
    layout=widgets.Layout(width='350px')
)

drive_path_input = widgets.Text(
    value='/content/drive/MyDrive/',
    description='Drive-Pfad:',
    style={'description_width': '80px'},
    layout=widgets.Layout(width='500px'),
    placeholder='z.B. /content/drive/MyDrive/Downloads/report.txt'
)

format_dropdown = widgets.Dropdown(
    options=['üìä Excel (.xlsx) - Mit gel√∂schten Zeilen', 'üìÑ CSV (.csv) - Nur bereinigte Daten'],
    value='üìä Excel (.xlsx) - Mit gel√∂schten Zeilen',
    description='Format:',
    style={'description_width': '80px'},
    layout=widgets.Layout(width='400px')
)

# Speicherort-Auswahl
save_location = widgets.Dropdown(
    options=['üíæ Auf meinen Computer herunterladen', 'üìÅ In Google Drive speichern'],
    value='üíæ Auf meinen Computer herunterladen',
    description='Speichern:',
    style={'description_width': '80px'},
    layout=widgets.Layout(width='400px')
)

save_path_input = widgets.Text(
    value='/content/drive/MyDrive/',
    description='Drive-Pfad:',
    style={'description_width': '80px'},
    layout=widgets.Layout(width='500px'),
    placeholder='z.B. /content/drive/MyDrive/Bereinigt/'
)

upload_btn = widgets.Button(description='üì§ Datei laden', button_style='primary',
                            layout=widgets.Layout(width='200px'))
process_btn = widgets.Button(description='üöÄ Verarbeiten & Speichern', button_style='success',
                             layout=widgets.Layout(width='200px'))
status_label = widgets.HTML(value='<b>Status:</b> Bereit')

def on_upload_click(b):
    global result_df, result_deleted, result_stats, source_filename
    with output_area:
        clear_output()
        if 'üì§' in source_dropdown.value:
            print("üì§ Bitte Datei ausw√§hlen...")
            uploaded = files.upload()
            if not uploaded:
                print("‚ùå Keine Datei ausgew√§hlt")
                return
            source_filename = list(uploaded.keys())[0]
            content = uploaded[source_filename].decode('utf-8', errors='replace')
        else:
            file_path = drive_path_input.value
            if not os.path.exists(file_path):
                print(f"‚ùå Datei nicht gefunden: {file_path}")
                return
            source_filename = os.path.basename(file_path)
            with open(file_path, 'r', encoding='utf-8', errors='replace') as f:
                content = f.read()

        print(f"üìÑ Datei geladen: {source_filename}")
        print("‚è≥ Verarbeite...")
        result_df, result_deleted, result_stats = process_sap_report(content)
        print(f"\n‚úÖ Verarbeitung abgeschlossen!")
        print(f"\nüìä Statistik:")
        print(f"   Bereinigte Zeilen: {result_stats['kept']}")
        print(f"   Summenzeilen entfernt: {result_stats['sum_rows']}")
        print(f"   Ohne Materialnr. entfernt: {result_stats['no_material']}")
        print(f"\nüìã Vorschau (erste 5 Zeilen):")
        display(result_df.head())
        status_label.value = f'<b>Status:</b> ‚úÖ {result_stats["kept"]} Zeilen bereit'

def on_process_click(b):
    global result_df, result_deleted, source_filename
    with output_area:
        if result_df is None:
            print("‚ùå Bitte zuerst eine Datei laden!")
            return
        base_name = Path(source_filename).stem
        
        # Pr√ºfe ob Download auf Computer oder Google Drive
        download_to_pc = 'üíæ' in save_location.value
        
        if download_to_pc:
            # Direkt auf Computer herunterladen
            if 'üìä' in format_dropdown.value:
                output_filename = f"{base_name}_cleaned.xlsx"
                output_path = f"/content/{output_filename}"
                with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
                    result_df.to_excel(writer, sheet_name='Bereinigte Daten', index=False)
                    if not result_deleted.empty:
                        result_deleted.to_excel(writer, sheet_name='Gel√∂schte Zeilen', index=False)
                print(f"\nüíæ Excel erstellt: {output_filename}")
            else:
                output_filename = f"{base_name}_cleaned.csv"
                output_path = f"/content/{output_filename}"
                result_df.to_csv(output_path, index=False, sep=';', encoding='utf-8-sig')
                print(f"\nüíæ CSV erstellt: {output_filename}")
            
            print("üì• Starte Download...")
            files.download(output_path)
            print(f"\n‚úÖ Fertig! Die Datei wird in Ihren Downloads-Ordner heruntergeladen.")
            status_label.value = f'<b>Status:</b> ‚úÖ Download gestartet!'
        else:
            # In Google Drive speichern
            save_dir = save_path_input.value.rstrip('/')
            if not os.path.exists(save_dir):
                os.makedirs(save_dir)
                print(f"üìÅ Ordner erstellt: {save_dir}")

            if 'üìä' in format_dropdown.value:
                output_path = f"{save_dir}/{base_name}_cleaned.xlsx"
                with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
                    result_df.to_excel(writer, sheet_name='Bereinigte Daten', index=False)
                    if not result_deleted.empty:
                        result_deleted.to_excel(writer, sheet_name='Gel√∂schte Zeilen', index=False)
                print(f"\nüíæ Excel gespeichert: {output_path}")
            else:
                output_path = f"{save_dir}/{base_name}_cleaned.csv"
                result_df.to_csv(output_path, index=False, sep=';', encoding='utf-8-sig')
                print(f"\nüíæ CSV gespeichert: {output_path}")

            print(f"\n‚úÖ Fertig! Die Datei ist in Ihrem Google Drive.")
            status_label.value = f'<b>Status:</b> ‚úÖ Gespeichert!'

upload_btn.on_click(on_upload_click)
process_btn.on_click(on_process_click)

# Layout anzeigen
print("\n" + "="*60)
print("  üìã SAP REPORT CLEANER")
print("="*60)
display(widgets.HTML('<h3>1Ô∏è‚É£ Quelldatei</h3>'))
display(source_dropdown)
display(widgets.HTML('<p><i>F√ºr Drive: Pfad eingeben, z.B. /content/drive/MyDrive/Downloads/report.txt</i></p>'))
display(drive_path_input)
display(upload_btn)
display(widgets.HTML('<h3>2Ô∏è‚É£ Ausgabeformat</h3>'))
display(format_dropdown)
display(widgets.HTML('<h3>3Ô∏è‚É£ Speicherort</h3>'))
display(save_location)
display(widgets.HTML('<p><i>Bei Google Drive - Ordner angeben:</i></p>'))
display(save_path_input)
display(widgets.HTML('<h3>4Ô∏è‚É£ Verarbeiten</h3>'))
display(process_btn)
display(status_label)
display(widgets.HTML('<hr><h3>üìä Ergebnis:</h3>'))
display(output_area)


---

## üí° Tipps

### üìÇ Datei ausw√§hlen (Upload)
- W√§hlen Sie **"Vom Computer hochladen"** ‚Üí Es √∂ffnet sich ein **Datei-Dialog**
- Navigieren Sie zu Ihrem **Downloads-Ordner** und w√§hlen Sie die SAP-Datei
- **Kein Pfad eingeben n√∂tig!**

### üíæ Datei speichern
| Option | Was passiert |
|--------|--------------|
| **Auf Computer herunterladen** | √ñffnet Browser-Download ‚Üí Datei landet in Downloads |
| **In Google Drive speichern** | Speichert direkt in Ihrem Drive-Ordner |

### Excel vs CSV:
| Format | Vorteile |
|--------|----------|
| **Excel** | 2 Tabellenbl√§tter (Daten + Gel√∂schte), direkt in Excel √∂ffnen |
| **CSV** | Kleiner, universell kompatibel |

---

## ‚ùì Probleml√∂sung

| Problem | L√∂sung |
|---------|--------|
| Download startet nicht | Popup-Blocker f√ºr colab.google.com deaktivieren |
| "Datei nicht gefunden" | Bei Drive: Pfad pr√ºfen |
| Keine Daten | Sicherstellen dass Datei Tab-getrennt ist (.txt) |

---

*SAP Report Cleaner v1.1 - Colab Version - Januar 2026*
