# Advanced Data Upload with LouieAI

This notebook demonstrates all the ways to upload data to LouieAI for analysis.

In [None]:
import numpy as np
import pandas as pd

from louieai.notebook import lui

## DataFrame Upload

Upload pandas DataFrames for AI-powered analysis.

In [None]:
# Create a sample DataFrame
df = pd.DataFrame(
    {
        "date": pd.date_range("2024-01-01", periods=30, freq="D"),
        "sales": np.random.randn(30).cumsum() + 100,
        "visitors": np.random.randint(50, 200, 30),
    }
)

# Pattern 1: Prompt first (recommended for clarity)
lui("What are the key trends in this data?", df)

# Pattern 2: DataFrame first (concise for simple operations)
lui(df, "show summary statistics")

# Pattern 3: Specify serialization format
lui("Analyze this", df, format="csv")  # Options: parquet (default), csv, json, arrow

## Image Upload

Multiple ways to upload images for analysis.

In [None]:
# Method 1: File path (file must exist)
# lui("What's in this image?", "path/to/image.png")

# Method 2: Raw bytes
# with open("image.jpg", "rb") as f:
#     image_bytes = f.read()
# lui("Describe this image", image_bytes)

# Method 3: BytesIO (file-like object)
# import io
# image_buffer = io.BytesIO(image_data)
# lui("Analyze this", image_buffer)

# Method 4: PIL/Pillow Image
# from PIL import Image
# img = Image.open("photo.jpg")
# lui("What objects are in this photo?", img)

## Document Upload

Upload PDFs, Word docs, Excel files, and more.

In [None]:
# PDF files
# lui("Summarize this document", "report.pdf")

# Excel files (as binary, not DataFrame)
# lui("Extract tables from this Excel file", "data.xlsx")

# Word documents
# lui("What are the key points?", "document.docx")

# PowerPoint presentations
# lui("Summarize this presentation", "slides.pptx")

# From bytes
# with open("document.pdf", "rb") as f:
#     pdf_bytes = f.read()
# lui("Extract key information", pdf_bytes)

## Loading Data from Files

Common patterns for loading and analyzing data files.

In [None]:
# CSV files
# df = pd.read_csv('sales_data.csv')
# lui("Find insights in this sales data", df)

# Excel files as DataFrames
# df = pd.read_excel('report.xlsx', sheet_name='Sales')
# lui("What patterns do you see?", df)

# JSON files
# df = pd.read_json('data.json')
# lui("Analyze this JSON data", df)

# Parquet files
# df = pd.read_parquet('large_dataset.parquet')
# lui("Summarize this dataset", df)

## Advanced Options

Additional parameters for upload operations.

In [None]:
# Specify parsing options for DataFrames
# lui("Analyze", df,
#     format="csv",
#     parsing_options={"delimiter": ";", "header": True})

# Control the agent used for processing
# lui("Analyze", df, agent="UploadAgent")  # Uses LLM for parsing
# lui("Analyze", df, agent="UploadPassthroughAgent")  # Direct parsing (default)

# Thread management
# response = lui("First analysis", df)
# thread_id = response.thread_id
# lui("Follow-up question", df, thread_id=thread_id)  # Continue same thread

## File Detection Details

How LouieAI detects different file types:

### Image Detection
- **File extensions**: .png, .jpg, .jpeg, .gif, .bmp, .webp, .svg
- **Byte signatures**: PNG (89 50 4E 47), JPEG (FF D8 FF), GIF (47 49 46 38)
- **PIL Images**: Automatically detected if Pillow is installed

### Document Detection
- **File extensions**: .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx
- **Byte signatures**: PDF (%PDF), ZIP-based Office files (PK 03 04)
- **File-like objects**: Anything with a .read() method

### DataFrame Detection
- **Type check**: isinstance(obj, pd.DataFrame)
- **Serialization formats**: parquet (default), csv, json, arrow

## Tips and Best Practices

1. **Use descriptive prompts** - Be specific about what analysis you want
2. **Check file paths** - Ensure files exist before passing paths
3. **Handle large files** - Use bytes or file-like objects for better memory management
4. **Format selection** - Use parquet for DataFrames (fastest and preserves types)
5. **Error handling** - Check `lui.has_errors` after uploads