# Data Science Workflow with LouieAI

This notebook demonstrates a complete data science workflow using the LouieAI notebook interface.

## Setup

In [None]:
import os

import matplotlib.pyplot as plt

# Credential check
if not os.environ.get('GRAPHISTRY_USERNAME'):
    raise RuntimeError(
        "Set GRAPHISTRY_USERNAME and GRAPHISTRY_PASSWORD "
        "environment variables"
    )

from louieai.notebook import lui

# Set up plotting style
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

## 1. Data Generation and Loading

Let's create a realistic dataset for analysis:

In [None]:
# Generate a sales dataset
lui("""
Create a realistic sales dataset with 1000 rows containing:
- date (last 12 months)
- product_id
- product_name
- category (Electronics, Clothing, Home, Sports)
- price
- quantity_sold
- customer_region (North, South, East, West)
- discount_percentage
- revenue (calculated as price * quantity * (1 - discount_percentage/100))

Make it realistic with seasonal trends and regional preferences.
""")

# Get the generated data
sales_df = lui.df
if sales_df is not None:
    print(f"Dataset shape: {sales_df.shape}")
    print(f"\nColumns: {list(sales_df.columns)}")
    print(f"\nData types:\n{sales_df.dtypes}")
    display(sales_df.head())

## 2. Exploratory Data Analysis

In [None]:
# Get basic statistics
lui("Provide summary statistics for the sales data, "
    "including total revenue by category")

# Display any generated summary tables
if lui.df is not None:
    print("Summary by Category:")
    display(lui.df)

In [None]:
# Analyze seasonal trends
lui("""
Analyze seasonal trends in the sales data:
1. Calculate monthly revenue
2. Identify the best and worst performing months
3. Show revenue trends by category over time
Return the monthly summary as a table.
""")

monthly_data = lui.df
if monthly_data is not None:
    display(monthly_data)

    # Visualize the trends
    if 'month' in monthly_data.columns or 'date' in monthly_data.columns:
        plt.figure(figsize=(12, 6))
        # Plot logic would go here based on actual column names
        plt.title('Monthly Revenue Trends')
        plt.show()

## 3. Regional Analysis

In [None]:
# Analyze regional performance
lui("""
Perform a regional analysis:
1. Calculate total revenue and average order value by region
2. Identify the most popular product category in each region
3. Calculate the effectiveness of discounts by region
Show results as a comprehensive table.
""")

regional_analysis = lui.df
if regional_analysis is not None:
    print("Regional Performance Analysis:")
    display(regional_analysis)

print(f"\nKey insights: {lui.text}")

## 4. Product Performance Analysis

In [None]:
# Find top performers
lui("""
Identify the top 10 best-selling products by:
1. Total revenue
2. Total quantity sold
3. Average discount given

Also identify products that might need attention (low sales despite high discounts).
""")

top_products = lui.df
if top_products is not None:
    print("Top 10 Products by Revenue:")
    display(top_products.head(10))

## 5. Advanced Analytics

In [None]:
# Enable traces to see the analysis process
lui.traces = True

# Perform correlation analysis
lui("""
Perform a correlation analysis to understand:
1. Relationship between discount percentage and quantity sold
2. Price elasticity by category
3. Regional preferences for different price points

Provide actionable insights for pricing strategy.
""")

# Turn off traces
lui.traces = False

# Display results
if lui.df is not None:
    display(lui.df)

print("\nStrategic Insights:")
print(lui.text)

## 6. Forecasting

In [None]:
# Simple forecasting
lui("""
Based on the historical data:
1. Project next month's revenue by category
2. Identify which products are trending up vs down
3. Recommend inventory adjustments

Show projections as a table with confidence levels.
""")

projections = lui.df
if projections is not None:
    print("Revenue Projections:")
    display(projections)

# Get recommendations
print("\nRecommendations:")
print(lui.text)

## 7. Export Results

Save your analysis results for reporting:

In [None]:
# Compile all insights
lui("""
Create an executive summary table with:
1. Total revenue YTD
2. Top performing category and region
3. Average discount rate and its impact
4. Key recommendations (top 3)
5. Revenue projection for next quarter
""")

executive_summary = lui.df
if executive_summary is not None:
    print("Executive Summary:")
    display(executive_summary)

    # Save to CSV
    # executive_summary.to_csv('executive_summary.csv', index=False)
    # print("\nSummary saved to executive_summary.csv")

## Working with Historical Analysis

The notebook interface keeps your entire analysis history:

In [None]:
# Access any previous analysis
print("Analysis steps performed:")
for i in range(-5, 0):
    try:
        response_text = lui[i].text
        if response_text:
            preview = (response_text[:100] + "..."
                      if len(response_text) > 100
                      else response_text)
            print(f"\nStep {i}: {preview}")
    except Exception:
        break

# Combine multiple dataframes
all_dataframes = []
for i in range(-10, 0):
    try:
        df = lui[i].df
        if df is not None:
            all_dataframes.append(df)
    except Exception:
        break

print(f"\nFound {len(all_dataframes)} dataframes in history")

## Best Practices

1. **Be specific** about output format - ask for "table", "dataframe", or "list"
2. **Enable traces** for complex analyses to understand the reasoning
3. **Check for errors** when working with external data
4. **Use history** to access previous results without re-running analyses
5. **Save intermediate results** for reproducibility

Next: Try the fraud investigation notebook for pattern detection techniques!