# Getting Started with BI Data Analysis

This notebook demonstrates basic data analysis workflows for Business Intelligence data.

## Objectives
- Load and explore sample data
- Perform basic analysis
- Visualize results
- Understand common BI metrics

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

## 1. Create Sample Business Data

For learning purposes, we'll create sample sales data:

In [None]:
# Create sample sales data
np.random.seed(42)

dates = pd.date_range('2024-01-01', '2024-12-31', freq='D')
n_records = len(dates)

data = {
    'date': dates,
    'sales': np.random.randint(1000, 5000, n_records) + np.arange(n_records) * 2,
    'customers': np.random.randint(50, 200, n_records),
    'region': np.random.choice(['North', 'South', 'East', 'West'], n_records),
    'product': np.random.choice(['Product A', 'Product B', 'Product C'], n_records)
}

df = pd.DataFrame(data)
print(f"Created dataset with {len(df)} records")
df.head(10)

## 2. Explore the Data

In [None]:
# Basic statistics
print("Dataset Info:")
print(df.info())
print("\nStatistical Summary:")
df.describe()

## 3. Calculate Key Metrics

In [None]:
# Calculate key business metrics
metrics = {
    'Total Sales': df['sales'].sum(),
    'Average Daily Sales': df['sales'].mean(),
    'Total Customers': df['customers'].sum(),
    'Average Order Value': df['sales'].sum() / df['customers'].sum(),
}

print("Key Business Metrics:")
print("-" * 40)
for metric, value in metrics.items():
    print(f"{metric}: ${value:,.2f}")

## 4. Visualize Sales Trends

In [None]:
# Plot sales over time
plt.figure(figsize=(12, 6))
plt.plot(df['date'], df['sales'], alpha=0.5, label='Daily Sales')
plt.plot(df['date'], df['sales'].rolling(window=7).mean(), 
         color='red', linewidth=2, label='7-Day Moving Average')
plt.xlabel('Date')
plt.ylabel('Sales ($)')
plt.title('Sales Trend Over Time')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 5. Regional Analysis

In [None]:
# Analyze by region
regional_sales = df.groupby('region')['sales'].agg(['sum', 'mean', 'count'])
regional_sales.columns = ['Total Sales', 'Avg Sales', 'Number of Days']
regional_sales = regional_sales.sort_values('Total Sales', ascending=False)

print("Regional Performance:")
print(regional_sales)

# Visualize regional sales
plt.figure(figsize=(10, 6))
regional_sales['Total Sales'].plot(kind='bar', color='skyblue')
plt.xlabel('Region')
plt.ylabel('Total Sales ($)')
plt.title('Total Sales by Region')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

## 6. Product Analysis

In [None]:
# Analyze by product
product_sales = df.groupby('product')['sales'].agg(['sum', 'mean'])
product_sales.columns = ['Total Sales', 'Avg Sales']
product_sales = product_sales.sort_values('Total Sales', ascending=False)

print("Product Performance:")
print(product_sales)

# Create pie chart
plt.figure(figsize=(8, 8))
plt.pie(product_sales['Total Sales'], labels=product_sales.index, 
        autopct='%1.1f%%', startangle=90)
plt.title('Sales Distribution by Product')
plt.axis('equal')
plt.show()

## 7. Time-based Patterns

In [None]:
# Add time-based features
df['month'] = df['date'].dt.month
df['day_of_week'] = df['date'].dt.day_name()

# Monthly sales
monthly_sales = df.groupby('month')['sales'].sum()

plt.figure(figsize=(12, 5))
monthly_sales.plot(kind='bar', color='coral')
plt.xlabel('Month')
plt.ylabel('Total Sales ($)')
plt.title('Monthly Sales Performance')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

## Next Steps

This notebook demonstrated basic BI data analysis. To continue learning:

1. **Try with real data**: Connect to an actual database or BI platform
2. **Advanced analytics**: Add forecasting, anomaly detection
3. **AI integration**: Use LLMs to generate insights automatically
4. **Interactive dashboards**: Create dashboards using Plotly or Streamlit
5. **Automated reporting**: Schedule this analysis to run regularly

## Additional Exercises

- Calculate year-over-year growth
- Identify top-performing days
- Segment customers by value
- Build a simple forecasting model
- Create a correlation analysis between metrics