# Power BI + Python — Complete Tutorial (Compact, Single Markdown + Single Code Cell)

This notebook provides a **concise, end-to-end** tutorial for using **Python with Microsoft Power BI**.
It includes ONE long Markdown cell (this one) and ONE long Code cell so the content is **not split across many cells**.

---

## A) Where Python Runs in Power BI

1. **Power Query (Transform)** — Python script returns a `pandas.DataFrame` → table in model.
2. **Python Visuals (Report Canvas)** — PBI injects a DataFrame named `dataset` to the script → static PNG render.
3. **External Automation** — Use Python to open/close Desktop, call REST API, or orchestrate refreshes.

---

## B) Reports, Dashboards, Apps (Workflow)

- **Report** → multi-page analytics
- **Dashboard** → one-page summary of pinned tiles
- **App** → packaged reports/dashboards for broad distribution

**Flow**: Build in Desktop → Publish → Pin to Dashboard → (Optional) App.

---

## C) Interaction & Design Principles

- Cross-filter, drill-down/drillthrough, Q&A, alerts, bookmarks.
- KPIs first → trends → details. Label units and fiscal context.
- Minimize slicers; keep design clean and consistent.
- Pair descriptive visuals with predictive (forecasts) for executive clarity.

---

## D) Library Ecosystem & Constraints

**Libraries**: `pandas`, `numpy`, `scikit-learn`, `statsmodels`, `prophet`, `matplotlib`, `seaborn`, `plotly` (static in visuals),
`wordcloud`, `nltk`, `spacy`, `subprocess`, `psutil`, `requests`, `msal`, `powerbiclient`.

**Desktop vs Service**:
- Service runs Python on Gateway host; mirror Python versions/libs.
- Network access often restricted in Service.
- Python visuals are static images in both Desktop and Service.

---

## E) What the Code Cell Provides

- Environment setup hint
- Synthetic dataset that mimics the Power BI `dataset` object
- Power Query–style preprocessing
- Bar plot, Regression example, Holt–Winters smoothing, Correlation heatmap, Word cloud
- Optional Windows function to open/close Power BI Desktop
- Quick reference tables (libraries & architecture)

> In Power BI Python visuals, replace the synthetic `dataset` with the PBI-provided `dataset`.


In [None]:
# (Optional) Install missing packages:
# !pip install pandas numpy matplotlib seaborn scikit-learn statsmodels wordcloud psutil

import math
import random
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from statsmodels.tsa.holtwinters import ExponentialSmoothing

np.random.seed(42)
random.seed(42)
plt.rcParams["figure.dpi"] = 120

# --- Synthetic dataset (acts like Power BI 'dataset') ---
years = np.arange(2010, 2025)
agencies = ["Army", "Navy", "Air Force", "Marines", "DoD (C)"]
titles = ["Operations & Maintenance", "Procurement", "RDT&E", "Military Personnel"]

rows = []
for year in years:
    t = year - years.min()
    for agency in agencies:
        base_ba = 20 + agencies.index(agency) * 5
        trend = 1.8 * t
        season = 0.6 * math.sin(t / 2.5)
        noise = np.random.normal(0, 1.2)
        ba = max(5.0, base_ba + trend + season + noise)
        obligations = ba * np.clip(np.random.normal(0.92, 0.06), 0.75, 1.05)
        outlays = obligations * np.clip(np.random.normal(0.96, 0.03), 0.85, 1.02)
        title = random.choice(titles)
        date = pd.Timestamp(year=year, month=5, day=15)
        words = [
            "readiness","modernization","sustainment","training","logistics","fleet",
            "airframe","shipyard","munitions","cyber","AI","ISR","joint","coalition",
            "resilience","infrastructure","compliance","audit","controls","innovation"
        ]
        justification = " ".join(random.choices(words, k=16)).capitalize() + "."
        rows.append({
            "Agency": agency,
            "FiscalYear": int(year),
            "BudgetAuthority": round(ba, 2),
            "Obligations": round(obligations, 2),
            "Outlays": round(outlays, 2),
            "Title": title,
            "Date": date,
            "Justification": justification
        })
dataset = pd.DataFrame(rows)

# --- Power Query–style preprocessing ---
df = dataset.copy()
df["Date"] = pd.to_datetime(df["Date"])
df["FiscalYear"] = df["FiscalYear"].astype(int)
for c in ["BudgetAuthority","Obligations","Outlays"]:
    df[c] = pd.to_numeric(df[c], errors="coerce").fillna(0)

grouped_title = (
    df.groupby("Title", as_index=False)["Obligations"]
      .sum().sort_values("Obligations", ascending=False)
)
print("Obligations by Title (synthetic):")
display(grouped_title)

# --- Bar plot ---
plt.figure(figsize=(6.5, 3.8))
sns.barplot(x="Agency", y="BudgetAuthority", data=dataset)
plt.xticks(rotation=20)
plt.title("Budget Authority by Agency (All Years)")
plt.xlabel("Agency"); plt.ylabel("Budget Authority ($B)")
plt.tight_layout(); plt.show()

# --- Regression ---
X = dataset[["FiscalYear"]]
y = dataset["Outlays"]
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=7)
model = LinearRegression().fit(X_tr, y_tr)
pred = model.predict(X_te)
r2 = r2_score(y_te, pred)
print(f"R^2 = {r2:.3f}")
plt.figure(figsize=(6.5, 3.8))
plt.plot(X_te, y_te, "o", label="Actual")
plt.plot(X_te, pred, "s", label="Predicted")
plt.title(f"Outlay Forecast (Linear Regression) — R^2={r2:.3f}")
plt.xlabel("Fiscal Year"); plt.ylabel("Outlays ($B)")
plt.legend(); plt.tight_layout(); plt.show()

# --- Holt–Winters smoothing ---
ts = dataset.groupby("FiscalYear", as_index=False)["Outlays"].sum().sort_values("FiscalYear")
ts.index = pd.to_datetime(ts["FiscalYear"].astype(str) + "-09-30")
series = ts["Outlays"].astype(float)
hw = ExponentialSmoothing(series, trend="add", seasonal=None).fit()
plt.figure(figsize=(7.0, 4.0))
plt.plot(series, label="Actual")
plt.plot(hw.fittedvalues, label="Fitted")
plt.title("Holt–Winters Smoothing — Total Outlays by Fiscal Year")
plt.xlabel("Fiscal Year"); plt.ylabel("Outlays ($B)")
plt.legend(); plt.tight_layout(); plt.show()

# --- Correlation heatmap ---
numeric = dataset[["BudgetAuthority","Obligations","Outlays","FiscalYear"]]
plt.figure(figsize=(5.5, 4.4))
sns.heatmap(numeric.corr(), annot=True, fmt=".2f", cmap="coolwarm")
plt.title("Correlation Matrix — Key Measures")
plt.tight_layout(); plt.show()

# --- Word cloud ---
text_blob = " ".join(dataset["Justification"].astype(str).tolist())
cloud = WordCloud(width=900, height=420, background_color="white").generate(text_blob)
plt.figure(figsize=(8.0, 4.2))
plt.imshow(cloud, interpolation="bilinear")
plt.axis("off")
plt.title("Frequent Terms — Budget Justifications (Synthetic)")
plt.tight_layout(); plt.show()

# --- Optional (Windows): open/close Power BI Desktop ---
import time, subprocess, psutil

def open_powerbi_report_auto(report_path: str,
                             powerbi_path: str = r"C:\\Program Files\\Microsoft Power BI Desktop\\bin\\PBIDesktop.exe",
                             duration: int = 300) -> None:
    rp = Path(report_path)
    if not rp.exists():
        raise FileNotFoundError(f"Report not found: {rp}")
    proc = subprocess.Popen([powerbi_path, str(rp)], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    print(f"Opened {rp.name}; waiting {duration}s...")
    time.sleep(duration)
    closed = False
    for p in psutil.process_iter(["pid","name"]):
        name = p.info.get("name")
        if name and "PBIDesktop" in name:
            try:
                p.terminate(); p.wait(timeout=10); closed = True
            except psutil.TimeoutExpired:
                p.kill(); closed = True
    print("Closed Power BI Desktop." if closed else "No PBIDesktop process found.")

# --- Quick reference tables ---
libraries = pd.DataFrame({
    "Category": [
        "Core Analytics", "Machine Learning", "Time-Series", "Visualization",
        "Text/NLP", "Automation", "Integration"
    ],
    "Primary Libraries": [
        "pandas, numpy",
        "scikit-learn, statsmodels",
        "ExponentialSmoothing (statsmodels), prophet",
        "matplotlib, seaborn, plotly",
        "wordcloud, nltk, spacy",
        "subprocess, psutil, requests",
        "msal, powerbiclient"
    ],
    "Usage": [
        "ETL, joins, aggregation, cleaning",
        "Regression, classification, clustering",
        "Trend modeling, smoothing/forecasting",
        "Static charts in PBI Python visuals",
        "Word clouds, entities, sentiment",
        "Open/close PBIDesktop, REST calls",
        "Power BI REST API, embed in notebooks"
    ]
})
print("\\nLibrary Reference:")
display(libraries)

architecture = pd.DataFrame({
    "Power BI Layer": [
        "Power Query (ETL)",
        "Data Model (DAX/Tabular)",
        "Report/Visuals",
        "Dashboards (Service)",
        "Automation/Scheduling",
        "REST & Azure Integration"
    ],
    "Python Role": [
        "pandas/numpy transforms during import",
        "Precompute features before DAX",
        "matplotlib/seaborn/plotly (static images)",
        "Pin Python visuals and KPI tiles",
        "psutil/subprocess for desktop tasks",
        "msal/powerbiclient for APIs/embedding"
    ]
})
print("\\nArchitecture Reference:")
display(architecture)
