# turboeda — Full Options Notebook

This notebook demonstrates all options available when using `turboeda` from a Jupyter environment.
It matches the current codebase (no optional PyArrow dependency).


## 0) Install

In [None]:
# pip install turboeda

## 1) Option reference (Python API)

| Argument | Type | Default | Description |
|---|---|---|---|
| `input_path` | `str` | **required** | Path to CSV/XLSX file. |
| `sep` | `str` | `","` | CSV delimiter (ignored for Excel). |
| `sheet` | `str \| None` | `None` | Excel sheet name. If `None`, the **first** sheet is used. |
| `sample_rows` | `int \| None` | `200000` | If set, randomly samples at most this many rows (speed). Use `None` for full data. |
| `max_corr_cols` | `int` | `40` | Max number of columns included in correlation heatmaps. |
| `max_numeric_plots` | `int` | `12` | Max numeric columns to plot histograms for. |
| `max_categorical_plots` | `int` | `12` | Max categorical columns to plot bar charts for. |
| `theme` | `str` | `"dark"` | Report theme: `'dark'` or `'light'`. |
| `auto_save_and_open` | `bool` | `False` | If `True`, **after** `run()` it **saves** the HTML and **opens** it in the default browser automatically. |
| `out_path` | `str \| None` | `None` | Custom output HTML path for auto-save. If `None`, uses `<input>_report.html`. |
| `open_target` | `str` | `"tab"` | Used with auto-open or `to_html(..., open_in_browser=True)`: `'tab'` or `'window'`. |
| `profile` | `str` | `"standard"` | Placeholder to choose different analysis depth presets (`quick|standard|deep`). |


## 2) Prepare input (demo fallback)
Set `input_path` to your file. If it doesn't exist, this cell creates a tiny demo CSV and an Excel file so the notebook can run end-to-end.


In [None]:
from pathlib import Path
import pandas as pd

input_path = Path("bankdataset.xlsx")  # change this to your dataset

if not input_path.exists():
    df_demo = pd.DataFrame({
        "age": [25, 32, 47, 51, 37, 29, 41, 33],
        "balance": [200, 1200, 5600, 3100, 450, 900, 7800, 1600],
        "is_active": [True, False, True, True, False, True, True, False],
        "segment": ["A", "B", "A", "C", "B", "A", "C", "B"],
        "opened": [
            "2021-01-10", "2021-03-02", "2020-12-29", "2022-05-14",
            "2022-05-30", "2023-04-01", "2021-07-18", "2020-10-09"
        ],
    })
    csv_path = Path("bankdataset.csv")
    df_demo.to_csv(csv_path, index=False)
    with pd.ExcelWriter(input_path) as xl:
        df_demo.to_excel(xl, sheet_name="Sheet1", index=False)
        (df_demo.sample(frac=1.0, random_state=42)).to_excel(xl, sheet_name="Another", index=False)
    print("Demo files created:")
    print(" -", csv_path.resolve())
    print(" -", input_path.resolve())
else:
    print("Using existing:", input_path.resolve())

## 3) Minimal example (defaults)
Uses the first Excel sheet automatically (or CSV delimiter detection).

In [None]:
from turboeda import EDAReport
from pathlib import Path

eda = EDAReport(
    input_path=input_path.as_posix(),
)
res_min = eda.run()

out_min = Path(input_path).with_name(f"{Path(input_path).stem}_report.html")
eda.to_html(out_min.as_posix(), open_in_browser=True)
out_min

## 4) Full example (all options)
Demonstrates setting every parameter explicitly, including auto-open after `run()`.

In [None]:
from turboeda import EDAReport
from pathlib import Path

full_out = Path(input_path).with_name("bankdataset_full_report.html")

eda_full = EDAReport(
    input_path=input_path.as_posix(),
    sep=",",                 # CSV only; ignored for Excel
    sheet=None,               # Excel: None -> use first sheet; otherwise provide a sheet name string
    sample_rows=None,         # None -> use all rows; or set an int to sample large datasets

    max_corr_cols=100_000,    # effectively unbounded for most datasets
    max_numeric_plots=10_000, # number of numeric histograms
    max_categorical_plots=10_000, # number of categorical bar charts

    theme="dark",            # or "light"

    # Auto save and open after run()
    auto_save_and_open=True,  # <-- saves and opens in browser automatically after run()
    out_path=full_out.as_posix(),
    open_target="tab",       # or "window"

    profile="standard",      # quick | standard | deep (placeholder)
)

res_full = eda_full.run()    # runs EDA, saves HTML to full_out, and opens it in default browser
full_out

## 5) Inline preview in Jupyter (IFrame)
Handy if you are working on a remote server without a GUI browser.

In [None]:
from IPython.display import IFrame
IFrame(src=str(full_out), width="100%", height=800)

## 6) Excel sheet selection example
If you want a specific sheet, pass it explicitly. Otherwise the **first** sheet is used by default.

In [None]:
from turboeda import EDAReport
from pathlib import Path

sheet_out = Path(input_path).with_name("bankdataset_sheet_report.html")
eda_sheet = EDAReport(
    input_path=input_path.as_posix(),
    sheet="Another",          # set a specific sheet
    theme="light",
)
res_sheet = eda_sheet.run()
eda_sheet.to_html(sheet_out.as_posix(), open_in_browser=False)
sheet_out

## 7) Where the HTML is saved
- If you **do not** specify an output file (API or CLI), turboeda uses: **`<input_basename>_report.html`** next to the input file.
- With `auto_save_and_open=True` + `out_path=None`, it applies the same default naming rule.
- With `out_path="custom.html"`, it saves to your custom path and opens that.

## 8) Tips
- Very large heatmaps or many per-variable charts can make the HTML heavy. Consider reducing `max_corr_cols`, `max_numeric_plots`, `max_categorical_plots` or sampling via `sample_rows`.
- Datetime detection is heuristic-based. For consistent parsing, pre-format timestamps upstream.
- Themes: the notebook uses `theme="dark"` by default; switch to `"light"` if preferred.