# run_rakesh_portfolio_sim.ipynb

This notebook fetches historical prices for a small set of real stocks (via `yfinance`), builds a Binary Quadratic Model (BQM) for a simple portfolio selection (0/1 per stock), runs a classical simulated annealer (`neal` or fallback), and prints top feasible portfolios within a budget. It is designed for local simulator-first development and is PR-ready for the `Qfinbox_portfolio` repo.

**How to use:**
- Open this notebook in JupyterLab / Jupyter Notebook.
- Ensure you are inside the repo virtualenv: `source .venv/bin/activate`.
- Install dependencies (next cell) and then run cells top-to-bottom.

> This notebook is simulation-only. Do **not** execute trades from results here. Tune penalty and budget before using for real decisions.

In [9]:
# Install dependencies (run this cell once in your environment)
# You can uncomment and run these lines in a notebook cell.
# Note: If 'dwave-neal' fails to install on your mac, use dimod fallback.
# pip install yfinance dimod dwave-neal numpy pandas

print('Run the pip install cell (uncomment) if you have not installed dependencies yet.')

Run the pip install cell (uncomment) if you have not installed dependencies yet.


In [10]:
import os
import sys
import numpy as np
import pandas as pd
import dimod
import yfinance as yf
from datetime import datetime

# Sampler fallback logic: try neal, then dwave.samplers, then dimod.reference
try:
    import neal
    SamplerClass = neal.SimulatedAnnealingSampler
    sampler_source = 'neal.SimulatedAnnealingSampler'
except Exception:
    try:
        from dwave.samplers import SimulatedAnnealingSampler
        SamplerClass = SimulatedAnnealingSampler
        sampler_source = 'dwave.samplers.SimulatedAnnealingSampler'
    except Exception:
        from dimod.reference.samplers import SimulatedAnnealingSampler
        SamplerClass = SimulatedAnnealingSampler
        sampler_source = 'dimod.reference.samplers.SimulatedAnnealingSampler'

print(f"[{datetime.now().isoformat()}] Using sampler: {sampler_source}")

[2025-11-24T10:44:51.299337] Using sampler: neal.SimulatedAnnealingSampler


## Configuration
Edit the cells below to set Rakesh's tickers and parameters. You can also provide a `tickers.csv` file (one ticker per line) in the same folder to override the default list.

In [11]:
# ----------------------------
# CONFIG — change these values
# ----------------------------
DEFAULT_TICKERS = ["JUBLFOOD.NS","POLYPLEX.NS", "SULA.NS", "IGPL.NS", "AWL.NS", "BUTTERFLY.NS"]
TICKERS_FILE = "tickers.csv"  # optional override file

BUDGET = 50000        # ₹ example; set Rakesh's budget
RISK_FACTOR = 1.0     # tune 0.0..2.0 (higher => more risk-averse)
HISTORY_PERIOD = "1y" # e.g., "6mo", "1y", "2y"
NUM_READS = 2000      # simulated annealer reads (increase for better exploration)
PENALTY_MULT = 1e3

print('Config loaded. Edit variables in this cell as needed.')

Config loaded. Edit variables in this cell as needed.


In [12]:
def load_tickers(defaults, filename):
    if os.path.exists(filename):
        try:
            df = pd.read_csv(filename, header=None)
            ticks = [str(x).strip() for x in df[0].tolist() if str(x).strip()]
            if len(ticks) >= 1:
                print(f"Loaded {len(ticks)} tickers from {filename}")
                return ticks
        except Exception as e:
            print('Failed to read tickers.csv, falling back to defaults:', e)
    print(f"Using default tickers list ({len(defaults)}):", defaults)
    return defaults

# load tickers
tickers = load_tickers(DEFAULT_TICKERS, TICKERS_FILE)
print('Tickers to use:', tickers)

Using default tickers list (6): ['JUBLFOOD.NS', 'POLYPLEX.NS', 'SULA.NS', 'IGPL.NS', 'AWL.NS', 'BUTTERFLY.NS']
Tickers to use: ['JUBLFOOD.NS', 'POLYPLEX.NS', 'SULA.NS', 'IGPL.NS', 'AWL.NS', 'BUTTERFLY.NS']


## Fetch historical prices
This cell downloads adjusted close prices using `yfinance`. It forward-fills missing data and drops rows with remaining NaNs. If data is too short, consider increasing `HISTORY_PERIOD`.

In [13]:
print("Fetching historical data from yfinance (period =", HISTORY_PERIOD, ") ...")

df_full = yf.download(
    tickers,
    period=HISTORY_PERIOD,
    interval="1d",
    progress=False,
    auto_adjust=False
)

# Ensure column exists
if "Adj Close" in df_full.columns:
    df_price = df_full["Adj Close"]
else:
    # fallback if auto_adjust=True or Yahoo returns only 'Close'
    print("Warning: 'Adj Close' not found, falling back to 'Close'")
    df_price = df_full["Close"]

# If single ticker => Series → convert to DataFrame
if isinstance(df_price, pd.Series):
    df_price = df_price.to_frame(tickers[0])

df_price = df_price.ffill().dropna(axis=0, how="any")

print("Historical rows collected:", df_price.shape[0])
latest_price = df_price.iloc[-1].values.astype(float)

daily_ret = df_price.pct_change().dropna()
mu_daily = daily_ret.mean().values
cov_daily = daily_ret.cov().values

# annualize
mu = mu_daily * 252.0
cov = cov_daily * 252.0

print("Latest prices:", dict(zip(tickers, np.round(latest_price,2))))
print("Sample annualized returns:", np.round(mu[:5],4))


Fetching historical data from yfinance (period = 1y ) ...
Historical rows collected: 250
Latest prices: {'JUBLFOOD.NS': np.float64(272.8), 'POLYPLEX.NS': np.float64(706.3), 'SULA.NS': np.float64(378.9), 'IGPL.NS': np.float64(585.85), 'AWL.NS': np.float64(851.55), 'BUTTERFLY.NS': np.float64(237.3)}
Sample annualized returns: [-0.0217 -0.0291 -0.3096 -0.0329 -0.2762]


## Build BQM
Objective: minimize energy = -sum(mu_i * x_i) + risk_factor * x^T cov x
Constraint (soft): sum(price_i * x_i) <= budget with penalty P*(violation)^2
We convert the penalty expansion into linear and quadratic terms and build `dimod.BinaryQuadraticModel`.

In [14]:
# Build BQM
n = len(tickers)
linear = {i: -float(mu[i]) for i in range(n)}
quad = {}
for i in range(n):
    for j in range(i, n):
        val = float(RISK_FACTOR * cov[i, j])
        if i == j:
            linear[i] = linear.get(i, 0.0) + val
        else:
            quad[(i, j)] = quad.get((i, j), 0.0) + val

max_lin = max(abs(v) for v in linear.values()) if linear else 1.0
P = max_lin * 10.0 / (max(latest_price) * max(1, n))
P = P * PENALTY_MULT
print(f'Using penalty P = {P:.3g} (heuristic). Tune if infeasible solutions appear.')

for i in range(n):
    linear[i] = linear.get(i, 0.0) + P * (latest_price[i] ** 2) - P * 2 * BUDGET * latest_price[i]
for i in range(n):
    for j in range(i + 1, n):
        quad[(i, j)] = quad.get((i, j), 0.0) + P * 2 * latest_price[i] * latest_price[j]

bqm = dimod.BinaryQuadraticModel(linear, quad, 0.0, vartype=dimod.BINARY)
print('BQM built: variables =', n, ', interactions =', len(quad))

Using penalty P = 1.1 (heuristic). Tune if infeasible solutions appear.
BQM built: variables = 6 , interactions = 15


## Sample with simulator
This cell runs the sampler (`neal` if available, otherwise fallback). Increase `NUM_READS` for better exploration. The cell aggregates unique solutions and evaluates feasibility and metrics.

In [15]:
sampler = SamplerClass()
print('Sampling with num_reads =', NUM_READS, '...')
sampleset = sampler.sample(bqm, num_reads=NUM_READS)

agg = sampleset.aggregate()
rows = []
for rec in agg.record:
    sample_vals = rec[0]
    energy = float(rec[1])
    sample = {i: int(sample_vals[idx]) for idx, i in enumerate(agg.variables)}
    x = np.array([sample[i] for i in range(n)])
    total_price = float(np.dot(latest_price, x))
    exp_return = float(np.dot(mu, x))
    risk = float(x @ cov @ x)
    rows.append({
        'picks': [tickers[i] for i in range(n) if x[i] == 1],
        'x': x,
        'price': total_price,
        'exp_return': exp_return,
        'risk': risk,
        'energy': energy
    })

import pandas as pd

df = pd.DataFrame(rows)
df = df.drop_duplicates(subset=['price','exp_return','risk','energy']).reset_index(drop=True)
df['feasible'] = df['price'] <= BUDGET
df = df.sort_values(['feasible','energy'], ascending=[False, True]).reset_index(drop=True)

print('\nTop feasible solutions (within budget):')
feasible = df[df['feasible']]
if feasible.empty:
    print('No feasible solution found. Try increasing penalty P or adjusting budget/inputs.')
else:
    for idx, r in feasible.head(10).iterrows():
        print(f"{idx+1}: picks={r['picks']}, price={r['price']:.2f}, exp_return={r['exp_return']:.4f}, risk={r['risk']:.4f}, energy={r['energy']:.4f}")

print('\nTop overall by energy (may include infeasible):')
for i, r in df.head(5).iterrows():
    print(f"{i+1}: picks={r['picks']}, price={r['price']:.2f}, feasible={r['feasible']}, energy={r['energy']:.4f}")

Sampling with num_reads = 2000 ...

Top feasible solutions (within budget):
1: picks=['JUBLFOOD.NS', 'POLYPLEX.NS', 'SULA.NS', 'IGPL.NS', 'AWL.NS', 'BUTTERFLY.NS'], price=3032.70, exp_return=-1.1287, risk=1.6465, energy=-323962490.9864

Top overall by energy (may include infeasible):
1: picks=['JUBLFOOD.NS', 'POLYPLEX.NS', 'SULA.NS', 'IGPL.NS', 'AWL.NS', 'BUTTERFLY.NS'], price=3032.70, feasible=True, energy=-323962490.9864


आपने 2000 बार simulated annealing sampler से BQM चलाया।
यह मॉडल हर शेयर को 0/1 (खरीदें / न खरीदें) के रूप में चुनता है, और साथ में risk, returns, budget सबको बैलेंस करता है।

अंतिम सरल सारांश

मॉडल ने 6 के 6 शेयरों को चुनना सबसे अच्छा पाया।(including JUBLFOOD.NS)

कुल लागत ~ ₹9994 है → बजट से कम → valid.

energy सबसे कम (सबसे अच्छा)।

expected return बहुत ज्यादा आया क्योंकि यह मॉडलिंग डेटा से annualized किया गया है।

risk-value और return-value दोनों अच्छे संतुलन में मिलने के कारण यही समाधान टॉप पर आया।

## How to switch to D-Wave QPU later
When you receive your Leap API token and configure auth (`dwave auth login` or env vars), replace the sampler creation cell with the following:

```python
from dwave.system import DWaveSampler, EmbeddingComposite
sampler = EmbeddingComposite(DWaveSampler())
# then call: sampleset = sampler.sample(bqm, num_reads=NUM_READS)
```

No other changes are required to the BQM or parsing logic.

## Notes & Next steps (for PR)
- This notebook is PR-ready: add it to `examples/` and reference it in README under "Simulator examples".
- Consider adding `requirements-dev.txt` listing `yfinance, dimod, dwave-neal (optional), numpy, pandas`.
- Add a short unit test that the notebook runs without raising (CI can use `nbconvert --execute` on a trimmed version).
- Tuning: penalty `P`, `NUM_READS`, and `RISK_FACTOR` are the main knobs.

