### **Documentation for the `Cross_bbl_sheet_construction.py`**

**Three core functions that help us obtain our curves from barchart**

* `parse_barchart_symbols` (line 29)
* `fetch_barchart_prices` (line 72)
* `clean_and_format_df` (line 115)

---
### Function: `parse_barchart_symbols`  *(line 29)*

**Purpose**  
Generates the list of futures tickers (symbols) that correspond to the exact flat price futures we want to scrape from **barchart.com**.

- Takes the base ticker (e.g. `"RB"` for RBOB gasoline).
- Appends the month code + 2-digit year → `"RBQ25"` (Aug ’25 RBOB contract).
- Produces a full forward list of symbols (e.g. `["RBQ25", "RBU25", ..., "RBZ27"]`).

**Parameters**
- `base_code` (str): Futures root ticker (e.g. `"RB"`, `"IIH"`, `"JSE"`).
- `start_year` (int, optional): Defaults to current year.
- `end_year` (int, optional): Defaults to 24 months forward.
- `start_month` (str, optional): Month code to begin from (defaults to current month).

**Returns**
- `list[str]`: Ordered list of symbols to feed into scraping (used in `fetch_barchart_prices`).

**Notes**
- Uses `month_map` dictionary to translate between month codes and calendar months.
- Skips months *before* `start_month` if in the `start_year`.
- Does not validate if the symbol actually exists on exchange — that happens later.

---
### Function: `fetch_barchart_prices`  *(line 72)*

**Purpose**  
Takes the list of symbols generated by `parse_barchart_symbols` and scrapes the **last traded/settlement price** for each from **barchart.com**.

- Builds the URL for each symbol (`https://www.barchart.com/futures/quotes/{symbol}/overview`).
- Uses **`requests`** to fetch the HTML, and **BeautifulSoup** to parse the page.
- *(Tip: you can open up your browser’s developer tools → Inspect → to see where the `data-ng-init` JSON lives on the page)*.
- Looks inside `<div class="symbol-header-info">` → finds the `data-ng-init` attribute (contains JSON).
- Extracts the `"lastPrice"` field, cleans it, and stores it in a DataFrame.

**Parameters**
- `symbols` (list[str]): List of symbols (from `parse_barchart_symbols`).
- `base_url` (str, optional): URL template, defaults to `https://www.barchart.com/futures/quotes/{}/overview`.

**Returns**
- `pd.DataFrame` with:
  - `symbol` → the contract ticker (`RBQ25`, `IIHU25`, etc.).
  - `last_price` → latest available flat price *(usually the previous day’s exchange settlement)*.

**Notes**
- Skips over symbols where no price is found (`None`, `"N/A"`, `"-"`).
- Robust to site hiccups — failures are caught and the loop continues.
- Critical dependency: **BeautifulSoup** (`bs4`) for parsing the HTML.

---
### Function: `clean_and_format_df`  *(line 115)*

**Purpose**  
Takes the raw data from `fetch_barchart_prices` and tidies it into a clean table with `month`, `year`, and a clearly named price column.  
This makes the data easier to read, sort, and merge with other curves.

**Inputs**
- `df_raw` *(DataFrame)*: Must have `symbol` and `last_price`.
- `label` *(str)*: Name to give the price column (e.g. `"0.5-Sing flat ($/kt)"`).

**Outputs**
- A DataFrame with:
  - `month` (e.g. `Aug`)  
  - `year` (e.g. `2025`)  
  - your chosen price column

**Steps**
1. Works out how many letters the base ticker has (2 if it starts with `RB`, else 3).  
2. Pulls out the month code and maps it to a month name (`F → Jan`, `Q → Aug`, etc.).  
3. Pulls out the year (last two digits of the symbol) and converts it to a full year (e.g. `25 → 2025`).  
4. Renames `last_price` to the label you gave, sorts the rows by date, and returns the result.

**Notes**
- Any rows with bad month codes are dropped.  
- If you add new tickers with different lengths, you may need to adjust the rule for base length.

---
### Grouping the benchmarks across the barrel

Before applying the three helper functions to pull the flat price data, it’s useful to explain how the code is organised.  
The benchmarks are grouped roughly by density (heavier → lighter) along the crude barrel, with line references from the script:

**1. Fuel oils (most dense group) – *lines 143 to 161***  
- 0.5% Singapore (IIH)  
- 0.5% Barges (IID)  
- 380cst Singapore (JSE)  
- 3.5% ARA Barges (JUV)  
- *180cst Singapore is missing from Barchart – if it ever appears it would be a valuable addition.*  

> Barchart also provides the 3.5% barge crack (refinery margin).  
> We combine this with the flat price to derive an **implied Brent swap curve** (covered later in *lines 185–196*).  

**2. Middle distillates – *lines 202 to 224***  
- 10ppm Singapore Gasoil (JSG)  
- LSGO (LF)  
- *Singapore Kerosene/Jet is missing – would be a useful addition if ever available.*  

**3. Gasoline – *lines 228 to 250***  
- M92 (J1N)  
- EBOB (J7H)  
- RBOB (RB)  
- *All key benchmarks already covered here.*  

**4. Naphtha – *lines 255 to 263***  
- MOPJ (JJA) – Japanese naphtha forward curve  
  - Often blended into M92 by Asian refiners/blenders.  
  - (Optional: add European Naphtha CIF NWE if ever required, but unlikely.)  

**5. Propane / LPG (least dense group)**  
- Not currently included.  
- Could be added after the Naphtha section if needed in future.  

---

Each group is pulled in turn using the three core functions:  
1. Generate tickers with `parse_barchart_symbols`.  
2. Scrape flat prices with `fetch_barchart_prices`.  
3. Clean and label with `clean_and_format_df`.  

**Example: pulling the 0.5% Singapore flat price curve (IIH)**  
```python
# 0.5 Sing flat price curve (IIH)
symbols_05 = parse_barchart_symbols("IIH")
df_05_raw = fetch_barchart_prices(symbols_05)
df_05 = clean_and_format_df(df_05_raw, "0.5-Sing flat ($/kt)")

---
### Combining the benchmark groups

At the end of each benchmark group (Fuel oils, Middle distillates, Gasoline, Naphtha),
the cleaned DataFrames are merged together into a single `merged_df`.

- The merge is done on **`month`** and **`year`**, which are common keys across all groups.
- This way, we build one master table that holds all flat price curves side by side.
- The merged dataset then serves as the **base** for all further calculations:
  - Time spreads (M1/M2, Jul/Aug style, etc.)
  - Crack spreads (product vs. Brent).
  - Geographical spreads (E/W, arb flows).  
  - Cross-barrel blends.

In short:
> *Individual DataFrames → merged into `merged_df` → used as the foundation for time spread, geographical spread and crack analysis.*

---
### Time spreads: where this lives and how to extend it - line 274 onwards

* Gives us an indication of the magnitude of contango or backwardation on certain parts of the curve

**Location in script**
- **Line 274**: Starts the first example TS calc (e.g. 0.5‑Sing Jul vs Aug style spread).
- **Line 298**: Defines `flat_columns` — the master list of *flat price* columns to compute TS for.
- **Lines 312–317**: Loop that computes TS for every entry in `flat_columns` and adds the results to the pricing sheet.

**What happens**
1. At **line 274**, we show the pattern for one product (e.g. `"0.5-Sing flat ($/kt)"`):  
   current month minus next month → creates `"0.5-Sing TS ($/kt)"`.
2. At **line 298**, we list **all** flat price columns that should get a TS column.
3. At **lines 312–317**, the code loops that list and, for each flat price column, creates a matching TS column by doing  
   `this_month_flat - next_month_flat` (using `.shift(-1)`).

**How to add a new product to time spreads**
- Make sure the new flat price column already exists in `merged_df` and follows the naming pattern:  
  `"<Product> flat (<units>)"`, e.g. `"Sing Kero flat ($/bbl)"`.
- Append that exact column name to `flat_columns` (at **line 298**).  
  The loop at **lines 312–317** will then automatically create:
  - `"<Product> TS (<same units>)"`
  - with values = *current month flat* − *next month flat*

**Example: adding Singapore Kerosene (if/when available)**
```python
# Line 298: extend the list
flat_columns = [
    "0.5-Sing flat ($/kt)",
    "380 Sing flat ($/kt)",
    "0.5-Barges flat ($/kt)",
    "3.5-Barges flat ($/kt)",
    "10ppm flat ($/bbl)",
    "LSGO flat ($/kt)",
    "M92 flat ($/bbl)",
    "EBOB flat ($/kt)",
    "RBOB flat ($/gal)",
    "MOPJ flat ($/kt)",
    # "Sing Kero flat ($/bbl)"   # <-- new product (example)
]

---
### Building crack spreads (refinery margins) - line 316 onwards

Cracks show a refiner’s margin from turning crude into usable products.  
For a given month:

crack = product price (usd/bbl) − Brent swap (usd/bbl)

> Make sure the product month and the Brent month align.

Some product curves are already quoted in usd/bbl, others come in usd/kt or usd/gal.  
Before comparing to the Implied Brent swap (usd/bbl), we convert where needed.

Where this is in the script:
- Cracks are built from line ~316 (after time spreads).
- Lines 338–373 apply the conversion rules, compute cracks, and merge them into the DataFrame.


#### Unit conversion cheat sheet

**$/kt products (divide by barrels per kiloton - kt):**

| Product group / curve                      | Units | Factor |
|--------------------------------------------|-------|--------|
| Propane / LPG (future)                     | $/kt  | 12.4   |
| Naphtha (MOPJ)                             | $/kt  | 8.9 (we also use 9.0) |
| Gasoline (EBOB)                            | $/kt  | 8.33   |
| Middle distillates (LSGO)                  | $/kt  | 7.45   |
| Fuel oil (0.5-Sing, 0.5-Barges, 380-Sing)  | $/kt  | 6.35   |

**$/gal products (multiply):**

| Product group / curve | Units | Factor |
|-----------------------|-------|--------|
| US Gasoline (RBOB)    | $/gal | 42.0   |

**Already in $/bbl (no conversion needed):**
- 10ppm Singapore gasoil  
- M92 gasoline  
- (For future reference) Singapore Jet/Kero  


**Example crack calculation**

For a product quoted in $/kt$:  
crack = (product price usd/kt ÷ conversion factor) − Brent swap usd/bbl  

For RBOB quoted in $/gal$:  
crack = (product price usd/gal × 42) − Brent swap usd/bbl

#### Conversion dictionaries in code

```python
# $/kt → $/bbl - line 324
conversion_factors = {
    "380 Sing flat ($/kt)": 6.35,
    "0.5-Barges flat ($/kt)": 6.35,
    "0.5-Sing flat ($/kt)": 6.35,
    "LSGO flat ($/kt)": 7.45,
    "EBOB flat ($/kt)": 8.33,
    "MOPJ flat ($/kt)": 8.9   # For Naphtha (we also test 9.0 separately)
    # Add Propane/LPG here in future (12.4)
}

# $/gal → $/bbl
gal_conversion_factors = {
    "RBOB flat ($/gal)": 42.0
}

# Already $/bbl
no_conversion = ["10ppm flat ($/bbl)", "M92 flat ($/bbl)"]
# (Sing Jet/Kero would also go here if added later)

---
### Cross‑barrel blends (lines 378–394)

These blends compare products across the barrel to track relative value used by blenders/refiners.

**What we calculate now**
- **M92 vs MOPJ** (two variants using different Naphtha conversions):
  - `M92 v MOPJ ($/bbl) 8.9-conv`  
    - formula: M92 flat (usd/bbl) − [ MOPJ flat (usd/kt) ÷ 8.9 ]
  - `M92 v MOPJ ($/bbl) 9.0-conv`  
    - formula: M92 flat (usd/bbl) − [ MOPJ flat (usd/kt) ÷ 9.0 ]

**Why two numbers?**  
Naphtha density varies; the market often references both **8.9** and **9.0 bbl/kt**. We show both for completeness.

**Null handling**  
If either side is missing for a month (M92 or MOPJ), the result is left as `NaN`. That is expected near the back of the curve, also MOPJ curves only go more than 18 months out

**Other common blends we might add later**
- **Regrade**: Sing Jet/Kero (usd/bbl) − Sing 10ppm Gasoil (usd/bbl).  
  *Requires Jet/Kero curve (not in Barchart currently).*  
- **Visco**: 180cst Sing (usd/kt) − 380cst Sing (usd/kt).  
  *Requires 180cst curve (not in Barchart currently). Convert to $/bbl if you need to compare cracks.*

**How to add a new blend**
1. Confirm both series exist in `merged_df_for_blends` with aligned months.  
2. Ensure both are in the same units (convert $/kt → $/bbl if required).  
3. Create a new column with a clear name and simple subtraction. Example:  
   ```python
   merged_df_for_blends["Regrade ($/bbl)"] = (
       df["Sing Jet flat ($/bbl)"] - df["10ppm flat ($/bbl)"]
   )

---
### Geographical spreads (lines 396–436)

These spreads highlight regional pricing differences and product quality relationships.  
They are widely tracked by traders to understand **arbitrage flows**, **regional competitiveness**,  
and **relative value within product groups**.

**What we calculate:**

- **0.5 E/W ($/kt)**  
  - 0.5% Singapore vs 0.5% Barges.  
  - Captures East–West spread in the low-sulphur fuel oil market.  
  - Key for identifying arbitrage opportunities between Asia and Europe.

- **380 E/W ($/kt)**  
  - 380cst Singapore vs 3.5% Barges.  
  - Shows regional differentials in high-sulphur fuel oil pricing.  

- **Sing Hi-5 ($/kt)**  
  - 0.5% Singapore vs 380cst Singapore.  
  - A *quality spread* — tracks the premium of low-sulphur (0.5) over high-sulphur (380) fuel oil.  
  - Heavily used by refiners and ship owners for IMO 2020 compliance economics.  

- **E/W Gasoline ($/bbl)**  
  - M92 Singapore gasoline vs EBOB gasoline (converted to $/bbl).  
  - Measures East–West gasoline differential.  
  - Important for tracking arbitrage flows of gasoline cargoes between Europe and Asia.  

- **Gasoline ARB: RBOB vs EBOB**  
  - Two ways of showing the US vs European gasoline spread:  
    - `($/gal)` form: direct comparison in cents/gal.  
    - `($/bbl)` form: converted to a barrel basis for consistency with crude and cracks.  
  - Tracks the arbitrage economics between US Gulf/NY gasoline and European EBOB.  
  - Key for deciding whether transatlantic gasoline flows are open or closed.


**Purpose of these spreads**  
They help answer questions like:  
- *Is Asia or Europe paying more for the same grade of fuel?*  
- *Does it make sense to move gasoline across the Atlantic?*  
- *What’s the premium of low-sulphur over high-sulphur fuel oil in Singapore?*  

These spreads are crucial inputs for **trading strategy** and **arbitrage decisions**.

---
### Big clean‑up and presentation (lines 440–495)

This block builds the final table (`master_df`) that we export to Excel.

**1) Select the output schema**
- `columns_to_keep` lists every column (by group) that should appear in the final sheet:
  - Keys: `month`, `year`, and `Implied Brent swap (usd/bbl)`.
  - Fuel oils: flats / TS / cracks, E/W, Hi‑5, 3.5% barge crack.
  - Middle distillates: 10ppm + LSGO flats / TS / cracks.
  - Gasoline: M92, EBOB, RBOB (incl. E/W Gasoline and ARB in both usd/gal and usd/bbl).
  - Naphtha + blends: MOPJ flats/TS/cracks and both M92 v MOPJ variants (8.9, 9.0).

*Key idea:*  
Each product should be grouped together with its **flat price**, then its **time spread (TS)**, and then its **crack**.  

- This keeps the sheet logical and easy to scan.  
- Blends or cross-product spreads (e.g. Sing Hi-5 between 0.5-Sing and 380-Sing) can sit in between the main product groups.  
- Always confirm with the trader where any **new curves** should be placed before adding them — consistency in ordering matters for usability.  

Note: to add or remove items from the final sheet, edit `columns_to_keep` in line 441.

**2) Build `master_df`**
- Subset `merged_df_for_geo` to `columns_to_keep` only:
  - `master_df = merged_df_for_geo[columns_to_keep]`

**3) Fix unit labels on crack columns**
- Some crack columns were computed in usd/bbl but still labelled `(usd/kt)`.
- Rename so all cracks read `(usd/bbl)`.
- RBOB crack is also relabelled to: `RBOB crk (RBBR) ~ (usd/bbl)`.

**4) Number formatting**
- Columns quoted in usd/gal use 4 decimal places:
  - `RBOB flat (usd/gal)`, `RBOB TS (usd/gal)`, `Gasoline ARB ~ rbob v ebob (usd/gal)`.
- All other numeric columns use 2 decimal places.
- Non‑numeric or missing values are left as is (NaN stays NaN).

**Why this matters**
- A fixed schema keeps the sheet stable for downstream users.
- Correct labels avoid unit mix‑ups (especially cracks).
- Consistent rounding improves readability while keeping extra precision where usd/gal is used.

**Common tweaks**
- If you add a new series you want in the final Excel:
  1) Make sure it exists in `merged_df_for_geo`.  
  2) Add its exact column name to `columns_to_keep`. -   
  3) If it’s a crack column, ensure its label ends with `($/bbl)`.  
  4) If it’s another usd/gal series, add it to `rbob_cols_4dp` so it formats to 4 dp.

**Pitfalls to avoid**
- Typos in `columns_to_keep` will raise a KeyError when subsetting.
- Don’t round before calculations — round only at the end (as done here), avoids double rounding errors.

---
### Euro–Dollar forwards (lines 515–563)

**Purpose**
Pull EUR/USD forward rates as an early warning / context signal for geo‑spreads.  
We scrape a public page, parse the forwards table, and keep **bid / ask / mid** for a set of tenors.

**How it works**
- `fetch_forward_fx_rates(url, tenor_list)`:
  - requests the page, parses the HTML with BeautifulSoup,
  - finds each `tenor` row,
  - reads the next three cells (bid, ask, mid),
  - returns a DataFrame with columns: `tenor`, `bid`, `ask`, `mid` (rounded to 4 dp).
- The code builds a list of tenors and then calls the function:
  - `eur_dol_url = "https://www.fxempire.com/currencies/eur-usd/forward-rates"`
  - `eurdol_fwd_data = fetch_forward_fx_rates(eur_dol_url, tenor_eur_dol)`

**Editing the tenor list (line 553)**
- Always **check the website first** to confirm what tenor labels are available.
- To **add** a tenor: append the exact display name (as shown on the site) to `tenor_eur_dol`.
- To **remove** a tenor: delete it from the list.
- If a tenor name doesn’t match the page text, it will be skipped (you’ll see “not found” in the cell output).

*Sidenote:* The FX Empire site has crashed before while fetching data, which caused the script to error out.  
If that happens, just re-run the cell or script once the site is back up.


---
### Export to Excel (lines 567 to 621)

This block writes the final outputs into an Excel workbook.

**Sheets created**
- sheet 1: **Product Fwd curve** → from `master_df` (all forward curves, TS, cracks, spreads).
- sheet 2: **Euro-dollar Fwds** → from `eurdol_fwd_data` (forward FX curve).

**Formatting applied**
- **Numeric precision**:
  - RBOB series in gallons → **4 decimal places**.
  - All other numeric columns → **2 decimal places**.
- **Column widths** auto-sized based on header + data length.
- **Cells** are centered both vertically and horizontally.

**Layout tweaks**
- Freeze panes:
  - Product Fwd curve → freeze **first column** (keeps month visible when scrolling).
  - Euro-dollar Fwds → freeze **first row** (keeps headers visible).

**File naming*
- Output file named `cross_bbl_pricing_sheet_YYYY-MM-DD.xlsx`  
  (date automatically taken from when the script runs).


---
### Delivery mechanism (lines 627–end)

This function emails out the Excel workbook (`filename`) as an attachment.  
It uses **SMTP over SSL** to securely connect and send.

**Key config (lines 670–684)**
- **from_email / login**  
  - The sender’s email address (currently set to `vedantxyz1@gmail.com`).  
  - Must match the login used for SMTP.  
  - Will need to be updated if ownership of delivery is transferred (e.g. to an IRH Outlook domain).
- **to_email**  
  - List of primary recipients (traders / distribution list).  
  - Example: `["Ola.Hansson@irh.ae"]`.
- **cc / bcc**  
  - Optional additional recipients. Keep CC visible, use BCC for hidden copies.
- **smtp_server / smtp_port**  
  - Mail server settings:  
    - Gmail → `smtp.gmail.com`, port `465`.  
    - Future IRH Outlook → will require IRH’s SMTP server details (to be confirmed by IT).
- **password**  
  - Pulled securely from an environment variable: `EMAIL_PASSWORD`.  
  - In `.github/workflows/cross_bbl_runner.yml`, the password is set on **line 28**:

```yaml
- name: Run pricing script
  env:
    EMAIL_PASSWORD: "your_app_password_here"  # <- Line 28