# Is your house worth it's weight in gold?

Now we've calculated how much a given house rose in value over time - the question is, would you have been better off buying gold?

In the `data` folder, we have the daily price of one ounce of gold in USD, called `XAUUSD` in trading terms (e`X`change rate of `AU` aka Gold to `USD`)

We want to find out what would have happened if instead of buying a house, we would have bought an equivalent sum of gold. Finally we want to compare what the average profit would be if everyone had invested in gold instead of buying a house.

For simplicity, use the daily Closing price.

```{note}
Since gold is priced in USD, and our houseprices are in GBP, there is also a USD/GBP daily prices in the `data/fx` folder to convert between currencies.
```

Breaking it down:

1. Load daily gold prices into Iceberg - for bonus points, apply a yearly partitioning scheme
2. Load exchange rates into Iceberg - for bonus points, apply a yearly partitioning scheme
3. Convert daily gold prices from USD to GBP
4. For each row in our `profits` table, calculate the equivalent amount of gold they would have been able to purchase on that date
5. Calculate the total value of that amount of gold on the sell date
6. Calculate the profit of our gold trade
7. Compare to Gold profit with Housing profit

In [1]:
import polars as pl
from utils import catalog, engine
from pyiceberg.schema import Schema, NestedField
from pyiceberg.types import DecimalType, DateType, StringType
from pyiceberg.partitioning import PartitionSpec, YearTransform, PartitionField
from IPython.display import display
pl.Config.set_thousands_separator(",")

polars.config.Config

## 1. Load daily gold prices into Iceberg
Start by creating the table in Iceberg. To organize things a bit better, I'm creating a new namespace `commodities` - we could imagine putting a table for oil prices or silver in here

In [2]:
catalog.create_namespace_if_not_exists("commodities")

gold_schema = Schema(
    NestedField(1, "date", DateType(), required=True, doc="Day of recorded price"),
    NestedField(
        2,
        "price",
        DecimalType(precision=38, scale=2),
        required=True,
        doc="Price in USD of one ounce of gold",
    ),
    identifier_field_ids=[1],
)

In [3]:
gold_prices_t = catalog.create_table_if_not_exists(
    "commodities.gold",
    schema=gold_schema,
    partition_spec=PartitionSpec(
        PartitionField(
            source_id=1, field_id=1, transform=YearTransform(), name="date_year"
        )
    ),
)

Next, read in the CSV, picking out the two columns we care about, remembering to convert to the correct schema, 

In [4]:
gold_prices = (
    pl.scan_csv("data/gold/daily_gold_prices.csv", separator=";", try_parse_dates=True)
    .select(pl.col("Date").alias("date"), pl.col("Close").alias("price"))
    .collect()
)
gold_prices

date,price
datetime[μs],f64
2004-06-11 00:00:00,384.1
2004-06-14 00:00:00,382.8
2004-06-15 00:00:00,388.6
2004-06-16 00:00:00,383.8
2004-06-17 00:00:00,387.6
…,…
2025-01-28 00:00:00,2763.17
2025-01-29 00:00:00,2759.68
2025-01-30 00:00:00,2794.06
2025-01-31 00:00:00,2799.23


In [5]:
gold_prices_t.append(gold_prices.to_arrow().cast(gold_schema.as_arrow()))

# 2. Load daily exchange rates into Iceberg

Similar process - create a new namespace and load the fx rates. Since this is technically a table of currency pairs, we can make the schema a bit more future-proof

In [6]:
catalog.create_namespace_if_not_exists("fx")

fx_schema = Schema(
    NestedField(1, "date", DateType(), required=True, doc="Day of recorded price"),
    NestedField(
        2,
        "from",
        StringType(),
        required=True,
        doc="From currency code of the currency pair",
    ),
    NestedField(
        3,
        "to",
        StringType(),
        required=True,
        doc="To currency code of the currency pair",
    ),
    NestedField(
        4,
        "exchange_rate",
        DecimalType(precision=38, scale=4),
        required=True,
        doc="Exchange rate of the currency pair",
    ),
    identifier_field_ids=[1, 2, 3],
)

Next, reading in the CSV file and transforming slightly to conform to our schema

In [7]:
fx_table = catalog.create_table_if_not_exists("fx.rates", schema=fx_schema)

In [8]:
fx_rates = (
    pl.scan_csv("data/fx/USD_GBP.csv", try_parse_dates=True)
    .with_columns(pl.lit("USD").alias("from"), pl.lit("GBP").alias("to"))
    .select(
        pl.col("date"),
        pl.col("from"),
        pl.col("to"),
        pl.col("close").cast(pl.Decimal(38, 4)).alias("exchange_rate"),
    )
    .collect()
)
fx_rates

date,from,to,exchange_rate
date,str,str,"decimal[38,4]"
2025-05-16,"""USD""","""GBP""",0.7529
2025-05-15,"""USD""","""GBP""",0.7516
2025-05-14,"""USD""","""GBP""",0.7540
2025-05-13,"""USD""","""GBP""",0.7516
2025-05-12,"""USD""","""GBP""",0.7590
…,…,…,…
2006-03-07,"""USD""","""GBP""",0.5760
2006-03-06,"""USD""","""GBP""",0.5713
2006-03-03,"""USD""","""GBP""",0.5696
2006-03-02,"""USD""","""GBP""",0.5703


In [9]:
fx_table.append(fx_rates.to_arrow().cast(fx_schema.as_arrow()))

## Business Logic
Next comes the business logic - the actual work of figuring out how much gold we could have bought and how much that would have sold for

In [10]:
sql = """
-- Convert USD gold prices into GBP denominated prices
with gold_prices as (
    select commodities.gold.date as gold_date, commodities.gold.price * fx.rates.exchange_rate as gold_price
    from commodities.gold
    join fx.rates on fx.rates.date = commodities.gold.date
), gold_purchase as (
-- Calculate how much gold (in ounces) the house owner could have bought at the time of purchase
    select address_id, 
    first_price / gold_price as purchased_gold
    from housing.profits
    join gold_prices on gold_date = housing.profits.first_day
), gold_sell as (
-- Calculate how much that amount of gold is worth at the time of sale, as well as the profit
    select housing.profits.address_id,
    cast((purchased_gold * gold_price) - first_price as DECIMAL(38, 2)) as gold_profit,
    cast(profit as DECIMAL(38, 2)) as house_profit
    from housing.profits
    join gold_purchase on gold_purchase.address_id = housing.profits.address_id
    join gold_prices on gold_prices.gold_date = last_day
)
select * from gold_sell
"""

In [11]:
gold_vs_house_profits = pl.read_database(sql, engine)
gold_vs_house_profits

address_id,gold_profit,house_profit
str,"decimal[38,2]","decimal[38,2]"
"""C93F0B6F6F82401AFF1C511FF7ACEF…",15498.03,39500.00
"""E7212FC4DD4DE6EEACEF3A2E069ACC…",40230.82,-22500.00
"""6083DCC9F2891595BED437CE8F7783…",13312.86,5000.00
"""996AD2EF5EFB730D42202B131BEB84…",77923.33,200000.00
"""97F1F1EB870F31583D8E6D74053B4C…",39107.51,100500.00
…,…,…
"""45B991C326E33134D67EF385A6789C…",75611.24,28500.00
"""383F1B41BEE83B46A5C8E2EDE3D15F…",53149.41,25050.00
"""EE32549865C4D4B42BB465C9B09569…",125473.63,93000.00
"""1F578411139E06388131C38AD563E7…",88080.35,51005.00


Now we have a per-address calculation, let's summarize the results

In [12]:
gold_vs_house_profits.select(pl.all().exclude("address_id")).describe()

statistic,gold_profit,house_profit
str,f64,f64
"""count""",1162593.0,1162593.0
"""null_count""",0.0,0.0
"""mean""",132744.096265,62504.278131
"""std""",177117.410554,111348.20833
"""min""",-918104.38,-11980000.0
"""25%""",44297.19,20000.0
"""50%""",90656.52,44900.0
"""75%""",166886.05,80000.0
"""max""",21050917.48,24992000.0


How many percent would have done better buying gold than a house?

In [13]:
with pl.Config(set_tbl_rows=100):
    summary_df = (
        gold_vs_house_profits.select(
            pl.col("gold_profit")
            .sub(pl.col("house_profit"))
            .qcut(100, labels=[f"Q{i + 1}" for i in range(100)], include_breaks=True)
        )
        .unnest("gold_profit")
        .unique()
        .sort("breakpoint")
    )
    display(summary_df)

breakpoint,category
f64,cat
-188196.0712,"""Q1"""
-125502.33,"""Q2"""
-97413.0512,"""Q3"""
-80479.7332,"""Q4"""
-68929.786,"""Q5"""
-60108.5736,"""Q6"""
-53141.0788,"""Q7"""
-47291.098,"""Q8"""
-42242.2864,"""Q9"""
-37760.928,"""Q10"""


# Exercise: What does it look like in your region?

This is the picture for all of the UK - what does it look like for your region?

In [None]:

# try Polars
"""
gold_prices = (
    pl.scan_iceberg(fx_table)
    .join(
        pl.scan_iceberg(gold_prices_t),
        on="date",
    )
    .select(
        gold_date="date",
        gold_price=pl.col("price") * pl.col("exchange_rate")
    )
    .collect()
)
"""