# Exercise: Is your house worth it's weight in gold?

Now we've calculated how much a given house rose in value over time - the question is, would you have been better off buying gold?

In the `data` folder, you will find the daily price of one ounce of gold in USD, called `XAUUSD` in trading terms (e`X`change rate of `AU` aka Gold to `USD`)

Your task is to load the data into Iceberg and then we want to find out what would have happened if instead of buying a house, we would have bought an equivalent sum of gold. Finally we want to compare what the average profit would be if everyone had invested in gold instead of buying a house.

For simplicity, use the daily Closing price.

```{note}
Since gold is priced in USD, and our houseprices are in GBP, there is also a USD/GBP daily prices in the `data/fx` folder to convert between currencies.
```

Breaking it down:

1. Load daily gold prices into Iceberg - for bonus points, apply a yearly partitioning scheme
2. Load exchange rates into Iceberg - for bonus points, apply a yearly partitioning scheme
3. Convert daily gold prices from USD to GBP
4. For each row in our `profits` table, calculate the equivalent amount of gold they would have been able to purchase on that date
5. Calculate the total value of that amount of gold on the sell date
6. Calculate the profit of our gold trade
7. Compare to Gold profit with Housing profit

In [52]:
import polars as pl
from utils import catalog, engine
from pyiceberg.schema import Schema, NestedField
from pyiceberg.types import DecimalType, DateType, StringType
from pyiceberg.partitioning import PartitionSpec, YearTransform, PartitionField

## 1. Load daily gold prices into Iceberg
Start by creating the table in Iceberg. To organize things a bit better, I'm creating a new namespace `commodities` - we could imagine putting a table for oil prices or silver in here

In [62]:
catalog.create_namespace_if_not_exists("commodities")

gold_schema = Schema(
    NestedField(1, "date", DateType(), required=True, doc="Day of recorded price"),
    NestedField(2, "price", DecimalType(precision=38, scale=2), required=True, doc="Price in USD of one ounce of gold"),
    identifier_field_ids=[1],
    
)

gold_prices_t = catalog.create_table_if_not_exists("commodities.gold", 
                                                   schema=gold_schema, 
                                                   partition_spec=PartitionSpec(PartitionField(source_id=1, field_id=1, transform=YearTransform(), name="date_year")))

Next, read in the CSV, picking out the two columns we care about, remembering to convert to the correct schema, 

In [59]:
gold_prices = pl.scan_csv("data/gold/daily_gold_prices.csv", separator=";", try_parse_dates=True).select(pl.col("Date").alias("date"), pl.col("Close").alias("price")).collect()

In [60]:
gold_prices

date,price
datetime[μs],f64
2004-06-11 00:00:00,384.1
2004-06-14 00:00:00,382.8
2004-06-15 00:00:00,388.6
2004-06-16 00:00:00,383.8
2004-06-17 00:00:00,387.6
…,…
2025-01-28 00:00:00,2763.17
2025-01-29 00:00:00,2759.68
2025-01-30 00:00:00,2794.06
2025-01-31 00:00:00,2799.23


In [63]:
gold_prices_t.append(gold_prices.to_arrow().cast(gold_schema.as_arrow()))

# 2. Load daily exchange rates into Iceberg

Similar process - create a new namespace and load the fx rates

In [8]:
catalog.create_namespace_if_not_exists("fx")

fx_schema = Schema(
    NestedField(1, "date", DateType(), required=True, doc="Day of recorded price"),
    NestedField(2, "from", StringType(), required=True, doc="From currency code of the currency pair"),
    NestedField(3, "to", StringType(), required=True, doc="To currency code of the currency pair"),
    NestedField(4, "exchange_rate", DecimalType(precision=38, scale=4), required=True, doc="Exchange rate of the currency pair"),
    identifier_field_ids=[1, 2, 3]
)

Reading in the CSV file

In [44]:
fx_table = catalog.create_table_if_not_exists("fx.rates", schema=fx_schema)

In [49]:
fx_rates = (
    pl.scan_csv("data/fx/USD_GBP_clean.csv", try_parse_dates=True)
    .with_columns(pl.lit("USD").alias("from"), pl.lit("GBP").alias("to"))
    .select(
        pl.col("date"),
        pl.col("from"),
        pl.col("to"),
        pl.col("close").alias("exchange_rate"),
    )
    .collect()
)

In [50]:
fx_table.append(fx_rates.to_arrow().cast(fx_schema.as_arrow()))

In [86]:
sql = """
with gold_prices as (
    select commodities.gold.date as gold_date, commodities.gold.price * fx.rates.exchange_rate as gold_price
    from commodities.gold
    join fx.rates on fx.rates.date = commodities.gold.date
), gold_purchase as (
    select address_id, 
    first_price / gold_price as purchased_gold
    from house_prices.profits
    join gold_prices on gold_date = house_prices.profits.first_day
), gold_sell as (
    select house_prices.profits.address_id,
    last_day 
    from house_prices.profits
    join gold_purchase on gold_purchase.address_id = house_prices.profits.address_id
)
select * from gold_purchase
"""

In [87]:
df = pl.read_database(sql, engine)

In [88]:
df

address_id,purchased_gold
str,"decimal[22,6]"
"""6E6E9B8DA9B1CCEDE22F1AB67F2138…",282.259665
"""0A008F19F5BEAD3966443466ED737E…",238.834266
"""93382E6D31CD68DF4D0D6416F24AF9…",166.096723
"""F6A93916522C8ED8696752BEDB7BDB…",173.701596
"""F642071EBA79D04CE93DBFAD79EA3C…",118.334212
…,…
"""7C92344ACB2328F36012ADFEDDF279…",145.891912
"""6AB2072A06719D01B394D5E6D22662…",137.030691
"""B0216E3C1905E42D38316C2024BEDC…",298.637806
"""8DFE0A964CC58CEEB0A3A8E0F22DB4…",543.422892
