# Exercise: Is your house worth it's weight in gold?

Now we've calculated how much a given house rose in value over time - the question is, would you have been better off buying gold?

In the `data` folder, you will find the daily price of one ounce of gold in USD, called `XAUUSD` in trading terms (e`X`change rate of `AU` aka Gold to `USD`)

Your task is to load the data into Iceberg and then we want to find out what would have happened if instead of buying a house, we would have bought an equivalent sum of gold. Finally we want to compare what the average profit would be if everyone had invested in gold instead of buying a house.

For simplicity, use the daily Closing price.

```{note}
Since gold is priced in USD, and our houseprices are in GBP, there is also a USD/GBP daily prices in the `data/fx` folder to convert between currencies.
```

Breaking it down:

1. Load daily gold prices into Iceberg - for bonus points, apply a yearly partitioning scheme
2. Load exchange rates into Iceberg - for bonus points, apply a yearly partitioning scheme
3. Convert daily gold prices from USD to GBP
4. For each row in our `profits` table, calculate the equivalent amount of gold they would have been able to purchase on that date
5. Calculate the total value of that amount of gold on the sell date
6. Calculate the profit of our gold trade
7. Compare to Gold profit with Housing profit

In [29]:
import polars as pl
from utils import catalog
from pyiceberg.schema import Schema, NestedField
from pyiceberg.types import DecimalType, DateType
from pyiceberg.partitioning import PartitionSpec, YearTransform, PartitionField

## 1. Load daily gold prices into Iceberg
Start by creating the table in Iceberg. To organize things a bit better, I'm creating a new namespace `commodities` - we could imagine putting a table for oil prices or silver in here

In [32]:
catalog.create_namespace_if_not_exists("commodities")

gold_schema = Schema(
    NestedField(1, "date", DateType(), required=True, doc="Day of recorded price"),
    NestedField(2, "price", DecimalType(precision=38, scale=2), required=True, doc="Price in USD of one ounce of gold"),
    identifier_field_ids=[1],
    
)

gold_prices_t = catalog.create_table_if_not_exists("commodities.gold", 
                                                   schema=gold_schema, 
                                                   partition_spec=PartitionSpec(PartitionField(source_id=1, field_id=1, transform=YearTransform(), name="date_year")))

Next, read in the CSV, picking out the two columns we care about, remembering to convert to the correct schema, 

In [27]:
gold_prices = pl.scan_csv("data/gold/daily_gold_prices.csv", separator=";", try_parse_dates=True).select(pl.col("Date").alias("date"), pl.col("Close").alias("price")).collect()

In [21]:
gold_prices_t.append(gold_prices.to_arrow().cast(gold_schema.as_arrow()))