# Model

This notebook loads your assembled and cleaned data and runs modeling and analysis on it.

- First, we run predictive models
- Then, we analyze your predictions as well as existing ones (such as the assessor's)
- Finally, we generate automated statistical reports assessing their quality

In [None]:
# Change these as desired

# The slug of the locality you are currently working on
locality = "us-nc-guilford"

# Whether to print out a lot of stuff (can help with debugging) or stay mostly quiet
verbose = True

# Clear previous state for this notebook and start fresh
clear_checkpoints = True

# 1. Basic setup

In [None]:
import init_notebooks
init_notebooks.setup_environment()

In [None]:
# import a bunch of stuff
from openavmkit.pipeline import (
    init_notebook,
    load_settings,
    load_cleaned_data_for_modeling,
    examine_sup,
    write_canonical_splits,
    try_variables,
    try_models,
    finalize_models,
    run_and_write_ratio_study_breakdowns,
    enrich_sup_spatial_lag,
    from_checkpoint,
    delete_checkpoints,
    identify_outliers
)

In [None]:
init_notebook(locality)

In [None]:
if clear_checkpoints:
    delete_checkpoints("3-model")

In [None]:
settings = load_settings()

# 2. Prepare

We load the cleaned data from the last checkpoint:

In [None]:
# load the data
sales_univ_pair = load_cleaned_data_for_modeling(settings)

In [None]:
sales_univ_pair.sales.to_parquet("sales_clean.parquet")
sales_univ_pair.universe.to_parquet("universe_clean.parquet")

In [None]:
#examine_sup(sales_univ_pair, load_settings())

We separate our test set from our training set.  
This guarantees we have one durable source of truth for test/train set.

In [None]:
write_canonical_splits(
    sales_univ_pair,
    load_settings(),
    verbose=verbose
)

In [None]:
sales_univ_pair = from_checkpoint("3-model-00-enrich-spatial-lag", enrich_sup_spatial_lag,
    {
        "sup": sales_univ_pair,
        "settings": load_settings(),
        "verbose": verbose
    }
)

In [None]:
sales_univ_pair.universe.to_parquet("out/look/3-spatial-lag-universe.parquet")
sales_univ_pair.sales.to_parquet("out/look/3-spatial-lag-sales.parquet")

In [None]:
# examine_sup(sales_univ_pair, load_settings())

# 3. Experiment

Try out variables and models before running the real thing

In [None]:
try_variables(
    sales_univ_pair,
    load_settings(),
    verbose,
    plot = False
)

In [None]:
try_models(
    sup=sales_univ_pair,
    settings=load_settings(),
    save_params=True,
    verbose=verbose,
    run_main=False,
    run_vacant=True,
    run_hedonic=False,
    run_ensemble=False,
    do_shaps=False,
    do_plots=True
)

# 4. Identify Outliers

Look at the predictions that you scored badly against. Answer these questions:

- Does the sale make sense? Is it possibly invalid?
- Is there a pattern to missed predictions? Are you missing a key variable?

In [None]:
identify_outliers(
    sup=sales_univ_pair,
    settings=load_settings()
)

# 5. Finalize models

Once we've locked in good values, we finalize our results

In [None]:
results = from_checkpoint("3-model-02-finalize-models", finalize_models,
    {
        "sup": sales_univ_pair,
        "settings": load_settings(),
        "save_params": True,
        "use_saved_params": True,
        "verbose": verbose
    }
)

# 6. Generate reports

In [None]:
# run ratio study reports
run_and_write_ratio_study_breakdowns(load_settings())