# Notebook 32 - Final Audit and Submission Package

This notebook performs a final check and packaging step for the BEP pipeline. It ensures all outputs used in the thesis-tables, figures, appendix files, and datasets—are present, consistent, and ready for submission.

## Objectives

- Validate presence and shape of all output files
- Check that all recipes have coverage in semantic/fuzzy matches
- Confirm LaTeX and PNG appendix files are properly rendered
- Copy required files into a clean `submission_package` folder

## Inputs

- `thesis_outputs/`
- `appendix_tables/semantic_matches/`
- `appendix_tables/fuzzy_matches/`
- `store_dashboards/`
- `deployment_exports/`

## Outputs

- `submission_package/` folder with:
  - Clean CSVs
  - Final LaTeX tables
  - PNGs for dashboard/appendix
  - README with file manifest and instructions


In [1]:
import os
import pandas as pd
import shutil

# Define all key folders
folders = {
    "thesis_outputs": "thesis_outputs",
    "appendix_semantic": os.path.join("appendix_tables", "semantic_matches"),
    "appendix_fuzzy": os.path.join("appendix_tables", "fuzzy_matches"),
    "store_dashboards": "store_dashboards",
    "deployment": "deployment_exports"
}

# Output package folder
submission_folder = "submission_package"
os.makedirs(submission_folder, exist_ok=True)

print("Folders set. Beginning audit...")


Folders set. Beginning audit...


In [2]:
# File manifest
required_files = [
    os.path.join("thesis_outputs", "store_summary_table.csv"),
    os.path.join("thesis_outputs", "store_summary_table.tex"),
    os.path.join("thesis_outputs", "store_summary_table_enhanced.csv"),
    os.path.join("thesis_outputs", "store_summary_table_enhanced.tex"),
    os.path.join("thesis_outputs", "store_value_saved_by_tier.png"),
    os.path.join("thesis_outputs", "annotated_methodology_table.csv"),
    os.path.join("thesis_outputs", "annotated_methodology_table.tex"),
]

missing_files = [f for f in required_files if not os.path.exists(f)]

if missing_files:
    print("Missing files:")
    for f in missing_files:
        print(" -", f)
else:
    print("All key output files are present.")


All key output files are present.


In [4]:
# Load recipe list
df_recipes = pd.read_csv("variant_exports/recipes_with_variants.csv")
recipes = df_recipes["recipe"].dropna().unique()

# Load final match matrices from correct folder
semantic = pd.read_csv("matching_scored/matching_matrix_semantic.csv")
fuzzy = pd.read_csv("matching_scored/matching_matrix_fuzzy.csv")

# Check coverage
semantic_recipes = semantic["recipe"].dropna().unique()
fuzzy_recipes = fuzzy["recipe"].dropna().unique()

missing_semantic = sorted(set(recipes) - set(semantic_recipes))
missing_fuzzy = sorted(set(recipes) - set(fuzzy_recipes))

print("Missing from semantic:", len(missing_semantic))
print("Missing from fuzzy:", len(missing_fuzzy))

# Optional: display missing examples
print("\nExamples missing in semantic match table:")
print(missing_semantic[:5])

print("\nExamples missing in fuzzy match table:")
print(missing_fuzzy[:5])


Missing from semantic: 5
Missing from fuzzy: 1

Examples missing in semantic match table:
['Banana Yogurt Bowl', 'Honey Glazed Carrots', 'Pasta with Tomato Sauce', 'Strawberry Smoothie', 'Tuna Sandwich']

Examples missing in fuzzy match table:
['Tuna Sandwich']


In [5]:
# Copy selected files into submission folder
def copy_file(src, dest_folder):
    if os.path.exists(src):
        shutil.copy(src, dest_folder)
        print("Copied:", src)
    else:
        print("Missing:", src)

# Copy main LaTeX/CSV/PNG tables
for file in required_files:
    copy_file(file, submission_folder)

# Copy appendix match tables
for folder_key in ["appendix_semantic", "appendix_fuzzy", "store_dashboards"]:
    for file in os.listdir(folders[folder_key]):
        if file.endswith(".csv") or file.endswith(".png") or file.endswith(".tex"):
            src = os.path.join(folders[folder_key], file)
            copy_file(src, submission_folder)

# Copy per-store deployment CSVs
deployment_files = os.listdir(os.path.join(folders["deployment"], "per_store"))
for file in deployment_files:
    if file.endswith(".csv"):
        src = os.path.join(folders["deployment"], "per_store", file)
        copy_file(src, submission_folder)


Copied: thesis_outputs\store_summary_table.csv
Copied: thesis_outputs\store_summary_table.tex
Copied: thesis_outputs\store_summary_table_enhanced.csv
Copied: thesis_outputs\store_summary_table_enhanced.tex
Copied: thesis_outputs\store_value_saved_by_tier.png
Copied: thesis_outputs\annotated_methodology_table.csv
Copied: thesis_outputs\annotated_methodology_table.tex
Copied: appendix_tables\semantic_matches\recipe_banana_yogurt_bowl.csv
Copied: appendix_tables\semantic_matches\recipe_greek_yogurt_and_honey.csv
Copied: appendix_tables\semantic_matches\recipe_honey_glazed_carrots.csv
Copied: appendix_tables\semantic_matches\recipe_pasta_with_tomato_sauce.csv
Copied: appendix_tables\semantic_matches\recipe_strawberry_smoothie.csv
Copied: appendix_tables\semantic_matches\recipe_tuna_sandwich.csv
Copied: appendix_tables\fuzzy_matches\recipe_banana_yogurt_bowl.csv
Copied: appendix_tables\fuzzy_matches\recipe_greek_yogurt_and_honey.csv
Copied: appendix_tables\fuzzy_matches\recipe_honey_glazed_

In [6]:
# Write a manifest/README
readme_path = os.path.join(submission_folder, "README.txt")
with open(readme_path, "w") as f:
    f.write("Final Submission Package – BEP Project\n")
    f.write("=======================================\n\n")
    f.write("This folder contains all reproducible outputs for thesis inclusion:\n")
    f.write("\nTables and Figures:\n")
    for file in required_files:
        f.write(f"- {os.path.basename(file)}\n")
    f.write("\nAppendix Tables (Semantic, Fuzzy):\n")
    for folder_key in ["appendix_semantic", "appendix_fuzzy"]:
        for file in os.listdir(folders[folder_key]):
            if file.endswith(".csv"):
                f.write(f"- {file}\n")
    f.write("\nStore Dashboards:\n")
    for file in os.listdir(folders["store_dashboards"]):
        if file.endswith(".png"):
            f.write(f"- {file}\n")
    f.write("\nPer-Store Deployment Exports:\n")
    for file in deployment_files:
        f.write(f"- {file}\n")
print("Wrote submission README to:", readme_path)


Wrote submission README to: submission_package\README.txt
