Skip to content

Model training scripts dump generated artifacts directly into the repo root #5672

@suhaibmujahid

Description

@suhaibmujahid

Problem

The repo root accumulates a large number of generated artifacts — mostly from model training — that are correctly .gitignore'd but still physically clutter the directory, making the project structure hard to navigate in file explorers, ls, and IDEs. The exact number depends on how many models have been trained locally during experimentation.

For example, after training several models, the root can contain ~59 items on disk while only ~31 are tracked source files — the rest are generated artifacts. Actual project files get buried among generated directories:

$ ls
CITATION.cff              fenixcomponentmodel/       regressormodel_data_X.zst.etag
CODE_OF_CONDUCT.md        functions/                 regressormodel_data_y
CONTRIBUTING.md           http_service/              regressormodel_data_y.zst
LICENSE                   indexdir/                  regressormodel_data_y.zst.etag
MANIFEST.in               infra/                     releasenotemodel/
README.md                 mcp/                       requirements.txt
VERSION                   metrics.json               scripts/
__pycache__/              node_modules/              setup.py
annotateignoremodel/      notebooks/                 spambugmodel/
backoutmodel/             patches/                   stepstoreproducemodel/
bugbug/                   pyproject.toml             test-requirements.txt
bugbug.egg-info/          regressionmodel.zst        testfailuremodel
bugtypemodel/             regressormodel.zst         testgroupselectmodel/
cache/                    regressormodel.zst.etag    testlabelselectmodel/
componentmodel/           regressormodel_data_X      tests/
crashcomponentmodel/      regressormodel_data_X.zst  ui/
data/                     regressormodel_data_X.zst… upliftmodel/
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions