Skip to content

directory_structure

Andreas Søgaard edited this page Sep 14, 2021 · 2 revisions

Directory structure

Intentionally verbose proposed directory structure (directories in bold):

  • gnn_reco
    • src
      • gnn_reco
        • data : all classes related to data ingestion and management; this probably doesn't cover preprocessing due to encapsulation in models
          • utils.py
          • dataset.py : main data class(es); preprocessing should be packaged with the models themselves
          • dataconverter.py : abstract base class, exposing syntax implemented by inheriting classes (see below)
          • sqlite_dataconverter.py
          • i3cols_dataconverter.py
          • dataset.py : abstract base class, exposing syntax implemented by inheriting classes (see below)
          • sqlite_dataset.py
          • i3cols_dataset.py
          • (...)
        • models : modular models categorised by detector (read-in), task (read-out), and GNN method.
          • model.py : base model class, exposing implementation-agnostic syntax (e.g. .compile, .fit, .predict, .config, .save, etc.)
          • configs : dict-like configurations parametrising model components to (1) allow for reconstructing model architecture and (2) logging differences in configuration with training runs for model optimisation. Separate configs classes for each model component; e.g.
            • config.py : base class
            • detector.py : inheriting classes
            • task.py
            • gnn.py
          • detector : standard first layer(s) of complete models, reading in data in a detector-appropriate format; e.g.
            • ic86.py
            • deepcore.py
            • upgrade.py
            • km3net.py
            • (...)
          • task : standard last layer(s)/heads/read-outs specific to each task; e.g.
            • classification.py
            • reconstruction.py
            • hit_clearning.py
            • pretraining.py
            • (...)
          • gnn : standard core GNN models, focus for optimisation; e.g.
            • mpnn.py
            • dynedge.py
            • tgcn.py
            • (...)
        • components : common, tested components from which to build the above models
          • operations.py
          • layers.py
          • losses.py
          • preprocessing.py
          • callbacks.py
          • utils.py
        • benchmarks : scripts for comparing alternative models and similar; e.g.
          • benchmark.py : base class
          • benchmark_data_loaders.py
          • benchmark_ic86_reconstruction_trigger.py
          • (...)
        • modules : I3Modules wrapping the gnn_reco.Model classe; used for deployment to IceCube reconstruction, analyses
          • gnn_module.py : Base class, acting on and modifying raw I3Frames. The base class should internally, from the frame, extract and format data corresponding to a single (non-batched) example from the Dataset class; and convert it back for writing. Inheriting modules can then work on the standard, ML-ready data without worrying about the I3Frame object itself
          • gnn_reconstruction.py
          • gnn_classification.py
          • (...)
        • plots : generic plotting scripts/functions
          • utils.py
          • model_architecture.py
          • energy_resolution.py
          • node_adjacency.py
          • feature_importance.py
          • (...)
        • utils
    • datasets : definitions of standard datasets for 1:1 comparisons between models and with other methods (benchmarking); e.g.
      • lvl7_neutrino_classification.yml
      • lvl7_neutrino_reconstruction.yml
      • lvl7_highenergy_cascade_reconstruction.yml
      • lvl7_doublebang_reconstruction.yml
      • lvl2_muon_classification.yml
      • lvl4_pretraining.yml
      • (...)
    • resources: Geometry files, etc.
    • tests: unit test scripts; e.g.
      • test_converters.py
      • test_dataloaders.py
      • test_losses.py
      • (...)
    • mlruns : using MLflow for tracking training runs for experimentation and as model registry
    • pisa : various analyses, e.g. systematics impact, data/MC differences)
    • docs : RST-files for use with Sphinx for HTML documentation.
    • notebooks : this is where the training/testing/validation/optimisation happens, using the above library code; e.g.
      • benchmark_dataloaders.ipynb
      • benchmark_models.ipynb
      • train_methodA.ipynb
      • paper_xyz.ipynb
      • (...)
    • examples : quick-start notebooks, for developers and end-users, respectively
    • environments : pip/conda/IceTray environment configurations and/or setup scripts
    • docker : for containerisation and reproducibility
    • contrib : student projects, etc.; subject to less stringent reviews
    • LICENSE, README, CONTRIBUTING, setup.py, .gitignore, .github/workflows, .circleci, etc.
Clone this wiki locally