# Project Diagnostics Notebook (0_diagnostics)

Run this notebook on JupyterHub to verify your computational environment for the *Applied Groundwater Modelling* course.
It will:

1. Summarize Python & platform info
2. Parse environment YAML (if present) and compile required package list
3. Check imports & report missing packages + versions
4. Optionally (commented) help you install missing packages
5. Verify geospatial stack (GeoPandas / Shapely / Fiona / PyProj / RasterIO / Contextily)
6. Test folium, plotly interactive plotting
7. Detect MODFLOW-2005 executable & run a minimal FloPy model
8. Basic 3D visualization capability check (plotly)
9. (Optional) Memory / performance snapshot (psutil)
10. Produce an aggregated summary at the end

If something fails, scroll to see the first failing diagnostic cell.

## First-Time Run (JupyterHub):

1. Ensure your repo is up to date by (optionally) running `0_sync_repo.ipynb` first if instructors announced an update.
2. Open `0_diagnostics.ipynb`.
3. Run all cells.
4. Confirm final summary → overall_ready = True.
5. If not ready: copy summary + failing section output and contact support.

## Daily Use:

- Do NOT routinely re-run this full notebook; only if environment changed or something breaks.
- For repository updates use `0_sync_repo.ipynb`.

---
**Tip:** Re-run the whole diagnostics notebook after fixing issues or after a sync that adds new dependencies.

## 0. (Moved) Repository Sync & Reset

The repository synchronization and cleaning step has been moved to a dedicated notebook: `0_sync_repo.ipynb`.

Run that separate notebook whenever instructors announce an update or if you suspect your local copy is out of date. This diagnostics notebook should normally be run only once at the beginning of the course (or after you change your Python environment) to validate packages, geospatial stack, visualization, and MODFLOW capabilities.

If a sync introduces new dependencies, re-run this diagnostics notebook to confirm readiness.

> Action: Skip directly to the initialization cell below and continue with the environment checks.

In [2]:
# Initialize a global results dictionary to accumulate checks
from __future__ import annotations
diag_results = {
    'python': {},
    'packages': {},
    'geospatial': {},
    'viz': {},
    'modflow': {},
    'system': {}
}
print('Diagnostics result container initialized.')

Diagnostics result container initialized.


### ℹ️ Refactor Note
Core diagnostic logic (environment parsing, imports, geospatial & viz smoke tests, MODFLOW run, 3D check, system snapshot, summary) has been moved into the support module `diagnostics.py` in `SUPPORT_REPO/src/`.

This notebook now only orchestrates those functions, keeping cells concise and easier to maintain. To extend:
- Add new helper in `diagnostics.py`
- Import & call it in the appropriate section here

Advantages: single source of truth, easier testing, less noisy notebook diffs.


## 1. Python & Platform Information

In [3]:
import sys, platform, os, datetime, shutil
py_info = {
    'python_version': sys.version.replace('\n', ' '),
    'executable': sys.executable,
    'platform': platform.platform(),
    'processor': platform.processor(),
    'python_build': platform.python_build(),
    'datetime_utc': datetime.datetime.now(datetime.UTC).isoformat()+'Z'
}
diag_results['python'] = py_info
print('Python/platform info collected:')
for k,v in py_info.items():
    print(f'  {k}: {v}')

Python/platform info collected:
  python_version: 3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:19:24) [Clang 18.1.8 ]
  executable: /Users/bea/anaconda3/envs/gw_course_students/bin/python
  platform: macOS-15.6.1-arm64-arm-64bit
  processor: arm
  python_build: ('main', 'Apr 10 2025 22:19:24')
  datetime_utc: 2025-09-07T15:13:03.381630+00:00Z


## 2. Compile Required Package List
This attempts to parse `environment_students.yml` and `environment_development.yml` if available to build a dependency list.

In [4]:
from pathlib import Path
from importlib import reload
import sys

# Ensure support src path already added earlier (repo_sync cell). If not, add.
support_src = Path('SUPPORT_REPO/src').resolve()
if str(support_src) not in sys.path:
    sys.path.insert(0, str(support_src))

import diagnostics  # type: ignore
reload(diagnostics)

ENV_FILE = 'environment_students.yml'
OPTIONAL_PACKAGES = {
    'python-graphviz', 'pyyaml'
}
EXCLUDE_PACKAGES = {'python', 'pip', 'pre-commit'}

parsed = diagnostics.parse_environment(
    env_file=ENV_FILE,
    optional_packages=OPTIONAL_PACKAGES,
    exclude_packages=EXCLUDE_PACKAGES,
)

pkg_meta = diag_results.setdefault('packages', {})
pkg_meta.update(parsed)

print(f"Total packages to check (from {ENV_FILE} only): {len(parsed['required_list'])}")
print(', '.join(parsed['required_list']))


Total packages to check (from environment_students.yml only): 32
affine, contextily, elevation, flopy, folium, gdal, geopandas, graphviz, ipympl, ipython, ipywidgets, jupyterlab, matplotlib, matplotlib-scalebar, numpy, pandas, plotly, porespy, psutil, pykrige, pyproj, python-graphviz, pyyaml, rasterio, rasterstats, requests, scikit-image, scipy, seaborn, shapely, statsmodels, yaml


## 3. Import & Version Check
Attempts to import each required package and record version or error.

In [5]:
import diagnostics  # already imported above; safe to re-import
from importlib import reload as _reload
_reload(diagnostics)

packages_info = diag_results['packages']
required_list = packages_info.get('required_list', [])
optional_set = set(packages_info.get('optional_packages', []))

import_results = diagnostics.run_import_checks(required_list, optional_packages=optional_set)
packages_info.update(import_results)

package_status = import_results['status']
missing_essential = import_results['missing_essential']
missing_optional = import_results['missing_optional']

print(f"Packages OK: {sum(1 for v in package_status.values() if v['ok'])}")
print(f"Essential missing: {len(missing_essential)} | Optional missing: {len(missing_optional)}")
if missing_essential:
    print('Missing essential:', ', '.join(missing_essential))
if missing_optional:
    print('Missing optional:', ', '.join(missing_optional))
if not missing_essential and not missing_optional:
    print('All required packages imported successfully.')


  import shapely.geos


Packages OK: 32
Essential missing: 0 | Optional missing: 0
All required packages imported successfully.


### (Optional) Install Missing Packages 
Uncomment and run the next cell ONLY if you have permission to install packages in this environment.  
This should not be necessary in your Jupyter Hub environment. 

In [6]:
# Uncomment to attempt installation (may not work on restricted JupyterHub)
if diag_results['packages'].get('status'):
    to_install = [p for p,s in diag_results['packages']['status'].items() if not s['ok']]
    if to_install:
        import sys, subprocess
        print('Attempting pip install for:', to_install)
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', *to_install])
    else:
        print('No missing packages to install.')

No missing packages to install.


## 4. Geospatial Stack Smoke Tests

In [7]:
import diagnostics
geo_checks = diagnostics.geospatial_smoke_test()
diag_results['geospatial'] = geo_checks
print('Geospatial checks:')
for k, v in geo_checks.items():
    print(f'  {k}: {v}')


Geospatial checks:
  geopandas: True
  polygons_count: 2
  union_valid: True
  planar_sum_area_m2: 14639199.86280878
  planar_union_area_m2: 12809299.926796934
  overlap_factor: 0.8750000031995773
  geodesic_union_area_m2: 5877494.507802963
  planar_vs_geodesic_ratio: 2.179381011720437
  fiona: True
  rasterio: True
  contextily: True


## 5. Visualization Library Checks (folium, plotly, matplotlib)

In [8]:
import diagnostics
viz_status = diagnostics.viz_smoke_test()
# Persist (retain prior viz keys if any)
diag_results['viz'] = {**diag_results.get('viz', {}), **viz_status}
print('Visualization stack:')
for k,v in viz_status.items():
    if k.startswith('_'):  # skip internal objects
        continue
    print(f'  {k}: {v}')
# Display folium map if created
fmap = viz_status.get('_folium_map_object')
if fmap:
    fmap


Visualization stack:
  folium: ok (map object created)
  plotly: ok (scatter figure created)
  matplotlib: ok (simple plot created)
  matplotlib_backend_used: Agg


## 6. MODFLOW-2005 Executable Detection & Minimal FloPy Model
Attempts to locate a MODFLOW-2005 executable and run a 1-layer steady-state test model.

In [9]:
from pathlib import Path
import diagnostics

AUTO_DOWNLOAD = True
PERSISTENT_WORKSPACE = Path.home() / 'applied_groundwater_modelling_data' / 'diagnostics'
CLEANUP_ON_SUCCESS = False
PERSISTENT_WORKSPACE.mkdir(parents=True, exist_ok=True)

modflow_diag = diagnostics.modflow_minimal_model(
    auto_download=AUTO_DOWNLOAD,
    persistent_workspace=PERSISTENT_WORKSPACE,
    cleanup_on_success=CLEANUP_ON_SUCCESS,
)

diag_results['modflow'] = modflow_diag
modflow_diag

{'executable_found': True,
 'executable_path': '/Users/bea/.local/share/flopy/bin/mf2005',
 'workspace_path': '/Users/bea/applied_groundwater_modelling_data/diagnostics',
 'namefile_exists_after_write': True,
 'model_files_written': ['diagtest.bas',
  'diagtest.dis',
  'diagtest.lpf',
  'diagtest.nam',
  'diagtest.oc',
  'diagtest.pcg'],
 'run_success': True,
 'final_heads': [10.0,
  8.88888931274414,
  7.777777671813965,
  6.666666507720947,
  5.55555534362793,
  4.44444465637207,
  3.3333332538604736,
  2.222222328186035,
  1.1111111640930176,
  0.0],
 'analytical_heads': [10.0,
  8.88888888888889,
  7.777777777777778,
  6.666666666666668,
  5.555555555555555,
  4.444444444444445,
  3.333333333333334,
  2.2222222222222223,
  1.1111111111111116,
  0.0],
 'max_abs_error_linear_solution': 4.2385525134136515e-07,
 'analytical_ok': True}

## 7. 3D Capability Check (Plotly Surface)

In [10]:
import diagnostics
plotly_3d = diagnostics.plotly_3d_test()
# Store nested under viz
viz = diag_results.setdefault('viz', {})
viz['plotly_3d'] = plotly_3d
plotly_3d

{'success': True}

## 8. System Resource Snapshot (Optional)

In [11]:
import diagnostics
sys_snap = diagnostics.system_snapshot()
diag_results['system'] = sys_snap
sys_snap

{'memory_total_GB': 64.0,
 'memory_available_GB': 20.72,
 'process_memory_MB': 423.5}

## 9. Aggregated Summary
Run this cell last to see a compact readiness report.

In [12]:
from pprint import pprint
import diagnostics

summary = diagnostics.build_summary(diag_results)
diag_results['summary'] = summary
print('=== DIAGNOSTICS SUMMARY ===')
pprint(summary)
if not summary['overall_ready']:
    if summary['missing_essential']:
        print('\nEssential packages missing; install them to be course-ready.')
    elif summary['geospatial_errors']:
        print('\nGeospatial errors detected; review Section 4 cell output.')
    else:
        print('\nSome optional packages are missing (does not block readiness).')
else:
    print('\nEnvironment appears READY for the course (optional packages may still be absent).')


=== DIAGNOSTICS SUMMARY ===
{'geospatial_errors': [],
 'missing_essential': [],
 'missing_optional': [],
 'modflow_executable_found': True,
 'modflow_linear_solution_ok': True,
 'modflow_run_success': True,
 'overall_ready': True,
 'plotly_3d_success': True}

Environment appears READY for the course (optional packages may still be absent).
