<a target="_blank" href="https://colab.research.google.com/github/lescai-teaching/data-science-env/blob/main/setup_runtime.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Course environment setup (Google Colab)

Run the install cell once per fresh Colab runtime.

This notebook installs the same **system** and **Python** libraries as the provided Dockerfile (as closely as is practical on hosted Colab), while avoiding common Colab dependency conflicts.


In [None]:
%%bash
set -euo pipefail

echo "[1/3] Installing system packages (apt)..."
sudo apt-get update -y
sudo apt-get install -y --no-install-recommends \
  build-essential \
  cm-super \
  dvipng \
  ffmpeg

echo "[2/3] Installing Python packages (pip, Colab-compatible)..."
python -m pip install -U pip

# Keep protobuf < 6 for TensorFlow / grpcio-status / google-ai-*.
python -m pip install -U \
  "protobuf>=5.26.1,<6"

# Install the remaining libraries WITHOUT forcing upgrades of Colab-managed core packages.
# (In particular, do not upgrade pandas.)
python -m pip install \
  altair \
  beautifulsoup4 \
  bokeh \
  bottleneck \
  cloudpickle \
  cython \
  dask \
  dill \
  h5py \
  ipywidgets \
  matplotlib \
  numba \
  numexpr \
  openpyxl \
  patsy \
  tables \
  scikit-image \
  scikit-learn \
  scipy \
  seaborn \
  sqlalchemy \
  statsmodels \
  sympy \
  xlrd

# NOTE: ipympl is installed in the Docker image, but on hosted Colab it currently
# tends to break matplotlib backend validation (module://ipympl.backend_nbagg).
# Colab works best with the default inline backend, so we do not install ipympl here.

echo "[3/3] Warming matplotlib font cache (similar to the Dockerfile)..."
MPLBACKEND=Agg python -c "import matplotlib.pyplot"

echo "Done."

## Sanity check
Run the next cells to confirm imports and show versions.

Note: Colab uses the inline matplotlib backend by default; `ipympl` is intentionally not installed because it can break backend validation on hosted Colab runtimes.


In [None]:
import sys
import importlib

pkgs = [
    'altair',
    'bs4',
    'bokeh',
    'bottleneck',
    'cloudpickle',
    'Cython',
    'dask',
    'dill',
    'h5py',
    'ipywidgets',
    'matplotlib',
    'numba',
    'numexpr',
    'openpyxl',
    'pandas',
    'patsy',
    'google.protobuf',
    'tables',
    'skimage',
    'sklearn',
    'scipy',
    'seaborn',
    'sqlalchemy',
    'statsmodels',
    'sympy',
    'xlrd'
]

print('Python:', sys.version)
print('---')

for name in pkgs:
    mod = importlib.import_module(name)
    ver = getattr(mod, '__version__', None)
    if ver is None and name == 'google.protobuf':
        from google.protobuf import __version__ as ver
    print(f"{name:15s} {ver}")


In [None]:
import matplotlib
import matplotlib.pyplot as plt

print('matplotlib backend:', matplotlib.get_backend())

plt.plot([0, 1, 2], [0, 1, 0])
plt.title('Matplotlib works')
plt.show()