[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NU-MSE-LECTURES/465_Computational_Microscopy_2026/blob/dev/Week_01/lectures/lecture_02_environment_setup.ipynb)

# Lecture 2: Setting up Computational Environments
**Date:** January 7, 2026
**Topic:** Environment Setup, Version Control, and Best Practices

## Agenda
1.  The Computational Stack
2.  Version Control with Git and GitHub
3.  Project Structure and Data Organization
4.  Reproducible Research Best Practices

In [None]:
# Colab setup
try:
    import google.colab
    IN_COLAB = True
    print("Running in Google Colab. Installing requirements...")
    !pip install hyperspy ase py4DSTEM
    !git clone https://github.com/NU-MSE-LECTURES/465_Computational_Microscopy_2026.git
    print("Setup complete.")
except ImportError:
    IN_COLAB = False
    print("Not running in Google Colab.")

## 1. The Computational Stack

### Python
Python is the dominant language in scientific computing due to its readability and vast ecosystem of libraries.

### Conda
Conda is a package manager and environment management system. It helps you create isolated environments for different projects to avoid dependency conflicts.

### Jupyter
Jupyter Notebooks (and Jupyter Lab) provide an interactive environment where you can combine code, text, and visualizations.

**Key Libraries:**
*   `numpy`: Numerical computing.
*   `matplotlib`: Plotting.
*   `pandas`: Data manipulation.
*   `scipy`: Scientific algorithms.

## 2. Version Control with Git and GitHub

### Why Version Control?
*   **History:** Track changes over time.
*   **Collaboration:** Work with others without overwriting each other's work.
*   **Backup:** Keep a remote copy of your code.

### Basic Git Workflow
1.  `git status`: Check the state of your working directory.
2.  `git add <file>`: Stage changes for commit.
3.  `git commit -m "message"`: Save changes to the history.
4.  `git push`: Upload changes to the remote repository (GitHub).
5.  `git pull`: Download changes from the remote repository.

## 3. Project Structure and Data Organization

A well-organized project is easier to maintain and share.

### Recommended Structure
```
project_name/
├── data/
│   ├── raw/            # Immutable raw data
│   └── processed/      # Cleaned/analyzed data
├── notebooks/          # Jupyter notebooks for exploration
├── src/                # Reusable source code
├── figures/            # Generated plots and images
├── environment.yml     # Environment definition
└── README.md           # Project documentation
```

### Naming Conventions
*   Use descriptive names.
*   Avoid spaces (use underscores or hyphens).

## 4. Reproducible Research Best Practices

1.  **Script Everything:** Avoid manual steps in GUIs.
2.  **Document Dependencies:** Always include an `environment.yml` or `requirements.txt`.
3.  **Use Relative Paths:** Code should run regardless of where the project folder is located.
4.  **Seed Random Numbers:** Ensure stochastic processes are repeatable.
    ```python
    import numpy as np
    np.random.seed(42)
    ```
5.  **Keep Raw Data Raw:** Never overwrite your original data files.