# Codespaces + JupyterLab GIS Research Workflow

This document explains, step by step, how to set up and work with a **reproducible, professional GIS research environment** using **GitHub Codespaces, Python, and JupyterLab**. It is written to be didactic.

## 1. Why this Setup?
This repository is designed to be:
* Reproducible
* Traceable
* Professional (research + industry ready)
* Platform-independent (Windows locally, Linux in the cloud)

**GitHub Codespaces** provides:
* A Linux environment (industry standard)
* Persistent storage
* Native Git integration
* Jupyter notebook support

This avoids the limitations of Binder (ephemeral sessions, no persistence) while keeping setup friction low.

## 2. Environment Overview
##### **Key characteristics:**
* **Operating system**: Linux (via Codespaces)
* **Python version**: 3.12
* **Package manager**: pip
* **Interface**: VS Code + Jupyter notebooks (optional JupyterLab UI)
No local installation is required on the user's Windows machine.

## 3. Dependency Management
### 3.1 Philosophy
Dependencies should be:
* Minimal
* Explicit
* Justified by actual usage
> A package is added only **when a notebook cell requires it.**
This avoids dependency bloat and version conflicts.

### 3.2 `requirements.txt`
The following packages define the **baseline GIS + data analysis environment** for this project:

```txt
numpy
pandas
geopandas
shapely
pyproj
fiona
matplotlib
seaborn
plotly
folium
contextily
geopy
osmnx
rasterio
scipy
```

These cover:
- Vector GIS
- Raster GIS
- Spatal analysis
- Network analysis
- Static and interactive visualization

### 3.3 Installing Dependencies

From the Codespaces terminal:

```bash
pip install -r requirements.txt
```

This installs all dependencies **inside the Codespace**, without affecting the local machine.

# 4. Environment Sanity Check

### 4.1 Purpose

Every serious research repository should include a **sanity check notebook** that:

- Confirms the environment works
- Documents versions
- Saves debugging time for future users


### 4.2 Environment Check Notebook

Create the file:

```
notebooks/00_environment_check.ipynb
```

#### Cell 1 — Package and version check



In [1]:
# Environment sanity check
# This cell verifies that the Python environment and core libraries are working

import sys
import numpy as np
import pandas as pd
import geopandas as gpd
import rasterio
import matplotlib.pyplot as plt

print("Python version:", sys.version)
print("NumPy version:", np.__version__)
print("Pandas version:", pd.__version__)
print("GeoPandas version:", gpd.__version__)

Python version: 3.12.1 (main, Nov 27 2025, 10:47:52) [GCC 13.3.0]
NumPy version: 2.4.1
Pandas version: 3.0.0
GeoPandas version: 1.1.2


#### Cell 2 — Minimal GIS test

In [2]:
# Minimal GeoPandas test
# Create a simple GeoDataFrame with an explicit CRS


from shapely.geometry import Point


cities = gpd.GeoDataFrame(
{
"city": ["Madrid", "Barcelona"],
"geometry": [
Point(-3.7038, 40.4168),
Point(2.1734, 41.3851)
]
},
crs="EPSG:4326"
)


cities

Unnamed: 0,city,geometry
0,Madrid,POINT (-3.7038 40.4168)
1,Barcelona,POINT (2.1734 41.3851)


If this renders correctly, the GIS stack is operational.

## 5. Version Control Workflow

### 5.1 Why Version Control Matters

Git is used to:

- Track analytical decisions
- Ensure reproducibility
- Provide a clear research narrative
- Maintain a clean, ordered workflow over time
- Make the research process and its evolution explicit and auditable

Version control is not only about backup. In a research context, it functions as a chronological log of reasoning, experimentation, and refinement, allowing anyone (including the author months later) to understand how and why results were obtained.

A well-maintained Git history makes the progression from raw ideas to structured analysis transparent.

### 5.2 Standard Commit Flow

After any meaningful change:
```bash
# Check changes
git status

# Stage changes
git add .

# Commit with a descriptive message
git commit -m "Describe clearly what was done"

# Push to GitHub
git push
```

Each commit should correspond to:
- A completed step
- A clear analytical decision
- A reproducible improvement

## 6. Notebook Structure and Research Narrative

### 6.1 Naming Convention

Notebooks are ordered numerically to reflect the research pipeline:

```
00_...  → environment checks / setup
01_...  → data ingestion and cleaning
02_...  → exploratory and core analysis
03_...  → indicator construction
04_...  → integration and synthesis
```

This creates a **linear scientific narrative**.


### 6.2 Research Header Template

Every notebook should begin with a documentation cell:

In [4]:
"""
Notebook: 02_population_variation_analysis.ipynb
Author: Juan Zotes
Date: 2025-12-17

Purpose:
    Analyze multi-interval population change (1996–2024)

Notes:
    - Uses cleaned INE census data
    - Focuses on k = 1–28 year intervals
"""

'\nNotebook: 02_population_variation_analysis.ipynb\nAuthor: Juan Zotes\nDate: 2025-12-17\n\nPurpose:\n    Analyze multi-interval population change (1996–2024)\n\nNotes:\n    - Uses cleaned INE census data\n    - Focuses on k = 1–28 year intervals\n'

This transforms notebooks into **research documents**, not scratchpads.


## 7. Data Management Strategy

### 7.1 Principle

> Code is versioned. Data is documented.

Raw and processed datasets are **not committed** to the repository.


### 7.2 Local Data Directory

Create a local directory:

```
data/
```

Ensure it is ignored by Git:

```
data/
```

in `.gitignore`.

Data sources and acquisition steps are documented in:

```
docs/data_sources.md
```


## 8. Learning Path (Incremental)

### Phase 1 — Core GIS and Data Analysis

- GeoPandas: CRS, dissolve, overlay, spatial joins
- Pandas: groupby, rolling windows, percent change

### Phase 2 — Visualization

- Static maps (matplotlib + contextily)
- Interactive maps (folium)

### Phase 3 — Raster and Remote Sensing Foundations

- rasterio: reading, masking, statistics
- Transition later to xarray / rioxarray if needed


## 9. Dependency Growth Rule

> A dependency is added **only when a concrete analytical need appears**.

Examples:

- Machine learning → `scikit-learn`
- Time-series remote sensing → `xarray`, `rioxarray`
- Google Earth Engine → handled outside this environment

## 10. Final Note

This setup is designed to:

- Scale from coursework to advanced research projects (including PhD-level work)
- Bridge academia and industry
- Produce work that is readable, reviewable, and reusable

Nothing here is rushed. Everything is intentional.

*Last updated: 2026-01-27*