# Assignment 1: Project Setup and Infrastructure

**Course:** 4DT907 - Project in Data Intensive Systems  
**Date:** 2026-01-21  
**Team:** Samuel, Nasser, Emil, Jesper


## 1. Team Setup

### 1.1 Team Roles

| Name    | Role                     | Responsibilities                                    |
|---------|--------------------------|-----------------------------------------------------|
| Samuel  | Scrum Master (DevOps)    | CI/CD, Infrastructure, Sprint management            |
| Nasser  | Developer (Full-stack)   | Frontend & Backend development                      |
| Emil    | Developer (Tester)       | Testing, Quality assurance                          |
| Jesper  | Developer (Data Scientist) | ML models, Data analysis, Jupyter notebooks       |

### 1.2 Sprint Organization

**Sprint Duration:** 1 week  
**Sprint Schedule:**
- Stand-ups: Every other day via Slack or in person on campus
- Sprint Review: Weekly during lectures
- Sprint Retrospective: Mondays for approximately 1 hour

**Communication:**
- Primary: Slack workspace
- Code collaboration: GitHub pull requests with reviews
- Documentation: Jupyter notebooks and GitHub Wiki


## 2. Software and Service Installation

### 2.1 Development Tools Installed

#### Version Control
- **Git:** Version control system
- **GitHub:** Repository hosting and collaboration
  - Repository: https://github.com/sb224sc-HT22-VT27/4dt907

#### Communication
- **Slack:** Team communication platform
  - Workspace configured with channels for different project areas

#### Programming Languages & Frameworks
- **Python 3.12.x:** Backend and ML development
- **Node.js 20.x:** Frontend development
- **FastAPI:** Backend web framework
- **React:** Frontend framework

#### ML Tools
- **MLflow:** Experiment tracking and model registry
- **Jupyter Notebook:** Interactive development and reporting

#### Infrastructure
- **Docker:** Containerization
- **Docker Compose:** Multi-container orchestration
- **GitHub Actions:** CI/CD pipeline


## 3. Python Installation and Hello World

### 3.1 Python Version Verification


In [None]:
import sys
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

### 3.2 Hello World Program


In [None]:
def hello_world():
    """Simple hello world function for 4dt907 project."""
    return "Hello World from 4dt907!"

# Execute the hello world function
message = hello_world()
print(message)

### 3.3 Hello World Test


In [None]:
def test_hello_world():
    """Test the hello world function."""
    result = hello_world()
    assert isinstance(result, str), "Result should be a string"
    assert "Hello World" in result, "Result should contain 'Hello World'"
    assert "4dt907" in result, "Result should contain '4dt907'"
    print("✓ All tests passed!")

# Run the test
test_hello_world()

## 4. Web Frontend/Backend System

### 4.1 Architecture Overview

Our system follows a client-server architecture:

```
┌─────────────┐      HTTP/REST      ┌─────────────┐      ┌─────────────┐
│   React     │ ◄─────────────────► │   FastAPI   │ ◄────┤   MLflow    │
│  Frontend   │                     │   Backend   │      │   Server    │
│  (Port 3000)│                     │  (Port 8000)│      │  (Port 5000)│
└─────────────┘                     └─────────────┘      └─────────────┘
```

### 4.2 Backend (FastAPI)

**Location:** `src/backend/`

**Key Features:**
- RESTful API endpoints
- Auto-generated OpenAPI documentation
- CORS middleware for frontend integration
- Health check endpoint
- MLflow integration for ML experiments

**Main Endpoints:**
- `GET /` - Hello World
- `GET /health` - Health check
- `GET /api/v1/hello/{name}` - Personalized greeting
- `GET /docs` - Interactive API documentation

### 4.3 Frontend (React)

**Location:** `src/frontend/`

**Key Features:**
- React 18 with modern hooks
- Vite for fast development and building
- Interactive UI demonstrating backend communication
- Responsive design

### 4.4 Testing Status

**Backend Tests:**
- ✓ Root endpoint test
- ✓ Health check test
- ✓ Personalized greeting test

**Test Framework:** pytest with FastAPI TestClient


## 5. CI/CD Pipeline

### 5.1 GitHub Actions Workflow

**Location:** `.github/workflows/ci.yml`

**Pipeline Jobs:**

1. **Python CI:**
   - Python 3.12 setup
   - Dependency installation
   - Linting with flake8
   - Testing with pytest

2. **Frontend CI:**
   - Node.js 20 setup
   - npm install and caching
   - ESLint for code quality
   - Production build verification

3. **Docker Build:**
   - Backend Docker image build test
   - Frontend Docker image build test

**Triggers:**
- Push to `main` or `develop` branches
- Pull requests to `main` branch

### 5.2 Branching Strategy

- **main:** Production-ready code (protected)
- **develop:** Integration branch
- **feature/*:** Feature development
- **bugfix/*:** Bug fixes
- **hotfix/*:** Emergency production fixes


## 6. MLflow Setup

### 6.1 MLflow Configuration

MLflow is configured in our `docker-compose.yml` as a separate service:

- **UI Access:** http://localhost:5000
- **Backend Store:** SQLite database
- **Artifact Store:** Local file system
- **Integration:** Backend service can log experiments via `MLFLOW_TRACKING_URI`

### 6.2 Example Usage


In [None]:
# Example of how to use MLflow in notebooks
# (This would connect to the MLflow server when running in the Docker environment)

import os

# Configuration
mlflow_tracking_uri = os.getenv('MLFLOW_TRACKING_URI', 'http://localhost:5000')

print(f"MLflow Tracking URI: {mlflow_tracking_uri}")
print("\nTo use MLflow in your experiments:")
print("1. Import mlflow: import mlflow")
print("2. Set tracking URI: mlflow.set_tracking_uri(mlflow_tracking_uri)")
print("3. Start experiment: mlflow.start_run()")
print("4. Log parameters: mlflow.log_param('param_name', value)")
print("5. Log metrics: mlflow.log_metric('metric_name', value)")
print("6. End run: mlflow.end_run()")

## 7. Docker and Docker Compose

### 7.1 Docker Compose Services

**Location:** `src/docker-compose.yml`

**Services:**
1. **backend:** FastAPI application (port 8000)
2. **frontend:** React application (port 3000)
3. **mlflow:** MLflow tracking server (port 5000)

### 7.2 Running the System

```bash
# Start all services
cd src
docker-compose up --build

# Access services:
# - Frontend: http://localhost:3000
# - Backend API: http://localhost:8000
# - API Docs: http://localhost:8000/docs
# - MLflow: http://localhost:5000
```


## 8. Project Structure

### 8.1 Repository Organization

```
4dt907/
├── .github/                    # GitHub configuration
│   ├── workflows/              # CI/CD workflows
│   │   └── ci.yml              # Main CI pipeline
│   ├── ISSUE_TEMPLATE/         # Issue templates
│   ├── PULL_REQUEST_TEMPLATE.md
│   └── CODEOWNERS
├── src/                        # Source code
│   ├── backend/                # FastAPI backend
│   │   ├── app/
│   │   │   ├── main.py         # Application entry
│   │   │   ├── api/            # API routes
│   │   │   ├── models/         # Data models
│   │   │   └── services/       # Business logic
│   │   ├── tests/              # Backend tests
│   │   ├── Dockerfile
│   │   └── requirements.txt
│   ├── frontend/               # React frontend
│   │   ├── src/
│   │   ├── public/
│   │   ├── Dockerfile
│   │   └── package.json
│   ├── ml-research/            # Jupyter notebooks
│   │   └── a1/                 # Assignment 1
│   └── docker-compose.yml
├── .gitignore
├── README.md
├── CONTRIBUTING.md
└── LICENSE
```

### 8.2 Code Quality Standards

- **Commit Messages:** Follow [Conventional Commits](https://www.conventionalcommits.org/)
- **Python:** PEP 8 style guide, enforced by flake8
- **JavaScript:** ESLint with React recommended rules
- **Code Review:** All changes require PR review before merging


## 9. Summary and Next Steps

### 9.1 Completed Tasks ✓

1. ✓ Team setup and role definition
2. ✓ Sprint organization established
3. ✓ Slack workspace configured
4. ✓ Git repository setup with proper structure
5. ✓ Python 3.12 installation and hello world program
6. ✓ Hello world tests implemented
7. ✓ FastAPI backend with REST endpoints
8. ✓ React frontend application
9. ✓ CI/CD pipeline with GitHub Actions
10. ✓ Docker and Docker Compose configuration
11. ✓ MLflow integration
12. ✓ Jupyter notebook setup for reporting
13. ✓ Code checked into Git

### 9.2 Next Sprint Goals

1. Implement core ML functionality
2. Expand API endpoints for ML model serving
3. Create data ingestion pipeline
4. Setup CSCloud infrastructure access
5. Implement comprehensive testing strategy
6. Setup secrets management for production

### 9.3 Lessons Learned

- **Good:** Clear role definition helped in task distribution
- **Good:** Docker Compose simplifies local development environment
- **Improvement:** Need to establish data storage strategy early
- **Improvement:** Consider adding pre-commit hooks for code quality


## 10. References

- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [React Documentation](https://react.dev/)
- [MLflow Documentation](https://mlflow.org/docs/latest/index.html)
- [Docker Documentation](https://docs.docker.com/)
- [GitHub Actions Documentation](https://docs.github.com/en/actions)
- [Conventional Commits](https://www.conventionalcommits.org/)
