# Version Control and Deployment Basics

**Scenario Update:** Leadership wants to ensure your aircraft IoT analytics work is saved, shared with teammates, and can be deployed reliably. Before the week is up, you need to understand version control basics and how to package your work for deployment.

## What You'll Learn

‚úÖ What version control is and why it matters  
‚úÖ How to create and use Git folders in Databricks  
‚úÖ How to save and sync your work  
‚úÖ Introduction to Databricks Asset Bundles (UI-based)  

**Time to Complete:** 30 minutes

---


## 1. What is Version Control?

### The Problem

Imagine you're working on your aircraft sensor analysis:
- You make changes to your notebook
- Something breaks
- You can't remember what worked before
- Your teammate also made changes and now there are conflicts
- You want to go back to yesterday's version but it's gone

### The Solution: Version Control

**Version control** is like a "time machine" for your code. It:
- **Tracks every change** you make to files
- **Saves snapshots** of your work over time
- **Allows collaboration** without overwriting each other's work
- **Lets you undo** changes when something breaks
- **Shows who changed what** and when

### What is Git?

**Git** is the most popular version control system. Think of it as:
- A **history book** for your code
- A **save point system** like in video games
- A **collaboration tool** that merges everyone's work

### Key Git Concepts

| Term | What It Means |
|------|---------------|
| **Repository (Repo)** | A folder that Git tracks - contains all your files and their history |
| **Commit** | A snapshot of your work at a point in time (like hitting "Save" with a description) |
| **Push** | Upload your changes from your computer to the shared repository |
| **Pull** | Download changes from the shared repository to your computer |
| **Branch** | A separate version of your code where you can experiment |

### Why This Matters for Your Aircraft IoT Project

- Your notebooks, pipelines, and dashboards can be **versioned**
- Multiple team members can work on the **same project**
- You can **track changes** to your data transformations
- If a pipeline breaks, you can **roll back** to a working version
- Leadership can **audit** who made what changes

---


## 2. Git Folders in Databricks

Databricks integrates with Git through **Git folders** (also called "Repos"). This lets you connect your Databricks workspace to a Git repository.

### Step 1: Set Up Your Git Provider Connection

First, you'll need a Git repository from a provider like:
- **Azure DevOps** (your organization's Git provider)
- GitHub
- GitLab
- Bitbucket

**Note:** Your organization uses Azure DevOps, so you'll work with Azure Repos.

### Step 2: Create a Git Folder in Databricks

**From the Databricks Workspace:**

1. In the left sidebar, click **"Workspace"**
2. Navigate to where you want to create the Git folder (e.g., your user folder)
3. Click the **"‚ãÆ" menu** (three dots) or **"Create"** button
4. Select **"Git folder"** (or **"Repo"** in some versions)
5. Fill in the details:
   ```
   Git repository URL: https://dev.azure.com/your-org/your-project/_git/aircraft-iot
   Git provider: Azure DevOps
   Branch: main
   ```
6. Click **"Create"**

**What Just Happened?**
- Databricks **cloned** the repository from Azure DevOps
- All files from that repo are now visible in your workspace
- Any changes you make here can be synced back to Azure DevOps

### Understanding the Git Folder Interface

When you open a Git folder, you'll see:
- **Files and folders** from your repository
- A **branch dropdown** showing which branch you're on (usually `main`)
- A **Git status icon** showing if you have uncommitted changes
- **Pull** and **Push** buttons to sync with the remote repository

---


## 3. Working with Git Folders: The Basic Workflow

Let's walk through a typical workflow using your aircraft IoT project.

### Scenario: Updating Your Temperature Analysis Notebook

#### Step 1: Pull Latest Changes

Before you start working, **always pull** to get the latest changes from your team:

1. Open your Git folder
2. Look for the **Git status indicator** in the top bar
3. Click the **"Pull"** button (or the refresh/sync icon)
4. Databricks downloads any changes your teammates made

**Why?** If a teammate updated the same notebook, you'll get their latest version before making your changes.

#### Step 2: Make Your Changes

Now work normally:
- Edit notebooks
- Add new files
- Modify transformations
- Test your code

**Example:** You update your aircraft engine temperature analysis to use Celsius instead of Fahrenheit.

#### Step 3: Check What Changed

1. Click the **Git status icon** (usually shows a number of changed files)
2. You'll see a list of **modified files** with a "M" indicator
3. You'll see **new files** with an "A" (added) indicator
4. Click any file to see a **diff** (what changed)

**What's a diff?**
- Lines in **red** (with a `-`) were deleted
- Lines in **green** (with a `+`) were added
- Unchanged lines appear in gray

#### Step 4: Commit Your Changes

A **commit** is like taking a snapshot with a description:

1. In the Git panel, click **"Commit"**
2. Review the files that will be included
3. Write a **commit message** describing what you changed:
   ```
   Good: "Convert engine temperature to Celsius for international standards"
   Bad: "changes" or "update"
   ```
4. Click **"Commit & Push"** (or just "Commit" if you want to push later)

**Best Practices for Commit Messages:**
- Be specific: "Add anomaly detection for altitude sensors"
- Explain why: "Fix typo in column name causing join failure"
- Use present tense: "Update" not "Updated"

#### Step 5: Push Your Changes

If you only committed (didn't "Commit & Push"), you need to **push**:

1. Click the **"Push"** button
2. Your changes upload to Azure DevOps
3. Now your teammates can pull and see your changes

**What Happens When You Push?**
- Your commits go to the **remote repository** (Azure DevOps)
- The change history is preserved
- Other team members can see your updates

---


## 4. Common Git Scenarios

### Scenario A: Someone Else Changed the Same File

**What happens:**
- You try to push your changes
- Git says there's a **conflict**
- Someone else modified the same lines you did

**How to handle it:**
1. **Pull** first to get their changes
2. Databricks will show you the **conflict**
3. You'll see:
   ```python
   <<<<<<< HEAD (your changes)
   temperature_threshold = 150
   =======
   temperature_threshold = 140  
   >>>>>>> main (their changes)
   ```
4. **Manually decide** which to keep (or combine them)
5. Remove the conflict markers (`<<<<<<<`, `=======`, `>>>>>>>`)
6. **Commit** the resolved version
7. **Push** the merge

### Scenario B: You Want to Experiment Without Breaking Things

**Use branches:**

1. In your Git folder, click the **branch dropdown**
2. Type a new branch name: `feature/new-anomaly-detection`
3. Click **"Create branch"**
4. You're now on a **separate copy** of the code
5. Make experimental changes
6. Commit and push to your branch
7. If it works, **merge** it back to `main` later (usually through a Pull Request in Azure DevOps)

**Why branches?**
- `main` branch stays stable
- You can try new ideas safely
- Multiple people can work on different features

### Scenario C: You Need to Undo Changes

**Before committing:**
- Click the file in the Git panel
- Click **"Discard changes"** to revert to the last commit

**After committing:**
- View the commit history (in Azure DevOps or Git panel)
- Revert to a previous commit
- Or create a new commit that undoes the changes

---


## 5. Introduction to Databricks Asset Bundles

Now that your code is in Git, how do you **deploy** it? How do you move your work from development to production?

### What are Databricks Asset Bundles?

**Asset Bundles** let you package and deploy your Databricks resources:
- Notebooks
- Jobs
- Pipelines
- Dashboards

Think of it as creating a **deployment package** for your aircraft IoT project.

### Creating a Bundle in the Databricks UI

#### Step 1: Access the Bundle Feature

1. Open your **Git folder** in Databricks
2. In the Git folder menu, look for **"Create Asset Bundle"** (or similar option)
3. Or, in the workspace, go to **"Create" ‚Üí "Bundle"**

#### Step 2: Initialize Your Bundle

When creating a bundle, you'll choose a template:

**Templates:**
- **Empty project** - Start from scratch
- **Default Python** - Basic Python project structure
- **Job bundle** - For packaging Databricks jobs
- **Pipeline bundle** - For packaging Delta Live Tables pipelines

**For your aircraft IoT project**, select **"Job bundle"** since you have scheduled jobs.

#### Step 3: Define Your Bundle Resources

The bundle editor will help you specify:

**1. Bundle Name**
```
Name: aircraft-iot-analytics
```

**2. Include Your Notebooks**
- Click **"Add resource"**
- Select **"Notebook"**
- Choose your notebooks:
  - `temperature_analysis.ipynb`
  - `sensor_anomaly_detection.ipynb`
  - `engine_performance_dashboard.ipynb`

**3. Include Your Jobs**
- Click **"Add resource"**
- Select **"Job"**
- Choose existing jobs or define new ones:
  ```
  Job: Daily Sensor Data Processing
  Notebook: temperature_analysis.ipynb
  Schedule: Daily at 6:00 AM
  Cluster: Small (2 workers)
  ```

**4. Include Pipelines (Optional)**
- If you built Delta Live Tables pipelines
- Add them to the bundle
- Example: `aircraft_sensor_pipeline`

---


#### Step 4: Preview Your Bundle

The UI will show you:
- All included resources
- Dependencies between resources
- What will be deployed

You'll see something like:
```
üì¶ aircraft-iot-analytics
‚îú‚îÄ‚îÄ üìì Notebooks
‚îÇ   ‚îú‚îÄ‚îÄ temperature_analysis.ipynb
‚îÇ   ‚îî‚îÄ‚îÄ sensor_anomaly_detection.ipynb
‚îú‚îÄ‚îÄ üîÑ Jobs
‚îÇ   ‚îî‚îÄ‚îÄ Daily Sensor Data Processing
‚îî‚îÄ‚îÄ üö∞ Pipelines
    ‚îî‚îÄ‚îÄ aircraft_sensor_pipeline
```

#### Step 5: Deploy Your Bundle

Once your bundle is defined:

1. Click **"Validate"** to check for errors
2. Review any warnings or missing configurations
3. Click **"Deploy"**
4. Choose your target:
   - **Development** - Your personal workspace
   - **Production** - The production workspace
5. Databricks will:
   - Create/update all resources
   - Set up jobs and schedules
   - Deploy your notebooks

### What Happens During Deployment?

Databricks Asset Bundles:
1. **Reads** your bundle definition
2. **Validates** all resources exist and are configured correctly
3. **Deploys** notebooks to the target workspace
4. **Creates or updates** jobs with the correct schedules
5. **Sets up** pipelines and dependencies
6. **Confirms** deployment success

### Viewing Deployed Resources

After deployment:
- Go to **"Workflows"** to see your deployed jobs
- Jobs will have a tag like `bundle:aircraft-iot-analytics`
- You can run, schedule, or modify them
- Notebooks are deployed to the target location

---


## 6. Practical Exercise: Your Aircraft IoT Project

Let's apply what you learned to your actual aircraft sensor project:

### Exercise 1: Create a Git Folder

**Task:** Set up version control for your aircraft IoT work

1. Create a new repository in Azure DevOps (or use an existing one)
2. In Databricks, create a Git folder connected to that repository
3. Move your existing notebooks into the Git folder
4. Make your first commit: "Initial commit: Aircraft IoT analytics notebooks"
5. Push to Azure DevOps

**Verify:** Check Azure DevOps to see your notebooks in the repository.

### Exercise 2: Make and Sync Changes

**Task:** Practice the Git workflow

1. Open one of your aircraft sensor notebooks
2. Add a comment at the top:
   ```python
   # Aircraft Temperature Analysis
   # Purpose: Monitor engine and cabin temperatures for anomalies
   # Last updated: [Today's date]
   ```
3. Save the notebook
4. Check the Git status - you should see the file marked as modified
5. View the diff to see your changes
6. Commit with message: "Add documentation header to temperature notebook"
7. Push to Azure DevOps

**Verify:** Check the commit history in Azure DevOps.

### Exercise 3: Create Your First Bundle

**Task:** Package your work for deployment

1. In your Git folder, create a new Asset Bundle
2. Name it: `aircraft-monitoring-v1`
3. Add your main analysis notebook(s)
4. Add the job you created earlier for daily sensor processing
5. Validate the bundle
6. Deploy to your development environment

**Verify:** 
- Go to "Workflows" and find your deployed job
- Check that it has the bundle tag
- Try running the job

---


## 7. Best Practices for Your Team

As you finish your week-long sprint on the aircraft IoT project, keep these practices in mind:

### Git Best Practices

‚úÖ **Do:**
- Pull before you start working each day
- Commit often with clear messages
- Push at least once a day (or when you finish a feature)
- Use branches for experimental work
- Review diffs before committing

‚ùå **Don't:**
- Work for days without committing
- Write vague commit messages ("fixed stuff")
- Commit broken or untested code to `main`
- Force push (overwrites others' work)
- Commit sensitive data (passwords, API keys)

### Bundle Best Practices

‚úÖ **Do:**
- Validate before deploying
- Test in development before production
- Document what each bundle contains
- Version your bundles (`v1`, `v2`, etc.)
- Keep bundles focused (one project per bundle)

‚ùå **Don't:**
- Deploy directly to production without testing
- Include experimental or test notebooks
- Bundle unrelated projects together
- Forget to update bundle configs when you change notebooks

### Collaboration Tips

**When working with your team:**
1. **Communicate** - Let others know what files you're working on
2. **Pull often** - Stay synced with team changes
3. **Small commits** - Easier to review and merge
4. **Document** - Add comments explaining complex logic
5. **Test** - Run your notebooks before committing

---


## Summary

You've learned the fundamentals of version control and deployment:

‚úÖ **Version Control Basics**
   - What Git is and why it's essential
   - How to track changes to your work
   - Collaborating with teammates

‚úÖ **Git Folders in Databricks**
   - Creating Git folders connected to Azure DevOps
   - Pull, commit, and push workflow
   - Handling conflicts and branches

‚úÖ **Asset Bundles**
   - Packaging notebooks, jobs, and pipelines
   - Deploying via the Databricks UI
   - Managing deployments

### You're Now Ready To:

üõ´ **Complete Your Week:**
- Your aircraft IoT work is version controlled
- Your dashboards and models are packaged
- You can deploy to production confidently
- Your team can collaborate effectively
- Leadership has full audit history

### Next Steps

As you continue working:
1. Establish a Git workflow with your team
2. Create bundles for your production deployments
3. Document your deployment process
4. Set up regular deployments (daily, weekly, etc.)

---


## Try This Out (Optional Extensions)

Want to go deeper? Try these exercises:

### 1. Branch Strategy
- Create a `dev` branch for development work
- Keep `main` for production-ready code
- Practice merging `dev` into `main`

### 2. Bundle Variations
- Create separate bundles for different environments
- `aircraft-iot-dev` with small clusters
- `aircraft-iot-prod` with production settings

### 3. Deployment Schedule
- Set up a weekly deployment cycle
- Monday-Thursday: Development work
- Friday: Deploy to production

### 4. Documentation
- Add a README.md to your Git repository
- Document what each notebook does
- Explain the deployment process

### 5. Team Collaboration
- Practice creating a feature branch
- Make changes on the branch
- Create a Pull Request in Azure DevOps
- Have a teammate review and merge

### 6. Bundle Management
- Create multiple versions of your bundle
- Practice deploying different versions
- Roll back to a previous bundle version

---

**Additional Resources:**
- [Databricks Git Folders](https://docs.databricks.com/repos/)
- [Git Folders Concepts](https://docs.databricks.com/repos/git-folders-concepts)
- [Asset Bundles Overview](https://docs.databricks.com/dev-tools/bundles/)
- [Workspace Bundles Tutorial](https://docs.databricks.com/dev-tools/bundles/workspace-tutorial)

**Congratulations!** You now have the foundation for version control and deployment in Databricks. üéâ
