# Databricks Git Integration (Git Folders)

**Objective:**
In this session, we will learn how to integrate Git repositories with the Azure Databricks workspace using **Databricks Git Folders** (formerly known as Repos). This is the starting point for implementing CI/CD pipelines for your data engineering workloads.

**What we will cover:**
1.  Setting up an **Azure DevOps** Project and Repository.
2.  Configuring Git Credentials in Databricks.
3.  Cloning a remote repository into Databricks.
4.  Performing Git operations (Branching, Commit, Push).
5.  Best practices for storing notebooks (Source format vs. IPYNB).

**Prerequisites:**
*   Basic knowledge of Git concepts (Clone, Commit, Push, Branch).
*   An Azure DevOps account (or GitHub/GitLab).

## Step 1: Azure DevOps Environment Setup

Before working in Databricks, we need a remote repository to store our code.

1.  **Log in to Azure DevOps:**
    *   Go to [dev.azure.com](https://dev.azure.com).
    *   If you don't have an organization, create one.

2.  **Create a New Project:**
    *   Click **"New Project"**.
    *   Name: `Self` (or your preferred name).
    *   Visibility: Private.

3.  **Initialize a Repository:**
    *   Navigate to **Repos** on the left sidebar.
    *   Select **"New Repository"**.
    *   Name: `adb_cicd`.
    *   **Important:** Add a `.gitignore` file. Search for "Visual Studio" to exclude common temporary files automatically.
    *   Click **Create**.

## Step 2: Connect Databricks to Azure DevOps

To allow Databricks to talk to Azure DevOps, we need to set up authentication.

1.  **Generate Git Credentials (in Azure DevOps):**
    *   Go to your new Repo.
    *   Click the **Clone** button (top right).
    *   Click **"Generate Git Credentials"**.
    *   **Copy the Password/Token** generated. (Keep the Username handy as well).

2.  **Link Account (in Databricks):**
    *   Go to your Databricks Workspace.
    *   Click your **Profile Icon** (top right) -> **Settings**.
    *   Go to **Linked Accounts**.
    *   Under **Git Integration**, click **"Add Git Credential"**.
    *   **Provider:** Azure DevOps Services (Personal Access Token).
    *   **Username:** Your email or the username generated in the previous step.
    *   **Token:** Paste the password/token you copied.
    *   Click **Save**.

## Step 3: Create a Databricks Git Folder

Now we clone the remote repository into our Databricks Workspace.

1.  **Copy Repo URL:**
    *   Back in Azure DevOps, click **Clone** and copy the **HTTPS URL**.
    *   *Example:* `https://dev.azure.com/your_org/EaseWithData/_git/adb_cicd`

2.  **Create Repo in Databricks:**
    *   In Databricks, go to **Workspace** -> **Repos** (or navigate to your user folder).
    *   Click **Add** -> **Repo**.
    *   **Git repository URL:** Paste the URL copied above.
    *   **Git provider:** Azure DevOps Services (should auto-detect).
    *   **Repo name:** `adb_cicd` (default).
    *   Click **Create Repo**.

*You should now see your README.md and .gitignore files inside Databricks.*

## Step 4: Development Workflow & Branching

It is best practice not to work directly on the `main` branch.

1.  **Create a Branch:**
    *   Click on the branch name (currently **main**) in the Git Folder UI.
    *   Click **"Create Branch"**.
    *   Name: `feature/demo`.
    *   Click **Create**.

2.  **Create Code Assets:**
    *   Create a new folder named `notebooks`.
    *   Inside, create a new notebook named `demo_repo_notebook`.

3.  **File Format Best Practice:**
    *   By default, notebooks might save as `.ipynb` (standard Jupyter).
    *   **Recommended:** Go to **File** -> **Export** or verify settings to save as **Source** (e.g., `.py` for Python).
    *   *Why?* Source files are easier to diff and review in Pull Requests compared to raw JSON/IPYNB files.

In [None]:
# Let's add some sample code to our new notebook to test the sync.
# This code represents a simple transformation or data generation task.

import pyspark.sql.functions as F

# Create a simple DataFrame
df = spark.range(10)

# Display the data
display(df)

print("Code executed successfully. Ready to commit.")

## Step 5: Commit and Push

Now that we have made changes, let's push them to the remote repository.

1.  **Review Changes:**
    *   Click the **Git** icon (branch name) in the left sidebar or top menu.
    *   You will see your new file listed under **Changes**.

2.  **Commit:**
    *   Select the file(s) you want to stage.
    *   Enter a **Commit Message**: e.g., "Added demo notebook".
    *   Click **Commit & Push**.

3.  **Verify:**
    *   Go back to **Azure DevOps**.
    *   Refresh the page ensuring you are looking at the `feature/demo` branch.
    *   You should see the `notebooks` folder and your `.py` file there.

## Summary

We have successfully:
1.  Created a repository in Azure DevOps.
2.  Linked Databricks to Azure DevOps using Git Credentials.
3.  Cloned the repo into a Databricks Git Folder.
4.  Created a feature branch, wrote code, and pushed changes back to the remote repo.

**Next Steps:**
In the next session, we will look into **Databricks CLI** to manage our workspace programmatically.