# 🚀 Databricks Getting Started - Essential Demos Installation

Welcome to Databricks! This notebook will help you get started by installing and exploring the most important demos that showcase key platform capabilities.

## What You'll Install

This notebook installs 7 essential **dbdemos** that every new Databricks user should explore:

* **Delta Lake** - Learn about the foundation of the Databricks Lakehouse with ACID transactions, time travel, and data versioning
* **Auto Loader** - Master incremental data ingestion from cloud storage with automatic schema evolution
* **Unity Catalog Data Lineage** - Understand data governance, lineage tracking, and metadata management
* **Delta Sharing Airlines** - Explore secure data sharing across organizations without copying data
* **Delta Live Tables Pipeline** - Build reliable data pipelines with declarative ETL
* **AI/BI Portfolio Assistant** - Explore advanced analytics and AI capabilities with dashboards and Genie
* **SQL Warehouse** - Understand data warehousing features including identity columns, primary/foreign keys, and stored procedures

## Prerequisites

* Databricks workspace access
* Cluster with appropriate permissions
* Internet connectivity for package installation

## How to Use This Notebook

1. Click the "Run All" command in the top right of the notebook
2. Each demo installation may take 2-5 minutes
3. After installation, explore the generated folders in your workspace
4. Follow the README files in each demo folder for detailed walkthroughs

---

**⚠️ Note**: Demo installations will create new folders, tables, and resources in your workspace. Make sure you have sufficient permissions and storage quota.

In [0]:
# Install the dbdemos package
# This package provides pre-built demos showcasing Databricks capabilities
%pip install dbdemos --quiet

# Restart Python to ensure the package is properly loaded
dbutils.library.restartPython()

In [0]:
import dbdemos

# Display dbdemos version and available demos
print(f"dbdemos version: {dbdemos.__version__}")
print("\n📋 Installing 7 essential demos for new Databricks users...")
print("Each installation may take 2-5 minutes depending on demo complexity.")

## 🏗️ Demo 1: Delta Lake - The Foundation of Databricks Lakehouse

**Delta Lake** is the storage layer that brings ACID transactions to Apache Spark and big data workloads.

### What you'll learn:
* ACID transactions for data reliability
* Time travel and data versioning
* Schema enforcement and evolution
* Optimizations like Z-ordering and auto-compaction
* Streaming and batch data processing

### Key Features Demonstrated:
* Creating Delta tables
* Handling schema changes
* Time travel queries
* Merge operations (UPSERT)
* Performance optimizations

In [0]:
# Install Delta Lake demo
print("🔄 Installing Delta Lake demo...")
dbdemos.install('delta-lake')
print("✅ Delta Lake demo installed successfully!")
print("📁 Check the 'delta-lake' folder in your workspace for notebooks and datasets.")

## 📥 Demo 2: Auto Loader - Incremental Data Ingestion

**Auto Loader** incrementally and efficiently processes new data files as they arrive in cloud storage.

### What you'll learn:
* Setting up Auto Loader for various file formats
* Automatic schema inference and evolution
* Handling bad records and data quality
* Monitoring and alerting
* Integration with Delta Live Tables

### Key Features Demonstrated:
* Cloud file ingestion (S3, ADLS, GCS)
* Schema evolution handling
* Checkpointing and exactly-once processing
* Error handling and dead letter queues
* Performance optimization **techniques**

In [0]:
# Install Auto Loader demo
print("🔄 Installing Auto Loader demo...")
dbdemos.install('auto-loader')
print("✅ Auto Loader demo installed successfully!")
print("📁 Check the 'auto-loader' folder in your workspace for ingestion patterns and examples.")

## 📈 Demo 3: Unity Catalog Data Lineage - Data Governance & Metadata Management

**Unity Catalog** provides centralized governance, security, and lineage tracking across your data estate.

### What you'll learn:
* Data governance and access control
* Automatic lineage tracking and visualization
* Metadata management and discovery
* Cross-workspace data sharing
* Audit logging and compliance

### Key Features Demonstrated:
* Catalog and schema management
* Table and column-level lineage
* Data discovery and search
* Access control policies
* Audit trail and monitoring

In [0]:
# Install Unity Catalog Data Lineage demo
print("🔄 Installing Unity Catalog Data Lineage demo...")
dbdemos.install('uc-03-data-lineage')
print("✅ Unity Catalog Data Lineage demo installed successfully!")
print("📁 Check the 'uc-03-data-lineage' folder for governance and lineage examples.")

## ✈️ Demo 4: Delta Sharing Airlines - Secure Cross-Organization Data Sharing

**Delta Sharing** enables secure data sharing across organizations without copying data, using an open protocol.

### What you'll learn:
* Setting up Delta Sharing providers and recipients
* Sharing live data across organizations securely
* Managing data sharing permissions and access
* Working with shared datasets in real-time
* Cross-platform data collaboration

### Key Features Demonstrated:
* Creating and managing data shares
* Recipient access and authentication
* Real-time data access without copying
* Cross-cloud and cross-platform sharing
* Audit and monitoring of shared data access

In [0]:
# Install Delta Sharing Airlines demo
print("🔄 Installing Delta Sharing Airlines demo...")
dbdemos.install('delta-sharing-airlines', overwrite=True)
print("✅ Delta Sharing Airlines demo installed successfully!")
print("📁 Check the 'delta-sharing-airlines' folder for data sharing examples and configurations.")

## 🔄 Demo 5: Delta Live Tables Pipeline - Declarative ETL Pipelines

**Delta Live Tables (DLT)** simplifies building reliable, maintainable, and testable data processing pipelines.

### What you'll learn:
* Declarative pipeline development
* Automatic data quality monitoring
* Pipeline orchestration and scheduling
* Error handling and recovery
* Live table and streaming table patterns

### Key Features Demonstrated:
* Creating live tables and streaming tables
* Data quality constraints and expectations
* Pipeline dependency management
* Automatic schema evolution
* Monitoring and observability

In [0]:
# Install Delta Live Tables Pipeline demo with custom catalog and schema
print("🔄 Installing Delta Live Tables Pipeline demo...")
dbdemos.install('pipeline-bike', catalog='main', schema='dbdemos_pipeline_bike', overwrite=True)
print("✅ Delta Live Tables Pipeline demo installed successfully!")
print("📁 Check the 'pipeline-bike' folder for DLT pipeline examples and configurations.")
print("🗄️ Data stored in: main.dbdemos_pipeline_bike")

## 🤖 Demo 6: AI/BI Portfolio Assistant - Advanced Analytics & AI

**Databricks AI/BI** combines the power of AI with business intelligence for advanced analytics in capital markets.

### What you'll learn:
* Building AI-powered dashboards
* Using Genie for natural language queries
* Financial data analysis and modeling
* Real-time portfolio monitoring
* Advanced visualization techniques

### Key Features Demonstrated:
* AI-assisted data exploration
* Natural language to SQL with Genie
* Interactive dashboards
* Financial risk modeling
* Automated insights and alerts

**Note**: This demo uses a custom catalog and schema for financial services data.

In [0]:
# Install AI/BI Portfolio Assistant demo with custom catalog and schema
print("🔄 Installing AI/BI Portfolio Assistant demo...")
dbdemos.install('aibi-portfolio-assistant', catalog='main', schema='dbdemos_aibi_fsi_portfolio_assistant', overwrite=True)
print("✅ AI/BI Portfolio Assistant demo installed successfully!")
print("📁 Check the 'aibi-portfolio-assistant' folder for dashboards and AI-powered analytics.")
print("🗄️ Data stored in: main.dbdemos_aibi_fsi_portfolio_assistant")

## 🏢 Demo 7: SQL Warehouse - Enterprise Data Warehousing

**SQL Warehouse** demonstrates advanced data warehousing capabilities including modern SQL features and enterprise-grade functionality.

### What you'll learn:
* Identity columns and auto-incrementing keys
* Primary and foreign key constraints
* Stored procedures and functions
* Control flow with loops and conditionals
* Advanced SQL patterns and optimizations

### Key Features Demonstrated:
* Table constraints and relationships
* Stored procedure development
* Transaction management
* Performance tuning
* Data governance and security

In [0]:
# Install SQL Warehouse demo
print("🔄 Installing SQL Warehouse demo...")
dbdemos.install('sql-warehouse')
print("✅ SQL Warehouse demo installed successfully!")
print("📁 Check the 'sql-warehouse' folder for advanced SQL examples and stored procedures.")

# 🎉 Installation Complete!

Congratulations! You've successfully installed 7 essential Databricks demos. Here's what to do next:

## 📂 Explore Your New Demo Folders

Check your workspace for these new folders:
* `delta-lake/` - Delta Lake fundamentals and advanced features
* `auto-loader/` - Data ingestion patterns
* `uc-03-data-lineage/` - Unity Catalog governance and lineage
* `delta-sharing-airlines/` - Secure cross-organization data sharing
* `pipeline-bike/` - Delta Live Tables declarative ETL pipelines
* `aibi-portfolio-assistant/` - AI/BI analytics and dashboards
* `sql-warehouse/` - Advanced SQL warehousing features

## 🚀 Recommended Learning Path

1. **Start with Delta Lake** - Understanding the storage foundation
2. **Explore Auto Loader** - Learn data ingestion patterns
3. **Understand Unity Catalog** - Data governance and lineage
4. **Try Delta Sharing** - Secure data collaboration
5. **Build with Delta Live Tables** - Reliable ETL pipelines
6. **Experiment with SQL Warehouse** - Advanced SQL features
7. **Dive into AI/BI** - Advanced analytics and AI capabilities

## 📚 Additional Resources

* [Databricks Documentation](https://docs.databricks.com/)
* [Databricks Academy](https://www.databricks.com/learn/training/login)
* [Community Forums](https://community.databricks.com/)
* [GitHub Examples](https://github.com/databricks)

## 💡 Tips for Success

* Each demo folder contains a README with detailed instructions
* Start with the `00-` numbered notebooks in each folder
* Don't hesitate to modify and experiment with the code
* Join the Databricks community for support and best practices

---

**Happy Learning! 🎓**