# Welcome to the ❄️ Notebooks in Workspaces Preview! 🌟
Snowflake Notebook in Workspaces is a fully-managed Jupyter-powered notebook built for end-to-end DS and ML development on Snowflake data. This includes: 
- 🐍 **Familiar Jupyter experience** - Get the full power of a Jupyter Python notebook environment, directly connected to the governed Snowflake data. 
- ✏️ **Full IDE features** - Easy editing and file management for maximum productivity.
- 🧠 **Powerful for AI/ML** - Runs in a pre-built container environment optimized for scalable AI/ML development with fully-managed access to CPUs and GPUs, parallel data loading, distributed training APIs for popular ML packages (e.g. xgboost, pytorch, lightGBM).
- ⚙️ **Governed collaboration** - Enable multiple users to collaborate simultaneously with built-in governance and a complete history of changes via Git or shared workspaces.

In this demo notebook, we’ll highlight some features that will make your work easier and more efficient!

## Markdown H1, H2, H3 are auto-converted to ToC
For SQL and Python cells, if the first line is a comment, that'll be the "cell name" shown in the minimap.

## Easier package management

## Pre-installed packages
Our Container Runtime comes with ~70 popular data science and machine learning packages. Check them out using `!pip freeze`! 

In [ ]:
# List the pre-installed packages
!pip freeze

## Streamline package setup using `requirements.txt`

Quickly set up your environment by using a `requirements.txt` file across all notebooks in your Workspace! This is a great way to ensure reproducibility and facilitate collaboration & deployment.

- List all dependencies
- Use the exact same versions

You may need to restart the service to get packages updated. Navigate to the split button on "Connected" and select "Restart service".

In [ ]:
!pip install -r ../packages/requirements.txt

# Now restart notebook if you want the package versions to match what's specified in requirements.txt

In [ ]:
# Check package version matches what's specified in requirements.txt
import matplotlib as plt
print("Note that you need to restart notebook. Updated matplotlib version:", plt.__version__)

## Get more packages via EAIs
Ask your admins to provision External Access Integrations that allows secure connection from your Snowflake environment to external end points. Turn on EAIs when you create a new service. The EAIs will be shared across all notebooks connected to the same service.



In [ ]:
# Quickly check that your service can now reach external endpoints allowed by the EAIs.
import requests

def check_internet(url="http://www.google.com/", timeout=3):
    try:
        response = requests.get(url, timeout=timeout)
        return True if response.status_code == 200 else False
    except requests.ConnectionError:
        return False

if check_internet():
    print("Internet is available ✅")
else:
    print("No internet connection ❌")

In [ ]:
# Now you can `pip install` packages
!pip install duckdb
import duckdb
print("DuckDB version:", duckdb.__version__)

## Install from a `.whl` file

In [ ]:
# Upload your `.whl` file to Workspace and pip install
!pip install "../packages/loguru-0.7.3-py3-none-any.whl"

In [ ]:
import loguru
print("loguru version:", loguru.__version__)

# Easier and more powerful cell referencing

We now support SQL to Python and Python to SQL referencing **including dataframes**!

### From Python to SQL
Uploaded a CSV file and want to convert it into SQL? Now you can now follow these simple steps:

In [ ]:
# Read in data from a CSV file

import pandas as pd

uploaded_df = pd.read_csv("../data/diamonds.csv")
uploaded_df

In [ ]:
var = '"price"'

In [ ]:
-- Reference a pandas dataframe and a Python variable in a SQL query
SELECT * FROM {{uploaded_df}} where {{var}} > 326;

### From SQL to SQL or SQL to Python

All SQL cells come with a `pandas` dataframe mapping to the last query in the cell. For example, the above cell has `dataframe_1`.
- You can directly use `dataframe_1` in Python code.
- Or, wrap it inside double curly braces to be referenced in another SQL query. 

Let's look at some examples.


In [ ]:
-- Reference a SQL query result in another SQL query
SELECT * FROM {{dataframe_1}} WHERE "carat" < 1.0
UNION ALL
SELECT * FROM {{dataframe_1}} WHERE "carat" >= 1.0;

In [ ]:
# Or directly use a SQL table result in Python code for some visualization!
import matplotlib.pyplot as plt
import seaborn as sns

# Randomly sample 2000 points for plotting (dataset has ~54k rows)
sampled = dataframe_1.sample(n=1000, random_state=42)

# Plot sampled points
plt.figure(figsize=(8, 5))
plt.scatter(sampled['carat'], sampled['price'], alpha=0.3, color='green')
plt.xlabel('Carat')
plt.ylabel('Price')
plt.title('Diamonds: Price vs Carat (Sampled)')
plt.grid(True)
plt.show()

# Import from Python files

In [ ]:
%load_ext autoreload
%autoreload 2

In [ ]:
from math_utils import add, multiply

# Use these functions
print(add(2, 3))      

# Magic commands

You may have noticed that the `%autoreload` magic is used in the above cells to pull in changes to the `math_utils.py` file. Chcek out more built-in line and cell magics!

In [ ]:
%lsmagic

In [ ]:
# Try some more magics!
%env

In [ ]:
%timeit sum(range(1000000))    

### Chaining multiple Notebooks / Python files
You may want to modularize your project into separate files and iterate on those separately, e.g. business logic units, ML preprocessing functions, SQL parsers, or visualization utilities. To run these pieces together, simple bring them into the Notebook process by using the `%run` command.

In [ ]:
# Execute another notebook in the same process
%run "2nd_notebook.ipynb"

# Git-synced workspace
Sync with your Git repo from Workspaces! View diff, merge conflicts, switch branches, and more!