# Snowpark Scala in Workspace Notebooks (Prototype)

This notebook demonstrates running **Scala** and **Snowpark Scala** within a
Snowflake Workspace Notebook using a `%%scala` cell magic powered by JPype.

**Architecture:** Python kernel → JPype (JNI) → JVM (in-process) → Scala REPL → Snowpark

---

## Contents

1. Installation & Configuration
2. Basic Scala Execution
3. Python <-> Scala Interop
4. Snowpark Scala Session
   - 4.6 Snowpark DataFrame Interop (SQL Plan Transfer)
5. Diagnostics
6. Spark Connect for Scala (opt-in)

---
## 1. Installation & Configuration

### 1.1 Install JDK, Scala, and Snowpark JAR

Run the setup script. This takes ~2-4 minutes on first run (installs
OpenJDK 17, Scala 2.12, Ammonite, Snowpark JAR via micromamba + coursier).

On subsequent runs it detects what is already installed and skips those steps.

In [None]:
!bash setup_scala_environment.sh

### 1.2 Configure Python Environment & Register Scala Magics

This cell:
1. Sets `JAVA_HOME` and `PATH`
2. Installs JPype1 into the kernel venv (if needed)
3. Starts the JVM in-process with the Scala + Snowpark classpath
4. Initialises the Scala REPL (Ammonite-lite or IMain)
5. Registers `%%scala` (cell) and `%scala` (line) magics

In [None]:
from scala_helpers import setup_scala_environment

result = setup_scala_environment()

print(f"Success:          {result['success']}")
print(f"Java version:     {result['java_version']}")
print(f"Scala version:    {result['scala_version']}")
print(f"Interpreter type: {result['interpreter_type']}")
print(f"JVM started:      {result['jvm_started']}")
print(f"Magic registered: {result['magic_registered']}")
if result.get('jvm_options'):
    print(f"JVM options:      {result['jvm_options']}")

if result['errors']:
    print(f"\nErrors:")
    for err in result['errors']:
        print(f"  - {err}")

### 1.3 Verify Scala Execution

In [None]:
%%scala
println(s"Hello from Scala ${util.Properties.versionString}")
println(s"Java: ${System.getProperty("java.version")}")
println(s"OS: ${System.getProperty("os.name")}")

### 1.4 Single-line Scala (`%scala`)

The `%scala` line magic runs a single Scala expression inline — handy
for quick checks without a full `%%scala` cell.

**Note:** IPython expands `${expr}` in line magic arguments before Scala
sees them. Use `$varName` (no braces) or string concatenation for `%scala`.
For `s"${...}"` interpolation, use `%%scala` cells instead.

In [None]:
%scala println("Quick check: 2 + 2 = " + (2 + 2))
%scala val v = util.Properties.versionString; println(s"Scala $v")

---
## 2. Basic Scala Execution

State persists across `%%scala` cells — vals, defs, imports, and classes
defined in one cell are available in the next.

In [None]:
%%scala
// Define a value
val greeting = "Hello from Snowflake Workspace Notebook!"
println(greeting)

In [None]:
%%scala
// Previous cell's 'greeting' is still in scope
println(s"Greeting length: ${greeting.length}")

// Define a function
def factorial(n: Int): BigInt = if (n <= 1) 1 else n * factorial(n - 1)

println(s"10! = ${factorial(10)}")
println(s"20! = ${factorial(20)}")

In [None]:
%%scala
// Collections and functional programming
val numbers = (1 to 10).toList
val squares = numbers.map(n => n * n)
val evenSquares = squares.filter(_ % 2 == 0)

println(s"Numbers:      $numbers")
println(s"Squares:      $squares")
println(s"Even squares: $evenSquares")
println(s"Sum:          ${evenSquares.sum}")

In [None]:
%%scala
// Case classes and pattern matching
case class Employee(name: String, department: String, salary: Double)

val employees = List(
  Employee("Alice", "Engineering", 120000),
  Employee("Bob", "Engineering", 115000),
  Employee("Carol", "Data Science", 130000),
  Employee("Dave", "Data Science", 125000),
  Employee("Eve", "Product", 110000)
)

val byDept = employees.groupBy(_.department).map {
  case (dept, emps) => (dept, emps.map(_.salary).sum / emps.size)
}

byDept.toList.sortBy(-_._2).foreach {
  case (dept, avgSalary) =>
    println(f"  $dept%-20s $$${avgSalary}%,.0f")
}

---
## 3. Python ↔ Scala Interoperability

### 3.1 Push values from Python to Scala

In [None]:
from scala_helpers import push_to_scala

# Push a string and number from Python into the Scala interpreter
push_to_scala("pythonMessage", "Hello from Python!")
push_to_scala("pythonNumber", 42)

In [None]:
%%scala
// Access the variables pushed from Python
println(s"From Python: $pythonMessage")
println(s"Number: $pythonNumber")

### 3.2 Pull values from Scala to Python

In [None]:
%%scala
val scalaResult = (1 to 100).sum
println(s"Sum 1..100 = $scalaResult")

In [None]:
from scala_helpers import pull_from_scala

value = pull_from_scala("scalaResult")
print(f"Pulled from Scala: {value} (type: {type(value).__name__})")

### 3.3 Magic flags: `-i` and `-o` (like rpy2's `%%R`)

Instead of calling `push_to_scala()` / `pull_from_scala()` explicitly,
you can use **`-i`** and **`-o`** flags directly on the `%%scala` line —
the same pattern as rpy2's `%%R -i` / `%%R -o`.

In [None]:
# Define Python variables to push into Scala
py_limit = 50
py_label = "first N numbers"

In [None]:
%%scala -i py_limit,py_label -o scala_sum --time
// py_limit and py_label were pushed from Python automatically
val n = py_limit.asInstanceOf[Int]
val scala_sum = (1 to n).sum
println(s"Sum of $py_label (1 to $n) = $scala_sum")

In [None]:
# scala_sum was pulled back into Python automatically via -o
print(f"Back in Python: scala_sum = {scala_sum} (type: {type(scala_sum).__name__})")

---
## 4. Snowpark Scala Session

### 4.1 Inject credentials

Extract credentials from the Python session and the SPCS container token,
then set them as Java System properties for Scala.

Inside a Workspace Notebook, the SPCS OAuth token at `/snowflake/session/token`
is used automatically. No PAT is needed.

In [None]:
from snowflake.snowpark.context import get_active_session
from scala_helpers import inject_session_credentials

session = get_active_session()
creds = inject_session_credentials(session)

print("Credentials injected as Java System properties:")
for k, v in creds.items():
    if 'TOKEN' in k:
        print(f"  {k}: {'SET (' + str(len(v)) + ' chars)' if v else 'NOT SET'}")
    else:
        print(f"  {k}: {v}")

### 4.2 Preview Session Code

In [None]:
from scala_helpers import create_snowpark_scala_session_code

code = create_snowpark_scala_session_code()
print(code)

### 4.3 Create Snowpark Scala Session

In [None]:
%%scala
import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._

def prop(k: String): String = {
  val v = System.getProperty(k)
  require(v != null, s"System property '$k' not set. Run inject_session_credentials() first.")
  v
}

val sfSession = Session.builder.configs(Map(
  "URL"           -> prop("SNOWFLAKE_URL"),
  "USER"          -> prop("SNOWFLAKE_USER"),
  "ROLE"          -> prop("SNOWFLAKE_ROLE"),
  "DB"            -> prop("SNOWFLAKE_DATABASE"),
  "SCHEMA"        -> prop("SNOWFLAKE_SCHEMA"),
  "WAREHOUSE"     -> prop("SNOWFLAKE_WAREHOUSE"),
  "TOKEN"         -> prop("SNOWFLAKE_TOKEN"),
  "AUTHENTICATOR" -> prop("SNOWFLAKE_AUTH_TYPE")
)).create

println("Snowpark Scala session created!")
val _user = sfSession.sql("SELECT CURRENT_USER()").collect()(0).getString(0)
val _role = sfSession.sql("SELECT CURRENT_ROLE()").collect()(0).getString(0)
val _db = sfSession.sql("SELECT CURRENT_DATABASE()").collect()(0).getString(0)
println(s"  User:      ${_user}")
println(s"  Role:      ${_role}")
println(s"  Database:  ${_db}")

### 4.4 Query Snowflake from Scala

In [None]:
%%scala
// Basic query
sfSession.sql("SELECT CURRENT_USER() AS user, CURRENT_ROLE() AS role, CURRENT_WAREHOUSE() AS warehouse").show()

In [None]:
%%scala
// DataFrame operations
val df = sfSession.sql("SELECT 'Scala' AS language, 'Snowpark' AS framework, CURRENT_TIMESTAMP() AS ts")
df.show()

In [None]:
%%scala
// Show available tables
sfSession.sql("SHOW TABLES LIMIT 5").show()

### 4.5 Cross-language Data Sharing

The Python and Scala Snowpark sessions are **separate connections**, so
`TEMPORARY TABLE`s (which are session-scoped) are not visible across them.
Use a `TRANSIENT TABLE` instead, and drop it when done.

In [None]:
# Python: create a transient table (visible across sessions, unlike TEMPORARY)
session.sql("""
    CREATE OR REPLACE TRANSIENT TABLE scala_demo (
        id INT, name STRING, value DOUBLE
    ) AS
    SELECT column1, column2, column3 FROM VALUES
        (1, 'alpha', 10.5),
        (2, 'beta', 20.3),
        (3, 'gamma', 30.7)
""").collect()
print("Transient table 'scala_demo' created from Python")

In [None]:
%%scala
// Scala: read the temp table created by Python
val demo = sfSession.table("scala_demo")
demo.show()

// Compute something
val total = demo.select(sum(col("VALUE"))).collect()(0).getDouble(0)
println(s"Total value: $total")

### 4.6 Snowpark DataFrame Interop (SQL Plan Transfer)

When `-i` or `-o` reference a **Snowpark DataFrame**, the magic
auto-detects it and transfers the underlying **SQL query plan** instead
of materialising data through temp tables.

- **Python → Scala (`-i`):** extracts `df.queries['queries'][-1]` and
  creates a Scala `sfSession.sql(...)` DataFrame.
- **Scala → Python (`-o`):** extracts the SQL from the Scala DataFrame
  and creates a Python `session.sql(...)` DataFrame.

No data is copied — only the SQL string crosses the bridge.

In [None]:
# Python → Scala: push a Snowpark Python DataFrame into Scala
py_df = session.sql("""
    SELECT column1 AS id, column2 AS name, column3 AS score
    FROM VALUES (1, 'Alice', 95.0), (2, 'Bob', 87.5), (3, 'Carol', 92.0)
""")
print(f"Python DataFrame SQL plan:\n  {py_df.queries['queries'][-1][:80]}...")
py_df.show()

In [None]:
%%scala -i py_df -o result_df --time
// py_df is now a Scala Snowpark DataFrame (created from the SQL plan)
// We can apply Scala Snowpark transformations on it
val result_df = py_df.filter(col("SCORE") > 90.0).select(col("NAME"), col("SCORE"))
result_df.show()
println("Filtered to scores > 90 — this DataFrame will be pulled back to Python via -o")

In [None]:
# Scala → Python: result_df was pulled back automatically via -o
# It's now a Snowpark Python DataFrame — use it with Python Snowpark API
print(f"Type: {type(result_df).__name__}")
result_df.show()

# Convert to Pandas if needed
pandas_df = result_df.to_pandas()
print(f"\nAs Pandas:\n{pandas_df}")

In [None]:
# Cleanup any interop views (if the view-based fallback was used)
from scala_helpers import cleanup_interop_views
dropped = cleanup_interop_views()
if dropped:
    print(f"Cleaned up {dropped} interop view(s)")
else:
    print("No interop views to clean up (SQL plan transfer was used)")

In [None]:
# Cleanup: drop the transient demo table
session.sql("DROP TABLE IF EXISTS scala_demo").collect()
print("Table 'scala_demo' dropped")

---
## 5. Diagnostics

Run the diagnostics check to verify the JVM, Scala interpreter,
Snowpark classpath, credentials, and disk space are all healthy.

In [None]:
from scala_helpers import print_diagnostics
print_diagnostics()

## 6. Spark Connect for Scala (opt-in)

When `spark_connect.enabled: true` is set in `scala_packages.yaml`, the
installer also sets up Snowpark Connect for Scala. This starts a local
Spark Connect gRPC server (Python proxy consuming SPCS OAuth) and makes
a Scala `SparkSession` available in `%%scala` cells as `spark`.

**Two APIs, one notebook:**
- `sfSession.sql(...)` — Snowpark Scala (direct JDBC, full Snowpark API)
- `spark.sql(...)` — Spark SQL via Spark Connect (Spark DataFrame API)

Both use the same JVM. No PAT needed — auth flows through the Python proxy.

### 6.1 Start Spark Connect Server + Scala SparkSession

In [None]:
from scala_helpers import setup_spark_connect

sc_result = setup_spark_connect()
print(f"Spark Connect: {'ready' if sc_result['success'] else 'FAILED'}")
print(f"Server port:   {sc_result['server_port']}")
if sc_result.get('pyspark_version'):
    print(f"PySpark:       {sc_result['pyspark_version']}")
if sc_result['errors']:
    for err in sc_result['errors']:
        print(f"  ERROR: {err}")

### 6.2 Scala Spark SQL via %%scala

In [None]:
%%scala
// Spark SQL via the local Spark Connect proxy
val df = spark.sql("SELECT 1 AS id, 'hello from Scala Spark' AS msg")
df.show()
println("Spark Connect: working")

### 6.3 Query Snowflake Tables via Spark SQL

In [None]:
%%scala
// Query Snowflake metadata via Spark SQL (goes through the gRPC proxy)
// Note: CURRENT_ROLE() is not supported by Spark Connect proxy
spark.sql("SELECT CURRENT_USER() AS user").show()

spark.sql("""
  SELECT TABLE_SCHEMA, TABLE_NAME, ROW_COUNT
  FROM INFORMATION_SCHEMA.TABLES
  WHERE ROW_COUNT IS NOT NULL
  ORDER BY ROW_COUNT DESC
  LIMIT 5
""").show()

### 6.4 Side-by-side: Snowpark Scala vs Spark SQL

Both APIs work in the same notebook. Use whichever fits your use case:
- `sfSession` for native Snowpark Scala operations (DataFrame API, UDFs, stored procs)
- `spark` for Spark SQL syntax (familiar to Spark users, broader SQL dialect)

In [None]:
%%scala
// Snowpark Scala: native Snowpark API
println("=== Snowpark Scala (sfSession) ===")
sfSession.sql("SELECT 'snowpark' AS api, CURRENT_USER() AS user").show()

// Spark SQL: via gRPC proxy
println("=== Spark SQL (spark) ===")
spark.sql("SELECT 'spark_connect' AS api, CURRENT_USER() AS user").show()

### 6.5 Interop: Snowpark Python writes, Scala Spark reads

This demonstrates cross-language data sharing. Python writes a transient
table via Snowpark, and Scala reads it via Spark SQL through the local proxy.

In [None]:
# Python: write a transient table
session.sql("""
    CREATE OR REPLACE TRANSIENT TABLE _SPARK_CONNECT_DEMO AS
    SELECT column1 AS id, column2 AS source FROM VALUES
        (1, 'from_snowpark_python'),
        (2, 'from_snowpark_python')
""").collect()
print("Snowpark Python: wrote _SPARK_CONNECT_DEMO")

In [None]:
%%scala
// Scala Spark reads the table written by Python
val interop_df = spark.sql("SELECT * FROM _SPARK_CONNECT_DEMO")
interop_df.show()
println("Scala Spark: read table written by Snowpark Python")

In [None]:
# Cleanup
session.sql("DROP TABLE IF EXISTS _SPARK_CONNECT_DEMO").collect()
print("Cleanup: _SPARK_CONNECT_DEMO dropped")