# Snowpark Java in Workspace Notebooks (Prototype)

This notebook demonstrates running **Java** and **Snowpark Java** within a
Snowflake Workspace Notebook using a `%%java` cell magic powered by JPype and JShell.

**Architecture:** Python kernel → JPype (JNI) → JVM (in-process) → JShell REPL → Snowpark Java

**What you get:**
- `%%java` cell magic and `%java` line magic for Java execution
- Full access to the Snowpark Java API (DataFrame, Session, functions)
- Python ↔ Java variable transfer via `-i` / `-o` flags
- Snowpark DataFrame interop (SQL plan transfer — no data copying)
- Shared JVM with the `%%scala` magic (both available in the same notebook)

**Prerequisites:** Snowflake Workspace Notebook (or local environment with JDK 17)

> **Note:** This notebook reuses the same `setup_scala_environment.sh` installer.
> The Snowpark JAR already contains both the Java and Scala APIs.

## 1. Installation & Configuration

The setup script installs OpenJDK 17, Scala, Ammonite, and the Snowpark JAR.
JShell ships with JDK 17 — no additional installation needed.

**This is idempotent** — safe to re-run.

In [None]:
!bash setup_scala_environment.sh

### 1.2 Configure Python Environment & Register Magics

This cell:
- Loads environment metadata from the installer
- Starts the JVM via JPype
- Initializes both the Scala REPL (IMain) and JShell for Java
- Registers `%%scala`, `%scala`, `%%java`, and `%java` magics

In [None]:
from scala_helpers import setup_scala_environment

result = setup_scala_environment()

if result['success']:
    print('Environment ready!')
    print(f"  Java:    {result['java_version']}")
    print(f"  Scala:   {result['scala_version']}")
    print(f"  REPL:    {result['interpreter_type']}")
    print(f"  JShell:  {'ready' if result.get('jshell_initialized') else 'failed'}")
    print(f"  Magics:  %%scala={'registered' if result['magic_registered'] else 'failed'}")
    print(f"           %%java={'registered' if result.get('java_magic_registered') else 'failed'}")
else:
    print('Setup failed:')
    for e in result['errors']:
        print(f'  - {e}')

### 1.3 Verify Java Execution

In [None]:
%%java
System.out.println("Hello from Java " + System.getProperty("java.version"));
System.out.println("JVM: " + System.getProperty("java.vm.name"));
System.out.println("OS:  " + System.getProperty("os.name") + " " + System.getProperty("os.arch"));

### 1.4 Single-line Java (`%java`)

The `%java` line magic runs a single Java statement inline.

In [None]:
%java System.out.println("Quick check: 2 + 2 = " + (2 + 2));
%java System.out.println("Max memory: " + (Runtime.getRuntime().maxMemory() / 1024 / 1024) + " MB");

## 2. Basic Java Execution

JShell supports full Java syntax — variables, methods, classes, generics, lambdas.

In [None]:
%%java
String greeting = "Hello from Snowflake Workspace Notebook!";
System.out.println(greeting);
System.out.println("Length: " + greeting.length());

In [None]:
%%java
// Previous cell's variables persist in JShell
System.out.println("Greeting: " + greeting.toUpperCase());

In [None]:
%%java
// Collections and streams
import java.util.*;
import java.util.stream.*;

List<Integer> numbers = IntStream.rangeClosed(1, 10).boxed().collect(Collectors.toList());
int sum = numbers.stream().mapToInt(Integer::intValue).sum();
List<Integer> evens = numbers.stream().filter(n -> n % 2 == 0).collect(Collectors.toList());

System.out.println("Numbers: " + numbers);
System.out.println("Sum:     " + sum);
System.out.println("Evens:   " + evens);

In [None]:
%%java
// Records (Java 16+) and pattern matching
record Employee(String name, String department, double salary) {}

var employees = List.of(
    new Employee("Alice", "Engineering", 95000),
    new Employee("Bob", "Marketing", 72000),
    new Employee("Charlie", "Engineering", 88000),
    new Employee("Diana", "Marketing", 81000)
);

double avgEngSalary = employees.stream()
    .filter(e -> e.department().equals("Engineering"))
    .mapToDouble(Employee::salary)
    .average()
    .orElse(0);

System.out.println("Employees:");
employees.forEach(e -> System.out.println("  " + e));
System.out.printf("Avg Engineering salary: $%,.0f%n", avgEngSalary);

## 3. Python ↔ Java Interoperability

### 3.1 Push values from Python to Java

In [None]:
# Define Python variables
py_message = "Hello from Python!"
py_count = 42
py_ratio = 3.14159
print(f"Python vars: message='{py_message}', count={py_count}, ratio={py_ratio}")

In [None]:
%%java -i py_message,py_count,py_ratio
// Variables pushed from Python are now available in Java
System.out.println("From Python: " + py_message);
System.out.println("Count: " + py_count);
System.out.println("Ratio: " + py_ratio);
System.out.println("Count * Ratio = " + (py_count * py_ratio));

### 3.2 Pull values from Java to Python

In [None]:
%%java
long javaSum = 0;
for (int i = 1; i <= 100; i++) { javaSum += i; }
String javaLabel = "Sum of 1..100";
System.out.println(javaLabel + " = " + javaSum);

In [None]:
%%java -o javaSum,javaLabel
// This cell pulls javaSum and javaLabel back into Python
System.out.println("Pulling javaSum and javaLabel to Python...");

In [None]:
# Variables pulled from Java are now in Python
print(f"javaSum = {javaSum} (type: {type(javaSum).__name__})")
print(f"javaLabel = '{javaLabel}' (type: {type(javaLabel).__name__})")

### 3.3 Magic flags: `-i` and `-o` combined

Push and pull in a single cell — same pattern as `%%scala`.

In [None]:
py_limit = 50
py_label = "first N numbers"
print(f"Will send: limit={py_limit}, label='{py_label}'")

In [None]:
%%java -i py_limit,py_label -o java_result --time
// py_limit and py_label were pushed from Python automatically
long java_result = 0;
for (long i = 1; i <= py_limit; i++) { java_result += i; }
System.out.println("Sum of " + py_label + " (n=" + py_limit + "): " + java_result);

In [None]:
# java_result was pulled back into Python via -o
print(f"Back in Python: java_result = {java_result} (type: {type(java_result).__name__})")
print(f"Verification: sum(1..{py_limit}) = {sum(range(1, py_limit+1))}")

## 4. Snowpark Java Session

### 4.1 Inject credentials from Python Snowpark session

The Python session's credentials are set as Java System properties,
which JShell can read directly.

In [None]:
from snowflake.snowpark.context import get_active_session
from scala_helpers import inject_session_credentials

session = get_active_session()
creds = inject_session_credentials(session)
print(f"Credentials injected for account: {creds.get('SNOWFLAKE_ACCOUNT', 'N/A')}")
print(f"Auth type: {creds.get('SNOWFLAKE_AUTH_TYPE', 'N/A')}")

### 4.2 Preview Session Code

In [None]:
from scala_helpers import create_snowpark_java_session_code

code = create_snowpark_java_session_code()
print("Java session code that will be executed:")
print("─" * 50)
print(code)
print("─" * 50)

### 4.3 Create Snowpark Java Session

In [None]:
%%java
import com.snowflake.snowpark_java.*;
import java.util.HashMap;
import java.util.Map;

Map<String, String> props = new HashMap<>();
props.put("URL",       System.getProperty("SNOWFLAKE_URL"));
props.put("USER",      System.getProperty("SNOWFLAKE_USER"));
props.put("ROLE",      System.getProperty("SNOWFLAKE_ROLE"));
props.put("DB",        System.getProperty("SNOWFLAKE_DATABASE"));
props.put("SCHEMA",    System.getProperty("SNOWFLAKE_SCHEMA"));
props.put("WAREHOUSE", System.getProperty("SNOWFLAKE_WAREHOUSE"));
props.put("TOKEN",     System.getProperty("SNOWFLAKE_TOKEN"));
props.put("AUTHENTICATOR", System.getProperty("SNOWFLAKE_AUTH_TYPE"));

Session javaSession = Session.builder().configs(props).create();

System.out.println("Snowpark Java session created");
Row[] info = javaSession.sql("SELECT CURRENT_USER(), CURRENT_ROLE(), CURRENT_DATABASE()").collect();
System.out.println("  User:      " + info[0].getString(0));
System.out.println("  Role:      " + info[0].getString(1));
System.out.println("  Database:  " + info[0].getString(2));

### 4.4 Query Snowflake from Java

In [None]:
%%java
// Basic query
javaSession.sql("SELECT CURRENT_USER() AS \"user\", CURRENT_ROLE() AS \"role\", CURRENT_WAREHOUSE() AS \"warehouse\"").show();

In [None]:
%%java
// DataFrame operations
DataFrame df = javaSession.sql("SELECT 'Java' AS \"language\", 'Snowpark' AS \"framework\", CURRENT_TIMESTAMP() AS \"ts\"");
df.show();

In [None]:
%%java
// Show available tables
javaSession.sql("SHOW TABLES LIMIT 5").show();

### 4.5 Cross-language Data Sharing

The Python and Java Snowpark sessions are **separate connections**, so
temporary objects aren't shared. Use **transient tables** to pass data.

In [None]:
# Python: create a transient table (visible across sessions)
session.sql("""
    CREATE OR REPLACE TRANSIENT TABLE java_demo AS
    SELECT * FROM VALUES
        ('Tokyo', 'Japan', 14000000),
        ('Delhi', 'India', 11000000),
        ('Shanghai', 'China', 24000000),
        ('Sao Paulo', 'Brazil', 12300000)
        AS t(city, country, population)
""").collect()
print("Table 'java_demo' created")

In [None]:
%%java
// Java: read the table created by Python
DataFrame demo = javaSession.table("java_demo");
demo.show();
System.out.println("Row count: " + demo.count());

### 4.6 Snowpark DataFrame Interop (SQL Plan Transfer)

When `-i` or `-o` reference a **Snowpark DataFrame**, the magic
transfers the SQL query plan — not the data itself. This is fast
and efficient regardless of data size.

In [None]:
# Python → Java: push a Snowpark Python DataFrame into Java
py_df = session.sql("""
    SELECT * FROM VALUES
        ('Alice', 'Engineering', 95000),
        ('Bob', 'Marketing', 72000),
        ('Charlie', 'Engineering', 88000),
        ('Diana', 'Marketing', 81000),
        ('Eve', 'Engineering', 102000)
        AS t(name, department, salary)
""")
print(f"Python DataFrame type: {type(py_df).__name__}")
py_df.show()

In [None]:
%%java -i py_df -o result_df --time
// py_df is now a Snowpark Java DataFrame (created from the SQL plan)
import com.snowflake.snowpark_java.Functions;

DataFrame result_df = py_df
    .filter(Functions.col("SALARY").gt(Functions.lit(80000)))
    .sort(Functions.col("SALARY").desc());

System.out.println("=== High earners (salary > 80k) ===");
result_df.show();

In [None]:
# Java → Python: result_df was pulled back automatically via -o
# It's now a Snowpark Python DataFrame
print(f"Type: {type(result_df).__name__}")
result_df.show()

In [None]:
# Cleanup interop views
from scala_helpers import cleanup_interop_views
dropped = cleanup_interop_views()
print(f"Cleaned up {dropped} interop object(s)")

In [None]:
# Cleanup: drop the transient demo table
session.sql("DROP TABLE IF EXISTS java_demo").collect()
print("Table 'java_demo' dropped")

## 5. Diagnostics

In [None]:
from scala_helpers import print_diagnostics
print_diagnostics()

## 6. Java + Scala Side by Side

Both `%%java` and `%%scala` magics share the same JVM, so they can
coexist in the same notebook. Each has its own REPL namespace but
they share System properties and can exchange data through Snowflake tables.

First, create a Snowpark Scala session (the Java session was created in Section 4):

In [None]:
from scala_helpers import bootstrap_snowpark_scala
success, msg = bootstrap_snowpark_scala(session)
print(msg)

In [None]:
%%scala
// Scala: create data and write to a transient table
val scalaData = sfSession.sql("""
    SELECT * FROM VALUES
        ('Scala', 2004, 'Odersky'),
        ('Java', 1995, 'Gosling'),
        ('Python', 1991, 'van Rossum'),
        ('Kotlin', 2011, 'Breslav')
        AS t(language, year, creator)
""")
scalaData.write.mode(com.snowflake.snowpark.SaveMode.Overwrite).saveAsTable("_LANG_DEMO")
println("Scala wrote _LANG_DEMO table")

In [None]:
%%java
// Java: read the table written by Scala
DataFrame langs = javaSession.table("_LANG_DEMO");
System.out.println("Java reads table written by Scala:");
langs.sort(Functions.col("YEAR")).show();

In [None]:
# Python: read the same table
print("Python reads the same table:")
session.table("_LANG_DEMO").sort("YEAR").show()

In [None]:
# Cleanup
session.sql("DROP TABLE IF EXISTS _LANG_DEMO").collect()
print("Cleanup: _LANG_DEMO dropped")

## 7. Spark Connect for Java (opt-in)

When `spark_connect.enabled: true` is set in `scala_packages.yaml`, the
installer also sets up a local Spark Connect gRPC server.

The Spark Java API is available through the same Spark Connect client.
This section demonstrates using Spark SQL and the Spark DataFrame API
from Java.

### 7.1 Start Spark Connect Server

In [None]:
from scala_helpers import setup_spark_connect

sc_result = setup_spark_connect()

if sc_result['success']:
    print('Spark Connect ready!')
    print(f"  Server port:     {sc_result['server_port']}")
    print(f"  PySpark version: {sc_result.get('pyspark_version', 'N/A')}")
else:
    print('Spark Connect setup issues:')
    for e in sc_result.get('errors', []):
        print(f'  - {e}')

### 7.2 Create a Java SparkSession

In [None]:
%%java
// Create a Java SparkSession connected to the local Spark Connect server
import org.apache.spark.sql.SparkSession;

SparkSession sparkJava = SparkSession.builder()
    .remote("sc://localhost:15002")
    .config("spark.sql.session.timeZone", "UTC")
    .getOrCreate();

System.out.println("Java SparkSession connected to Spark Connect");

### 7.3 Spark SQL from Java

In [None]:
%%java
// Spark SQL via the local Spark Connect proxy
org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sparkDf = sparkJava.sql(
    "SELECT 1 AS id, 'hello from Java Spark' AS message"
);
sparkDf.show();

In [None]:
%%java
// Query Snowflake metadata via Spark SQL
sparkJava.sql("SELECT CURRENT_USER() AS spark_user, CURRENT_DATABASE() AS spark_db").show();

### 7.4 Spark DataFrame API from Java

In [None]:
%%java
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import static org.apache.spark.sql.functions.*;

Dataset<Row> cities = sparkJava.sql("""
    SELECT * FROM VALUES
        ('Tokyo', 'Japan', 14000000),
        ('Delhi', 'India', 11000000),
        ('Shanghai', 'China', 24000000),
        ('London', 'UK', 9000000),
        ('Paris', 'France', 2100000)
        AS t(city, country, population)
""");

System.out.println("=== Cities ===");
cities.show();

// Filter and transform
Dataset<Row> bigCities = cities
    .filter(col("POPULATION").gt(10000000))
    .withColumn("POP_MILLIONS", col("POPULATION").divide(1000000))
    .orderBy(col("POPULATION").desc());

System.out.println("=== Cities with population > 10M ===");
bigCities.show();

## Summary

This notebook demonstrated:

1. **`%%java` magic** — Execute Java code in JShell (JDK 17 built-in REPL)
2. **Python ↔ Java interop** — Push/pull variables with `-i`/`-o` flags
3. **Snowpark Java** — Full Snowpark Java API (Session, DataFrame, functions)
4. **DataFrame interop** — SQL plan transfer between Python and Java
5. **Side-by-side** — `%%java` and `%%scala` coexist in the same notebook
6. **Spark Connect** — Optional Spark Java API via Spark Connect

The `%%java` and `%%scala` magics share the same JVM instance, classpath,
and credential injection — making it trivial to use both languages in a
single workflow.