# 👋 Cybershuttle Demo Notebook

Welcome to the Cybershuttle demo notebook! This notebook walks you through the basic steps of running a scientific or AI computation using Cybershuttle, starting locally and scaling to remote clusters.

---

**Goals of this notebook:**
- Submit a simple Python script that squares a number
- Run it both locally and remotely
- Learn to track and retrieve results via Cybershuttle

In [1]:
# 🛠️ Install and Import Apache Airavata, the software which powers Cybershuttle
%pip install -q --no-cache-dir --force-reinstall airavata-python-sdk[notebook]
import airavata_jupyter_magic


[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyopenssl 23.2.0 requires cryptography!=40.0.0,!=40.0.1,<42,>=38.0.0, but you have cryptography 44.0.2 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.

Loaded airavata_jupyter_magic (2.0.12) 
(current runtime = local)

  %authenticate                      -- Authenticate to access high-performance runtimes.
  %request_runtime <rt> [args]       -- Request a runtime named <rt> with configuration <args>. Call multiple times to request multiple runtimes.
  %restart_runtime <rt>              -- Restart runtime <rt>. Run this if you install new dependencies or if the runtime hangs.
  %stop_runtime <rt>                 -- Stop runtime <rt> when no longer needed.
  %switch_runtime <rt>               -- Switch active runtime to <rt>. All subsequent executions will 

## 💻 Step 1: Run simple scripts locally on the Cybershuttle hub. The same example can be replicated locally. \

In [2]:
# factorial example
import socket
import time
import math

# Input value
n = 10  # You can change this to any integer

# Compute factorial
result = math.factorial(n)

# Metadata
hostname = socket.gethostname()
timestamp = time.strftime('%Y-%m-%d %H:%M:%S')

# Output
print("===== Job Metadata =====")
print(f"Hostname   : {hostname}")
print(f"Timestamp  : {timestamp}")
print("========================\n")
print(f"{n}! = {result}")

===== Job Metadata =====
Hostname   : 64e7c208d4e6
Timestamp  : 2025-04-24 22:10:37

10! = 3628800


## 🚀 Step 3: Authenticate with Cybershuttle

In [3]:
%authenticate
%request_runtime hpc_cpu --file=cybershuttle.yml --walltime=60 --use=NeuroData25VC1:cloud,expanse:shared,anvil:shared
%switch_runtime hpc_cpu

Output()

Requesting runtime=hpc_cpu
cpuCount: 4
experimentName: CS_Agent
group: Default
libraries:
- python=3.10
- pip
memory: 0
mounts:
- cybershuttle-reference:/cybershuttle_data/cybershuttle-reference
nodeCount: 1
pip: []
queue: cloud
remoteCluster: NeuroData25VC1
wallTime: 60

Requested runtime=hpc_cpu. state=EXECUTING
Switched to runtime=hpc_cpu.


## 3.1: install dependencies

In [4]:
pip install streamlit transformers torch sentencepiece scikit-learn accelerate vaderSentiment

[2K[32m⠸[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[HSuccessfully installed accelerate-1.6.0 altair-5.5.0 attrs-25.3.0 cachetools-5.5.2 certifi-2025.1.31 charset-normalizer-3.4.1 filelock-3.18.0 fsspec-2025.3.2 gitdb-4.0.12 gitpython-3.1.44 huggingface-hub-0.30.2 idna-3.10 joblib-1.4.2 jsonschema-4.23.0 jsonschema-specifications-2025.4.1 mpmath-1.3.0 narwhals-1.36.0 networkx-3.4.2 numpy-2.2.5 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu12-12.6.77 packaging-24.2 pandas-2.2.3 pillow-11.2.1 protobuf-5.29.4 pyarrow-19.0.1 pydeck-0.9.1 pytz-2025.2 pyyaml-6.0.2 referencing-0.36.2 regex-2024.11.6 requests-2.32.3 rpds-py-0.24.0 safete

In [5]:
pip install hf_xet pyngrok

[2K[32m⠹[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[HCollecting hf_xet
  Using cached hf_xet-1.0.4-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (494 bytes)
Collecting pyngrok
  Using cached pyngrok-7.2.5-py3-none-any.whl.metadata (8.9 kB)
Using cached hf_xet-1.0.4-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (54.0 MB)
Using cached pyngrok-7.2.5-py3-none-any.whl (23 kB)
Installing collected packages: pyngrok, hf_xet
Successfully installed hf_xet-1.0.4 pyngrok-7.2.5
Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install ipywidgets

[2K[32m⠹[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[HCollecting ipywidgets
  Using cached ipywidgets-8.1.6-py3-none-any.whl.metadata (2.4 kB)
Collecting widgetsnbextension~=4.0.14 (from ipywidgets)
  Using cached widgetsnbextension-4.0.14-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab_widgets~=3.0.14 (from ipywidgets)
  Using cached jupyterlab_widgets-3.0.14-py3-none-any.whl.metadata (4.1 kB)
Using cached ipywidgets-8.1.6-py3-none-any.whl (139 kB)
Using cached jupyterlab_widgets-3.0.14-py3-none-any.whl (213 kB)
Using cached widgetsnbextension-4.0.14-py3-none-any.whl (2.2 MB)
Installing collected packages: widgetsnbextension, jupyterlab_widgets, ipywidgets
Successfully installed ipywidgets-8.1.6 jupyterlab_widgets-3.0.14 widgetsnbextension-4.0.14
Note: you may need to restart the kernel to use updated packages.


## 📡 Step 4: Just write code and run as if you would run locally, cybershuttle will move the required data, code and execute remotely. 

In [18]:
c = """from transformers import pipeline
import numpy as np
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Initialize both models
hf_classifier = pipeline(
    "text-classification",
    model="finiteautomata/bertweet-base-sentiment-analysis",
    top_k=None
)
vader = SentimentIntensityAnalyzer()

def classify_sentiment(input_text):
    
    #Hybrid sentiment analysis combining transformer models with VADER intensity analysis
    #Returns formatted string with nuanced sentiment assessment
    
    try:
        # Get HuggingFace predictions
        hf_results = hf_classifier(input_text, truncation=True)[0]
        pos_score = next(r['score'] for r in hf_results if r['label'] == 'POS')
        neg_score = next(r['score'] for r in hf_results if r['label'] == 'NEG')
        
        # Get VADER intensity scores
        vader_scores = vader.polarity_scores(input_text)
        
        # Combined weighted score (70% HF, 30% VADER)
        combined_pos = (pos_score * 0.7) + (vader_scores['pos'] * 0.3)
        combined_neg = (neg_score * 0.7) + (vader_scores['neg'] * 0.3)
        
        # Determine final sentiment
        if combined_pos > combined_neg:
            sentiment = "POSITIVE"
            base_confidence = combined_pos
            intensity = vader_scores['pos']
        else:
            sentiment = "NEGATIVE"
            base_confidence = combined_neg
            intensity = vader_scores['neg']
        
        # Dynamic confidence adjustment based on intensity
        adjusted_confidence = min(base_confidence * (1 + intensity), 0.99)
        
        # Strength classification with wider bands
        strength_ranges = [
            (0.9, "Extremely"),
            (0.8, "Very"),
            (0.7, "Strongly"),
            (0.6, "Fairly"),
            (0.5, "Moderately"),
            (0.4, "Somewhat"),
            (0, "Slightly")
        ]
        
        strength = next(
            desc for threshold, desc in strength_ranges 
            if adjusted_confidence >= threshold
        )
        
        # Add intensity qualifiers
        modifiers = {
            "Extremely": "!",
            "Very": "!",
            "Strongly": "",
            "Fairly": "",
            "Moderately": " (somewhat)",
            "Somewhat": " (mildly)",
            "Slightly": " (barely)"
        }
        
        return (
            f"{strength} {sentiment}{modifiers[strength]} "
            f"(Confidence: {adjusted_confidence:.0%})"
        )
        
    except Exception as e:
        return f"Analysis error: {str(e)}"
"""
with open("classify.py", "w") as f:
    f.write(c)

f.close()
print("testing")

[2K[32m⠹[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[Htesting


In [19]:
t = """from transformers import MarianMTModel, MarianTokenizer

# Function to get the model name based on the source and target language
def get_model_name(source_language, target_language):
    return f"Helsinki-NLP/opus-mt-{source_language}-{target_language}"

# Function to perform translation
def translate_text(input_text, source_language='en', target_language='es'):
    model_name = get_model_name(source_language, target_language)
    
    # Load the MarianMT model and tokenizer for the specific language pair
    model = MarianMTModel.from_pretrained(model_name)
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    
    # Prepare the input text with the correct prefix for translation
    input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True, padding="max_length")
    
    # Generate the translation
    translated_ids = model.generate(input_ids, max_length=150, num_beams=4, early_stopping=True)
    
    # Decode the translated output
    translation = tokenizer.decode(translated_ids[0], skip_special_tokens=True)
    
    return translation
"""
with open("translate.py", "w") as f:
    f.write(code)

f.close()
print("testing")

[2K[32m⠸[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[Htesting


In [22]:
q = """from transformers import T5ForConditionalGeneration, T5Tokenizer

model_name = "t5-base"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def answer_question(context, question):
    
    #Enhanced question answering with T5
    #Args:
    #    context: Background information text (1-3 sentences work best)
    #    question: Clear question about the context
    #Returns:
    #    Concise answer extracted from context
    
    # Improved input formatting
    input_text = f"answer question based on context: {question} context: {context}"
    
    # Better tokenization with attention to question-context balance
    input_ids = tokenizer.encode(
        input_text,
        return_tensors="pt",
        max_length=512,
        truncation=True,
        padding="max_length"  # Helps with consistency
    )
    
    # Optimized generation parameters
    answer_ids = model.generate(
        input_ids,
        max_length=100,        # More concise answers
        min_length=5,          # Avoid empty answers
        num_beams=5,           # Better quality than 4 beams
        early_stopping=True,
        repetition_penalty=2.5, # Reduce repeated phrases
        length_penalty=1.5,     # Prefer shorter answers
        no_repeat_ngram_size=3, # Prevent word repetition
        temperature=0.7         # Adds slight creativity
    )
    
    # Improved decoding
    answer = tokenizer.decode(
        answer_ids[0],
        skip_special_tokens=True,
        clean_up_tokenization_spaces=True
    )
    
    # Post-processing for better results
    answer = answer.split(".")[0]  # Take the first complete thought
    answer = answer.strip()
    
    return answer if answer else "I couldn't find an answer in the context."
    """
with open("qa.py", "w") as f:
    f.write(code)

f.close()
print("testing")

[2K[32m⠸[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[Htesting


In [23]:
s = """from transformers import T5ForConditionalGeneration, T5Tokenizer

model_name = "t5-base"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def summarize_text(input_text):
    # Preprocess input
    input_text = input_text.strip()
    if len(input_text.split()) < 15:  # Minimum words needed for good summary
        return "Input too short - please provide at least 15-20 words for meaningful summarization."
    
    # Format for T5 (crucial!)
    input_text = "summarize: " + input_text
    
    # Tokenize with better truncation
    input_ids = tokenizer.encode(
        input_text,
        return_tensors="pt",
        max_length=512,
        truncation=True,
        padding="max_length"  # Helps with short texts
    )
    
    # Generate with adjusted parameters
    summary_ids = model.generate(
        input_ids,
        max_length=100,       # Reduced from 150
        min_length=30,        # Reduced from 50
        length_penalty=3.0,   # Increased to favor shorter summaries
        num_beams=6,          # Increased from 4
        early_stopping=True,
        no_repeat_ngram_size=3  # Prevents word repetition
    )
    
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    
    # Post-process output
    if summary.lower() == input_text[11:].lower():  # If output == input
        return "Summary failed (input may be too short or unclear). Try with longer text."
    
    return summary
"""
with open("summarize.py", "w") as f:
    f.write(code)

f.close()
print("testing")

[2K[32m⠸[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[Htesting


In [36]:
code = """import streamlit as st
from summarize import summarize_text
from translate import translate_text
from qa import answer_question
from classify import classify_sentiment
# Language mapping dictionary
LANGUAGE_MAP = {
    "en": "English",
    "es": "Español (Spanish)",
    "fr": "Français (French)",
    "de": "Deutsch (German)",
    "it": "Italiano (Italian)",
    "pt": "Português (Portuguese)",
    "ja": "日本語 (Japanese)",
    "zh": "中文 (Chinese)"
}

def validate_input(task, input_text):
    if not input_text.strip():
        raise ValueError("Input text cannot be empty!")
    
    if task == "Answer Question":
        lines = input_text.strip().split("\n")
        if len(lines) < 2:
            raise ValueError(
                "For 'Answer Question', input must have:\n"
                "Line 1: Context (text with the answer)\n"
                "Line 2: Question"
            )
    return True

# --- Streamlit UI ---
st.set_page_config(page_title="AI NLP Tool", layout="centered")
st.title("🧠 AI NLP Tool")

# Task selection
task = st.selectbox("Select Task:", ["Summarize", "Translate", "Answer Question", "Classify"])

# Dynamic help text
if task == "Answer Question":
    st.info("ℹ️ For 'Answer Question', enter context (line 1) and question (line 2).")
elif task == "Translate":
    st.info("ℹ️ Enter text and select languages from the dropdown menus.")
else:
    st.info("ℹ️ Enter text and click 'Run Task'.")

# Text input
input_text = st.text_area("Enter Input Text:", height=200)

# Translation language selectors
if task == "Translate":
    language_options = [f"{code} - {name}" for code, name in LANGUAGE_MAP.items()]
    source_lang = st.selectbox("From:", language_options, index=0)
    target_lang = st.selectbox("To:", language_options, index=1)

# Run task
if st.button("Run Task"):
    try:
        # Validate input
        validate_input(task, input_text)

        if task == "Summarize":
            result = summarize_text(input_text)
        elif task == "Translate":
            source_code = source_lang.split(" - ")[0]
            target_code = target_lang.split(" - ")[0]
            result = translate_text(input_text, source_code, target_code)
        elif task == "Answer Question":
            lines = input_text.strip().split("\n")
            context, question = lines[0], lines[1]
            result = answer_question(context, question)
        elif task == "Classify":
            result = classify_sentiment(input_text)
        else:
            result = "Unknown task."

        st.success("✅ Task Completed")
        st.text_area("Result:", value=result, height=200)

    except Exception as e:
        error_messages = {
            "ValueError": str(e),
            "RuntimeError": "Model failed to process. Try shorter text.",
            "IndexError": "For 'Answer Question', provide both context and question.",
        }
        error_msg = error_messages.get(type(e).__name__, f"An error occurred: {str(e)}")
        st.error(error_msg)
"""
with open("app.py", "w") as f:
    f.write(code)

f.close()
print("testing")

[2K[32m⠸[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[Htesting


In [39]:
import socket
import os
import time

from IPython.display import IFrame
# Function to chefrom IPython.display import IFrameck if a port is available
def find_available_port(start_port=8501, max_tries=100):
    for port in range(start_port, start_port + max_tries):
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        result = s.connect_ex(('localhost', port))
        if result != 0:  # If the result is non-zero, the port is available
            s.close()
            return port
        s.close()
    raise Exception("No available ports found!")

# Find an available port
port = find_available_port()

# Create the command to run Streamlit on the available port
streamlit_command = f"streamlit run app.py --server.port {port} --server.headless true --server.enableCORS false > streamlit.log 2>&1 &"

# Run the Streamlit app in the background
os.system(streamlit_command)
#os.system(c2)

print(f"Streamlit app is running on port {port}.")


hostname = socket.gethostname()
timestamp = time.strftime('%Y-%m-%d %H:%M:%S')
print("===== Job Metadata =====")
print(f"Hostname   : {hostname}")
print(f"Timestamp  : {timestamp}")
print("========================\n")
IFrame(src=f"https://localhost:{port}", width=900, height=600)

[2K[32m⠸[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[HStreamlit app is running on port 8510.
===== Job Metadata =====
Hostname   : nsworkshopcpuvc1-compute-1.novalocal
Timestamp  : 2025-04-24 22:45:44
<IPython.lib.display.IFrame at 0x7f48c59065f0>


## ✅ That's it!

You’ve now used Cybershuttle to run the same computation locally and remotely. You can use this pattern for scaling your research workflows!

---

### 🔗 Resources:
- [Cybershuttle](https://cybershuttle.org)
- [Cybershuttle GitHub](https://github.com/cyber-shuttle)

In [38]:
!tail -n 10 streamlit.log

[2K[32m⠸[0m Connecting to=hpc_cpu... status=CONNECTED
[1A[2K[2J[HCollecting usage statistics. To deactivate, set browser.gatherUsageStats to false.


  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8509
  Network URL: http://10.0.6.217:8509
  External URL: http://149.165.159.166:8509
