# Migrating from Arize Phoenix to LangSmith



## Migrating Resources

Contained in this repo are scripts to migrate your resources from Arize Phoenix to LangSmith.

This includes:
- Datasets
- Prompts
- Recent Traces

To migrate your resources over, refer to ```providers/phoenix/main.py```. Specific scripts for each are provided in the ```providers/phoenix/data``` directory.


## Updating Code

In the process of migrating to LangSmith, you will also need to update your instrumentation code as well. 

In the following sections we break down some common patterns used in Arize Phoenix, and their equivalent implementation in LangSmith. Not all features are shared, but common constructs are available across both frameworks.

**Note:** The examples below require API keys to be configured in your `.env` file:
- `OPENAI_API_KEY` - for OpenAI examples
- `PHOENIX_API_KEY` - for Phoenix examples  
- `LANGSMITH_API_KEY` - for LangSmith examples

First, let's load in our environment variables.


In [1]:
import os
os.environ["LANGSMITH_PROJECT"] = "default"

from dotenv import load_dotenv
load_dotenv("../../.env", override=True)


True

### **Tracing**


#### OpenTelemetry Auto-Instrumentation

Phoenix uses OpenTelemetry for tracing with auto-instrumentation via ```phoenix.otel.register()```. This automatically instruments OpenAI, LangChain, LlamaIndex, and other frameworks.


In [2]:
import os
from phoenix.otel import register
from openai import OpenAI

# Get your Phoenix space from env (e.g., "christine")
PHOENIX_SPACE = os.getenv("PHOENIX_SPACE", "")

# Register Phoenix tracing with auto-instrumentation
tracer_provider = register(
    project_name="my-llm-project",
    endpoint=f"https://app.phoenix.arize.com/s/{PHOENIX_SPACE}/v1/traces",
    auto_instrument=True,
)

# Your code is automatically traced
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)


  from .autonotebook import tqdm as notebook_tqdm


üî≠ OpenTelemetry Tracing Details üî≠
|  Phoenix Project: my-llm-project
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/s/christine/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {'authorization': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



LangSmith provides automatic tracing via ```wrap_openai``` or the ```@traceable``` decorator. No explicit registration needed - just set your environment variables.


In [3]:
from langsmith.wrappers import wrap_openai
from openai import OpenAI

# Wrap OpenAI client for automatic tracing
client = wrap_openai(OpenAI())

# All OpenAI calls are now traced to LangSmith
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)


#### Manual Spans with OpenInference

Phoenix uses OpenTelemetry spans with OpenInference semantic conventions for manual instrumentation.


In [4]:
from opentelemetry import trace
from openinference.semconv.trace import SpanAttributes

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("my-custom-span") as span:
    span.set_attribute(SpanAttributes.INPUT_VALUE, "user input here")
    # Your processing logic here
    result = "processed result"
    span.set_attribute(SpanAttributes.OUTPUT_VALUE, result)


LangSmith uses the ```@traceable``` decorator or ```ls.trace()``` context manager for custom spans. Input/output is automatically captured.


In [7]:
from langsmith import traceable
import langsmith as ls

# Using the @traceable decorator
@traceable
def my_custom_function(user_input: str) -> str:
    # Input/output automatically captured
    return f"processed: {user_input}"

my_custom_function("hello")

# Or using context manager
with ls.trace(name="my-custom-span") as run:
    result = "processed result"
    run.end(outputs={"result": result})


#### OpenTelemetry

If you prefer to keep using OpenTelemetry, LangSmith supports [OTel tracing natively](https://docs.smith.langchain.com/observability/how_to_guides/tracing/trace_with_opentelemetry).

You'll switch from the Phoenix OTLP endpoint to LangSmith's OTLP endpoint.


### **Evaluations**


#### Datasets

Phoenix allows you to create datasets and upload examples using the SDK.


In [6]:
import os
import pandas as pd
from phoenix.client import Client

# Configure client for Phoenix Cloud
PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY")
PHOENIX_SPACE = os.getenv("PHOENIX_SPACE", "")
base_url = f"https://app.phoenix.arize.com/s/{PHOENIX_SPACE}" if PHOENIX_SPACE else "https://app.phoenix.arize.com"

px_client = Client(base_url=base_url, api_key=PHOENIX_API_KEY)

# Create dataset from DataFrame
queries = [
    "What is the capital of France?",
    "What is the capital of Germany?",
]
responses = [
    "Paris",
    "Berlin",
]
dataset_df = pd.DataFrame(data={"query": queries, "response": responses})

dataset = px_client.datasets.create_dataset(
    dataframe=dataset_df,
    name="basic",
    input_keys=["query"],
    output_keys=["response"],
)


LangSmith allows you to create datasets using the LangSmith SDK as well.


In [7]:
from langsmith import Client

client = Client()

# Create a dataset
examples = [
    {"input": "What is the capital of France?", "expected_output": "Paris"},
    {"input": "What is the capital of Germany?", "expected_output": "Berlin"}
]

dataset_name = "basic"

if not client.has_dataset(dataset_name=dataset_name):
    langsmith_dataset = client.create_dataset(dataset_name=dataset_name)
    client.create_examples(
        inputs=[{"text": ex["input"]} for ex in examples],
        outputs=[{"text": ex["expected_output"]} for ex in examples],
        dataset_id=langsmith_dataset.id
    )


#### Experiments

Running experiments with Phoenix is done through ```run_experiment```.


In [19]:
import os
from phoenix.client import Client
from openai import OpenAI

# Configure client for Phoenix Cloud
PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY")
PHOENIX_SPACE = os.getenv("PHOENIX_SPACE", "")
base_url = f"https://app.phoenix.arize.com/s/{PHOENIX_SPACE}" if PHOENIX_SPACE else "https://app.phoenix.arize.com"

px_client = Client(base_url=base_url, api_key=PHOENIX_API_KEY)
openai_client = OpenAI()

# Load the dataset we created earlier
dataset = px_client.datasets.get_dataset(dataset="basic")

# Define your task - takes input dict, returns output
def task(x):
    question = x["query"]
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )
    return {"answer": response.choices[0].message.content}

# Define evaluators - simple functions that take output and return bool
def has_answer(output) -> bool:
    return bool(output.get("answer"))

def answer_not_empty(output) -> bool:
    return len(output.get("answer", "")) > 0

# Run experiment using client.experiments.run_experiment()
experiment = px_client.experiments.run_experiment(
    dataset=dataset,
    task=task,
    evaluators=[has_answer, answer_not_empty]
)


üß™ Experiment started.
üì∫ View dataset experiments: https://app.phoenix.arize.com/s/christine/datasets/RGF0YXNldDo0/experiments
üîó View this experiment: https://app.phoenix.arize.com/s/christine/datasets/RGF0YXNldDo0/compare?experimentId=RXhwZXJpbWVudDoy


running tasks |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 (100.0%) | ‚è≥ 00:01<00:00 |  1.51it/s


‚úÖ Task runs completed.
üß† Evaluation started.


running experiment evaluations |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 (100.0%) | ‚è≥ 00:00<00:00 |  9.11it/s

Experiment completed: 2 task runs, 2 evaluator runs, 4 evaluations





The equivalent in LangSmith is using ```evaluate()```.


In [20]:
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = Client()
dataset = client.read_dataset(dataset_name="basic")

# Wrap OpenAI client for tracing
openai_client = wrap_openai(OpenAI())

# Define your task function
def my_task(inputs: dict) -> dict:
    question = inputs["text"]
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": question}],
    )
    return {"output": response.choices[0].message.content}

# Define evaluation functions
def accuracy_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    output = outputs.get("output", "")
    expected = reference_outputs.get("text", "")
    if expected and expected.lower() in output.lower():
        return {"key": "accuracy", "score": 1.0, "comment": "Correct answer found"}
    return {"key": "accuracy", "score": 0.0, "comment": "Incorrect answer"}

def length_evaluator(inputs: dict, outputs: dict) -> dict:
    output = outputs.get("output", "")
    return {"key": "response_length", "score": len(output), "comment": f"Response has {len(output)} characters"}

# Run experiment
result = client.evaluate(
    my_task,
    data=dataset.id,
    evaluators=[accuracy_evaluator, length_evaluator],
    experiment_prefix="capital-cities-eval"
)


View the evaluation results for experiment: 'capital-cities-eval-c67d8c74' at:
https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/04f29335-5da4-40a2-95d5-04edef724598/compare?selectedSessions=c9d0dca7-c114-484a-9eca-2b7b5d87803d




2it [00:02,  1.00s/it]


### **Prompts**


Phoenix and LangSmith both have prompting interfaces in the UI and the SDK.

In Phoenix, prompts are typically created in the **Phoenix UI** (Prompt Playground), then retrieved via SDK using ```client.prompts.get()```. You can also create prompts programmatically using ```client.prompts.create()```.


In [27]:
import os
import phoenix as px
from phoenix.client.types import PromptVersion

# Configure client for Phoenix Cloud
PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY")
PHOENIX_SPACE = os.getenv("PHOENIX_SPACE", "")
base_url = f"https://app.phoenix.arize.com/s/{PHOENIX_SPACE}" if PHOENIX_SPACE else "https://app.phoenix.arize.com"


# Create a prompt programmatically with PromptVersion
prompt_name = "movie-critic"
prompt = client.prompts.create(
    name=prompt_name,
    version=PromptVersion(
        [
            {"role": "system", "content": "You are a {{ criticlevel }} movie critic"},
            {"role": "user", "content": "Do you like {{ movie }}?"}
        ],
        model_name="gpt-4o-mini",
    ),
)

# Get an existing prompt by name (fetches latest version by default)
prompt = client.prompts.get(prompt_identifier=prompt_name)

# Or get a specific tagged version (e.g., "production")
# prompt = client.prompts.get(prompt_identifier=prompt_name, tag="production")

# Format the prompt with variables and use with OpenAI
from openai import OpenAI

prompt_vars = {"criticlevel": "harsh", "movie": "Inception"}
formatted_prompt = prompt.format(variables=prompt_vars)

print(formatted_prompt)

OpenAIPrompt(messages=[{'role': 'system', 'content': 'You are a harsh movie critic'}, {'role': 'user', 'content': 'Do you like Inception?'}], kwargs={'model': 'gpt-4o-mini'})


LangSmith has ```push_prompt``` and ```pull_prompt``` functions in the SDK for managing prompts.


In [11]:
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

client = Client()

# Create a prompt with model binding
model = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate([
    ("system", "You are a {criticlevel} movie critic"),
    ("human", "Do you like {movie}?")
])
chain = prompt | model

# Push to LangSmith
client.push_prompt("movie-critic", object=chain)


'https://smith.langchain.com/prompts/movie-critic/bc956426?organizationId=ebbaf2eb-769b-4505-aca2-d11de10372a4'

In [12]:
# Pull and use a prompt
prompt = client.pull_prompt("movie-critic")

# Invoke with variables
response = prompt.invoke({"criticlevel": "harsh", "movie": "Inception"})
print(response)


messages=[SystemMessage(content='You are a harsh movie critic', additional_kwargs={}, response_metadata={}), HumanMessage(content='Do you like Inception?', additional_kwargs={}, response_metadata={})]
