Skip to content

Commit 03350d9

Browse files
Add reference / best practice for AWS + Azure infrastructure (#1403)
Fixes DOC-442 Adds reference / best practice pages on AWS + Azure infrastructure setup for self-hosted and hybrid in LangSmith. Decided on a different placement from initial discussion, for the following reasons: These pages belong under "Self-hosted cloud architecture" rather than "Setup guides" because they provide reference architecture and best practices, not step-by-step installation instructions. Structure allows for: - Cloud-specific best practices before beginning setup - Reference throughout their deployment lifecycle - Architectural guidance separate from procedural installation steps - Content on using self-hosted or hybrid models ## Preview [AWS reference](https://langchain-5e9cc07a-preview-selfho-1763139517-b0f2587.mintlify.app/langsmith/aws-self-hosted) [Azure reference](https://langchain-5e9cc07a-preview-selfho-1763139517-b0f2587.mintlify.app/langsmith/azure-self-hosted) --------- Co-authored-by: Rahul Verma <rahul@langchain.dev>
1 parent eef737f commit 03350d9

File tree

6 files changed

+280
-9
lines changed

6 files changed

+280
-9
lines changed

src/docs.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1362,6 +1362,13 @@
13621362
"langsmith/cloud"
13631363
]
13641364
},
1365+
{
1366+
"group": "Self-hosted cloud architecture",
1367+
"pages": [
1368+
"langsmith/aws-self-hosted",
1369+
"langsmith/azure-self-hosted"
1370+
]
1371+
},
13651372
{
13661373
"group": "Hybrid",
13671374
"pages": [

src/langsmith/aws-self-hosted.mdx

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
title: Self-hosted on AWS
3+
sidebarTitle: AWS
4+
icon: "aws"
5+
---
6+
7+
When running LangSmith on [Amazon Web Services (AWS)](https://aws.amazon.com/), you can set up in either [full self-hosted](/langsmith/self-hosted) or [hybrid](/langsmith/hybrid) mode. Full self-hosted mode deploys a complete LangSmith platform with observability functionality as well as the option to create agent deployments. Hybrid mode entails just the infrastructure to run agents in a data plane within your cloud, while our SaaS provides the control plane and observability functionality.
8+
9+
This page provides AWS-specific architecture patterns, service recommendations, and best practices for deploying and operating LangSmith on AWS.
10+
11+
<Note>
12+
LangChain provides Terraform modules specifically for AWS to help provision infrastructure for LangSmith. These modules can quickly set up EKS clusters, RDS, ElastiCache, S3, and networking resources.
13+
14+
View the [AWS Terraform modules](https://github.com/langchain-ai/terraform/tree/main/modules/aws) for documentation and examples.
15+
</Note>
16+
17+
## Reference architecture
18+
19+
We recommend leveraging AWS's managed services to provide a scalable, secure, and resilient platform. The following architecture applies to both self-hosted and hybrid and aligns with the [AWS Well-Architected Framework](https://aws.amazon.com/architecture/well-architected/):
20+
21+
![Architecture diagram showing AWS relations to LangSmith services](/langsmith/images/aws-architecture-self-hosted.png)
22+
23+
- <Icon icon="globe" /> **Ingress & networking**: Requests enter via [Amazon Application Load Balancer (ALB)](https://aws.amazon.com/elasticloadbalancing/application-load-balancer/) within your [VPC](https://aws.amazon.com/vpc/), secured using [AWS WAF](https://aws.amazon.com/waf/) and [IAM](https://aws.amazon.com/iam/)-based authentication.
24+
- <Icon icon="cube" /> **Frontend & backend services:** Containers run on [Amazon EKS](https://aws.amazon.com/eks/), orchestrated behind the ALB. routes requests to other services within the cluster as necessary.
25+
- <Icon icon="database" /> **Storage & databases:**
26+
- [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) or [Aurora](https://aws.amazon.com/rds/aurora/): metadata, projects, users, and short-term and long-term memory for deployed agents. LangSmith supports PostgreSQL version 14 or higher.
27+
- [Amazon ElastiCache (Redis)](https://aws.amazon.com/elasticache/redis/): caching and job queues. ElastiCache must be in single instance mode, running Redis OSS version 5 or higher.
28+
- ClickHouse + [Amazon EBS](https://aws.amazon.com/ebs/): analytics and trace storage.
29+
- We recommend using an [externally managed ClickHouse solution](/langsmith/self-host-external-clickhouse) unless security or compliance reasons
30+
prevent you from doing so.
31+
- ClickHouse is not required for hybrid deployments.
32+
- [Amazon S3](https://aws.amazon.com/s3/): object storage for trace artifacts and telemetry.
33+
34+
- <Icon icon="sparkles" /> **LLM integration:** Optionally proxy requests to [Amazon Bedrock](https://aws.amazon.com/bedrock/) or [Amazon SageMaker](https://aws.amazon.com/sagemaker/) for LLM inference.
35+
- <Icon icon="chart-line" /> **Monitoring & observability:** Integrate with [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/)
36+
37+
38+
## Compute options
39+
40+
LangSmith supports multiple compute options depending on your requirements:
41+
42+
| Compute option | Description | Suitable for |
43+
|-----------------|-------------|--------------|
44+
| **Elastic Kubernetes Service (preferred)** | Advanced scaling and multi-tenant support | Large enterprises |
45+
| **EC2-based** | Full control, BYO-infra | Regulated or air-gapped environments |
46+
47+
## AWS Well-Architected best practices
48+
49+
This reference is designed to align with the six pillars of the AWS Well-Architected Framework:
50+
51+
### Operational excellence
52+
53+
- Automate deployments with IaC ([CloudFormation](https://aws.amazon.com/cloudformation/) / [Terraform](https://www.terraform.io/)).
54+
- Use [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html) for configuration.
55+
- Configure your LangSmith instance to [export telemetry data](/langsmith/export-backend) and continuously monitor via [CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html).
56+
- The preferred method to manage [LangSmith deployments](/langsmith/deployments) is to create a CI process that builds [Agent Server](/langsmith/agent-server) images and pushes them to [ECR](https://aws.amazon.com/ecr/). Create a test deployment for pull requests before deploying a new revision to staging or production upon PR merge.
57+
58+
### Security
59+
60+
- Use [IAM](https://aws.amazon.com/iam/) roles with least-privilege policies.
61+
- Enable encryption at rest ([RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.html), [S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingEncryption.html), ClickHouse volumes) and in transit (TLS 1.2+).
62+
- Integrate with [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/) for credentials.
63+
- Use [Amazon Cognito](https://aws.amazon.com/cognito/) as an IDP in conjunction with LangSmith's built-in authentication and authorization features to secure access to agents and their tools.
64+
65+
### Reliability
66+
67+
- Replicate the LangSmith [data plane](/langsmith/data-plane) across regions: Deploy identical data planes to Kubernetes clusters in different regions for LangSmith Deployment. Deploy [RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html) and [ECS](https://aws.amazon.com/ecs/) services across [Multi-AZ](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/).
68+
- Implement [auto-scaling](https://aws.amazon.com/autoscaling/) for backend workers.
69+
- Use [Amazon Route 53](https://aws.amazon.com/route53/) health checks and failover policies.
70+
71+
### Performance efficiency
72+
73+
- Leverage [EC2](https://aws.amazon.com/ec2/) instances for optimized compute.
74+
- Use [S3 Intelligent-Tiering](https://aws.amazon.com/s3/storage-classes/intelligent-tiering/) for infrequently accessed trace data.
75+
76+
### Cost optimization
77+
78+
- Right-size [EKS](https://aws.amazon.com/eks/) clusters using [Compute Savings Plans](https://aws.amazon.com/savingsplans/compute-pricing/).
79+
- Monitor cost KPIs using [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) dashboards.
80+
81+
### Sustainability
82+
83+
- Minimize idle workloads with on-demand compute.
84+
- Store telemetry in low-latency, low-cost tiers.
85+
- Enable auto-shutdown for non-prod environments.
86+
87+
## Security and compliance
88+
89+
LangSmith can be configured for:
90+
91+
- [PrivateLink](https://aws.amazon.com/privatelink/)-only access (no public internet exposure, besides egress necessary for billing).
92+
- [KMS](https://aws.amazon.com/kms/)-based encryption keys for S3, RDS, and EBS.
93+
- Audit logging to [CloudWatch](https://aws.amazon.com/cloudwatch/) and [AWS CloudTrail](https://aws.amazon.com/cloudtrail/).
94+
95+
Customers can deploy in [GovCloud](https://aws.amazon.com/govcloud-us/), ISO, or HIPAA regions as needed.
96+
97+
## Monitoring and evals
98+
99+
Use LangSmith to:
100+
101+
- Capture traces from LLM apps running on [Bedrock](https://aws.amazon.com/bedrock/) or [SageMaker](https://aws.amazon.com/sagemaker/).
102+
- Evaluate model outputs via [LangSmith datasets](/langsmith/manage-datasets).
103+
- Track latency, token usage, and success rates.
104+
105+
Integrate with:
106+
107+
- [AWS CloudWatch](https://aws.amazon.com/cloudwatch/) dashboards.
108+
- [OpenTelemetry](https://opentelemetry.io/) and [Prometheus](https://prometheus.io/) exporters.

0 commit comments

Comments
 (0)