Skip to content

Conversation

@alukach
Copy link
Contributor

@alukach alukach commented Aug 6, 2025

What I'm changing

Problem

Vercel is designed to return Firewall Challenges on requests when it suspects that a client may be attempting to DoS attack. Unfortunately, our Data Proxy makes many requests to our Vercel-hosted Source API, occasionally triggering Vercel's DoS prevention logic. When this occurs, Vercel returns a 403 with an HTML page presenting a button for a human user to press. Of course, our Data Proxy is expecting JSON data and throws a parse error when it experiences this response. I believe this is currently the largest issue surrounding the data proxy's reliability, wherein end users receive intermittent 403 responses from our Data Proxy.

It's worth noting that our data proxy never explicitly returns 403 responses:

// 400
BackendError::InvalidRequest(_)
| BackendError::UnsupportedAuthMethod(_)
| BackendError::UnsupportedOperation(_) => StatusCode::BAD_REQUEST,
// 401
BackendError::UnauthorizedError => StatusCode::UNAUTHORIZED,
// 404
BackendError::RepositoryNotFound
| BackendError::ObjectNotFound(_)
| BackendError::SourceRepositoryMissingPrimaryMirror
| BackendError::ApiKeyNotFound
| BackendError::DataConnectionNotFound => StatusCode::NOT_FOUND,
// 502
BackendError::ReqwestError(_)
| BackendError::ApiServerError { .. }
| BackendError::RepositoryPermissionsNotFound
| BackendError::AzureError(_)
| BackendError::S3Error(_) => StatusCode::BAD_GATEWAY,
// 500
_ => StatusCode::INTERNAL_SERVER_ERROR,

However, it does pass through the response codes from the Source API:

BackendError::ApiClientError { status, .. } => {
StatusCode::from_u16(*status).unwrap_or(StatusCode::BAD_REQUEST)
}

This means that any 403 response coming from our Data Proxy is a 403 response from our Source API (almost surely a firewall challenge when it occurs intermittently).

Solution

We can work around this by setting a WAF System Bypass Rule within Vercel, instructing it that traffic from our Data Proxy should be exempt from firewall restrictions. We can do this by providing the IP Address of our source data proxy. However, this is challenging because our data proxy runs in ECS tasks on AWS Fargate, which have ephemeral IP Addresses. To resolve this, I considered some solutions like using a NAT Gateway for outbound traffic; however, that would add an additional $0.045/GB to the existing $0.09/GB egress charge (along with an hourly rate), which would be substantial given the function of the data proxy (ie, serving lots of data). Instead, I believe the simplest solution is to send only our API traffic through an in-network proxy running on EC2 with a stable IP address. Most request clients (such as reqwest) have built-in support for using such a proxy.

How I did it

This PR:

  1. Adds CDK configuration for a Squid proxy onto EC2 and to register its elastic IP on Route53 at an internal DNS entry of vercel-api-${stage}.internal. This squid proxy has a security group allowing traffic only from 172.31.0.0/16 (ie, internal only traffic).
  2. Sets up CICD to auto-deploy CDK code alongside the current deployment logic. Eventually, it would be nice to have all deployment code managed within CDK, as it's currently a bit piecemeal and not completely stored within the codebase (e.g. CloudFormation template for current ECS Clusters)
  3. Refactors the Data Proxy's SourceApi struct to use a helper method build_req_client() that generates a reqwest client, optionally configuring it to use a proxy if the PROXY_URL environment variable is set.

Note

Currently, the step to provide the PROXY_URL to the ECS Task Definition is manual.

Along the way...

  • I refactored source_api_headers(), tucking it into a method on the SourceApi struct alongside the new build_req_client() method. I think this makes for slightly tidier code organization

How to test it

https://github.com/source-cooperative/data.source.coop/actions/runs/16784185577

PR Checklist

  • This PR has no breaking changes.
  • I have updated or added new tests to cover the changes in this PR.
  • This PR affects the Source Cooperative Frontend & API,
    and I have opened issue/PR #XXX to track the change.

Related Issues

@alukach alukach mentioned this pull request Aug 6, 2025
3 tasks
@jedsundwall
Copy link

I'd appreciate it if @gadomski could look at this. :)

@gadomski gadomski self-requested a review August 6, 2025 18:32
@alukach
Copy link
Contributor Author

alukach commented Aug 6, 2025

Sorry, I accidentally "assigned" you both rather than requesting your reviews 🙃

I should state that the dev proxy is currently running with traffic going through the Squid proxy. I can change the EC2 instance's security group and suddenly endpoints like https://data.dev.source.coop/nasa/?delimiter=%2F&list-type=2&prefix=floods%2F stop working, which gives me confidence that the traffic is indeed going through the proxy.

Copy link
Contributor

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few light notes, otherwise makes sense. I'm taking @alukach at his word that this works in the real world, I didn't try to deploy it or anything.

@alukach
Copy link
Contributor Author

alukach commented Aug 6, 2025

@gadomski Okay, I think we're good to go now.

I'm taking @alukach at his word that this works in the real world, I didn't try to deploy it or anything.

I will be transparent that while I did test that the data does go through the proxy, I did not test that this actually stabilizes the IP Address that Vercel sees. However, that sees like a reasonable assumption, no?

let source_api = web::Data::new(SourceApi::new(source_api_url));
let source_api_url = env::var("SOURCE_API_URL").expect("SOURCE_API_URL must be set");
let proxy_url = env::var("SOURCE_API_PROXY_URL").ok(); // Optional proxy for the Source API
let source_api = web::Data::new(SourceApi::new(source_api_url, proxy_url));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels much more explicit, thanks!

@alukach alukach merged commit 25438c3 into main Aug 6, 2025
4 checks passed
@alukach alukach deleted the feat/cdk-api-proxy branch August 6, 2025 21:08
alukach pushed a commit that referenced this pull request Aug 21, 2025
🤖 I have created a release *beep* *boop*
---


##
[1.0.0](v0.1.29...v1.0.0)
(2025-08-21)


### ⚠ BREAKING CHANGES

* update to accommodate Product in S2 API

### Features

* add headers to requests to source API
([#81](#81))
([edda62f](edda62f))
* update to accommodate Product in S2 API
([be44f43](be44f43))
* use Squid proxy for communication with Vercel API
([#85](#85))
([25438c3](25438c3))


---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: source-coop-release[bot] <187876225+source-coop-release[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants