pingcap · ti-chi-bot · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026
diff --git a/TOC-tidb-cloud-lake.md b/TOC-tidb-cloud-lake.md
@@ -28,15 +28,15 @@
   - Data Sources
     - [Overview](/tidb-cloud-lake/guides/data-sources.md)
     - [Amazon S3 - Credentials](/tidb-cloud-lake/guides/aws-credentials.md)
-    - [Amazon SQS (S3) - IAM Role](/tidb-cloud-lake/guides/amazon-sqs-s3-iam-role.md)
+    - [Amazon SQS (S3) - IAM Role](/tidb-cloud-lake/guides/amazon-sqs-s3-iam-role.md) ![BETA](/media/tidb-cloud/blank_transparent_placeholder.png)
     - [MySQL - Credentials](/tidb-cloud-lake/guides/mysql-credentials.md)
     - [PostgreSQL - Credentials](/tidb-cloud-lake/guides/postgresql-credentials.md)
     - [FeiShuBot](/tidb-cloud-lake/guides/feishubot.md)
   - Integration Tasks
     - [Overview](/tidb-cloud-lake/guides/integration-tasks.md)
     - [Task Management](/tidb-cloud-lake/guides/task-management.md)
     - [Amazon S3 Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-s3.md)
-    - [Amazon SQS (S3) Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md)
+    - [Amazon SQS (S3) Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md) ![BETA](/media/tidb-cloud/blank_transparent_placeholder.png)
     - [MySQL Integration Task](/tidb-cloud-lake/guides/integrate-with-mysql.md)
     - [PostgreSQL Integration Task](/tidb-cloud-lake/guides/integrate-with-postgresql.md)
 - Connect

diff --git a/tidb-cloud-lake/guides/amazon-sqs-s3-iam-role.md b/tidb-cloud-lake/guides/amazon-sqs-s3-iam-role.md
@@ -1,9 +1,9 @@
 ---
-title: Amazon SQS (S3) - IAM Role
+title: Amazon SQS (S3) - IAM Role (Beta)
 summary: Learn how to create an "Amazon SQS (S3) - IAM Role" data source in {{{ .lake }}}.
 ---
 
-# Amazon SQS (S3) - IAM Role
+# Amazon SQS (S3) - IAM Role (Beta)
 
 This page describes how to create an `Amazon SQS (S3) - IAM Role` data source. This data source stores the configuration required to access an Amazon SQS queue and the corresponding S3 bucket, and is used for consuming S3 object creation events delivered from Amazon S3 to SQS.
 
@@ -49,7 +49,7 @@ Before creating the data source, complete the following configuration in your AW
 5. Attach S3 read permissions and SQS consume permissions to the IAM Role.
 6. Upload a test object and confirm that S3 can deliver the event to SQS.
 
-Prepare the following variables first. `AWS_REGION` must be the Region where both the S3 bucket and SQS queue are located. `EXTERNAL_ID` is the organization ID from the {{{ .lake }}} console.
+Prepare the following variables first. `AWS_REGION` must be the Region where both the S3 bucket and SQS queue are located. `EXTERNAL_ID` is the organization ID from the {{{ .lake }}} platform console.
 
 ```bash
 export AWS_REGION="<bucket-and-sqs-region>"
@@ -231,9 +231,9 @@ aws s3api get-bucket-notification-configuration \
 
 Confirm that `QueueArn` points to the target SQS queue, `Events` includes `s3:ObjectCreated:*`, and `FilterRules` matches the `Object Key Prefix` / `Object Key Suffix` configured in the {{{ .lake }}} data source.
 
-## Step 4: Create an IAM Role for Platform to Assume
+## Step 4: Create an IAM Role for {{{ .lake }}} to Assume
-## Step 4: Create an IAM Role for {{{ .lake }}} to Assume
+## Step 4: Create an IAM role for {{{ .lake }}} to assume
-## Step 4: Create an IAM Role for {{{ .lake }}} to Assume
+## Step 4: Create an IAM role for {{{ .lake }}} to assume
 
-Generate `trust-policy.json`. `ExternalId` is the organization ID from the Platform console.
+Generate `trust-policy.json`. `ExternalId` is the organization ID from the {{{ .lake }}} (platform) console.
 
 ```bash
 jq -n \

diff --git a/tidb-cloud-lake/guides/data-integration-overview.md b/tidb-cloud-lake/guides/data-integration-overview.md
@@ -23,7 +23,7 @@ Not every data source corresponds to an ingestion task. For example, `FeiShuBot`
 | Task Type | Description |
 |-----------|-------------|
 | [Amazon S3](/tidb-cloud-lake/guides/integrate-with-amazon-s3.md) | Imports CSV, Parquet, or NDJSON files from Amazon S3 with support for one-time or continuous ingestion. |
-| [Amazon SQS (S3)](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md) | Consumes S3 object creation events from an SQS queue and writes the corresponding object data into {{{ .lake }}}. |
+| [Amazon SQS (S3) (Beta)](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md) | Consumes S3 object creation events from an SQS queue and writes the corresponding object data into {{{ .lake }}}. |
 | [MySQL](/tidb-cloud-lake/guides/integrate-with-mysql.md) | Synchronizes table data from MySQL using `Snapshot`, `CDC Only`, or `Snapshot + CDC` modes. |
 | [PostgreSQL](/tidb-cloud-lake/guides/integrate-with-postgresql.md) | Synchronizes table data from PostgreSQL using `Snapshot`, `CDC Only`, or `Snapshot + CDC` modes. |
 

diff --git a/tidb-cloud-lake/guides/data-sources.md b/tidb-cloud-lake/guides/data-sources.md
@@ -14,7 +14,7 @@ Data sources do not execute synchronization by themselves. Their role is to cent
 | Type | Purpose |
 |------|---------|
 | [Amazon S3 - Credentials](/tidb-cloud-lake/guides/aws-credentials.md) | Stores the Access Key and Secret Key required to access Amazon S3. These credentials can be reused across multiple S3 import tasks. |
-| [Amazon SQS (S3) - IAM Role](/tidb-cloud-lake/guides/amazon-sqs-s3-iam-role.md) | Stores the queue URL, Region, IAM Role, and S3 path scope required for SQS (S3) ingestion. It can be used to consume S3 object creation events. |
+| [Amazon SQS (S3) - IAM Role (Beta)](/tidb-cloud-lake/guides/amazon-sqs-s3-iam-role.md) | Stores the queue URL, Region, IAM Role, and S3 path scope required for SQS (S3) ingestion. It can be used to consume S3 object creation events. |
 | [MySQL - Credentials](/tidb-cloud-lake/guides/mysql-credentials.md) | Stores the host, port, username, password, and database information required to access MySQL. These settings can be reused across multiple MySQL sync tasks. |
 | [PostgreSQL - Credentials](/tidb-cloud-lake/guides/postgresql-credentials.md) | Stores the host, port, username, password, and database information required to access PostgreSQL. These settings can be reused across multiple PostgreSQL sync tasks. |
 | [FeiShuBot](/tidb-cloud-lake/guides/feishubot.md) | Stores a FeiShu bot webhook and message template for task failure notifications and similar scenarios. |

diff --git a/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md b/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md
@@ -1,15 +1,15 @@
 ---
-title: Amazon SQS (S3) Integration Task
+title: Amazon SQS (S3) Integration Task (Beta)
 summary: Learn how to create an Amazon SQS (S3) integration task that consumes S3 object creation events from an SQS queue and writes the corresponding object data into {{{ .lake }}}.
 ---
 
-# Amazon SQS (S3) Integration Task
+# Amazon SQS (S3) Integration Task (Beta)
 
 This page describes how to create an Amazon SQS (S3) integration task that consumes S3 object creation events from an SQS queue and writes the corresponding object data into {{{ .lake }}}.
 
 This task is designed for S3 event-driven data ingestion. After an upstream system writes an object to S3, S3 sends an `ObjectCreated` event to SQS. {{{ .lake }}} consumes the SQS message through AssumeRole and writes data into {{{ .lake }}} based on the bucket and object key in the event.
 
-If you need to create reusable SQS (S3) connection settings first, see [Amazon SQS (S3) - IAM Role](/tidb-cloud-lake/guides/amazon-sqs-s3-iam-role.md).
+If you need to create reusable SQS (S3) connection settings first, see [Amazon SQS (S3) - IAM Role (Beta)](/tidb-cloud-lake/guides/amazon-sqs-s3-iam-role.md).
 
 ## Use Cases
 

diff --git a/tidb-cloud-lake/guides/integration-tasks.md b/tidb-cloud-lake/guides/integration-tasks.md
@@ -14,7 +14,7 @@ Unlike data sources, integration tasks are the executable units that actually pe
 | Task Type | Description |
 |-----------|-------------|
 | [Amazon S3](/tidb-cloud-lake/guides/integrate-with-amazon-s3.md) | Imports CSV, Parquet, or NDJSON files from Amazon S3 with support for one-time or continuous ingestion. |
-| [Amazon SQS (S3)](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md) | Consumes S3 object creation events from an SQS queue and writes the corresponding object data into {{{ .lake }}}. |
+| [Amazon SQS (S3) (Beta)](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md) | Consumes S3 object creation events from an SQS queue and writes the corresponding object data into {{{ .lake }}}. |
 | [MySQL](/tidb-cloud-lake/guides/integrate-with-mysql.md) | Synchronizes table data from MySQL using `Snapshot`, `CDC Only`, or `Snapshot + CDC`. |
 | [PostgreSQL](/tidb-cloud-lake/guides/integrate-with-postgresql.md) | Synchronizes table data from PostgreSQL using `Snapshot`, `CDC Only`, or `Snapshot + CDC`. |
 

diff --git a/tidb-cloud-lake/guides/schema-evolution.md b/tidb-cloud-lake/guides/schema-evolution.md
@@ -5,15 +5,16 @@ summary: Automatically evolve table schemas when loading data with COPY INTO.
 
 # Schema Evolution
 
-Schema evolution allows {{{ .lake }}} to automatically add new columns to a table during `COPY INTO` when the source Parquet files contain columns not yet present in the table.
+Schema evolution allows {{{ .lake }}} to automatically add columns that exist in source files but are missing from the target table during `COPY INTO`. It currently supports **Parquet** and **NDJSON** files.
 
 ## How It Works
 
-When enabled, `COPY INTO`:
+When enabled, {{{ .lake }}} infers the source file schema before loading and appends new columns to the end of the table. New columns are nullable, and missing values are filled with `NULL`.
 
-1. Infers the schema from source Parquet files.
-2. Adds any new columns (not in the table) as nullable columns.
-3. Loads the data, filling missing values with `NULL`.
+The workflow differs slightly by file format:
+
+- **Parquet**: After the table option is enabled, `COPY INTO` infers new columns directly from Parquet file schemas.
+- **NDJSON**: After the table option is enabled, `COPY INTO` uses `AUTO` sampling values for schema inference. You can optionally add `SCHEMA_EVOLUTION = (...)` to override the file and record sampling limits.
 
 ## Enabling Schema Evolution
 
@@ -27,15 +28,21 @@ ALTER TABLE my_table SET OPTIONS(ENABLE_SCHEMA_EVOLUTION = true);
 CREATE TABLE my_table(id INT) ENABLE_SCHEMA_EVOLUTION = true;
 ```
 
-To disable, set it back to `false`:
+To disable schema evolution, set it back to `false`:
 
 ```sql
 ALTER TABLE my_table SET OPTIONS(ENABLE_SCHEMA_EVOLUTION = false);
 ```
 
-## Tutorial
+## Privileges
+
+When `COPY INTO <table>` loads files from a stage or external location and runs schema evolution inference, the loading role must have both `INSERT` and `ALTER` privileges on the target table. `ALTER` is required because {{{ .lake }}} may append new columns before loading.
+
+Query-based COPY is not affected. For example, `COPY INTO <table> FROM (SELECT ... FROM @stage)` keeps the existing privilege requirements.
+
+## Parquet Example
-## Parquet Example
+## Parquet example
-## Parquet Example
+## Parquet example
 
-This tutorial uses a fully runnable example to demonstrate schema evolution.
+The following example loads Parquet files with different schemas and automatically adds missing columns.
 
 ### Step 1: Create a Table and Stage
 
@@ -72,7 +79,7 @@ FILE_FORMAT = (TYPE = parquet MISSING_FIELD_AS = FIELD_DEFAULT);
 
 ### Step 4: Verify Results
 
-The table now has three columns — `amount` and `currency` were added automatically:
+The table now has three columns. `amount` and `currency` were added automatically:
 
 ```sql
 DESC invoices;
@@ -104,6 +111,129 @@ SELECT * FROM invoices ORDER BY order_id;
 
 Row 3 has `currency = NULL` because its source file did not contain that column.
 
+## NDJSON Example
-## NDJSON Example
+## NDJSON example
-## NDJSON Example
+## NDJSON example
+
+{{{ .lake }}} loads NDJSON files with `TYPE = ndjson`. NDJSON files do not have an embedded columnar schema like Parquet files, so {{{ .lake }}} samples file content, infers fields that are missing from the target table, and appends them as nullable columns.
+
+### Step 1: Create a Table and Stage
-### Step 1: Create a Table and Stage
+### Step 1: Create a table and stage
-### Step 1: Create a Table and Stage
+### Step 1: Create a table and stage
+
+```sql
+CREATE OR REPLACE TABLE events(id INT);
+CREATE OR REPLACE STAGE events_stage;
+```
+
+### Step 2: Generate NDJSON Files with Different Fields
-### Step 2: Generate NDJSON Files with Different Fields
+### Step 2: Generate NDJSON files with different fields
-### Step 2: Generate NDJSON Files with Different Fields
+### Step 2: Generate NDJSON files with different fields
+
+```sql
+-- File with fields: id, city, score
+COPY INTO @events_stage FROM (
+    SELECT 1 AS id, 'SF' AS city, 9 AS score
+    UNION ALL
+    SELECT 2, 'NYC', 8
+) FILE_FORMAT = (TYPE = ndjson);
+
+-- File with fields: id, score (no city)
+COPY INTO @events_stage FROM (
+    SELECT 3 AS id, 7 AS score
+) FILE_FORMAT = (TYPE = ndjson);
+```
+
+### Step 3: Enable Schema Evolution and Load
-### Step 3: Enable Schema Evolution and Load
+### Step 3: Enable schema evolution and load
-### Step 3: Enable Schema Evolution and Load
+### Step 3: Enable schema evolution and load
+
+```sql
+ALTER TABLE events SET OPTIONS(ENABLE_SCHEMA_EVOLUTION = true);
+
+COPY INTO events
+FROM @events_stage/
+FILE_FORMAT = (TYPE = ndjson MISSING_FIELD_AS = FIELD_DEFAULT)
+SCHEMA_EVOLUTION = (
+  SAMPLE_FILES = AUTO,
+  SAMPLE_RECORDS_PER_FILE = AUTO,
+  SAMPLE_TOTAL_RECORDS = AUTO
+);
+```
+
+The three `SCHEMA_EVOLUTION` sampling options accept either `AUTO` or a positive integer:
+
+| Option | Description |
+|------|------|
+| `SAMPLE_FILES` | Number of files to sample. |
+| `SAMPLE_RECORDS_PER_FILE` | Maximum number of records to sample from each selected file. |
+| `SAMPLE_TOTAL_RECORDS` | Maximum number of records to sample across all selected files. |
+
+If `SCHEMA_EVOLUTION` is omitted, {{{ .lake }}} uses `AUTO` for all three sampling options. The current `AUTO` behavior samples up to 64 files, 1,000 records per file, and 10,000 records in total. These internal defaults may change in future versions. If your load is sensitive to the sampling strategy, set `SAMPLE_FILES`, `SAMPLE_RECORDS_PER_FILE`, and `SAMPLE_TOTAL_RECORDS` explicitly.
+
+#### NDJSON Inference Rules
+
+When running Schema Evolution for NDJSON, {{{ .lake }}} infers new columns using these rules:
+
+- Schema is inferred only from sampled NDJSON records. Fields not covered by the sample are not added to the target table ahead of time.
+- Each line must be a JSON object. {{{ .lake }}} uses top-level object field names as candidate column names.
+- Columns that already exist in the target table are not added again. Only fields missing from the target table are appended.
+- New field types are inferred from sampled JSON values, such as integers, floats, strings, and booleans.
+- Schema Evolution uses shallow NDJSON inference: if a top-level field value is an object or array, it is appended as a `VARIANT` column instead of being recursively expanded.
+- `NULL` samples only mark the field as nullable. They do not force later non-null values to become `VARCHAR` or `VARIANT`.
+- Same-name fields across files or records are merged: integer and float conflicts become `DOUBLE`; other scalar conflicts become `VARCHAR`; any conflict involving an object, array, or `VARIANT` becomes `VARIANT`.
+- If loading encounters extra fields that were not inferred during sampling, the load fails and reports those field names. Increase `SAMPLE_FILES`, `SAMPLE_RECORDS_PER_FILE`, or `SAMPLE_TOTAL_RECORDS` and retry.
+
+> **Note:**
+>
+> The `INFER_SCHEMA` table function does not limit NDJSON nesting depth by default. The rules here describe the shallow inference used by `COPY INTO` Schema Evolution.
+
+For example, the following NDJSON records infer six new columns: `name`, `age`, `active`, `score`, `profile`, and `tags`:
+
+```json
+{"id":1,"name":"Alice","age":30,"active":true,"score":1,"profile":{"city":"SF"},"tags":["new"]}
+{"id":2,"name":"Bob","age":null,"active":false,"score":1.5,"profile":{"city":"NYC"},"tags":["vip"]}
+```
+
+If the target table only has `id INT`, {{{ .lake }}} appends:
+
+```text
+name    VARCHAR   NULL
+age     BIGINT    NULL
+active  BOOLEAN   NULL
+score   DOUBLE    NULL
+profile VARIANT   NULL
+tags    VARIANT   NULL
+```
+
+The second row has `age = NULL`, which does not change the `BIGINT` type inferred from the first row. `score` contains both an integer and a float, so it becomes `DOUBLE`. `profile` and `tags` are an object and an array, so Schema Evolution appends them as `VARIANT` columns.
+
+### Step 4: Verify Results
-### Step 4: Verify Results
+### Step 4: Verify results
-### Step 4: Verify Results
+### Step 4: Verify results
+
+The table now has three columns. `city` and `score` were added automatically:
+
+```sql
+DESC events;
+```
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│ Field │     Type     │  Null  │ Default │    Extra     │
+├───────┼──────────────┼────────┼─────────┼──────────────┤
+│ id    │ INT          │ YES    │ NULL    │              │
+│ city  │ VARCHAR      │ YES    │ NULL    │              │
+│ score │ BIGINT       │ YES    │ NULL    │              │
+└─────────────────────────────────────────────────────────┘
+```
+
+```sql
+SELECT * FROM events ORDER BY id;
+```
+
+```text
+┌────────────────────────────┐
+│ id │ city │ score          │
+├────┼──────┼────────────────┤
+│  1 │ SF   │              9 │
+│  2 │ NYC  │              8 │
+│  3 │ NULL │              7 │
+└────────────────────────────┘
+```
+
+If the sample does not cover a field that appears later in the data, loading fails and returns the extra field name. Increase `SAMPLE_FILES`, `SAMPLE_RECORDS_PER_FILE`, or `SAMPLE_TOTAL_RECORDS` and retry.
+
 ## Column Match Mode
 
 By default, column names are matched case-insensitively. Use `COLUMN_MATCH_MODE` for case-sensitive matching:
@@ -117,8 +247,9 @@ COLUMN_MATCH_MODE = CASE_SENSITIVE;
 
 ## Limitations
 
-- Supported for **Parquet** files only.
+- Currently supports **Parquet** and **NDJSON** files.
 - New columns are appended to the end of the table and are always nullable.
 - If the same column name appears in multiple files with **different data types**, the load fails.
-- No automatic type promotion (e.g., `INT` → `BIGINT`).
+- No automatic type promotion, such as `INT` to `BIGINT`.
 - Column drops and renames are not supported through schema evolution.
+- NDJSON relies on sampling to infer schema. If sampling does not cover all fields, increase the `SCHEMA_EVOLUTION` sampling options.
diff --git a/tidb-cloud-lake/guides/task-management.md b/tidb-cloud-lake/guides/task-management.md
@@ -52,6 +52,6 @@ Click a task to view its execution history. The run history includes:
 For field-level configuration and detailed behavior, continue with the relevant task guide:
 
 - [Amazon S3 Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-s3.md)
-- [Amazon SQS (S3) Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md)
+- [Amazon SQS (S3) Integration Task (Beta)](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md)
 - [MySQL Integration Task](/tidb-cloud-lake/guides/integrate-with-mysql.md)
 - [PostgreSQL Integration Task](/tidb-cloud-lake/guides/integrate-with-postgresql.md)