Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a tenant SQL migration to enforce object-name constraints (excluding control characters and disallowed Unicode non-characters), adds XML numeric entity decoding to the XML parser, tightens key validation to reject XML-invalid and non-character code points, standardizes S3 CopySource encoding and parsing (including support and validation of source/versionId), versioned continuation tokens, simplifies signed URL signing APIs with path-matching utilities, refines DB error messages for the new constraint, and adds extensive tests for Unicode, CopySource, signed URLs, and token behavior. Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTP as S3 Handler
participant Parser as parseCopySource
participant Storage as Storage Service
participant DB as Database
participant Backend as Object Backend
Client->>HTTP: COPY request (CopySource header, Destination)
HTTP->>Parser: parseCopySource(CopySource)
alt malformed
Parser-->>HTTP: throw InvalidParameter / MissingParameter
HTTP-->>Client: 400
else parsed
Parser-->>HTTP: {bucket, key, sourceVersion?}
HTTP->>Storage: copyObject(srcBucket, srcKey, srcVersion?, destBucket, destKey)
Storage->>DB: SELECT version WHERE bucket=srcBucket AND name=srcKey
DB-->>Storage: {version, metadata} or not found
alt sourceVersion provided
Storage->>Storage: compare provided sourceVersion vs DB.version
alt mismatch
Storage-->>HTTP: throw NoSuchKey / InvalidParameter
HTTP-->>Client: 404/400
else match
Storage->>Backend: CopyObject with encoded CopySource (includes versionId)
Backend-->>Storage: {eTag, lastModified}
Storage-->>HTTP: success
HTTP-->>Client: 200 OK (copy response)
end
else no sourceVersion
Storage->>Backend: CopyObject using DB.version in CopySource
Backend-->>Storage: {eTag, lastModified}
Storage-->>HTTP: success
HTTP-->>Client: 200 OK (copy response)
end
end
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3dd9e78 to
60b9538
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/object.ts (1)
876-894:⚠️ Potential issue | 🟠 MajorBackward compatibility break in continuation token decoding corrupts pagination cursors
Line 889 unconditionally decodes all token values with
decodeURIComponent(). Old tokens stored raw unescaped values (e.g.,a%2Fbas a literal string), but the new decoder treats%2Fas an encoded/, silently transforming cursor positions and causing list operations to skip or duplicate results.Proposed fix (version-prefix encoded tokens)
function encodeContinuationToken(tokenInfo: ContinuationToken) { const result: string[] = [] for (const [k, v] of Object.entries(CONTINUATION_TOKEN_PART_MAP)) { const value = tokenInfo[v] if (value) { - result.push(`${k}:${encodeURIComponent(value)}`) + result.push(`${k}:e:${encodeURIComponent(value)}`) } } return Buffer.from(result.join('\n')).toString('base64') } function decodeContinuationToken(token: string): ContinuationToken { @@ - let value = partMatch[2] - try { - value = decodeURIComponent(value) - } catch { - // Backward compatibility: previously cursor values were stored unescaped. - } + let value = partMatch[2] + if (value.startsWith('e:')) { + value = decodeURIComponent(value.slice(2)) + } result[CONTINUATION_TOKEN_PART_MAP[partMatch[1]]] = value } return result }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/object.ts` around lines 876 - 894, The decoder decodeContinuationToken currently always runs decodeURIComponent on every part which corrupts legacy unescaped tokens; update decodeContinuationToken to support a version prefix: detect and strip a token version prefix (e.g., "v1:" vs legacy/no-prefix), and only apply decodeURIComponent for the new version (v1) while preserving legacy behavior by leaving values raw (no decoding) when the token has no version prefix; keep using CONTINUATION_TOKEN_PART_MAP and the same part parsing logic, and ensure the function throws the same Invalid continuation token error for malformed parts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/internal/errors/codes.ts`:
- Line 327: The error message in ERRORS.InvalidKey uses encodeURIComponent(key)
which can throw for unpaired UTF-16 surrogates; update ERRORS.InvalidKey (where
the message is constructed) to produce a safe-encoded key: prefer using
key.toWellFormed() if available, otherwise normalize/replace unpaired surrogates
(or fall back to a try/catch around encodeURIComponent and sanitize the key to
replace invalid code units with U+FFFD) and then call encodeURIComponent on the
sanitized string before embedding it in the message so URIError cannot escape
and turn a 400 into a 500.
---
Outside diff comments:
In `@src/storage/object.ts`:
- Around line 876-894: The decoder decodeContinuationToken currently always runs
decodeURIComponent on every part which corrupts legacy unescaped tokens; update
decodeContinuationToken to support a version prefix: detect and strip a token
version prefix (e.g., "v1:" vs legacy/no-prefix), and only apply
decodeURIComponent for the new version (v1) while preserving legacy behavior by
leaving values raw (no decoding) when the token has no version prefix; keep
using CONTINUATION_TOKEN_PART_MAP and the same part parsing logic, and ensure
the function throws the same Invalid continuation token error for malformed
parts.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 46444ed6-2080-41d2-80b4-c88c86a791d1
📒 Files selected for processing (21)
migrations/tenant/57-unicode-object-names.sqlpackage.jsonsrc/http/plugins/xml.tssrc/internal/database/migrations/types.tssrc/internal/errors/codes.tssrc/scripts/migrations-types.tssrc/storage/backend/s3/adapter.tssrc/storage/database/knex.tssrc/storage/limits.tssrc/storage/object.tssrc/storage/protocols/s3/s3-handler.tssrc/test/bucket.test.tssrc/test/common.tssrc/test/limits.test.tssrc/test/object-list-v2.test.tssrc/test/object.test.tssrc/test/s3-adapter.test.tssrc/test/s3-protocol.test.tssrc/test/test-hygiene.test.tssrc/test/tus.test.tssrc/test/xml-plugin.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/test/xml-plugin.test.ts`:
- Around line 3-15: Add tests to ensure decodeXmlNumericEntities rejects
XML-forbidden but in-range numeric entities (so they remain un-decoded). In the
existing test suite for decodeXmlNumericEntities add negative cases asserting
inputs like '�' (and '�' hex/dec variants), surrogate halves such as
'�' (and decimal '�'), and noncharacters like '' (and
decimal '') are returned unchanged; reference the
decodeXmlNumericEntities function to locate the implementation to validate
behavior and ensure both hex (&#x...) and decimal (&#...;) forms are covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: af25095c-594e-4a8f-afaa-599e92b3eee5
📒 Files selected for processing (21)
migrations/tenant/57-unicode-object-names.sqlpackage.jsonsrc/http/plugins/xml.tssrc/internal/database/migrations/types.tssrc/internal/errors/codes.tssrc/scripts/migrations-types.tssrc/storage/backend/s3/adapter.tssrc/storage/database/knex.tssrc/storage/limits.tssrc/storage/object.tssrc/storage/protocols/s3/s3-handler.tssrc/test/bucket.test.tssrc/test/common.tssrc/test/limits.test.tssrc/test/object-list-v2.test.tssrc/test/object.test.tssrc/test/s3-adapter.test.tssrc/test/s3-protocol.test.tssrc/test/test-hygiene.test.tssrc/test/tus.test.tssrc/test/xml-plugin.test.ts
🚧 Files skipped from review as they are similar to previous changes (9)
- src/test/test-hygiene.test.ts
- src/test/object-list-v2.test.ts
- src/test/common.ts
- migrations/tenant/57-unicode-object-names.sql
- src/storage/object.ts
- src/storage/backend/s3/adapter.ts
- src/internal/errors/codes.ts
- src/http/plugins/xml.ts
- src/test/limits.test.ts
ff9670e to
bf8efff
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/storage/protocols/s3/s3-handler.ts`:
- Around line 1448-1501: Change decodeLegacyContinuationToken and
decodeContinuationToken to accept a parameter name (e.g.
decodeContinuationToken(token: string, parameterName: string)) and replace all
plain throws like throw new Error('Invalid continuation token') with the
standardized request error from your ERRORS collection, passing the parameter
name (e.g. throw ERRORS.InvalidArgument(parameterName) or the appropriate
ERRORS.* factory used elsewhere). Update callers (e.g. where
KeyMarker/UploadIdMarker are decoded) to pass the parameter name string such as
'KeyMarker' or 'UploadIdMarker' when calling
decodeContinuationToken/decodeLegacyContinuationToken so malformed client tokens
produce a 4xx request error instead of a generic exception.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 773df860-6e8f-4ac9-bba0-96890cba39d8
📒 Files selected for processing (20)
migrations/tenant/57-unicode-object-names.sqlsrc/http/routes/object/getSignedObject.tssrc/http/routes/object/getSignedUploadURL.tssrc/http/routes/render/renderSignedImage.tssrc/http/routes/signed-url.tssrc/storage/backend/s3/adapter.tssrc/storage/events/lifecycle/webhook-filter.tssrc/storage/events/lifecycle/webhook.tssrc/storage/object.tssrc/storage/protocols/s3/s3-handler.tssrc/test/limits.test.tssrc/test/object-list-v2.test.tssrc/test/object.test.tssrc/test/render-routes.test.tssrc/test/s3-adapter.test.tssrc/test/s3-protocol.test.tssrc/test/signed-url-route.test.tssrc/test/tus.test.tssrc/test/webhook-filter.test.tssrc/test/webhooks.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/storage/backend/s3/adapter.ts
- src/test/limits.test.ts
| function decodeLegacyContinuationToken(decoded: string) { | ||
| // Backward compatibility: preserve pre-version behavior for old in-flight tokens. | ||
| const continuationToken = decoded.split(':')[1] | ||
| if (!continuationToken) { | ||
| throw new Error('Invalid continuation token') | ||
| } | ||
| return continuationToken | ||
| } | ||
|
|
||
| function decodeContinuationToken(token: string) { | ||
| const decoded = Buffer.from(token, 'base64').toString().split(':') | ||
| const decoded = Buffer.from(token, 'base64').toString() | ||
|
|
||
| if (decoded.startsWith('l:')) { | ||
| return decodeLegacyContinuationToken(decoded) | ||
| } | ||
|
|
||
| const parts = decoded.split('\n') | ||
| let version: string | undefined | ||
| let encodedValue: string | undefined | ||
|
|
||
| for (const part of parts) { | ||
| const match = part.match(/^(\S):(.*)/) | ||
| if (!match || match.length !== 3) { | ||
| throw new Error('Invalid continuation token') | ||
| } | ||
|
|
||
| if (match[1] === 'v') { | ||
| if (version !== undefined) { | ||
| throw new Error('Invalid continuation token') | ||
| } | ||
| version = match[2] | ||
| continue | ||
| } | ||
|
|
||
| if (match[1] === 'l') { | ||
| if (encodedValue !== undefined) { | ||
| throw new Error('Invalid continuation token') | ||
| } | ||
| encodedValue = match[2] | ||
| continue | ||
| } | ||
|
|
||
| if (decoded.length === 0) { | ||
| throw new Error('Invalid continuation token') | ||
| } | ||
|
|
||
| return decoded[1] | ||
| if (version !== '1' || !encodedValue) { | ||
| throw new Error('Invalid continuation token') | ||
| } | ||
|
|
||
| try { | ||
| return decodeURIComponent(encodedValue) | ||
| } catch { | ||
| throw new Error('Invalid continuation token') | ||
| } |
There was a problem hiding this comment.
Return structured request errors for invalid continuation tokens.
These helpers now throw plain Error for malformed client-supplied markers. In this handler, invalid request input is otherwise surfaced via ERRORS.*; letting KeyMarker / UploadIdMarker failures escape as generic exceptions makes the response path unpredictable and can turn a bad client token into an internal error instead of a 4xx.
Proposed fix
-function decodeLegacyContinuationToken(decoded: string) {
+function decodeLegacyContinuationToken(decoded: string, parameterName: string) {
// Backward compatibility: preserve pre-version behavior for old in-flight tokens.
const continuationToken = decoded.split(':')[1]
if (!continuationToken) {
- throw new Error('Invalid continuation token')
+ throw ERRORS.InvalidParameter(parameterName)
}
return continuationToken
}
-function decodeContinuationToken(token: string) {
+function decodeContinuationToken(token: string, parameterName: string) {
const decoded = Buffer.from(token, 'base64').toString()
if (decoded.startsWith('l:')) {
- return decodeLegacyContinuationToken(decoded)
+ return decodeLegacyContinuationToken(decoded, parameterName)
}
const parts = decoded.split('\n')
let version: string | undefined
let encodedValue: string | undefined
for (const part of parts) {
const match = part.match(/^(\S):(.*)/)
if (!match || match.length !== 3) {
- throw new Error('Invalid continuation token')
+ throw ERRORS.InvalidParameter(parameterName)
}
if (match[1] === 'v') {
if (version !== undefined) {
- throw new Error('Invalid continuation token')
+ throw ERRORS.InvalidParameter(parameterName)
}
version = match[2]
continue
}
if (match[1] === 'l') {
if (encodedValue !== undefined) {
- throw new Error('Invalid continuation token')
+ throw ERRORS.InvalidParameter(parameterName)
}
encodedValue = match[2]
continue
}
- throw new Error('Invalid continuation token')
+ throw ERRORS.InvalidParameter(parameterName)
}
if (version !== '1' || !encodedValue) {
- throw new Error('Invalid continuation token')
+ throw ERRORS.InvalidParameter(parameterName)
}
try {
return decodeURIComponent(encodedValue)
} catch {
- throw new Error('Invalid continuation token')
+ throw ERRORS.InvalidParameter(parameterName)
}
}And pass the parameter name at the call sites, e.g. decodeContinuationToken(keyContinuationToken, 'KeyMarker').
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/storage/protocols/s3/s3-handler.ts` around lines 1448 - 1501, Change
decodeLegacyContinuationToken and decodeContinuationToken to accept a parameter
name (e.g. decodeContinuationToken(token: string, parameterName: string)) and
replace all plain throws like throw new Error('Invalid continuation token') with
the standardized request error from your ERRORS collection, passing the
parameter name (e.g. throw ERRORS.InvalidArgument(parameterName) or the
appropriate ERRORS.* factory used elsewhere). Update callers (e.g. where
KeyMarker/UploadIdMarker are decoded) to pass the parameter name string such as
'KeyMarker' or 'UploadIdMarker' when calling
decodeContinuationToken/decodeLegacyContinuationToken so malformed client tokens
produce a 4xx request error instead of a generic exception.
This allows all Unicode characters that are also valid in XML 1.0 documents. See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html See: https://www.w3.org/TR/REC-xml/#charsets
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
What kind of change does this PR introduce?
feature
What is the current behavior?
Limited set of characters are permitted in object names.
What is the new behavior?
Unicode is supported across the board.
Additional context
#133