-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat(sql): implement files(s), glob(s) and glob(Ss) functions
#6391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces Changes
Sequence Diagram(s)sequenceDiagram
participant SQL as SQL Query
participant GlobFunc as GlobFilesFunctionFactory
participant FilesFunc as FilesFunctionFactory
participant GlobStr as GlobStrFunctionFactory
participant Cursor as FilesRecordCursor
participant Filter as FilteredRecordCursor
SQL->>GlobFunc: glob('/path/to/*.csv')
Note over GlobFunc: Extract non-glob prefix<br/>'/path/to/'
GlobFunc->>FilesFunc: files('/path/to/')
FilesFunc->>Cursor: Create cursor for all files
GlobFunc->>GlobStr: Create glob matcher for '*.csv'
GlobStr->>Filter: Wrap cursor with filter
Filter->>Cursor: Iterate files
Note over Filter: Apply glob pattern<br/>filter
Filter-->>SQL: Return matching records
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Key areas requiring attention:
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CodeRabbit please read PR description and review the changes |
🧩 Analysis chain@bluestreak01: I'll review the changes in this PR. Let me examine the implementation of the new 🏁 Script executed: Length of output: 52409 🏁 Script executed: Length of output: 986 🏁 Script executed: Length of output: 13573 🏁 Script executed: Length of output: 485 @bluestreak01: I've reviewed the implementation of the Critical Issues1. Thread-Safety Issue in GlobStrFunctionFactory
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (6)
core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java (1)
3023-3029: Avoid hard‑coded/tmp/*inGlobFilesFunctionFactorytest argumentsSpecial‑casing
GlobFilesFunctionFactorywithnew StrConstant("/tmp/*")bakes in an absolute, Unix‑specific path. That can be fragile on non‑Unix environments and may also conflict with any configured SQL copy/file root restrictions. Since this test only needs a syntactically valid glob fortoPlan()/ instantiation, consider switching to a path that is:
- Relative to the test root or configured SQL copy/import root, or
- Clearly environment‑agnostic (e.g., a relative pattern under the test fixture directory).
That keeps the test portable and decoupled from host filesystem layout.
core/src/main/java/io/questdb/griffin/engine/functions/catalogue/FilesFunctionFactory.java (1)
63-77: Enforce constant root argument and integrate with configured copy root / path restrictions
files(s)currently:
- Accepts any expression as its root argument (
Function arg = args.getQuick(0)), but then immediately materializes it once viagetStrA(null)and bakes that into the cursor factory. If the caller passes a non‑constant expression, they will still only see one root, which is misleading.- Only checks
Chars.isBlank(root); it does not validate the root againstCAIRO_SQL_COPY_ROOT/ the configured import root, nor guard against..traversal.To avoid surprising semantics and tighten security, consider:
- Requiring the root to be constant at compile time.
- Validating that the resolved path is under the configured copy root (if that’s the intended security boundary), and rejecting anything else with a
SqlException.For example:
public Function newInstance( @@ - final Function arg = args.getQuick(0); - final CharSequence root = arg.getStrA(null); + final Function arg = args.getQuick(0); + if (!arg.isConstant()) { + throw SqlException.$(argPositions.getQuick(0), "files() root must be a constant"); + } + + final CharSequence root = arg.getStrA(null); if (Chars.isBlank(root)) { throw SqlException.$(position, "root path must be non-empty"); } + + // Optionally, resolve against CAIRO_SQL_COPY_ROOT or similar and + // verify the resulting path does not escape that root via "..". return new CursorFunction(new FilesRecordCursorFactory(configuration, root));This keeps
files()behavior predictable and aligned with other file‑system‑related functions that honor configured roots.core/src/test/java/io/questdb/test/griffin/engine/functions/table/GlobFilesIntegrationTest.java (2)
60-225: Good end‑to‑end coverage; consider adding edge‑case glob patternsThe integration tests exercise
files()plus typicalglob(path, '*.ext')flows (ordering, limits, nested directories) and look correct and deterministic given the way you fix file sizes and timestamps.Once the core glob implementation is hardened, it would be useful to add a few more cases here to catch regressions:
- Patterns using
?and[ ](including negation with!) to ensureGlobStrFunctionFactory’s bracket handling behaves as intended.- Behavior for empty patterns or obviously malformed ones (unbalanced
[), verifying that you get a clear error rather than silent mis‑matches.These can be incremental tests using the existing
setupTestFilesscaffolding.
227-291: Test helper duplicates low‑level file creation logic; consider centralizing
createTestFile/setupTestFileshere are very similar to the helpers inFilesFunctionFactoryTest(manualUnsafe.malloc,Files.write,setLastModified, directory creation).To reduce duplication and keep future changes (e.g., to test file size or timestamp conventions) in one place, consider extracting these helpers into a shared test utility class under
io.questdb.test.*and reusing them across both test classes.core/src/test/java/io/questdb/test/griffin/engine/functions/catalogue/FilesFunctionFactoryTest.java (1)
280-320: Share test file creation utilities with glob integration testsAs in
GlobFilesIntegrationTest,createTestFileandsetupTestFilesmanually manage native memory and I/O. The logic is nearly identical between the two classes.Consider extracting a small shared test utility (e.g.,
TestFilesHelper) that:
- Creates a test root directory under a given base.
- Writes a file of a specified size with deterministic contents.
- Sets
lastModifiedbased on size or a provided timestamp.Then both test classes can use it, cutting duplication and keeping native memory handling in one place.
core/src/test/java/io/questdb/test/griffin/engine/functions/table/GlobFilesFunctionFactoryTest.java (1)
32-123: Excellent test coverage forextractNonGlobPrefix.The test suite comprehensively covers edge cases (null, empty, single slash) and various glob patterns (
*,?,[,]) across different path styles. The tests are well-organized, focused, and follow good naming conventions.Optionally, consider adding a few more edge case tests to further improve robustness:
- Paths with multiple consecutive slashes (e.g.,
"/path//to///file*.txt")- Relative paths starting with
./or../(e.g.,"./data/*.csv"or"../parent/file?.txt")- Opening bracket without closing bracket (e.g.,
"data/file[0-9.parquet")- Paths with only backslashes (Windows-style:
"C:\\data\\*.txt")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
core/src/main/java/io/questdb/griffin/engine/functions/catalogue/ExportFilesFunctionFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/functions/catalogue/FilesFunctionFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/functions/catalogue/FilesRecordCursor.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/functions/catalogue/ImportFilesFunctionFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/functions/regex/GlobStrFunctionFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/functions/table/GlobFilesFunctionFactory.java(1 hunks)core/src/test/java/io/questdb/test/cutlass/http/ExpParquetExportTest.java(2 hunks)core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java(3 hunks)core/src/test/java/io/questdb/test/griffin/engine/functions/catalogue/FilesFunctionFactoryTest.java(1 hunks)core/src/test/java/io/questdb/test/griffin/engine/functions/table/GlobFilesFunctionFactoryTest.java(1 hunks)core/src/test/java/io/questdb/test/griffin/engine/functions/table/GlobFilesIntegrationTest.java(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-01T00:46:15.310Z
Learnt from: bluestreak01
Repo: questdb/questdb PR: 6326
File: core/src/main/java/io/questdb/griffin/engine/functions/groupby/RegressionR2FunctionFactory.java:43-57
Timestamp: 2025-11-01T00:46:15.310Z
Learning: In QuestDB, function registration in core/src/main/resources/function_list.txt is optional because the function loader automatically scans the classpath. Registration is only required for ambiguous functions.
Applied to files:
core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
- GitHub Check: New pull request (Coverage Report Coverage Report)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (7)
core/src/test/java/io/questdb/test/cutlass/http/ExpParquetExportTest.java (2)
27-32: ActiveConnectionTracker import relocation is fineMoving
ActiveConnectionTrackerin the import list is a no-op and keeps the dependency available where it’s used later in the file; no issues here.
118-125: Updated expected generate_series interval matches new behaviorAdjusting the expected SQL text in
testBasicsto use'1m'aligns the test with the updated generate_series interval; the assertion remains consistent with the endpoint’s behavior.core/src/main/java/io/questdb/griffin/engine/functions/catalogue/ImportFilesFunctionFactory.java (1)
84-84: LGTM! Appropriate memory tag for Path allocation.The change from
NATIVE_FUNC_RSStoNATIVE_PATHcorrectly categorizes this Path instance for memory accounting purposes, aligning with the semantic purpose of path-based resources in file operations.core/src/main/java/io/questdb/griffin/engine/functions/catalogue/ExportFilesFunctionFactory.java (1)
79-79: LGTM! Consistent memory tag update.The memory tag change aligns with the update in
ImportFilesFunctionFactoryand properly categorizes the export path allocation.core/src/main/java/io/questdb/griffin/engine/functions/catalogue/FilesRecordCursor.java (1)
58-58: LGTM! Completes consistent memory tag refactoring.This change completes the memory tag update across the path-based resources used in file listing operations, ensuring consistent memory categorization throughout the import/export file functionality.
core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java (1)
143-149: Imports for new glob factories are appropriateThe added imports for
GlobStrFunctionFactoryandGlobFilesFunctionFactorymatch their use intestFunctions()and are consistent with existing import style in this test class. No changes needed here.core/src/test/java/io/questdb/test/griffin/engine/functions/catalogue/FilesFunctionFactoryTest.java (1)
62-278: Comprehensive coverage forfiles()behaviorThis suite nicely exercises the key aspects of
files():
- Column projections and types.
- Recursive traversal and nested directories.
- Aggregations (
COUNT,SUM),GROUP BY,DISTINCT,ORDER BY,WHEREfilters, LIMIT, and non‑existent roots.The expectations match the helper behavior (sizes/timestamps), so it should catch regressions in
FilesFunctionFactoryandFilesRecordCursor.No functional issues spotted here.
core/src/main/java/io/questdb/griffin/engine/functions/catalogue/FilesFunctionFactory.java
Show resolved
Hide resolved
core/src/main/java/io/questdb/griffin/engine/functions/regex/GlobStrFunctionFactory.java
Show resolved
Hide resolved
core/src/main/java/io/questdb/griffin/engine/functions/regex/GlobStrFunctionFactory.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/questdb/griffin/engine/functions/table/GlobFilesFunctionFactory.java
Show resolved
Hide resolved
core/src/main/java/io/questdb/griffin/engine/functions/table/GlobFilesFunctionFactory.java
Show resolved
Hide resolved
|
@CodeRabbit review the fixes and check if your comments have been addressed? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 12157 @bluestreak01: Excellent work! 🎉 All the issues from my previous review have been addressed. Here's the detailed status: ✅ Critical Issues - All Fixed1. Thread-Safety with StringSinkStatus: ✅ FIXED The instance field has been replaced with a private static final io.questdb.std.ThreadLocal<StringSink> tlSink = new ThreadLocal<>(StringSink::new);This ensures each thread gets its own 2. IndexOutOfBoundsException in charAt(0)Status: ✅ FIXED Added a blank check that returns early before the if (Chars.isBlank(glob)) {
return filesFunctionFactory.newInstance(...);
}
// Now safe to call glob.charAt(0)3. Missing Null Check in convertGlobPatternToRegexStatus: ✅ FIXED Added if (Chars.isBlank(globPattern)) {
throw SqlException.$(argPositions.get(1), "glob pattern must not be null or empty");
}✅ Medium Priority Issues - All Addressed4. Assert Statement for Bracket ValidationStatus: ✅ FIXED Replaced assert with proper exception handling: if (bracketStackDepth != 0) {
throw SqlException.$(position, "unbalanced bracket [glob=").put(globPattern).put(']');
}5. Debug Code in ExplainPlanTest.javaStatus: ✅ FIXED The 6. Test Coverage for GlobStrFunctionFactoryStatus: ✅ SIGNIFICANTLY IMPROVED Coverage improved from 65% to 91.49% 📈 ✅ Suggestions - Addressed7. Root Path ValidationStatus: ✅ ADDRESSED Added validation in if (Chars.isBlank(root)) {
throw SqlException.$(position, "root path must be non-empty");
}📊 Overall Coverage StatsThe overall PR coverage is excellent:
SummaryAll critical issues have been resolved, and the implementation is now production-ready. The significant improvement in test coverage demonstrates thorough testing of edge cases. Great job on the fixes! 🚀 (| |) Hooray! All issues fixed! If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
[PR Coverage check]😍 pass : 108 / 113 (95.58%) file detail
|
Usage
These functions are prerequisites for parquet upload and for upgrades to the read_parquet function. The code for files(s) aready existed, but was not exposed as a function. the
globfunctions add glob-style pattern matching to complement the existing regex and LIKE matches.Three functions are introduced:
files(s): this is a recursive scan of a directory specified as an argumentglob(s): same as files but accepting a glob patternglob(Ss): applies a glob pattern match to the string (alternative to~(regex) andLIKEmatching)