Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fallback to Prim_Text_Helper.compile_regex; accept Regex in Text.parse_to_table #7297

Merged
merged 7 commits into from
Jul 18, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,7 @@
- [Improving date/time support in Table - added `date_diff`, `date_add`,
`date_part` and some shorthands. Extended `Time_Period` with milli-, micro-
and nanosecond periods.][7221]
- [`Text.parse_to_table` can take a `Regex`.][7297]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -749,6 +750,7 @@
[7223]: https://github.com/enso-org/enso/pull/7223
[7234]: https://github.com/enso-org/enso/pull/7234
[7221]: https://github.com/enso-org/enso/pull/7221
[7297]: https://github.com/enso-org/enso/pull/7297

#### Enso Compiler

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Table.from_objects value fields=Nothing =
regex).

Arguments:
- pattern: The pattern used to search within the text.
- pattern: The regex string or `Regex` used to search within the text.
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved
- case_sensitivity: Specifies if the text values should be compared case
sensitively.
- parse_values: Parse any values using the default value parser.
Expand All @@ -114,8 +114,8 @@ Table.from_objects value fields=Nothing =
If the marked groups are named, the names will be used otherwise the column
will be named `Column <N>` where `N` is the number of the marked group.
(Group 0 is not included.)
Text.parse_to_table : Text -> Case_Sensitivity -> Boolean -> Problem_Behavior -> Table ! Type_Error | Regex_Syntax_Error | Illegal_Argument
Text.parse_to_table self pattern case_sensitivity=Case_Sensitivity.Sensitive parse_values=True on_problems=Report_Warning =
Text.parse_to_table : Text | Regex -> Case_Sensitivity -> Boolean -> Problem_Behavior -> Table ! Type_Error | Regex_Syntax_Error | Illegal_Argument
Text.parse_to_table self (pattern : Text | Regex) case_sensitivity=Case_Sensitivity.Sensitive parse_values=True on_problems=Report_Warning =
Parse_To_Table.parse_text_to_table self pattern case_sensitivity parse_values on_problems

## PRIVATE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@ from project.Errors import Duplicate_Output_Column_Names
Converts a Text into a Table using a regular expression pattern.

See Table.parse_text_to_table.
parse_text_to_table : Text -> Text -> Case_Sensitivity -> Boolean -> Problem_Behavior -> Table ! Type_Error | Regex_Syntax_Error | Illegal_Argument
parse_text_to_table text pattern_string="." case_sensitivity=Case_Sensitivity.Sensitive parse_values=True on_problems=Report_Warning =
parse_text_to_table : Text | Regex -> Text -> Case_Sensitivity -> Boolean -> Problem_Behavior -> Table ! Type_Error | Regex_Syntax_Error | Illegal_Argument
parse_text_to_table text regex_or_pattern_string="." case_sensitivity=Case_Sensitivity.Sensitive parse_values=True on_problems=Report_Warning =
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved
case_insensitive = case_sensitivity.is_case_insensitive_in_memory
pattern = Regex.compile pattern_string case_insensitive=case_insensitive
pattern = case regex_or_pattern_string of
_ : Regex -> regex_or_pattern_string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ : Regex -> regex_or_pattern_string
_ : Regex -> regex_or_pattern_string.recompile case_sensitivity

Will need to be rebased on my one for this to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

_ : Text -> Regex.compile regex_or_pattern_string case_insensitive=case_insensitive
matches = pattern.match_all text

columns = case pattern.group_count == 1 of
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import com.oracle.truffle.api.CompilerDirectives.TruffleBoundary;
import com.oracle.truffle.api.dsl.Cached;
import com.oracle.truffle.api.dsl.Fallback;
import com.oracle.truffle.api.dsl.Specialization;
import com.oracle.truffle.api.exception.AbstractTruffleException;
import com.oracle.truffle.api.nodes.Node;
Expand Down Expand Up @@ -45,6 +46,13 @@ Object alwaysCompile(Text pattern, Text options) {
return compile(pattern.toString(), options.toString());
}

@Fallback
Object doOther(Object pattern, Object options) {
Builtins builtins = EnsoContext.get(this).getBuiltins();
Atom err = builtins.error().makeTypeError(builtins.text(), pattern, "pattern");
throw new PanicException(err, this);
}

@TruffleBoundary
Object compile(String pattern, String options) {
var ctx = EnsoContext.get(this);
Expand Down
10 changes: 10 additions & 0 deletions test/Table_Tests/src/In_Memory/Parse_To_Table_Spec.enso
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from Standard.Base import all

import Standard.Base.Data.Text.Regex.Regex_Syntax_Error
import Standard.Base.Errors.Common.Type_Error
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument
import Standard.Table.Data.Table_Conversions
import Standard.Test.Extensions
Expand All @@ -19,6 +20,12 @@ spec =
actual = "a 7 ab12 bt100 c12d20q 12".parse_to_table "[a-z]+\d*"
actual.should_equal expected

Test.specify "text_to_table with a regex" <|
expected = Table.from_rows ["Column"]
[["a"], ["ab12"], ["bt100"], ["c12"], ["d20"], ["q"]]
actual = "a 7 ab12 bt100 c12d20q 12".parse_to_table "[a-z]+\d*".to_regex
actual.should_equal expected

Test.group "Text.parse_to_table with groups" <|
Test.specify "with groups" <|
expected = Table.from_rows ["Column 1", "Column 2"]
Expand Down Expand Up @@ -68,4 +75,7 @@ spec =
Test.specify "enpty pattern" <|
"abc".parse_to_table "" . should_fail_with Illegal_Argument

Test.specify "bad arg" <|
Test.expect_panic_with (actual = "a 7 ab12 bt100 c12d20q 12".parse_to_table 12) Type_Error

main = Test_Suite.run_main spec
5 changes: 5 additions & 0 deletions test/Tests/src/Data/Text/Regex_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ spec =
Test.specify "should disallow empty patterns in `compile`" <|
Regex.compile "" . should_fail_with Illegal_Argument

Test.specify "passing a non-string should fail with a type error" <|
Test.expect_panic_with (Regex.compile 12) Type_Error
p = Regex.compile "[a-z]"
Test.expect_panic_with (Regex.compile p) Type_Error
Comment on lines +38 to +39
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this behaviour (that I think is good), do we still need the parseRegexPattern?

I assume it should not be allowed to pass an already-compiled regex to compile_regex, we should use James's recompile instead - e.g. to update case sensitivity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that parseRegexPattern is for caching compiled regexes, so if you compile the same one multiple times, it uses the cache. This test is for passing a non-string to Regex.compile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry! I read that wrong apparently.


Test.group "Escape" <|
Test.specify "should escape an expression for use as a literal" <|
Regex.escape "[a-z\d]+" . should_equal '\\[a-z\\d\\]\\+'
Expand Down
Loading