Added support for regex patterns in skip_rows #290

roll · 2019-12-19T13:39:42Z

fixes Feature request: regular expression skip rows BCODMO/frictionless-usecases#34

@akariv
@cschloer
Here is regex support for skip_rows:

skip_rows=[1, '# comment', '^# (regex|comment)']

What do you think :

it's OK to have one argument for it OR
it's better to use skip_rows and skip_rows_regex

akariv · 2019-12-24T07:50:09Z

tabulator/stream.py

-            strings to skip. If a string, it'll skip rows that begin with it
-            (e.g. '#' and '//').
+            List of row numbers, strings and regex patterns to skip.
+            If a string, it'll skip rows that begin with it e.g. '#' and '//'.


'rows that begin with' is a bit ambiguous - perhaps 'rows that their first cells begin with the string or match the regex'

akariv · 2019-12-24T07:52:16Z

This looks good @roll , the API is better the way you implemented it (i.e. no need for separate property).
This might be a breaking change, though, in case someone is using the 'skip by string' feature and is using '^' as a first character...

cschloer · 2020-01-06T09:20:03Z

For our usecase totally changing skip_rows would be okay, but it's probably better for backward compatibility to have skip_rows_regex.

roll · 2020-01-06T13:01:26Z

Honestly, I don't think that it's a real risk that ^ will be breaking for real existent software. Probably with a proper changelog note and a new x.N.x version should be fine. But yea it's a tricky situation.

The reason I don't like skip_rows_regex is that we have skip_rows accepting string and numbers. So the current argument is already mixed.

Actually, the truly proper way could be having something like:

skip_rows # with deprecated strings as a parameter
skip_rows_text
skip_rows_regex

But it will mean changes (at least documentation) for a lot of libraries (datapackage/pipeline/etc). And our current specs/software style is more dynamic than strictly typed so I'm not sure that it's worth the trouble

roll · 2020-01-14T13:51:52Z

@akariv
@cschloer
I've figured out another option which will definitely not be breaking and maybe more obvious because it's a well-known JavaScript notation for RegExp:

skip_rows=[1, '# comment', '/# (regex|comment)/'] # new idea
skip_rows=[1, '# comment', '^# (regex|comment)'] # initial idea

Which one do you think is better?

I have a feeling that this concept (string/regex coming from a text source like DPP) can be used in other parts of the stack so I want to ensure that the solution is good enough

akariv · 2020-01-14T15:46:03Z

It has the same issues as before, as e.g. `/* comment */` will be treated as a regexp. What if we used RegExp objects when we wanted to specify regular expressions? i.e. the result of `re.compile()`

…

On Tue, Jan 14, 2020, 15:51 roll ***@***.***> wrote: @akariv <https://github.com/akariv> @cschloer <https://github.com/cschloer> I've figured out another option which will definitely not be breaking and maybe more obvious because it's a well-known JavaScript notation for RegExp: skip_rows=[1, '# comment', '/# (regex|comment)/'] # new idea skip_rows=[1, '# comment', '^# (regex|comment)'] # initial idea `` Which one do you think is better? I have a feeling that this concept (string/regex coming from a text source) can be used in other parts of the stack so I want to ensure that the solution is good enough — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#290?email_source=notifications&email_token=AACAY5P2CS6AIHX7RB6PSFDQ5W7PTA5CNFSM4J5GQIJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI4VS3I#issuecomment-574183789>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACAY5MKN2UOXIWH4O5VKH3Q5W7PTANCNFSM4J5GQIJQ> .

cschloer · 2020-01-14T16:50:09Z

Using RegExp objects would mean that the regular expression option couldn't be used from a pipeline-spec.yaml (since it's only text). If we want to keep skip rows, we could also make it into a dictionary object, which can be passed in the yaml. Something like :
skip_rows=[1, '# comment', { 'type': 'regex', 'value': '^#'}]

That would keep support for simple strings and numbers but would open regular expression support.

akariv · 2020-01-23T07:37:47Z

@cschloer this is a good solution I think

roll · 2020-01-30T12:44:30Z

Hopefully, at some point, we will find a less verbose syntax but I agree that with @cschloer's version we can't go wrong

roll · 2020-01-30T12:57:53Z

Will be tabulator@1.33

roll added 2 commits December 19, 2019 16:35

Added support for regex patterns in skip_rows

9d1eb05

Fixed regex

fc1feb3

akariv reviewed Dec 24, 2019

View reviewed changes

roll added this to In progress in Pilot with BCO-DMO Jan 27, 2020

roll mentioned this pull request Jan 27, 2020

Feature request: regular expression skip rows BCODMO/frictionless-usecases#34

Closed

roll self-assigned this Jan 27, 2020

roll and others added 2 commits January 30, 2020 15:30

Merge branch 'master' into skip_rows_regex

b232497

Rebased on passing regex as a dict

d52ae1b

roll added 2 commits January 30, 2020 15:49

Fixed linting

9343761

Updated readme

38ff08f

roll merged commit a6606ad into master Jan 30, 2020

roll deleted the skip_rows_regex branch January 30, 2020 12:56

roll moved this from In progress to Done in Pilot with BCO-DMO Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for regex patterns in skip_rows #290

Added support for regex patterns in skip_rows #290

roll commented Dec 19, 2019 •

edited

akariv Dec 24, 2019

akariv commented Dec 24, 2019

cschloer commented Jan 6, 2020

roll commented Jan 6, 2020 •

edited

roll commented Jan 14, 2020 •

edited

akariv commented Jan 14, 2020 via email

cschloer commented Jan 14, 2020 •

edited

akariv commented Jan 23, 2020

roll commented Jan 30, 2020

roll commented Jan 30, 2020

Added support for regex patterns in skip_rows #290

Added support for regex patterns in skip_rows #290

Conversation

roll commented Dec 19, 2019 • edited

akariv Dec 24, 2019

Choose a reason for hiding this comment

akariv commented Dec 24, 2019

cschloer commented Jan 6, 2020

roll commented Jan 6, 2020 • edited

roll commented Jan 14, 2020 • edited

akariv commented Jan 14, 2020 via email

cschloer commented Jan 14, 2020 • edited

akariv commented Jan 23, 2020

roll commented Jan 30, 2020

roll commented Jan 30, 2020

roll commented Dec 19, 2019 •

edited

roll commented Jan 6, 2020 •

edited

roll commented Jan 14, 2020 •

edited

cschloer commented Jan 14, 2020 •

edited