Ability to skip last rows. #225

AcckiyGerman · 2017-12-19T08:40:48Z

fixes Skip last rows #224
fixes Drop support for py3.3? #226

Added tests, code and updated readme.

Details

I added one more built-in processor for Stream (tabulator/stream.py line 448) , which has a buffer, so it could delete rows counting them from the end.

#224

AcckiyGerman · 2017-12-19T08:58:10Z

Strange error in the travis logs:
pytest requires Python '>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*' but the running Python is 3.3.6
@roll ?

akariv · 2017-12-19T15:31:27Z

tabulator/stream.py

+                for row in extended_rows:
+                    yield row
+            else:
+                buffer_size = abs(min(rows_to_skip)) + 1


This is a bit confusing - the buffer size if actually abs(min(rows_to_skip)) (without the +1)

Agree, will fix.

akariv · 2017-12-19T15:31:41Z

tabulator/stream.py

+                # use buffer to save last rows
+                for row in extended_rows:
+                    buffer.append(row)
+                    if len(buffer) == buffer_size:


if len(buffer) > buffer_size

akariv · 2017-12-19T15:44:07Z

tabulator/stream.py

+                # Now squeeze out the buffer
+                last_row_number = buffer[len(buffer)-1][0]
+                # with last_row_number, we could transform negative row numbers to positive
+                rows_to_skip_positive = [last_row_number + 1 + n for n in rows_to_skip]


a bit convoluted...
why not something like:

n = len(buffer) for i, row in enumerate(buffer): if i-n not in rows_to_skip: yield row

I used 'last_row_number' from extended_row[0] because 'extended_rows' were counted before any row deletions, so I avoid double deletions when two arguments are pointing on the same row (what you've been worried in the next comment)

akariv · 2017-12-19T15:48:11Z

tests/test_stream.py

+        assert stream.read() == [['id', 'name'], ['1', 'english']]
+
+
+def test_stream_skip_rows_no_double_skip():


Since skip rows runs before skip negative numbers, it's possible that 'skip_rows' will remove the last line, and then 'skip_negative_numbers' wouldn't know which line is the last line...

yes, you are right, it is happens :( fixing...

AcckiyGerman · 2017-12-19T16:03:53Z

Update from @akariv : Pytest no longer supports Python 2.6 and 3.3
https://docs.pytest.org/en/latest/changelog.html#deprecations-and-removals

roll · 2017-12-20T13:14:37Z

@AcckiyGerman
Thanks. Great work.

Could you please also address in this PR this issue - #226 - just remove Pyhton 3.3 from tox/travis and other places. So we will be able to have a green build.

roll · 2017-12-20T13:17:56Z

tabulator/stream.py

        # Apply processors to iterator
-        processors = [builtin_processor] + self.__post_parse
+        processors = [builtin_processor, skip_negative_rows] + self.__post_parse


I think we should add this processor only if there is an actual need of skipping rows at the end of the file. So could you please use here a simple condition? It will save some CPU ticks. And less processors - easy to debug. In 99% of the cases there is no negative skip_rows values.

#224

- PR feedback fixes. - Tox & travis configs updated (pytest does not support python3.3 anymore). #issues: #224 #226

AcckiyGerman · 2017-12-20T14:59:50Z

@akariv @roll

the PR feedback are fixed
python3.3 tests are disabled

akariv · 2017-12-20T15:20:44Z

tabulator/stream.py

+        # last row counter will be incremented in builtin_processor()
+        # and used in skip_negative_rows() to count rows from the end
+        last_row_number = 0
+        rows_to_skip_from_end = [n for n in self.__skip_rows_by_numbers if n < 0]


why create a list here if you're not using it?

akariv · 2017-12-20T15:31:33Z

tabulator/stream.py

+                    yield buffer.popleft()
+
+            # Now squeeze out the buffer
+            global last_row_number


why use the global? usually it's a sign of bad design.
my conclusion is that you should run the 'remove rows from end' bit before the original 'remove_rows' processor - and then this entire thing is not needed.

oh my stupid head :) that's so obvious )))

- PR feedback fixes (v2). - Tox & travis configs updated (pytest does not support python3.3 anymore). #issues: #224 #226

roll · 2017-12-20T16:32:47Z

@AcckiyGerman
Thanks. I'll be able to review and merge tomorrow morning.

roll · 2017-12-21T13:44:10Z

Sorry for the delay. Can't do today. I'm looking forward for a tomorrow merge.

AcckiyGerman · 2017-12-22T10:02:11Z

That's no problem with time, just hope it will be done :)

roll · 2017-12-22T10:03:59Z

If it's not blocking for today then I prefer to do it on Monday because today is super busy day in OKI)

roll

👍

roll · 2017-12-27T17:50:05Z

Released as v1.13.0

Ability to skip last rows.

ce3f527

#224

akariv added the {review} label Dec 19, 2017

zelima requested review from roll and akariv December 19, 2017 09:19

akariv suggested changes Dec 19, 2017

View reviewed changes

roll suggested changes Dec 20, 2017

View reviewed changes

AcckiyGerman added 2 commits December 20, 2017 15:43

Ability to skip last rows: PR feedback fixes.

b7d5cd7

#224

Ability to skip last rows:

3a9986c

- PR feedback fixes. - Tox & travis configs updated (pytest does not support python3.3 anymore). #issues: #224 #226

AcckiyGerman mentioned this pull request Dec 20, 2017

Drop support for py3.3? #226

Closed

akariv suggested changes Dec 20, 2017

View reviewed changes

Ability to skip last rows:

8312755

- PR feedback fixes (v2). - Tox & travis configs updated (pytest does not support python3.3 anymore). #issues: #224 #226

roll approved these changes Dec 27, 2017

View reviewed changes

roll merged commit a317561 into frictionlessdata:master Dec 27, 2017

roll removed the {review} label Dec 27, 2017

AcckiyGerman deleted the skip-last-rows branch December 28, 2017 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to skip last rows. #225

Ability to skip last rows. #225

AcckiyGerman commented Dec 19, 2017 •

edited by roll

AcckiyGerman commented Dec 19, 2017 •

edited

akariv Dec 19, 2017

AcckiyGerman Dec 19, 2017

akariv Dec 19, 2017

AcckiyGerman Dec 19, 2017

akariv Dec 19, 2017

AcckiyGerman Dec 19, 2017

akariv Dec 19, 2017

AcckiyGerman Dec 20, 2017

AcckiyGerman commented Dec 19, 2017

roll commented Dec 20, 2017 •

edited

roll Dec 20, 2017 •

edited

AcckiyGerman Dec 20, 2017

AcckiyGerman commented Dec 20, 2017

akariv Dec 20, 2017

akariv Dec 20, 2017

AcckiyGerman Dec 20, 2017

roll commented Dec 20, 2017

roll commented Dec 21, 2017 •

edited

AcckiyGerman commented Dec 22, 2017

roll commented Dec 22, 2017

roll left a comment

roll commented Dec 27, 2017

		assert stream.read() == [['id', 'name'], ['1', 'english']]


		def test_stream_skip_rows_no_double_skip():

Ability to skip last rows. #225

Ability to skip last rows. #225

Conversation

AcckiyGerman commented Dec 19, 2017 • edited by roll

Details

AcckiyGerman commented Dec 19, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AcckiyGerman commented Dec 19, 2017

roll commented Dec 20, 2017 • edited

roll Dec 20, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AcckiyGerman commented Dec 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roll commented Dec 20, 2017

roll commented Dec 21, 2017 • edited

AcckiyGerman commented Dec 22, 2017

roll commented Dec 22, 2017

roll left a comment

Choose a reason for hiding this comment

roll commented Dec 27, 2017

AcckiyGerman commented Dec 19, 2017 •

edited by roll

AcckiyGerman commented Dec 19, 2017 •

edited

roll commented Dec 20, 2017 •

edited

roll Dec 20, 2017 •

edited

roll commented Dec 21, 2017 •

edited