Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Ability to skip last rows. #225

Merged
merged 4 commits into from Dec 27, 2017
Merged

Ability to skip last rows. #225

merged 4 commits into from Dec 27, 2017

Conversation

AcckiyGerman
Copy link
Contributor

@AcckiyGerman AcckiyGerman commented Dec 19, 2017


Added tests, code and updated readme.

Details

I added one more built-in processor for Stream (tabulator/stream.py line 448) , which has a buffer, so it could delete rows counting them from the end.

@AcckiyGerman
Copy link
Contributor Author

AcckiyGerman commented Dec 19, 2017

Strange error in the travis logs:
pytest requires Python '>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*' but the running Python is 3.3.6
@roll ?

@zelima zelima requested review from roll and akariv December 19, 2017 09:19
for row in extended_rows:
yield row
else:
buffer_size = abs(min(rows_to_skip)) + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing - the buffer size if actually abs(min(rows_to_skip)) (without the +1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will fix.

# use buffer to save last rows
for row in extended_rows:
buffer.append(row)
if len(buffer) == buffer_size:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(buffer) > buffer_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

# Now squeeze out the buffer
last_row_number = buffer[len(buffer)-1][0]
# with last_row_number, we could transform negative row numbers to positive
rows_to_skip_positive = [last_row_number + 1 + n for n in rows_to_skip]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit convoluted...
why not something like:

n = len(buffer)
for i, row in enumerate(buffer):
    if i-n not in rows_to_skip:
       yield row

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 'last_row_number' from extended_row[0] because 'extended_rows' were counted before any row deletions, so I avoid double deletions when two arguments are pointing on the same row (what you've been worried in the next comment)

assert stream.read() == [['id', 'name'], ['1', 'english']]


def test_stream_skip_rows_no_double_skip():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since skip rows runs before skip negative numbers, it's possible that 'skip_rows' will remove the last line, and then 'skip_negative_numbers' wouldn't know which line is the last line...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, it is happens :( fixing...

@AcckiyGerman
Copy link
Contributor Author

Update from @akariv : Pytest no longer supports Python 2.6 and 3.3
https://docs.pytest.org/en/latest/changelog.html#deprecations-and-removals

@roll
Copy link
Member

roll commented Dec 20, 2017

@AcckiyGerman
Thanks. Great work.

Could you please also address in this PR this issue - #226 - just remove Pyhton 3.3 from tox/travis and other places. So we will be able to have a green build.

# Apply processors to iterator
processors = [builtin_processor] + self.__post_parse
processors = [builtin_processor, skip_negative_rows] + self.__post_parse
Copy link
Member

@roll roll Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this processor only if there is an actual need of skipping rows at the end of the file. So could you please use here a simple condition? It will save some CPU ticks. And less processors - easy to debug. In 99% of the cases there is no negative skip_rows values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

 - PR feedback fixes.
 - Tox & travis configs updated (pytest does not support python3.3 anymore).
 #issues:
#224
#226
@AcckiyGerman
Copy link
Contributor Author

@akariv @roll

  • the PR feedback are fixed
  • python3.3 tests are disabled

# last row counter will be incremented in builtin_processor()
# and used in skip_negative_rows() to count rows from the end
last_row_number = 0
rows_to_skip_from_end = [n for n in self.__skip_rows_by_numbers if n < 0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why create a list here if you're not using it?

yield buffer.popleft()

# Now squeeze out the buffer
global last_row_number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use the global? usually it's a sign of bad design.
my conclusion is that you should run the 'remove rows from end' bit before the original 'remove_rows' processor - and then this entire thing is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my stupid head :) that's so obvious )))

 - PR feedback fixes (v2).
 - Tox & travis configs updated (pytest does not support python3.3 anymore).
 #issues:
#224
#226
@roll
Copy link
Member

roll commented Dec 20, 2017

@AcckiyGerman
Thanks. I'll be able to review and merge tomorrow morning.

@roll
Copy link
Member

roll commented Dec 21, 2017

Sorry for the delay. Can't do today. I'm looking forward for a tomorrow merge.

@AcckiyGerman
Copy link
Contributor Author

That's no problem with time, just hope it will be done :)

@roll
Copy link
Member

roll commented Dec 22, 2017

If it's not blocking for today then I prefer to do it on Monday because today is super busy day in OKI)

Copy link
Member

@roll roll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@roll roll merged commit a317561 into frictionlessdata:master Dec 27, 2017
@roll roll removed the {review} label Dec 27, 2017
@roll
Copy link
Member

roll commented Dec 27, 2017

Released as v1.13.0

@AcckiyGerman AcckiyGerman deleted the skip-last-rows branch December 28, 2017 07:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop support for py3.3? Skip last rows
3 participants