Conversation
Strange error in the travis logs: |
tabulator/stream.py
Outdated
for row in extended_rows: | ||
yield row | ||
else: | ||
buffer_size = abs(min(rows_to_skip)) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing - the buffer size if actually abs(min(rows_to_skip))
(without the +1
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, will fix.
tabulator/stream.py
Outdated
# use buffer to save last rows | ||
for row in extended_rows: | ||
buffer.append(row) | ||
if len(buffer) == buffer_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(buffer) > buffer_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
tabulator/stream.py
Outdated
# Now squeeze out the buffer | ||
last_row_number = buffer[len(buffer)-1][0] | ||
# with last_row_number, we could transform negative row numbers to positive | ||
rows_to_skip_positive = [last_row_number + 1 + n for n in rows_to_skip] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit convoluted...
why not something like:
n = len(buffer)
for i, row in enumerate(buffer):
if i-n not in rows_to_skip:
yield row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used 'last_row_number' from extended_row[0]
because 'extended_rows' were counted before any row deletions, so I avoid double deletions when two arguments are pointing on the same row (what you've been worried in the next comment)
assert stream.read() == [['id', 'name'], ['1', 'english']] | ||
|
||
|
||
def test_stream_skip_rows_no_double_skip(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since skip rows runs before skip negative numbers, it's possible that 'skip_rows' will remove the last line, and then 'skip_negative_numbers' wouldn't know which line is the last line...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right, it is happens :( fixing...
Update from @akariv : Pytest no longer supports Python 2.6 and 3.3 |
@AcckiyGerman Could you please also address in this PR this issue - #226 - just remove Pyhton 3.3 from tox/travis and other places. So we will be able to have a green build. |
tabulator/stream.py
Outdated
# Apply processors to iterator | ||
processors = [builtin_processor] + self.__post_parse | ||
processors = [builtin_processor, skip_negative_rows] + self.__post_parse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add this processor only if there is an actual need of skipping rows at the end of the file. So could you please use here a simple condition? It will save some CPU ticks. And less processors - easy to debug. In 99% of the cases there is no negative skip_rows
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
tabulator/stream.py
Outdated
# last row counter will be incremented in builtin_processor() | ||
# and used in skip_negative_rows() to count rows from the end | ||
last_row_number = 0 | ||
rows_to_skip_from_end = [n for n in self.__skip_rows_by_numbers if n < 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why create a list here if you're not using it?
tabulator/stream.py
Outdated
yield buffer.popleft() | ||
|
||
# Now squeeze out the buffer | ||
global last_row_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use the global? usually it's a sign of bad design.
my conclusion is that you should run the 'remove rows from end' bit before the original 'remove_rows' processor - and then this entire thing is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my stupid head :) that's so obvious )))
@AcckiyGerman |
Sorry for the delay. Can't do today. I'm looking forward for a tomorrow merge. |
That's no problem with time, just hope it will be done :) |
If it's not blocking for today then I prefer to do it on Monday because today is super busy day in OKI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Released as |
Added tests, code and updated readme.
Details
I added one more built-in processor for Stream (tabulator/stream.py line 448) , which has a buffer, so it could delete rows counting them from the end.