Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds logic to automatically trim empty columns #912

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions static_frame/core/store_xlsx.py
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,7 @@
store_filter: tp.Optional[StoreFilter] = STORE_FILTER_DEFAULT,
container_type: tp.Type[TFrameAny] = Frame,
) -> tp.Iterator[TFrameAny]:
from openpyxl.cell.read_only import EMPTY_CELL

config_map = StoreConfigMap.from_initializer(config)
wb = self._load_workbook(self._fp)
Expand Down Expand Up @@ -418,7 +419,18 @@
# says that some clients might not report correct dimensions
ws.calculate_dimension()

max_column = ws.max_column
# Possible for a sheet to report many columns with no data.
# A header cannot have trailing empty cells!
first_non_empty = 0
for i, row in enumerate(ws.rows, start=-skip_header):
if i < 0:
continue

Check warning on line 427 in static_frame/core/store_xlsx.py

View check run for this annotation

Codecov / codecov/patch

static_frame/core/store_xlsx.py#L427

Added line #L427 was not covered by tests

mask = np.array([cell is not EMPTY_CELL for cell in row], dtype=bool)
first_non_empty = int(mask[::-1].argmax())
break

max_column = ws.max_column - first_non_empty
max_row = ws.max_row

# adjust for downward shift for skipping header, then reduce for footer; at this value and beyond we stop
Expand All @@ -433,7 +445,7 @@
mask = np.full((last_row_count, max_column), False)

for row_count, row in enumerate(
ws.iter_rows(max_row=max_row), start=-skip_header):
ws.iter_rows(max_row=max_row, max_col=max_column), start=-skip_header):
if row_count < 0:
continue # due to skip header; preserves comparison to columns_depth
if row_count >= last_row_count:
Expand Down
Loading