Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display evset as table in notebook #209

Merged
merged 15 commits into from Aug 7, 2023
Merged

display evset as table in notebook #209

merged 15 commits into from Aug 7, 2023

Conversation

ianspektor
Copy link
Collaborator

@ianspektor ianspektor commented Jul 27, 2023

TODOs

  • set max number of features, max number of timestamps, and max number of indexes, max number of chars in a cell
  • allow configuring those ☝🏼 in some way (like pd.set_option('display.max_columns', ...))
    • (Braulio) Moved options to config.py, can be changed through env vars or at runtime: config.max_display_events = 3 (we should define an object and check types in a following PR).
  • test
image

@DonBraulio DonBraulio requested a review from achoum July 28, 2023 16:44
Copy link
Collaborator

@achoum achoum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool :)

max_printed_features = int(os.environ.get("TEMPORIAN_MAX_PRINTED_FEATURES", 10))
max_printed_events = int(os.environ.get("TEMPORIAN_MAX_PRINTED_EVENTS", 20))

# Limits for html display of evsets (notebooks)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of those values might be too large. What about:

TEMPORIAN_MAX_DISPLAY_INDEXES=5
TEMPORIAN_MAX_DISPLAY_EVENTS=20
TEMPORIAN_MAX_DISPLAY_CHARS=32

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, changing.

all_index_keys = self.get_index_keys(sort=True)
repr = ""
for index_key in all_index_keys[: config.max_display_indexes]:
repr += "<h3>Index: ("
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would interesting to find a way to escape html characters, syntax check, and more generally, protect from injections.
There seems to be several python library for that. Alternatively, maybe we can use existing python package (e.g. xml.dom.minidom ) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll take a look into it.

# Features (columns) per table
max_display_features = int(os.environ.get("TEMPORIAN_MAX_DISPLAY_FEATURES", 20))
# Events (rows) per table
max_display_events = int(os.environ.get("TEMPORIAN_MAX_DISPLAY_EVENTS", 100))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some users might expect that the value None (or some other value) will disable this limit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can implement that and print a not-recommended warning, since it's a risky move to hang the notebook.
Do you agree?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

@@ -587,3 +585,76 @@ def creator(self) -> Optional[Operator]:
created EventSets have a `None` creator.
"""
return self.node()._creator

def _repr_html_(self) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this function are growing significantly, especially when we add interactive elements. Could it be moved in a separate file ?


def _repr_html_(self) -> str:
"""HTML representation, mainly for IPython notebooks."""
features = self.schema.features[: config.max_display_features]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is there is less than "max_display_features" ? (same for the other indexes)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a problem in python, it will stop at the list length.

index_data = self.data[index_key]
n_events = len(index_data.timestamps)
repr += f"{n_events} events × {n_features} features"
repr += "<table><tr><th><b>Timestamp</b></th>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowercase?

@DonBraulio
Copy link
Collaborator

@achoum this is ready to re-review. Changes since your last review:

  • Moved display methods to separate file
  • Using xml.dom.minidom instead of raw html, checked that it prevents html injection
  • Setting limit=0 will disable limit (show all index or feats, events).

@DonBraulio DonBraulio requested a review from achoum August 3, 2023 23:26
Copy link
Collaborator

@achoum achoum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome.
Left a few comments.


def display_html(evset: EventSet) -> str:
"""HTML representation, mainly for IPython notebooks."""
from xml.dom import minidom
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this at the top of the file (unless there is reason no to).

visible_features = evset.schema.features[:max_features]
index_schemas = evset.schema.indexes
all_index_keys = evset.get_index_keys(sort=True)
total_indexes = len(all_index_keys)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_indexes ?

all_index_keys = evset.get_index_keys(sort=True)
total_indexes = len(all_index_keys)
total_features = len(evset.schema.features)
hiding_feats = max_features is not None and total_features > max_features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_hidden_feats or not show_all_feats ?

# Create one table and header per index value
for index_key in all_index_keys[:max_indexes]:
# Index header text (n events x n features)
index_text = ", ".join(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "," useful? (need to check the print to see if this helps)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'd leave it because it's like a tuple: (idx_1=val_1, idx_2=val_2)

]
)
index_data = evset.data[index_key]
index_n_events = len(index_data.timestamps)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_timestamps ?


# Table with column names
table = dom.createElement("table")
col_names = ["timestamp"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Merge in a single line

col_names = ["timestamp"]
col_names += [feature.name for feature in visible_features]
if hiding_feats:
col_names += ["..."]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table.appendChild(create_table_row(col_names, header=True))

# Rows with events
for i, timestamp in enumerate(index_data.timestamps[:max_events]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/i/timestamp_idx

row = []

# Timestamp column
timestamp = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redefining timestamp with another type make reading the code harder.
What about "raw_timestamp" or "timestamp_repr" (or something like that)?

self.assertEqual(
self.evset._repr_html_(),
"<div><h3>Index: (x=1, y=hello)</h3>"
+ "3 events × 2 features"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the number of features is the same for all the indexes, maybe we don't need to display it each time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Though changing this required adding text common to the whole eventset.
Took the chance and included number of indexes and memory usage.
Final result (setting small max_feats and indexes on purpose, only for this demo):
Screen Shot 2023-08-07 at 13 13 24

@DonBraulio DonBraulio merged commit 82adde4 into main Aug 7, 2023
15 checks passed
@DonBraulio DonBraulio deleted the evset-table-display branch August 7, 2023 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants