New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
display evset as table in notebook #209
Conversation
e51cabb
to
1c7e5ae
Compare
1c7e5ae
to
a83cba4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool :)
max_printed_features = int(os.environ.get("TEMPORIAN_MAX_PRINTED_FEATURES", 10)) | ||
max_printed_events = int(os.environ.get("TEMPORIAN_MAX_PRINTED_EVENTS", 20)) | ||
|
||
# Limits for html display of evsets (notebooks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of those values might be too large. What about:
TEMPORIAN_MAX_DISPLAY_INDEXES=5
TEMPORIAN_MAX_DISPLAY_EVENTS=20
TEMPORIAN_MAX_DISPLAY_CHARS=32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, changing.
all_index_keys = self.get_index_keys(sort=True) | ||
repr = "" | ||
for index_key in all_index_keys[: config.max_display_indexes]: | ||
repr += "<h3>Index: (" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would interesting to find a way to escape html characters, syntax check, and more generally, protect from injections.
There seems to be several python library for that. Alternatively, maybe we can use existing python package (e.g. xml.dom.minidom ) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'll take a look into it.
temporian/utils/config.py
Outdated
# Features (columns) per table | ||
max_display_features = int(os.environ.get("TEMPORIAN_MAX_DISPLAY_FEATURES", 20)) | ||
# Events (rows) per table | ||
max_display_events = int(os.environ.get("TEMPORIAN_MAX_DISPLAY_EVENTS", 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some users might expect that the value None (or some other value) will disable this limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can implement that and print a not-recommended warning, since it's a risky move to hang the notebook.
Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :)
@@ -587,3 +585,76 @@ def creator(self) -> Optional[Operator]: | |||
created EventSets have a `None` creator. | |||
""" | |||
return self.node()._creator | |||
|
|||
def _repr_html_(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this function are growing significantly, especially when we add interactive elements. Could it be moved in a separate file ?
|
||
def _repr_html_(self) -> str: | ||
"""HTML representation, mainly for IPython notebooks.""" | ||
features = self.schema.features[: config.max_display_features] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is there is less than "max_display_features" ? (same for the other indexes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not a problem in python, it will stop at the list length.
index_data = self.data[index_key] | ||
n_events = len(index_data.timestamps) | ||
repr += f"{n_events} events × {n_features} features" | ||
repr += "<table><tr><th><b>Timestamp</b></th>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lowercase?
@achoum this is ready to re-review. Changes since your last review:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome.
Left a few comments.
|
||
def display_html(evset: EventSet) -> str: | ||
"""HTML representation, mainly for IPython notebooks.""" | ||
from xml.dom import minidom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this at the top of the file (unless there is reason no to).
visible_features = evset.schema.features[:max_features] | ||
index_schemas = evset.schema.indexes | ||
all_index_keys = evset.get_index_keys(sort=True) | ||
total_indexes = len(all_index_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_indexes ?
all_index_keys = evset.get_index_keys(sort=True) | ||
total_indexes = len(all_index_keys) | ||
total_features = len(evset.schema.features) | ||
hiding_feats = max_features is not None and total_features > max_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_hidden_feats or not show_all_feats ?
# Create one table and header per index value | ||
for index_key in all_index_keys[:max_indexes]: | ||
# Index header text (n events x n features) | ||
index_text = ", ".join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "," useful? (need to check the print to see if this helps)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'd leave it because it's like a tuple: (idx_1=val_1, idx_2=val_2)
] | ||
) | ||
index_data = evset.data[index_key] | ||
index_n_events = len(index_data.timestamps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_timestamps ?
|
||
# Table with column names | ||
table = dom.createElement("table") | ||
col_names = ["timestamp"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Merge in a single line
col_names = ["timestamp"] | ||
col_names += [feature.name for feature in visible_features] | ||
if hiding_feats: | ||
col_names += ["..."] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using a h-ellipsis ? (https://www.toptal.com/designers/htmlarrows/punctuation/horizontal-ellipsis/)
table.appendChild(create_table_row(col_names, header=True)) | ||
|
||
# Rows with events | ||
for i, timestamp in enumerate(index_data.timestamps[:max_events]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/i/timestamp_idx
row = [] | ||
|
||
# Timestamp column | ||
timestamp = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redefining timestamp with another type make reading the code harder.
What about "raw_timestamp" or "timestamp_repr" (or something like that)?
self.assertEqual( | ||
self.evset._repr_html_(), | ||
"<div><h3>Index: (x=1, y=hello)</h3>" | ||
+ "3 events × 2 features" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the number of features is the same for all the indexes, maybe we don't need to display it each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODOs
pd.set_option('display.max_columns', ...)
)config.py
, can be changed through env vars or at runtime:config.max_display_events = 3
(we should define an object and check types in a following PR).