-
Notifications
You must be signed in to change notification settings - Fork 84
Add missing filter options #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We shouldn’t change the behavior of the existing I think if you want a query that matches NULL (null), the empty string, and NaN in JavaScript (and maybe also Invalid Date objects? I don’t recall how we represent those), it should probably be an |
Thanks, @mbostock - how would you feel about adding new case "m": {
source = source.filter(
(d) =>
d[column] == null || d[column] === "" || Number.isNaN(d[column])
);
break;
}
case "nm": {
source = source.filter(
(d) =>
d[column] != null && d[column] !== "" && !Number.isNaN(d[column])
); If we add those, it would surface two questions:
|
Perhaps a missing case "m":
appendSql(` IS NULL`, args);
appendSql(` = ""`, args); // not sure if we need to use appendOperand before this
break;
case "nm":
appendSql(` IS NOT NULL`, args);
appendSql(` !=""`, args); |
The SQL for the (I’m tempted to use |
That makes sense -- for the sake of conversation (see this notebook in progress) I've added the |
A brief explanation of Only check that the first value is a primitive type: Because we want to support columns that put unmatched values into a "missing" bin, we no longer want to check that the first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You appear to be removing some of the intentional “guard rails” for the data table cell here; this will allow the data table cell to be used on primitive arrays of mixed types (for example, a mix of Dates and objects, or a mix of booleans and numbers). Why do we need to do this? Is this type of mixed data something we expect to see in practice, and that we want to show in a data table cell? I’m not sure why we’re removing the consistency requirements that we previously implemented and that seem like a good idea (from my perspective).
Important questions! I wanted to write out my thoughts before chatting. Feel free to comment here / in the notebook, or just wait until our meeting. I'd say we're just moving the guard rails somewhere else 😅 . |
Corresponding monorepo PR that uses these filters. |
Here's a notebook with the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when you’re ready for me to take a look.
src/table.js
Outdated
case "boolean": | ||
return typeof value === colType; | ||
case "number": | ||
return typeof value === colType && !Number.isNaN(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify here, too, since at this point we already know that value
is a number and hence will not require coerce.
return typeof value === colType && !Number.isNaN(value); | |
return typeof value === colType && !isNaN(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm mistaken, but I thought that isNaN
does the coercion, so here is used Number.isNaN
to avoid unnecessarily doing coercion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s correct. You can use either here and get the same result. The only difference is that one is shorter to type—that’s all I was saying. You’re welcome to leave it as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, by "simplify", I thought you meant simplify the computation.
Number.isNaN
seems "simpler" in terms of the operation (doesn't do the additionally complex step of coercion)isNaN
is simpler in terms of number of characters 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally, this looks great!
Implementation-wise, I think there’s an easy improvement we can make. This is currently doing the switch (colType) for every value; it’d be faster to restructure the code to have separate functions for each type, e.g.,
function isValidNumber(value) {
return typeof value === "number" && !Number.isNaN(value);
}
function isValidDate(value) {
return value instanceof Date && !isNaN(value);
}
and then have a function for getting a validator of the specified column type like so:
function validator(colType) {
switch (colType) {
case "number": return isValidNumber;
case "date": return isValidDate;
// etc.
default: throw new Error(`unknown type: ${colType}`); // maybe? see comment below
}
}
And lastly you could then apply the validator like so:
const isValid = validator(colType);
source = source.filter(d => isValid(d[column]));
That way we only have to check colType
once.
And also we can throw an error if we see a colun type we don’t expect… though according to the DatabaseClient specification, I think we should probably treat an unknown value the same way as we treat "other"
rather than throwing an error.
Great point -- appreciate you took the time to write that out, will make that change soon and re-request a review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on the previous discussion in this PR!
perhaps we should also add some unit testing for getValidator
, either in this or a later PR?
Co-authored-by: Annie Zhang <annie@observablehq.com>
This is a small code change that warrants a detailed explanation. For context, the
__table
function creates JavaScript array filters that mimic SQL filter functions. This PR is concerned with the way we think about NULL values. The SQL code that gets constructed by the Table cell reads:The corresponding JavaScript code currently reads:
While this makes sense at face value, the concept of
null
means something different in JavaScript and SQL. In SQL, "A field with a NULL value is a field with no value." (source). In JavaScript,null
is a primitive value which may appear in the dataset.Currently, the way the
__table
function checks fornull
values doesn't align with the way the Observable Data Table cell checks forNULL/EMPTY
. When computing summaries for the Summary Charts, the Data Table cell uses the following check:Note, it checks if the value is
null
orundefined
in the same way the standard library currently does, and also includes two additional checks: empty string (value === ""
) and NaNNumber.isNaN(value)
. These checks importantly identify common JavaScript data patterns that are uncommon (or not possible?) in SQL.A remaining question is: should we make any additional changes to the SQL code to ensure comparable behaviors?
See this notebook for additional context and interactive examples.