-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it easier to work with missing data #564
Comments
You should be able to do that today with |
Another method that might be useful for working with missing data is a getOrDefault method on Row. // Proposed
row.getLongOrDefault("col1", 1L); It could simply iterating over a table with missing data. // Current
for (Row row : table) {
long value1 = row.getLong("col1");
int value2 = row.getInt("col2");
value1 = LongColumn.valueIsMissing(value1) ? 1L : value1;
value2 = IntColumn.valueIsMissing(value2) ? 1 : value2;
// do something with value1 and value 2;
}
// Proposed
for (Row row : table) {
long value1 = row.getLongOrDefault("col1", 1L);
int value2 = row.getIntOrDefault("col2", 1);
// do something with value1 and value2
} |
I have mixed feelings on whether it's worth having a new method.
On the other hand, there's a reason why they made it easier in pandas. I feel like it's customary in data analysis tools and libraries to provide very good support for missing values. I would not be averse to:
|
Using set is fine with me, but I would like to point out we do have a fillWith method on DoubleColumn (I am not sure I like this method. map is a good alternative). // This method was added recently.
@Override
public DoubleColumn fillWith(double d) {
for (int r = 0; r < size(); r++) {
set(r, d);
}
return this;
} |
@ryancerf You're right about fillWith. I forgot how it was implemented. I think the intent there was to create columns where every value is set in this method, and that it is intended for populating new columns. Although it uses set() instead of append, I think the intent is the same. I do think set() is more appropriate for a method that is being selective about what it's updating, where fill suggests more of a bulk/batch process. |
plus test, plus fix for a bug in a DoubleColumn.create()
* Fix for Make it easier to work with missing data #564 plus test, plus fix for a bug in a DoubleColumn.create() * Made create method safe for null values in input data * made append methods that take Objects handle null by adding the missing value indicator * Updated DoubleColumn and IntColumn to take advantage of improved append method
closing. @ryancerf please lmk if you have issues with the resolution. |
Useful for working with missing data. Similar to pandas fillna, but for a single column at a time.
Usage:
I am not sure what the easiest way to do this is right now.
Will send a PR if we think this is worthwhile.
The text was updated successfully, but these errors were encountered: