Skip to content

Commit

Permalink
fixes nans in dask df engine (#2020)
Browse files Browse the repository at this point in the history
Co-authored-by: Geoffrey Angus <geoffrey@predibase.com>
  • Loading branch information
geoffreyangus and geoffreyangus authored May 11, 2022
1 parent fabeee7 commit 2c3fe2e
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions ludwig/data/dataframe/dask.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ def df_like(self, df: dd.DataFrame, proc_cols: Dict[str, dd.Series]):
# we need to drop it immediately following creation.
dataset = df.index.to_frame(name=TMP_COLUMN).drop(columns=[TMP_COLUMN])
# TODO: address if following results in fragmented DataFrame
col_names, cols = zip(*proc_cols.items())
dataset = dd.concat([dataset] + list(cols), axis=1)
dataset.columns = col_names
for col_name, col in proc_cols.items():
col.name = col_name
dataset = dataset.join(col, how="inner") # inner join handles Series with dropped rows
return dataset

def parallelize(self, data):
Expand Down

0 comments on commit 2c3fe2e

Please sign in to comment.