Skip to content

Commit

Permalink
Pandas rewrite Performance optimizations (#136)
Browse files Browse the repository at this point in the history
* basic scatterplot experiments

* experiment results with manually binned heatmaps

* experiment result

* incorporated heatmap code into executor and renderer

* additional experiments to evaluate scatter v.s. heatmap performance

* experiment based on real estate and airbnb data

* modified general sampling criteria, suppress SettingWithCopyWarning stemming from groupby .agg (#93)

* decrease sampling parameter

* change sampling strategy (above threshold keep 3/4 of data)

* remove experiment dir

* modified performance param

* enforce lux-widget minimum version

* update requirement.txt

* testing out modin (Recursion error)

* create modin executor, all else in sync with master changes

* rewrote .loc with column reference, speed up by 100x

* replace agg("count") with .count() --> ~0.1ms speedup

* run black
  • Loading branch information
dorisjlee committed Nov 11, 2020
1 parent 9897d0e commit af0043a
Showing 1 changed file with 11 additions and 21 deletions.
32 changes: 11 additions & 21 deletions lux/executor/PandasExecutor.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,15 +332,11 @@ def apply_filter(
def execute_2D_binning(vis: Vis):
pd.reset_option("mode.chained_assignment")
with pd.option_context("mode.chained_assignment", None):
x_attr = vis.get_attr_by_channel("x")[0]
y_attr = vis.get_attr_by_channel("y")[0]
x_attr = vis.get_attr_by_channel("x")[0].attribute
y_attr = vis.get_attr_by_channel("y")[0].attribute

vis._vis_data.loc[:, "xBin"] = pd.cut(
vis._vis_data[x_attr.attribute], bins=40
)
vis._vis_data.loc[:, "yBin"] = pd.cut(
vis._vis_data[y_attr.attribute], bins=40
)
vis._vis_data["xBin"] = pd.cut(vis._vis_data[x_attr], bins=40)
vis._vis_data["yBin"] = pd.cut(vis._vis_data[y_attr], bins=40)

color_attr = vis.get_attr_by_channel("color")
if len(color_attr) > 0:
Expand All @@ -361,23 +357,17 @@ def execute_2D_binning(vis: Vis):
).reset_index()
result = result.dropna()
else:
groups = vis._vis_data.groupby(["xBin", "yBin"])[x_attr.attribute]
result = groups.agg("count").reset_index(
name=x_attr.attribute
) # .agg in this line throws SettingWithCopyWarning
result = result.rename(columns={x_attr.attribute: "count"})
groups = vis._vis_data.groupby(["xBin", "yBin"])[x_attr]
result = groups.count().reset_index(name=x_attr)
result = result.rename(columns={x_attr: "count"})
result = result[result["count"] != 0]

# convert type to facilitate weighted correlation interestingess calculation
result.loc[:, "xBinStart"] = (
result["xBin"].apply(lambda x: x.left).astype("float")
)
result.loc[:, "xBinEnd"] = result["xBin"].apply(lambda x: x.right)
result["xBinStart"] = result["xBin"].apply(lambda x: x.left).astype("float")
result["xBinEnd"] = result["xBin"].apply(lambda x: x.right)

result.loc[:, "yBinStart"] = (
result["yBin"].apply(lambda x: x.left).astype("float")
)
result.loc[:, "yBinEnd"] = result["yBin"].apply(lambda x: x.right)
result["yBinStart"] = result["yBin"].apply(lambda x: x.left).astype("float")
result["yBinEnd"] = result["yBin"].apply(lambda x: x.right)

vis._vis_data = result.drop(columns=["xBin", "yBin"])

Expand Down

0 comments on commit af0043a

Please sign in to comment.