Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas rewrite Performance optimizations #136

Merged
merged 24 commits into from
Nov 11, 2020
Merged
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
1db0846
basic scatterplot experiments
dorisjlee Sep 14, 2020
d26f378
Merge remote-tracking branch 'upstream/master'
dorisjlee Sep 22, 2020
4e8d9e7
experiment results with manually binned heatmaps
dorisjlee Sep 23, 2020
15adb77
experiment result
dorisjlee Sep 24, 2020
48633d4
incorporated heatmap code into executor and renderer
dorisjlee Sep 24, 2020
dcc8ece
additional experiments to evaluate scatter v.s. heatmap performance
dorisjlee Sep 24, 2020
b7e0f60
experiment based on real estate and airbnb data
dorisjlee Sep 25, 2020
de07933
modified general sampling criteria, suppress SettingWithCopyWarning s…
dorisjlee Sep 25, 2020
3d59c5f
decrease sampling parameter
dorisjlee Sep 25, 2020
8ed7010
change sampling strategy (above threshold keep 3/4 of data)
dorisjlee Sep 25, 2020
520885c
remove experiment dir
dorisjlee Sep 25, 2020
3de5bb7
modified performance param
dorisjlee Sep 25, 2020
7532b80
merge
dorisjlee Sep 25, 2020
60bbed8
Merge remote-tracking branch 'upstream/master'
dorisjlee Oct 16, 2020
137abf5
enforce lux-widget minimum version
dorisjlee Oct 16, 2020
33bc23a
update requirement.txt
dorisjlee Oct 16, 2020
9138a5a
merged upstream/master
dorisjlee Oct 25, 2020
3545b50
Merge remote-tracking branch 'upstream/master'
dorisjlee Oct 25, 2020
593e03e
testing out modin (Recursion error)
dorisjlee Oct 25, 2020
098e7f3
merged fixed conflict in PandasExecutor (revert back to master type d…
dorisjlee Nov 6, 2020
3ec1193
create modin executor, all else in sync with master changes
dorisjlee Nov 6, 2020
b6a7dd6
rewrote .loc with column reference, speed up by 100x
dorisjlee Nov 11, 2020
08d167e
replace agg("count") with .count() --> ~0.1ms speedup
dorisjlee Nov 11, 2020
665be02
run black
dorisjlee Nov 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 11 additions & 21 deletions lux/executor/PandasExecutor.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,15 +332,11 @@ def apply_filter(
def execute_2D_binning(vis: Vis):
pd.reset_option("mode.chained_assignment")
with pd.option_context("mode.chained_assignment", None):
x_attr = vis.get_attr_by_channel("x")[0]
y_attr = vis.get_attr_by_channel("y")[0]
x_attr = vis.get_attr_by_channel("x")[0].attribute
y_attr = vis.get_attr_by_channel("y")[0].attribute

vis._vis_data.loc[:, "xBin"] = pd.cut(
vis._vis_data[x_attr.attribute], bins=40
)
vis._vis_data.loc[:, "yBin"] = pd.cut(
vis._vis_data[y_attr.attribute], bins=40
)
vis._vis_data["xBin"] = pd.cut(vis._vis_data[x_attr], bins=40)
vis._vis_data["yBin"] = pd.cut(vis._vis_data[y_attr], bins=40)

color_attr = vis.get_attr_by_channel("color")
if len(color_attr) > 0:
Expand All @@ -361,23 +357,17 @@ def execute_2D_binning(vis: Vis):
).reset_index()
result = result.dropna()
else:
groups = vis._vis_data.groupby(["xBin", "yBin"])[x_attr.attribute]
result = groups.agg("count").reset_index(
name=x_attr.attribute
) # .agg in this line throws SettingWithCopyWarning
result = result.rename(columns={x_attr.attribute: "count"})
groups = vis._vis_data.groupby(["xBin", "yBin"])[x_attr]
result = groups.count().reset_index(name=x_attr)
result = result.rename(columns={x_attr: "count"})
result = result[result["count"] != 0]

# convert type to facilitate weighted correlation interestingess calculation
result.loc[:, "xBinStart"] = (
result["xBin"].apply(lambda x: x.left).astype("float")
)
result.loc[:, "xBinEnd"] = result["xBin"].apply(lambda x: x.right)
result["xBinStart"] = result["xBin"].apply(lambda x: x.left).astype("float")
result["xBinEnd"] = result["xBin"].apply(lambda x: x.right)

result.loc[:, "yBinStart"] = (
result["yBin"].apply(lambda x: x.left).astype("float")
)
result.loc[:, "yBinEnd"] = result["yBin"].apply(lambda x: x.right)
result["yBinStart"] = result["yBin"].apply(lambda x: x.left).astype("float")
result["yBinEnd"] = result["yBin"].apply(lambda x: x.right)

vis._vis_data = result.drop(columns=["xBin", "yBin"])

Expand Down