-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a plot function for gains/lift in R and Python #7271
Comments
Erin LeDell commented: I think there’s a standard blue to use in our R plots, but I just guessed a shade of blue to use. |
Erin LeDell commented: We can also consider whether we want to allow the user to plot a single plot at the same time, maybe by passing an arg, and have the standard show both at once? Check out this ticket for more info and discussion: [https://github.com//pull/5845|https://github.com//pull/5845|smart-link] |
JIRA Issue Details Jira Issue: PUBDEV-8388 |
Attachments From Jira Attachment Name: Screen Shot 2021-10-20 at 5.49.08 PM.png Attachment Name: Screen Shot 2021-11-15 at 3.31.27 PM.png Attachment Name: Screen Shot 2021-11-15 at 3.37.06 PM.png |
Linked PRs from JIRA |
We don't have this in R and Python, however we do have the plots in Flow.
Here's how you'd have to do it in R:
{code:r}plot(gain_table$cumulative_data_fraction,
gain_table$cumulative_capture_rate,'l',
ylim = c(0,1.5), col = "dodgerblue3",
xlab = "cumulative data fraction",
ylab = "cumulative capture rate, cumulative lift",
main = "Gains/Lift")
lines(gain_table$cumulative_data_fraction,
gain_table$cumulative_lift, col = "orange"){code}
Python code:
{code:python}from h2o.estimators import H2OGradientBoostingEstimator
from h2o.utils.ext_dependencies import get_matplotlib_pyplot
from matplotlib.collections import PolyCollection
Import the airlines dataset:
airlines = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/airlines_train.csv")
Build and train the model:
model = H2OGradientBoostingEstimator(ntrees=1, gainslift_bins=20)
model.train(x=["Origin","Distance"], y="IsDepDelayed", training_frame=airlines)
gl = model.gains_lift()
X = gl['cumulative_data_fraction']
Y = gl['cumulative_capture_rate']
YC = gl['cumulative_lift']
plt = get_matplotlib_pyplot(server=False, raise_if_not_available=True)
plt.figure(figsize=(10,10))
plt.grid(True)
plt.plot(X, Y, zorder=10, label='cumulative capture rate')
plt.plot(X, YC, zorder=10, label='cumulative lift')
plt.legend(loc=4, fancybox=True, framealpha=0.5)
plt.xlim(0, 1)
plt.ylim(0, 1.5)
plt.xlabel('cumulative data fraction')
plt.ylabel('cumulative capture rate, cumulative lift')
plt.title('Gains/Lift')
fig = plt.gcf()
plt.show(){code}
Functions should be something like:
R:
{code:r}h2o.plot_gainslift(model, xval = TRUE){code}
Python:
{code:r}model.plot_gainslift(xval=True){code}
!Screen Shot 2021-10-20 at 5.49.08 PM.png|width=541,height=524!
The text was updated successfully, but these errors were encountered: