pysparkling: adding a column to a data frame does not work when parse the original frame in spark #3904

exalate-issue-sync · 2023-05-22T14:29:34Z

#90702
Code to repro-
from Kuba - looks like the issue. frame is not re-evaluated after the column is added.

{code:java}

import csv file

spark_df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load('BostonHousing.csv')

create h2o context

from pysparkling import *
hc = H2OContext.getOrCreate(sc)

boston = hc.as_h2o_frame(spark_df)
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator

predictors = boston.columns[:-1]
response = "medv"
boston_glm2 = H2OGeneralizedLinearEstimator(nfolds=2,Lambda=.01)
boston_glm2.train(x = predictors, y = response,training_frame = boston)

pred = boston_glm2.predict(boston)

boston["predict"] = pred['predict']
sp_boston = hc.as_spark_frame(boston)

sp_boston
{code}

exalate-issue-sync · 2023-05-22T14:29:36Z

Jakub Hava commented: Just an update - we know the fix for it with [~accountid:557058:389d9607-5bd8-4611-8c6a-755fe9295223], it doesn't affect the h2o core, but just sparkling-water. We're just trying to see what are all the consequences, if it's all fine, we could put it into the fix release tomorrow I was mentioning on Slack. What do you think [~accountid:557058:7e008760-093e-4668-9387-9ca6f3fd2aa7]?

exalate-issue-sync · 2023-05-22T14:29:38Z

Jakub Hava commented: btw: it's not h2o issue, but sparkling-water issue - this jira should be closed and corresponding new one created on sw project

DinukaH2O · 2023-05-23T10:17:31Z

JIRA Issue Migration Info

Jira Issue: SW-430
Assignee: Michal Malohlava
Reporter: Nidhi Mehta
State: Resolved
Fix Version: 2.0.9
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#273

hasithjp · 2023-05-29T13:42:54Z

JIRA Issue Migration Info Cont'd

Jira Issue Created Date: 2017-05-10T15:20:44.101-0700

exalate-issue-sync bot added the CHC label May 22, 2023

DinukaH2O assigned mmalohlava May 23, 2023

DinukaH2O closed this as completed May 23, 2023

DinukaH2O added the fixVersion/2.0.9 label May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pysparkling: adding a column to a data frame does not work when parse the original frame in spark #3904

pysparkling: adding a column to a data frame does not work when parse the original frame in spark #3904

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

DinukaH2O commented May 23, 2023

hasithjp commented May 29, 2023

pysparkling: adding a column to a data frame does not work when parse the original frame in spark #3904

pysparkling: adding a column to a data frame does not work when parse the original frame in spark #3904

Comments

exalate-issue-sync bot commented May 22, 2023

import csv file

create h2o context

exalate-issue-sync bot commented May 22, 2023

exalate-issue-sync bot commented May 22, 2023

DinukaH2O commented May 23, 2023

hasithjp commented May 29, 2023