You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Python DB API2 executemany method could be sync or async. Modules are free to implement it using different ways. The example code leverages sqlite3 and does not have the issue since sqlite3 implement executemany in a sync way. However, in another widely used library PyHive, all connections implement executemany in a async way (lazy execution). When using PyHive connections in write_sql, the query would be queued but not be executed.
The expected behavior of write_sql should be similar to read_sql's implementation. read_sql leverages execute() and fetchall() API to ensure the execution of the query (code).
Versions / Dependencies
ray == 2.24.0
Reproduction script
import ray
from pyhive import presto
def create_connection():
return presto.connect('localhost')
def create_hive_table_using_presto_connection(conn):
cursor = conn.cursor()
sql = """
CREATE TABLE IF NOT EXISTS movie (
title varchar,
year bigint,
score double
)
WITH (FORMAT = 'parquet')
"""
cursor.execute(sql)
print(cursor.fetchall())
def write_data_into_hive_table_using_ray():
dataset = ray.data.from_items([
{"title": "Monty Python and the Holy Grail", "year": 1975, "score": 8.2},
{"title": "And Now for Something Completely Different", "year": 1971, "score": 7.5}
])
dataset.write_sql(
"INSERT INTO movie VALUES(%s, %d, %f)", create_connection
)
def read_data_from_hive_table_using_ray():
dataset = ray.data.read_sql(
"SELECT * FROM movie", create_connection
)
dataset.show()
if __name__ == '__main__':
ray.init()
create_hive_table_using_presto_connection(create_connection())
write_data_into_hive_table_using_ray()
read_data_from_hive_table_using_ray()
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered:
voe09
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jun 20, 2024
scottjlee
added
P1
Issue that should be fixed within a few weeks
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jun 26, 2024
What happened + What you expected to happen
Python DB API2 executemany method could be sync or async. Modules are free to implement it using different ways. The example code leverages sqlite3 and does not have the issue since sqlite3 implement executemany in a sync way. However, in another widely used library PyHive, all connections implement executemany in a async way (lazy execution). When using PyHive connections in write_sql, the query would be queued but not be executed.
The expected behavior of write_sql should be similar to read_sql's implementation. read_sql leverages execute() and fetchall() API to ensure the execution of the query (code).
Versions / Dependencies
ray == 2.24.0
Reproduction script
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: