Skip to content

Commit

Permalink
Merge pull request #1005 from mabel-dev/#1002
Browse files Browse the repository at this point in the history
  • Loading branch information
joocer committed Apr 24, 2023
2 parents e56bc57 + 50f07a0 commit acf1fd6
Show file tree
Hide file tree
Showing 13 changed files with 93 additions and 13 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<h3 align="center">


Opteryx is a SQL Engine designed for embedded and cloud-native environments, and with command-line skills.
Opteryx is an in-process SQL query engine for analysis of distributed datasets.

[Documentation](https://opteryx.dev/latest) |
[Examples](#examples) |
Expand All @@ -26,15 +26,15 @@ Opteryx is a SQL Engine designed for embedded and cloud-native environments, and

## What is Opteryx?

Opteryx is a powerful Python library designed for data wrangling and analytics. With Opteryx, users can seamlessly interact with various data platforms, unlocking the full potential of their data.
Opteryx is a Python library designed for data wrangling and analytics. With Opteryx, users can seamlessly interact with various data platforms, unlocking the full potential of their data.

Opteryx offers the following features:

- SQL queries on data files generated by other processes, such as logs.
- A command-line tool for filtering, transforming, and combining files in a flexible and intuitive manner.
- Embeddable as a low-cost engine, allowing for hundreds of analysts to leverage ad hoc databases with ease.
- Integration with familiar tools like pandas and Polars.
- Unified access to data on disk, in the Cloud and in on-prem databases, not only through the same interface, but in the same query.
- Unified and federated access to data on disk, in the Cloud and in on-prem databases, not only through the same interface, but in the same query.

## Why Use Opteryx?

Expand Down Expand Up @@ -68,7 +68,7 @@ Opteryx is Open Source Python, it quickly and easily integrates into Python code

### __Time Travel__

Designed for data analytics in environments where decisions need to be replayable, Opteryx allows you to query data as at a point in time in the past to replay decision algorithms against facts as they were known in the past. _(data must be structured to enable temporal queries)_
Designed for data analytics in environments where decisions need to be replayable, Opteryx allows you to query data as at a point in time in the past to replay decision algorithms against facts as they were known in the past. You can even self-join tables historic data, great for finding deltas in datasets over time. _(data must be structured to enable temporal queries)_

### __Fast__

Expand Down
7 changes: 4 additions & 3 deletions opteryx/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,15 +56,16 @@ def main(
print(f"Opteryx version {opteryx.__version__}")
print(" Enter '.help' for usage hints")
print(" Enter '.exit' to exit this program")
print()

# Start the REPL loop
while True: # pragma: no cover
# Prompt the user for a SQL statement
print()
statement = input('opteryx> ')

# If the user entered "quit", exit the loop
if statement == '.exit':
# If the user entered "exit", exit the loop
# forgive them for 'quit'
if statement in {'.exit', '.quit'}:
break
if statement == ".help":
print(" .exit Exit this program")
Expand Down
15 changes: 15 additions & 0 deletions opteryx/command.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env python

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from opteryx.__main__ import main
17 changes: 15 additions & 2 deletions opteryx/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import typing
from uuid import uuid4

import pyarrow
from orso import DataFrame
from orso import converters

Expand Down Expand Up @@ -109,7 +110,7 @@ def id(self):
"""The unique internal reference for this query"""
return self._qid

def execute(self, operation, params=None):
def _inner_execute(self, operation, params=None):
if not operation:
raise MissingSqlStatement("SQL statement not found")

Expand Down Expand Up @@ -145,9 +146,21 @@ def execute(self, operation, params=None):
results = self._query_planner.execute(self._plan)

if results is not None:
self._rows, self._schema = converters.from_arrow(utils.arrow.rename_columns(results))
return utils.arrow.rename_columns(results)

def execute(self, operation, params=None):
results = self._inner_execute(operation, params)
if results is not None:
self._rows, self._schema = converters.from_arrow(results)
self._cursor = iter(self._rows)

def execute_to_arrow(self, operation, params=None, limit=None):
results = self._inner_execute(operation, params)
if results is not None:
if limit is not None:
return utils.arrow.limit_records(results, limit)
return pyarrow.concat_tables(results, promote=True)

@property
def stats(self):
"""execution statistics"""
Expand Down
2 changes: 1 addition & 1 deletion opteryx/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def hasher(vals):
This is roughly 2x faster than the previous implementation for lists of strings.
Do note though, if you're micro-optimizing, this is faster to create but is
slower for some Python functions to handle, like 'sorted'.
slower for some Python functions to handle the result of, like 'sorted'.
"""
if numpy.issubdtype(vals.dtype, numpy.character):
return numpy.array([CityHash64(s.encode()) for s in vals], numpy.uint64)
Expand Down
2 changes: 1 addition & 1 deletion opteryx/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@
"""

# __version__ = "0.4.0-alpha.6"
__version__ = "0.10.0-alpha.5"
__version__ = "0.10.0-alpha.6"
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
hadrodb
numpy
orjson
orso>=0.0.57
orso>=0.0.61
pyarrow>=11.0.0
typer

Expand Down
Binary file added testdata/duckdb/planets.duckdb
Binary file not shown.
3 changes: 2 additions & 1 deletion tests/misc/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

sys.path.insert(1, os.path.join(sys.path[0], "../.."))

from opteryx.__main__ import main
from opteryx.command import main


def test_basic_cli():
Expand All @@ -16,6 +16,7 @@ def test_basic_cli():
main(sql="SELECT * FROM $planets;", o="temp.csv")
main(sql="SELECT * FROM $planets;", o="temp.jsonl")
main(sql="SELECT * FROM $planets;", o="temp.parquet")
main(sql="SELECT * FROM $planets;", o="temp.md")


if __name__ == "__main__": # pragma: no cover
Expand Down
14 changes: 14 additions & 0 deletions tests/misc/test_connection_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,22 @@ def test_as_arrow_with_limit():
assert len(table.column_names) == 20


def test_direct_as_arrow_no_limit():
import opteryx

conn = opteryx.connect()
cur = conn.cursor()
table = cur.execute_to_arrow("SELECT * FROM $planets")

assert "name" in table.column_names
assert table.num_rows == 9
assert len(table.column_names) == 20
assert cur.stats["rows_read"] == 9, cur.stats


if __name__ == "__main__": # pragma: no cover
test_as_arrow_no_limit()
test_as_arrow_with_limit()
test_direct_as_arrow_no_limit()

print("✅ okay")
2 changes: 2 additions & 0 deletions tests/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,7 @@ sqlalchemy
pymysql
psycopg2-binary
polars
duckdb
duckdb-engine

setuptools_rust
2 changes: 2 additions & 0 deletions tests/requirements_arm.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,7 @@ firebase-admin
sqlalchemy
pymysql
psycopg2-binary
duckdb
duckdb-engine

setuptools_rust
32 changes: 32 additions & 0 deletions tests/storage/test_sql_duckdb.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""
Test we can read from DuckDB - this is a basic exercise of the SQL Connector
"""
import os
import sys

sys.path.insert(1, os.path.join(sys.path[0], "../.."))

import opteryx

from opteryx.connectors import SqlConnector


def test_duckdb_storage():
opteryx.register_store(
"duckdb",
SqlConnector,
remove_prefix=True,
connection="duckdb:///testdata/duckdb/planets.duckdb",
)

results = opteryx.query("SELECT * FROM duckdb.planets")
assert results.rowcount == 9, results.rowcount

# PROCESS THE DATA IN SOME WAY
results = opteryx.query("SELECT COUNT(*) FROM duckdb.planets;")
assert results.rowcount == 1, results.rowcount


if __name__ == "__main__": # pragma: no cover
test_duckdb_storage()
print("✅ okay")

0 comments on commit acf1fd6

Please sign in to comment.