Contain operator and generic ndarray syntax support in EVA #146

xzdandy · 2021-03-01T06:09:46Z

Add generic Ndarray syntax support, which is NDARRAY [array_type](dimensions)
Examples:

CREATE UDF DummyObjectDetector
INPUT  (Frame_Array NDARRAY UINT8(3, 256, 256))
OUTPUT (label NDARRAY STR(10))
TYPE  Classification
IMPL  'test/util.py

array_type can be one of the following: [INT8, UINT8, INT16, INT32, INT64, UNICODE, BOOL, FLOAT32, FLOAT64, DECIMAL, STR, DATETIME]
Notice: array_type is enforced when writing with petastorm, so data needs to be at least convertible to the defined array_type.
2. Add contain comparison operator @>,<@,
Examples:

SELECT id,DummyObjectDetector(data) FROM MyVideo
WHERE DummyObjectDetector(data).label <@ ['person', 'bicycle']
ORDER BY id;

Check issue #144 or test/integration_tests/test_udf_executor.py for more example usages

TODO:

fix broken unittest testcases.

…aggregate batches.

gaurav274

Review in progress

src/expression/comparison_expression.py

test/parser/test_parser.py

src/catalog/schema_utils.py

gaurav274 · 2021-03-11T00:24:11Z

src/expression/comparison_expression.py

-                left_values.values != right_values.values))
+            return Batch(pd.DataFrame(lvalues != rvalues))
+        elif self.etype == ExpressionType.COMPARE_CONTAINS:
+            res = [[all(x in p for x in q)


Can we change this to work without for loops? Typically, it is slow if we use python loops. We should be able to perform this using numpy/pandas internal operations.

Yeah, I was also trying to look for an existing sub array check in numpy. However I do not find it.
What we can do is using set instead of array. But as far as my knowledge, I do not think we can avoid the for loop to check cell by cell.

gaurav274 · 2021-03-11T00:24:24Z

src/expression/comparison_expression.py

+                   for left, right in zip(lvalues, rvalues)]
+            return Batch(pd.DataFrame(res))
+        elif self.etype == ExpressionType.COMPARE_IS_CONTAINED:
+            res = [[all(x in q for x in p)


Same as above.

gaurav274 · 2021-03-11T02:41:07Z

src/catalog/models/df_column.py

@@ -30,6 +30,7 @@ class DataFrameColumn(BaseModel):
    _name = Column('name', String(100))
    _type = Column('type', Enum(ColumnType), default=Enum)
    _is_nullable = Column('is_nullable', Boolean, default=False)
+    _array_type = Column('array_type', Enum(NdArrayType), nullable=True)


We also need to add the array_type parameter to the udfIO.

Added in 0853244.

gaurav274 · 2021-03-11T23:28:20Z

src/catalog/catalog_manager.py

@@ -234,23 +234,6 @@ def get_dataset_metadata(self, database_name: str, dataset_name: str) -> \
        metadata.schema = df_columns
        return metadata

-    def udf_io(


I think we should keep the catalog manager as a single source of access to the catalog models/services. It will help keep things modular.

Add back in 50812db

…pr148

gaurav274

LGTM

Contain operator and generic ndarray syntax support in EVA

karan-sarkar and others added 24 commits February 11, 2021 12:28

Added Sample Frequency Parameter to Select Statement.

ca43ff3

Added Sample Keyword to G4 Grammar Files.

a11abb7

Added Sampling Functionality to parser_visitor.py

b4ded9b

Added Sampling Query Test.

444edc6

Added LogicalSample.

782a7d9

Add Sampling Functionality to Statement Converter.

d8ff406

Added sampling to planners and generators.

5d7eec6

Added Sampling to Execution

94753cb

Added Integration Test for Sampling Operator.

42e513a

Added more robust tests for sample executor and modified executor to …

46bd790

…aggregate batches.

every thing missing array support

443668a

enable both test

e63f20e

array support added in parser

8372644

insert support + array primitive datatype

600207e

added insert test-cases

084099d

fix some errors. Array of string not working.

2ed5868

Merge branch 'contain' of github.com:georgia-tech-db/eva into contain

a864127

remove print

9a5a1dd

remove double fix

36055fc

add array_type

c01f585

generic array support and type casting

f0c2643

fix all testcases

6f34547

ndarray type test

32acacb

add testcases

1a0a136

xzdandy marked this pull request as ready for review March 1, 2021 22:58

xzdandy requested a review from gaurav274 March 1, 2021 22:58

gaurav274 requested changes Mar 5, 2021

View reviewed changes

src/expression/comparison_expression.py Show resolved Hide resolved

test/parser/test_parser.py Show resolved Hide resolved

gaurav274 reviewed Mar 5, 2021

View reviewed changes

src/catalog/schema_utils.py Show resolved Hide resolved

xzdandy added 2 commits March 5, 2021 10:24

Merge branch 'master' into contain

05c7c43

fix merge errors

2c584c8

gaurav274 reviewed Mar 11, 2021

View reviewed changes

array type support for udf_io

0853244

gaurav274 reviewed Mar 11, 2021

View reviewed changes

xzdandy and others added 9 commits March 11, 2021 20:13

revert catalog udf_io

50812db

Table Level Sampling.

84bf88e

merged master

55fead2

sample attached with table sources

2ecd8b0

nested queries parser implementation modified

e4e9bd5

sample operator fixed

a8e0121

Merge branch 'master' of https://github.com/georgia-tech-db/eva into …

935ac6b

…pr148

use constant expression instead of new member variables

7c6551d

fixed load test case

864d290

gaurav274 approved these changes Mar 26, 2021

View reviewed changes

gaurav274 added 2 commits March 26, 2021 00:33

fixed old test case

10adee7

Merge branch 'pr148' into contain

d62e7f8

gaurav274 merged commit e238bb3 into master Mar 26, 2021

gaurav274 deleted the contain branch March 26, 2021 05:53

xzdandy pushed a commit to gaurav274/Eva that referenced this pull request Mar 19, 2022

Merge pull request georgia-tech-db#146 from georgia-tech-db/contain

68d5e38

Contain operator and generic ndarray syntax support in EVA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contain operator and generic ndarray syntax support in EVA #146

Contain operator and generic ndarray syntax support in EVA #146

xzdandy commented Mar 1, 2021 •

edited

Loading

gaurav274 left a comment

gaurav274 Mar 11, 2021

xzdandy Mar 11, 2021

gaurav274 Mar 11, 2021

gaurav274 Mar 11, 2021

xzdandy Mar 11, 2021 •

edited

Loading

gaurav274 Mar 11, 2021

xzdandy Mar 12, 2021

gaurav274 left a comment

Contain operator and generic ndarray syntax support in EVA #146

Contain operator and generic ndarray syntax support in EVA #146

Conversation

xzdandy commented Mar 1, 2021 • edited Loading

gaurav274 left a comment

Choose a reason for hiding this comment

gaurav274 Mar 11, 2021

Choose a reason for hiding this comment

xzdandy Mar 11, 2021

Choose a reason for hiding this comment

gaurav274 Mar 11, 2021

Choose a reason for hiding this comment

gaurav274 Mar 11, 2021

Choose a reason for hiding this comment

xzdandy Mar 11, 2021 • edited Loading

Choose a reason for hiding this comment

gaurav274 Mar 11, 2021

Choose a reason for hiding this comment

xzdandy Mar 12, 2021

Choose a reason for hiding this comment

gaurav274 left a comment

Choose a reason for hiding this comment

xzdandy commented Mar 1, 2021 •

edited

Loading

xzdandy Mar 11, 2021 •

edited

Loading