-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ndarray UDF Support (ArrayCount, Unnest, etc.) #152
Conversation
Use the right sql uri
2. remove more from response 3. add license info to interpreter.py
Added support for uploading a video from the client into a custom location in the server. The syntax is as follows: ``` UPLOAD INFILE 'data/ua_detrac/ua_detrac.mp4' PATH 'test_video.mp4'; LOAD DATA INFILE 'test_video.mp4' INTO MyVideo; SELECT id, data FROM MyVideo WHERE id < 5; ``` The prefix `/tmp` is added to the location before storing the video on the server. Appropriate changes have been made to the parser, planner, and executor. The UPLOAD statement is rewritten by the client as: `UPLOAD PATH 'test_video.mp4' BLOB b'AAAA......';` The server _only_ accepts upload command of the above syntax. The client is responsible for rewriting the command provided by the end-user into a syntax understood by the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped comments
class Dimension(Enum): | ||
ANY_DIM = -1 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra spaces
src/udfs/ndarray_udfs/__init__.py
Outdated
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""user defined functions operating on ndarrays""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline missing
""" | ||
|
||
def __call__(self, *args, **kwargs): | ||
return self.exec(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline missing
src/udfs/ndarray_udfs/array_count.py
Outdated
count_result = values.apply(lambda x: self.count_in_row(x[0], search_element), axis=1) | ||
|
||
return pd.DataFrame({'count': count_result.values}) | ||
# return pd.DataFrame({'id': count_result.index, 'count': count_result.values}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comments
src/udfs/ndarray_udfs/unnest.py
Outdated
res = self.xplode(inp, explode) | ||
res = res.set_index(dummy_idx) | ||
res.index.name = None | ||
return res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline
src/udfs/udf_bootstrap_queries.py
Outdated
scores NDARRAY FLOAT32(10)) | ||
TYPE Classification | ||
IMPL 'src/udfs/fastrcnn_object_detector.py'; | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline
Duplicate #163 |
Basic tutorial for using EVA - [x] Need to support Unnest to enable useful queries. georgia-tech-db#152 - [x] Disable concurrent queries for cursor correctness. georgia-tech-db#160 @xzdandy - [x] Upload fails because of missing permissions. georgia-tech-db#162 @pchunduri6 - [x] Better response messages from the server. georgia-tech-db#158
This PR includes the addition of Ndarray UDF support to EVA
In order to add a Ndarray UDF:
CREATE
UDF query toudf_bootstrap_queries
. This query should have the input specifications, output specifications, type (Ndarray/Classification/etc) and the file location of where the udf is implementedudfs/ndarray_udfs/
. Each UDF implementation here should implement theabstract_ndarray_udfs.py
class which requires that theexec
call of the UDF have the input of a Pandas DataFrame (Ideally only 1-2 columns) and returns one single DataFrame with one column (no need to return anID
column in most cases)This PR adds the
Array_Count
andUnnest
Ndarray UDFs.Some other changes include:
udfs/udf_bootstrap_queries
. This is to make it easier to organize all the UDF strings in one file rather than recreating them each time.Catalog
for adding any time of dimension type for NdarraaysStill In Progress:
https://georgia-tech-db.atlassian.net/wiki/spaces/EVA/overview#Detailed-Roadmap