Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ndarray UDF Support (ArrayCount, Unnest, etc.) #152

Closed
wants to merge 47 commits into from

Conversation

Abdullahshah
Copy link
Member

@Abdullahshah Abdullahshah commented Apr 29, 2021

This PR includes the addition of Ndarray UDF support to EVA

In order to add a Ndarray UDF:

  1. Add the CREATE UDF query to udf_bootstrap_queries. This query should have the input specifications, output specifications, type (Ndarray/Classification/etc) and the file location of where the udf is implemented
  2. Add implementation to udfs/ndarray_udfs/. Each UDF implementation here should implement the abstract_ndarray_udfs.py class which requires that the exec call of the UDF have the input of a Pandas DataFrame (Ideally only 1-2 columns) and returns one single DataFrame with one column (no need to return an ID column in most cases)
  3. Add test cases

This PR adds the Array_Count and Unnest Ndarray UDFs.

Some other changes include:

  • For batches, it no longer sorts on the frames "creation", but rather when the user is doing equality testing.
  • Moved all UDF strings to a single bootstrap file called udfs/udf_bootstrap_queries. This is to make it easier to organize all the UDF strings in one file rather than recreating them each time.
  • Some minor changes to Catalog for adding any time of dimension type for Ndarraays
  • other various minor changes

Still In Progress:

  • Add guides to eva documentation
  • some parser stuff

https://georgia-tech-db.atlassian.net/wiki/spaces/EVA/overview#Detailed-Roadmap

@Abdullahshah Abdullahshah changed the title NDArray UDF Support (ArrayCount, Unnest, etc.) Ndarray UDF Support (ArrayCount, Unnest, etc.) Apr 29, 2021
@Abdullahshah Abdullahshah marked this pull request as ready for review April 29, 2021 22:14
@Abdullahshah Abdullahshah requested review from xzdandy and gaurav274 and removed request for xzdandy April 29, 2021 22:14
pchunduri6 and others added 7 commits July 7, 2021 15:38
2. remove more from response
3. add license info to interpreter.py
Added support for uploading a video from the client into a custom location in the server.
The syntax is as follows:
```
UPLOAD INFILE 'data/ua_detrac/ua_detrac.mp4' PATH 'test_video.mp4';
LOAD DATA INFILE 'test_video.mp4' INTO MyVideo;
SELECT id, data FROM MyVideo WHERE id < 5;
```

The prefix `/tmp` is added to the location before storing the video on the server.
Appropriate changes have been made to the parser, planner, and executor.

The UPLOAD statement is rewritten by the client as:
`UPLOAD PATH 'test_video.mp4' BLOB b'AAAA......';`

The server _only_ accepts upload command of the above syntax. The client is responsible for rewriting the command provided by the end-user into a syntax understood by the server.
Copy link
Member

@gaurav274 gaurav274 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped comments

class Dimension(Enum):
ANY_DIM = -1


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra spaces

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""user defined functions operating on ndarrays"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline missing

"""

def __call__(self, *args, **kwargs):
return self.exec(*args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline missing

count_result = values.apply(lambda x: self.count_in_row(x[0], search_element), axis=1)

return pd.DataFrame({'count': count_result.values})
# return pd.DataFrame({'id': count_result.index, 'count': count_result.values})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comments

res = self.xplode(inp, explode)
res = res.set_index(dummy_idx)
res.index.name = None
return res
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline

scores NDARRAY FLOAT32(10))
TYPE Classification
IMPL 'src/udfs/fastrcnn_object_detector.py';
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline

@gaurav274 gaurav274 self-assigned this Jul 10, 2021
@gaurav274 gaurav274 mentioned this pull request Jul 11, 2021
4 tasks
@gaurav274
Copy link
Member

Duplicate #163

@gaurav274 gaurav274 closed this Jul 12, 2021
@gaurav274 gaurav274 deleted the ndarray_udf_functions branch July 24, 2021 23:23
@gaurav274 gaurav274 restored the ndarray_udf_functions branch July 24, 2021 23:23
@gaurav274 gaurav274 deleted the ndarray_udf_functions branch July 24, 2021 23:23
gaurav274 added a commit that referenced this pull request Aug 4, 2021
Basic tutorial for using EVA
- [x] Need to support Unnest to enable useful queries. #152 
- [x] Disable concurrent queries for cursor correctness. #160 @xzdandy 
- [x] Upload fails because of missing permissions. #162 @pchunduri6 
- [x] Better response messages from the server. #158
xzdandy pushed a commit to gaurav274/Eva that referenced this pull request Mar 19, 2022
Basic tutorial for using EVA
- [x] Need to support Unnest to enable useful queries. georgia-tech-db#152 
- [x] Disable concurrent queries for cursor correctness. georgia-tech-db#160 @xzdandy 
- [x] Upload fails because of missing permissions. georgia-tech-db#162 @pchunduri6 
- [x] Better response messages from the server. georgia-tech-db#158
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants