Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

FR: Select single fields from substring search or split results #747

Closed
creachadair opened this issue Apr 9, 2019 · 5 comments · Fixed by #748
Closed

FR: Select single fields from substring search or split results #747

creachadair opened this issue Apr 9, 2019 · 5 comments · Fixed by #748
Assignees
Labels

Comments

@creachadair
Copy link

I would like to be able to split a string and select one of the fields. The SPLIT function returns a JSON array, so I can't use that directly. There is a SUBSTRING function, but there does not appear to be any way to find the offset of a partition (what MySQL calls LOCATE or POSITION).

One way to address this would be to give SPLIT an optional third parameter, so that

mysql> SELECT SPLIT('a,b,c', ',');
'["a","b","c"]'
mysql> SELECT SPLIT('a,b,c', ',', 2);
'b'

Another would be to add a LOCATE (or FIND or POSITION or INDEX) function to find the offset of a subfield, e.g.,

mysql> SELECT LOCATE('a,b,c', ',');
2
mysql> SELECT SUBSTRING('a,b,c', 0, LOCATE('a,b,c', ',') - 1);
'a'

The first is probably better, since LOCATE would probably want a "skip" parameter anyway to specify which of the matches to stop on. But the second would also work.

@smola
Copy link
Collaborator

smola commented Apr 10, 2019

We could implement SUBSTRING_INDEX, which is supported both in MySQL and Spark SQL.

It would be:

mysql > SELECT SUBSTRING_INDEX('a,b,c', ',', 1);
'a'

For second part, it's a quite verbose but still possible:

mysql > SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('a,b,c', ',', 2), ',', -1);
'b'

@ajnavarro
Copy link
Contributor

ajnavarro commented Apr 10, 2019

@creachadair Right now you can use JSON_EXTRACT to get the value of a specific position:

mysql> select JSON_EXTRACT(split("a,b,c",","), "$[0]");
+-------------------------------------------+
| JSON_EXTRACT(split("a,b,c", ","), "$[0]") |
+-------------------------------------------+
| "a"                                       |
+-------------------------------------------+
1 row in set (0.01 sec)

@smola having this implemented is it worth it implement SUBSTRING_INDEX ?

@smola
Copy link
Collaborator

smola commented Apr 10, 2019

@ajnavarro Since we have JSON_EXTRACT, I see no urgency to implement SUBSTRING_INDEX, but we should still consider it when it comes to align a subset of functions between gitbase and Spark SQL. That is, those functions we want to be able to push down from Spark SQL.

@creachadair
Copy link
Author

I have run into another case where JSON_EXTRACT can be a little limiting. Suppose I would like to identify branch points, that is to say, commits that have multiple children. This is not that interesting in a single repo, but is a useful basis for finding places where forks happened and diverged across multiple repos.

The obvious way to do this is to do an autojoin on commits, but commit_parents is multi-valued. You can filter by length with ARRAY_LENGTH, but JSON_EXTRACT projects the literal string value including its quotes, so you can't join on it without doing extra string surgery.

I could write a program, but it seems unfortunate that we don't have a good way to get properly-typed values out of these denormalized fields. Arguably this is better solved with a schema change, but that's probably a lot more work (and maybe not worth it—this problem is mostly a demonstration case for me, not a blocking work item).

Anyway, just another datum on that while I'm working with it.

@ajnavarro
Copy link
Contributor

maybe implementing JSON inline selectors: https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html#operator_json-inline-path or just JSON_UNQUOTE at the beginning we can fix that problem and many others.

@kuba-- kuba-- self-assigned this Jun 6, 2019
@kuba-- kuba-- transferred this issue from src-d/gitbase Jun 6, 2019
@kuba-- kuba-- added the feature label Jun 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants