result filter/map capability #104

juhlig · 2019-01-28T15:21:49Z

This PR adds the capability to pass a one-ary or two-ary function along with query and execute calls, to be used somewhat like lists:filtermap/2, as discussed in Issue #94.

Result rows will be passed to this function during fetching. The output of the function determines if the row is included or excluded from the result (filter capability), and may at the same time transform the row into something else (mapping capability). The function may also use side effects, eg send the row to another process, write to a file, insert into ets tables, etc. An important point is that rows are fetched and decoded only as fast as the function handles them (may be fast if it does some simple filtering or mapping or asynchronously hands them off to another process, or slow if it needs to do complicated things or stuff that is slow by nature), so it can also work as a throttling mechanism to prevent flooding memory with data that is needed only temporarily (ie, not all at once).

To achieve this behavior, it was necessary to change the response fetching strategy (mysql_protocol:fetch_response etc). Before, it was like this:

Fetch all row packets of a result set
Decode the rows
Repeat for the next result set until there are none left in the response

Now, it has been changed to:

Fetch 1 row packet of a result set
Decode this row packet to a row
Pass the decoded row to the filtermap function, and depending on the return value (mimicking lists:filtermap/2 behavior):
- if true, keep the row (to return later as the return of the query or execute call) as is
- if false, drop the row
- if {true, Value}, keep Value instead of the original row (to return later)
Repeat for next row until there are no more row packets in the result set
Repeat for the next result set until there are none left in the response

Each response row is fetched and decoded in one operation, in contrast to first fetching all rows then and decoding all rows.

The query/execute and their result handling functions are provided with an additional parameter to allow for an (optional) filtermap function to be passed in.

Provides new and augments existing query and execute functions to accept a filter/map function to be applied to result rows.

zuiderkwast

Great work!

First of all, I'm missing a test case or two, to cover each new line of code. :-) It should be fairly easy to write.

I've been trying hard in the past to keep line length to 80, so I appreciate if this can be kept. If the indentation is too deep, it's an indication to break things out to smaller functions.

Sorry if I'm repeating myself in the other comments. Things came to my mind in a somewhat random order.

src/mysql_conn.erl

zuiderkwast · 2019-01-30T17:34:11Z

src/mysql.erl

    when Conn :: connection(),
         Query :: iodata(),
         Timeout :: timeout(),
         Params :: [term()],
+         FilterMap :: query_filtermap(),


Here, FilterMap is cannot be undefined, so I think we shouldn't have undefined as an allowed value in the query_filtermap() type.

Well, it can, the second clause to allows undefined, a one-ary or two-ary function. See my next comment in conjunction with this.

src/mysql.erl

zuiderkwast · 2019-01-30T17:52:17Z

src/mysql.erl

+             Timeout :: timeout(),
+             Params :: [term()],
+             Result :: query_result();
+    (Conn, Query, FilterMap, Timeout) -> Result


I like this multi-clause spec! Good job!

I just noticed that this does not work with edoc... edoc will only generate documentation for the first clause :\

Bummer... the guys on #erlang say that's somewhat normal as edoc just doesn't seem to support the full extend of what -specs offer :( I opened an issue at bugs.erlang.org for this, but I don't think much will come of it anytime soon as it's not a pressing issue.

As it stands, I see no alternative other than weakening the spec to:

-spec query(Conn, Query, Params | FilterMap, Timeout | FilterMap) -> Result when Conn :: connection(), Query :: iodata(), Timeout :: default_timeout \| timeout(), Params :: no_params \| [term()], FilterMap :: no_filtermap_fun \| query_filtermap_fun(), Result :: query_result().

which I am reluctant to do, since it is in fact incorrect as it suggests you can pass in two filtermap functions.

Keep the good spec. It's more important than EDoc. I would rather drop EDoc if it's not good enough, but that can be done later. (What I do like about EDoc though is that you get the functions and types from the code for free.)

Can you post the link to the bug you opened for EDoc? I don't think it can be that hard to solve... (or is it?)

https://bugs.erlang.org/browse/ERL-849

Oh, sure, here you go: ERL-849. Not sure about how complicated it might be, have only glimpsed into the edoc code. edoc_specs seems to be the module to go for, I guess.

Ok, ERL-849 has just been resolved as "won't fix"...

src/mysql.erl

juhlig · 2019-01-31T10:24:47Z

Great work!

Thanks :)

First of all, I'm missing a test case or two, to cover each new line of code. :-) It should be fairly easy to write.

It's still a work in progress, just wanted to let you have a look to know if I'm not on the wrong track before investing more time than necessary for this step.

I've been trying hard in the past to keep line length to 80, so I appreciate if this can be kept. If the indentation is too deep, it's an indication to break things out to smaller functions.

Ok, I'll see to it. Personally, I don't care much about line lengths and never restrict myself to them in my own projects (my thinking being, this is a 24" monitor I work with, not a gameboy screen ;)), so I usually have a hard time when working on other peoples projects. But I'll adhere to your requirements, of course.

Sorry if I'm repeating myself in the other comments. Things came to my mind in a somewhat random order.

I'll answer those separately in the appropiate places.

A note/thought on documentation, though (this might become the start of a new issue).

I don't think it is good to put usage documentation, ie practically a user guide, right into the code. Again personally, I find lengthy documentation comments rather disrupting, ie not helpful for working inside the project.

I believe a separate documentation in something like asciidoc would be over all more practical. I've seen it done beautifully with ranch and have done it myself in some own projects. erlang.mk (dunno about rebar) creates man pages and a guide in pdf and html formats from asciidoc via dblatex "just like that" :)

What it boils down to is, IMO:

comments in the code concern only the code itself and are, in the best of circumstances, not even necessary because the code and specs speak for themself
usage documentation, which concerns only the user, doesn't disrupt the code.

zuiderkwast · 2019-01-31T14:02:25Z

I sort of agree about the documentation. I wanted to use edoc just because it's an OTP standard application. I didn't want any dependencies of any sort, apart from OTP. This project can build using rebar (using default values, no config), make or just using the OTP compile or make functions in an Erlang shell.

However, building the docs is not necessary when using the project as a dependency. Asciidoc can be nicely displayed in the github web interface as well I suppose, just like markdown, so there isn't even a need to build it. Is that right?

Anyway, let's do that separately from this change.

juhlig · 2019-01-31T15:33:06Z

However, building the docs is not necessary when using the project as a dependency. Asciidoc can be nicely displayed in the github web interface as well I suppose, just like markdown, so there isn't even a need to build it. Is that right?

Correct, see here for instance. Some IDEs like Eclipse can interpret it to varying degrees as well. And it can be read with any text editor, of course, similar to markdown.

Anyway, let's do that separately from this change.

I agree, that's for later :)

juhlig · 2019-02-01T13:32:29Z

With my last commit, I addressed most of the requested changes as best as I could. In a nutshell:

~~The tests are still missing. I'll get to that later.~~ There is a test case in mysql_tests now, filtermap_queries, in the queriesgroup of tests.
I couldn't always keep the lines below 80, some are slightly over this limit. This is not due to deep nesting but because of some long names. Since I wasn't always sure about your line breaking, indentation, and alignment policies (didn't always find examples in the current code), I left it at that for now.
I also renamed some things (like query_filtermap to query_filtermap_fun) to better reflect what they are for.
Documentation for query/{2,3,4} was replaced with @see query/5, where I also added an extensive paragraph on the FilterMap argument.
no_params, no_filtermap_fun and default_timeout atoms are now replacing `undefined´ in the appropriate places.

zuiderkwast · 2019-02-01T16:39:45Z

Personally, I don't care much about line lengths and never restrict myself to them in my own projects (my thinking being, this is a 24" monitor I work with, not a gameboy screen ;)), so I usually have a hard time when working on other peoples projects. But I'll adhere to your requirements, of course.

Some benefits of short lines:

I like to have a text editor on half of the laptop screen and and a terminal on the other half.
Side-by-side diff, even with a large font size.
Short lines prevents too much nested code.
Short lines are easier for the eye to read. Compare to the columns in a newspaper.
Readable on small screens like smartphones.

zuiderkwast

Good test case and documentation! I just have very few comments now.

zuiderkwast · 2019-02-03T22:42:47Z

test/mysql_tests.erl

+
+    %% test with plain query
+    {ok, _, Rows1}=mysql:query(Pid, Query, FilterMap1),
+    ?assertEqual(Expected, lists:sort(Rows1)),


I suggest you add ORDER BY id to the SQL query instead of these calls to lists:sort/1. Relying on the fact that tuples < lists in erlang isn't the most obvious thing.

Ok, no problem :)

zuiderkwast · 2019-02-03T22:49:37Z

src/mysql.erl

+%%
+%% Here is an example showing some of the things that are possible:
+%% ```
+%% Query = "SELECT `a`, `b`, `c` FROM `foo`",


Very good documentation!

Only one thing: I don't like the backticks around column names in MySQL examples though (nor in actual programs) because it's not standard SQL and it's not needed. Just remove them, I'd prefer.

Regarding not having the full manual in the code, have you seen that there is an edoc file in the doc/ folder? Just saying. I don't mind that the full documentation about this is here in the code for now. We can move this to e.g. asciidocs some other time.

Only one thing: I don't like the backticks around column names in MySQL examples though (nor in actual programs) because it's not standard SQL and it's not needed. Just remove them, I'd prefer.

Sure, done :)

Regarding not having the full manual in the code, have you seen that there is an edoc file in the doc/ folder? Just saying. I don't mind that the full documentation about this is here in the code for now. We can move this to e.g. asciidocs some other time.

Oh, no, I didn't notice. Looking at it, though, it seems that it would require some more structure and context, if you know what I mean. Like, just slapping the query paragraph from mysql.erl in there would look half-heartedly, ie not too good ;) I feel that there should be all things from connectin through querying in all sorts of ways etcetc in it, too. That's beyond my scope for now, though, so I'd say we leave the manual in the code for the time being until we can do an all-out manual ;)

Good. I agree.

zuiderkwast

Good. I'm happy!

zuiderkwast · 2019-02-04T13:21:01Z

test/mysql_tests.erl

+    ok = mysql:query(Pid, ?create_table_t),
+    ok = mysql:query(Pid, <<"INSERT INTO t (id, tx) VALUES (1, 'text 1')">>),
+    ok = mysql:query(Pid, <<"INSERT INTO t (id, tx) VALUES (2, 'text 2')">>),
+    ok = mysql:query(Pid, <<"INSERT INTO t (id, tx) VALUES (3, 'text 3')">>),


Just in case you didn't know you can do this in SQL:

INSERT INTO t (id, tx) VALUES (1, 'text 1'), (2, 'text 2'), (3, 'text 3');

I know ;) That's more or less an artifact there, I wasn't sure how many values I'd need, so just duplicated the line a few times over. I can change it to the (...), (...), (...)style if you like that better.

No, it's not important.

Ok, in that case, it seems this is good to go now, as far as I can see. I removed the WIP status from the title.

Sure, I'll merge it tonight or tomorrow.

No sweat :)

The query/execute functions are provided with an additional parameter to allow for an (optional) filtermap function to be passed in, for filtering and/or mapping the rows in the results in a similar fashion as lists:filtermap/2. This fun is applied in the process of the connection, so that copying to the caller process can be avoided for large result sets.

juhlig added 3 commits January 28, 2019 12:26

fetch resultset per-row

0470045

Each response row is fetched and decoded in one operation, in contrast to first fetching all rows then and decoding all rows.

provide for optional filtermap function

f0a8c06

The query/execute and their result handling functions are provided with an additional parameter to allow for an (optional) filtermap function to be passed in.

Optional result row filtering/mapping

05827b5

Provides new and augments existing query and execute functions to accept a filter/map function to be applied to result rows.

zuiderkwast requested changes Jan 30, 2019

View reviewed changes

implement requested changes

c68f6ac

juhlig force-pushed the query_filtermap branch from 7dcee7e to c68f6ac Compare February 1, 2019 13:37

zuiderkwast reviewed Feb 3, 2019

View reviewed changes

tests and doc changes

44980a4

juhlig force-pushed the query_filtermap branch from 909a24a to 44980a4 Compare February 4, 2019 12:16

zuiderkwast approved these changes Feb 4, 2019

View reviewed changes

juhlig changed the title ~~WIP: result filter/map capability~~ result filter/map capability Feb 4, 2019

zuiderkwast merged commit 7df8f1d into mysql-otp:master Feb 9, 2019

juhlig mentioned this pull request Mar 20, 2019

Retrieve partial result sets #94

Closed

juhlig deleted the query_filtermap branch October 12, 2020 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

result filter/map capability #104

result filter/map capability #104

juhlig commented Jan 28, 2019

zuiderkwast left a comment

zuiderkwast Jan 30, 2019

juhlig Jan 31, 2019

zuiderkwast Jan 30, 2019

juhlig Jan 31, 2019

juhlig Feb 1, 2019

juhlig Feb 1, 2019

juhlig Feb 1, 2019

zuiderkwast Feb 1, 2019

zuiderkwast Feb 1, 2019

juhlig Feb 1, 2019

juhlig Feb 12, 2019

juhlig commented Jan 31, 2019 •

edited

Loading

zuiderkwast commented Jan 31, 2019

juhlig commented Jan 31, 2019

juhlig commented Feb 1, 2019 •

edited

Loading

zuiderkwast commented Feb 1, 2019

zuiderkwast left a comment

zuiderkwast Feb 3, 2019

juhlig Feb 4, 2019

zuiderkwast Feb 3, 2019

juhlig Feb 4, 2019

zuiderkwast Feb 4, 2019

zuiderkwast left a comment

zuiderkwast Feb 4, 2019

juhlig Feb 4, 2019

zuiderkwast Feb 4, 2019

juhlig Feb 4, 2019

zuiderkwast Feb 4, 2019

juhlig Feb 4, 2019

result filter/map capability #104

result filter/map capability #104

Conversation

juhlig commented Jan 28, 2019

zuiderkwast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juhlig commented Jan 31, 2019 • edited Loading

zuiderkwast commented Jan 31, 2019

juhlig commented Jan 31, 2019

juhlig commented Feb 1, 2019 • edited Loading

zuiderkwast commented Feb 1, 2019

zuiderkwast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuiderkwast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juhlig commented Jan 31, 2019 •

edited

Loading

juhlig commented Feb 1, 2019 •

edited

Loading