Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

result filter/map capability #104

Merged
merged 5 commits into from
Feb 9, 2019
Merged

Conversation

juhlig
Copy link
Member

@juhlig juhlig commented Jan 28, 2019

This PR adds the capability to pass a one-ary or two-ary function along with query and execute calls, to be used somewhat like lists:filtermap/2, as discussed in Issue #94.

Result rows will be passed to this function during fetching. The output of the function determines if the row is included or excluded from the result (filter capability), and may at the same time transform the row into something else (mapping capability). The function may also use side effects, eg send the row to another process, write to a file, insert into ets tables, etc. An important point is that rows are fetched and decoded only as fast as the function handles them (may be fast if it does some simple filtering or mapping or asynchronously hands them off to another process, or slow if it needs to do complicated things or stuff that is slow by nature), so it can also work as a throttling mechanism to prevent flooding memory with data that is needed only temporarily (ie, not all at once).

To achieve this behavior, it was necessary to change the response fetching strategy (mysql_protocol:fetch_response etc). Before, it was like this:

  • Fetch all row packets of a result set
  • Decode the rows
  • Repeat for the next result set until there are none left in the response

Now, it has been changed to:

  • Fetch 1 row packet of a result set
  • Decode this row packet to a row
  • Pass the decoded row to the filtermap function, and depending on the return value (mimicking lists:filtermap/2 behavior):
    • if true, keep the row (to return later as the return of the query or execute call) as is
    • if false, drop the row
    • if {true, Value}, keep Value instead of the original row (to return later)
  • Repeat for next row until there are no more row packets in the result set
  • Repeat for the next result set until there are none left in the response

juhlig added 3 commits January 28, 2019 12:26
Each response row is fetched and decoded in one operation, in contrast
to first fetching all rows then and decoding all rows.
The query/execute and their result handling functions are provided
with an additional parameter to allow for an (optional) filtermap
function to be passed in.
Provides new and augments existing query and execute functions
to accept a filter/map function to be applied to result rows.
Copy link
Member

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

First of all, I'm missing a test case or two, to cover each new line of code. :-) It should be fairly easy to write.

I've been trying hard in the past to keep line length to 80, so I appreciate if this can be kept. If the indentation is too deep, it's an indication to break things out to smaller functions.

Sorry if I'm repeating myself in the other comments. Things came to my mind in a somewhat random order.

src/mysql_conn.erl Outdated Show resolved Hide resolved
src/mysql.erl Outdated
when Conn :: connection(),
Query :: iodata(),
Timeout :: timeout(),
Params :: [term()],
FilterMap :: query_filtermap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, FilterMap is cannot be undefined, so I think we shouldn't have undefined as an allowed value in the query_filtermap() type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it can, the second clause to allows undefined, a one-ary or two-ary function. See my next comment in conjunction with this.

src/mysql.erl Outdated Show resolved Hide resolved
src/mysql.erl Outdated Show resolved Hide resolved
src/mysql.erl Outdated Show resolved Hide resolved
Timeout :: timeout(),
Params :: [term()],
Result :: query_result();
(Conn, Query, FilterMap, Timeout) -> Result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this multi-clause spec! Good job!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that this does not work with edoc... edoc will only generate documentation for the first clause :\

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bummer... the guys on #erlang say that's somewhat normal as edoc just doesn't seem to support the full extend of what -specs offer :( I opened an issue at bugs.erlang.org for this, but I don't think much will come of it anytime soon as it's not a pressing issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands, I see no alternative other than weakening the spec to:

-spec query(Conn, Query, Params | FilterMap, Timeout | FilterMap) -> Result
when Conn :: connection(),
Query :: iodata(),
Timeout :: default_timeout \| timeout(),
Params :: no_params \| [term()],
FilterMap :: no_filtermap_fun \| query_filtermap_fun(),
Result :: query_result().

which I am reluctant to do, since it is in fact incorrect as it suggests you can pass in two filtermap functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the good spec. It's more important than EDoc. I would rather drop EDoc if it's not good enough, but that can be done later. (What I do like about EDoc though is that you get the functions and types from the code for free.)

Can you post the link to the bug you opened for EDoc? I don't think it can be that hard to solve... (or is it?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sure, here you go: ERL-849. Not sure about how complicated it might be, have only glimpsed into the edoc code. edoc_specs seems to be the module to go for, I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, ERL-849 has just been resolved as "won't fix"...

src/mysql.erl Outdated Show resolved Hide resolved
src/mysql.erl Outdated Show resolved Hide resolved
@juhlig
Copy link
Member Author

juhlig commented Jan 31, 2019

Great work!

Thanks :)

First of all, I'm missing a test case or two, to cover each new line of code. :-) It should be fairly easy to write.

It's still a work in progress, just wanted to let you have a look to know if I'm not on the wrong track before investing more time than necessary for this step.

I've been trying hard in the past to keep line length to 80, so I appreciate if this can be kept. If the indentation is too deep, it's an indication to break things out to smaller functions.

Ok, I'll see to it. Personally, I don't care much about line lengths and never restrict myself to them in my own projects (my thinking being, this is a 24" monitor I work with, not a gameboy screen ;)), so I usually have a hard time when working on other peoples projects. But I'll adhere to your requirements, of course.

Sorry if I'm repeating myself in the other comments. Things came to my mind in a somewhat random order.

I'll answer those separately in the appropiate places.


A note/thought on documentation, though (this might become the start of a new issue).

I don't think it is good to put usage documentation, ie practically a user guide, right into the code. Again personally, I find lengthy documentation comments rather disrupting, ie not helpful for working inside the project.

I believe a separate documentation in something like asciidoc would be over all more practical. I've seen it done beautifully with ranch and have done it myself in some own projects. erlang.mk (dunno about rebar) creates man pages and a guide in pdf and html formats from asciidoc via dblatex "just like that" :)

What it boils down to is, IMO:

  • comments in the code concern only the code itself and are, in the best of circumstances, not even necessary because the code and specs speak for themself
  • usage documentation, which concerns only the user, doesn't disrupt the code.

@zuiderkwast
Copy link
Member

I sort of agree about the documentation. I wanted to use edoc just because it's an OTP standard application. I didn't want any dependencies of any sort, apart from OTP. This project can build using rebar (using default values, no config), make or just using the OTP compile or make functions in an Erlang shell.

However, building the docs is not necessary when using the project as a dependency. Asciidoc can be nicely displayed in the github web interface as well I suppose, just like markdown, so there isn't even a need to build it. Is that right?

Anyway, let's do that separately from this change.

@juhlig
Copy link
Member Author

juhlig commented Jan 31, 2019

However, building the docs is not necessary when using the project as a dependency. Asciidoc can be nicely displayed in the github web interface as well I suppose, just like markdown, so there isn't even a need to build it. Is that right?

Correct, see here for instance. Some IDEs like Eclipse can interpret it to varying degrees as well. And it can be read with any text editor, of course, similar to markdown.

Anyway, let's do that separately from this change.

I agree, that's for later :)

@juhlig
Copy link
Member Author

juhlig commented Feb 1, 2019

With my last commit, I addressed most of the requested changes as best as I could. In a nutshell:

  • The tests are still missing. I'll get to that later. There is a test case in mysql_tests now, filtermap_queries, in the queriesgroup of tests.
  • I couldn't always keep the lines below 80, some are slightly over this limit. This is not due to deep nesting but because of some long names. Since I wasn't always sure about your line breaking, indentation, and alignment policies (didn't always find examples in the current code), I left it at that for now.
  • I also renamed some things (like query_filtermap to query_filtermap_fun) to better reflect what they are for.
  • Documentation for query/{2,3,4} was replaced with @see query/5, where I also added an extensive paragraph on the FilterMap argument.
  • no_params, no_filtermap_fun and default_timeout atoms are now replacing `undefined´ in the appropriate places.

@zuiderkwast
Copy link
Member

Personally, I don't care much about line lengths and never restrict myself to them in my own projects (my thinking being, this is a 24" monitor I work with, not a gameboy screen ;)), so I usually have a hard time when working on other peoples projects. But I'll adhere to your requirements, of course.

Some benefits of short lines:

  • I like to have a text editor on half of the laptop screen and and a terminal on the other half.
  • Side-by-side diff, even with a large font size.
  • Short lines prevents too much nested code.
  • Short lines are easier for the eye to read. Compare to the columns in a newspaper.
  • Readable on small screens like smartphones.

Copy link
Member

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test case and documentation! I just have very few comments now.


%% test with plain query
{ok, _, Rows1}=mysql:query(Pid, Query, FilterMap1),
?assertEqual(Expected, lists:sort(Rows1)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you add ORDER BY id to the SQL query instead of these calls to lists:sort/1. Relying on the fact that tuples < lists in erlang isn't the most obvious thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, no problem :)

src/mysql.erl Outdated
%%
%% Here is an example showing some of the things that are possible:
%% ```
%% Query = "SELECT `a`, `b`, `c` FROM `foo`",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good documentation!

Only one thing: I don't like the backticks around column names in MySQL examples though (nor in actual programs) because it's not standard SQL and it's not needed. Just remove them, I'd prefer.

Regarding not having the full manual in the code, have you seen that there is an edoc file in the doc/ folder? Just saying. I don't mind that the full documentation about this is here in the code for now. We can move this to e.g. asciidocs some other time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one thing: I don't like the backticks around column names in MySQL examples though (nor in actual programs) because it's not standard SQL and it's not needed. Just remove them, I'd prefer.

Sure, done :)

Regarding not having the full manual in the code, have you seen that there is an edoc file in the doc/ folder? Just saying. I don't mind that the full documentation about this is here in the code for now. We can move this to e.g. asciidocs some other time.

Oh, no, I didn't notice. Looking at it, though, it seems that it would require some more structure and context, if you know what I mean. Like, just slapping the query paragraph from mysql.erl in there would look half-heartedly, ie not too good ;) I feel that there should be all things from connectin through querying in all sorts of ways etcetc in it, too. That's beyond my scope for now, though, so I'd say we leave the manual in the code for the time being until we can do an all-out manual ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I agree.

Copy link
Member

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I'm happy!

ok = mysql:query(Pid, ?create_table_t),
ok = mysql:query(Pid, <<"INSERT INTO t (id, tx) VALUES (1, 'text 1')">>),
ok = mysql:query(Pid, <<"INSERT INTO t (id, tx) VALUES (2, 'text 2')">>),
ok = mysql:query(Pid, <<"INSERT INTO t (id, tx) VALUES (3, 'text 3')">>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case you didn't know you can do this in SQL:

INSERT INTO t (id, tx) VALUES (1, 'text 1'), (2, 'text 2'), (3, 'text 3');

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know ;) That's more or less an artifact there, I wasn't sure how many values I'd need, so just duplicated the line a few times over. I can change it to the (...), (...), (...)style if you like that better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in that case, it seems this is good to go now, as far as I can see. I removed the WIP status from the title.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll merge it tonight or tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sweat :)

@juhlig juhlig changed the title WIP: result filter/map capability result filter/map capability Feb 4, 2019
@zuiderkwast zuiderkwast merged commit 7df8f1d into mysql-otp:master Feb 9, 2019
zuiderkwast pushed a commit that referenced this pull request Feb 9, 2019
The query/execute functions are provided
with an additional parameter to allow for an (optional) filtermap
function to be passed in, for filtering and/or mapping the rows in the results in a similar fashion as lists:filtermap/2.

This fun is applied in the process of the connection, so that copying to the caller process can be avoided for large result sets.
zuiderkwast pushed a commit that referenced this pull request Mar 31, 2019
The query/execute functions are provided with an additional parameter
to allow for an (optional) filtermap function to be passed in, for
filtering and/or mapping the rows in the results in a similar fashion
as lists:filtermap/2.

This fun is applied in the process of the connection, so that copying
to the caller process can be avoided for large result sets.
@juhlig juhlig deleted the query_filtermap branch October 12, 2020 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants