Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract Response Processing #7

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

joelvh
Copy link
Contributor

@joelvh joelvh commented Feb 26, 2018

The goal is to extract all processing of the response in order to implement ETL-like post processing and streaming of HTTP requests.

Changes:

  • Extract DSL classes to separate files
  • Move current response parsing to class which encapsulates all processing of Response object
    • Introduced TLAW::Processors::Base with basic functionality
    • Introduced TLAW::Processors::ResponseProcessor with response body parsing into JSON or XML without DataTable flattening
    • Introduced new default response processor (TLAW::Processors::DataTableResponseProcessor) which builds on ResponseProcessor and flattens data into DataTable
  • Added option to specify response_processor
    • Nested API paths inherit response processor
  • Move data transformation processors into DSL::Transforms namespace
    • Renamed process* methods to transform* with method aliases for backwards-compatibility
    • Replace process_replace with transform(replace: true)
  • Move "param"-related classes to TLAW::Params namespace
  • Rename "post process" to "process"
  • Implement Ruby 2.5's Object#yield_self for older versions of Ruby (in place of Object#derp)

@joelvh
Copy link
Contributor Author

joelvh commented Feb 26, 2018

@zverok this is mainly a placeholder PR to get some feedback on the direction of things. This is primarily meant to keep things backward-compatible (except for classes where the namespace was changed, but should not affect users of the DSL).

@coveralls
Copy link

coveralls commented Feb 26, 2018

Coverage Status

Coverage increased (+0.1%) to 95.856% when pulling 01fbcc3 on joelvh:master into a4c7470 on molybdenum-99:master.

Copy link
Contributor

@zverok zverok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of "just refactoring" changes look OK to me (left some comments), but the situation with Processors module is a bit unclear.

Depending on your final goal, I can imagine several approaches here. As far as I can understand, you want to achieve is: Use TLAW construction DSL, but:

  • not use its response processing at all (receive raw responses);
  • or, use its basic response processing (JSON parsing and errors guard), but not "advanced" features (flat hashes, data tables, post_process DSL);
  • or, use parsing & errors guards, post_process DSL, but avoid flattening of hashes and data tables.

In either case, I believe that the "right" way is splitting response-processing into Faraday-like independent middleware, or even indeed Faraday middleware: in this case, "parse JSON" and "fail on errors" are probably already existing ones! Then, what will be left is

  • Some "flattener" (which just takes parsed response and makes it flat & adds datatables → unless the latter is ONE MORE independent middleware);
  • Some "handlers" middleware, allowing to set "entire response" and "per-key" handlers, the trick here that they probably can't rely on flattening, so should be able to set an arbitrary depth of processing; also, this is kind of middleware that updates the DSL too, or, alternatively, the DSL that adds middleware to response processing.

Please tell me what you think about the above reasoning. I probably can jump in and write some code too, to clear the path :)

lib/backports.rb Outdated
yield self, *args
end
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just add backports to the dependencies? It is kinda more generic.

ArgumentError,
"Unregistered #{expected_class.name.downcase}: #{symbol}"
)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preferred the previous logic (fetch → validate → create), it seems more "functional" to me.
And, in fact, could've been simplified a bit:

children[symbol]
  .tap { |child_class| ... }
  .new(@parent_params.merge(params))

README.md Outdated
@@ -5,7 +5,7 @@
[![Coverage Status](https://coveralls.io/repos/molybdenum-99/tlaw/badge.svg?branch=master)](https://coveralls.io/r/molybdenum-99/tlaw?branch=master)

**TLAW** (pronounce it like "tea+love"... or whatever) is the last (and
only) API wrapper framework for _get-only APIes_<sup>[*](#get-only-api)</sup>
only) API wrapper framework for you'll ever need APIs in a consistent way
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little bit of a bad copy/paste job there, but also have the goal of allowing other request methods. Will add back the GET-only text for now since other methods are probably a ways away (and maybe not needed in the short term anyways).


module TLAW
module DSL
# @!method base(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs of all # @!method-s below are related to DSL itself, not BaseWrapper, they should be returned to dsl.rb

@@ -35,7 +37,7 @@ def base_url=(url)
@base_url = url

children.values.each do |c|
c.base_url = base_url + c.path if c.path && !c.base_url
c.base_url ||= base_url + c.path if c.path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow liked the previous version better ("set, if it has a path and no base url" is one "statement" in my mind, while ||= and if in same line makes it hard to follow: "unless it is set, set it... Oh, if it has path, BTW")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zverok I debated this as well, and actually intended to remove the trailing if ... if you know what the expected logic or resulting values are intended to be here. Could this just use the base_url even if the path is nil?

Copy link
Contributor

@zverok zverok Feb 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably check with specs.
I believe these excessive checks are the result of how DSL constructs it: not by one constructor, but gradually setting parts of an object, and in the process sometimes it becomes hard to a) not reset what is already properly set and b) not set the value too early, to incomplete state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this was reverted

@@ -0,0 +1,37 @@
module TLAW
module Params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have some docs for the module, probably.

def self.make(name, **options)
# NB: Sic. :keyword is nil (not provided) should still
# make a keyword argument.
if options[:keyword] != false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. I know I wrote it myself, but lets flip the if for the sake of readablility (if == false →Argument else → Keyword)

# converting. You'll never instantiate it directly, just see {DSL#param}
# for parameters definition.
#
class Param
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it Params::Base? I've never been a huge fan of Foos::Foo naming.

module Processors
# @private
# FIXME: everything is awfully dirty here
class BaseProcessor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processors::Base?

end

def all_processors
[*(parent && parent.all_processors), *@processors]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, in 2018 we probably can limit the lowest version to 2.3 and just go with &.

@joelvh
Copy link
Contributor Author

joelvh commented Feb 26, 2018

@zverok your comments have been addressed except for 2 I left comments for.

The Processors namespace is mostly a placeholder right now.

Regarding your reasoning, I think it all makes sense. In terms of my goals, I would want to keep the current TLAW behavior and DSL in tact. However, the underlying architecture would become more modular and focused on building pipelines (or middleware) to handle processing of HTTP responses. Ideally, the DSL can have the process (post_process) helper methods to build up the ETL-like transformations, so out-of-the-box, TLAW has most of what people want for simple situations.

The additions I'm envisioning are not far from what is already there. However, I would want to achieve:

  • TLAW will become more extensible and the ability to reuse existing components will become easier
  • Ability to add more DSL options that might swap out the default response processor or set options on the default response processor for more optionality (whatever those may end up being)
  • Ability to replace the "default" response processor to create a fully custom processor
    *For example, some HTTP requests sync large files that should be processed while downloading, so implementing an Enumerator to stream the data to processors while downloading is a goal, but should not be imposed as part of the TLAW default response processor if deemed out-of-scope)
  • Ability to use any kind of transformation pipeline for the response (I believe you have some projects related to this and Solnic too, for example)

I'm developing an "abstract" ETL pipeline project that can encapsulate the workflows and use existing gems under-the-hood to execute parts of the process (e.g. TLAW can be used to create an API client to stream data into an ETL pipeline)

So all that to say that the current TLAW behavior would ideally stay intact or if it makes sense, some default behavior may evolve.

@@ -1,6 +1,6 @@
require 'pp'

$:.unshift 'lib'
$:.unshift '../lib'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the proper File.expand_path('../lib', __FILE__), then :)
Because I run them from root of repo (ruby examples/<something>.rb), and this way they will not work.

@@ -35,7 +37,7 @@ def base_url=(url)
@base_url = url

children.values.each do |c|
c.base_url = base_url + c.path if c.path && !c.base_url
c.base_url ||= base_url + c.path if c.path
Copy link
Contributor

@zverok zverok Feb 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably check with specs.
I believe these excessive checks are the result of how DSL constructs it: not by one constructor, but gradually setting parts of an object, and in the process sometimes it becomes hard to a) not reset what is already properly set and b) not set the value too early, to incomplete state.

Copy link
Contributor

@zverok zverok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey.

I wanted to ask you a big favor. It is really interesting to me where the story with response processors unwinds, but it becomes to be really hard following 50+ files PR.

If it is possible, can you split this PR in two: one with refactorings/spelling/examples fixes without any API changes, we'll discuss it briefly and merge; and the next one focused just on processors/transformers redesign? I believe it would be waaay easier for both of us to proceed this way.

@zverok
Copy link
Contributor

zverok commented Feb 27, 2018

However, the underlying architecture would become more modular and focused on building pipelines (or middleware) to handle processing of HTTP responses.

Yes, that's one of my goals (that I've never had time to fully embrace for this project). So, let's go with it 👍

BTW, as you've mentioned it, I, in fact, developed Hm somewhat with TLAW in mind, so probably (on some further steps) the possibility could be investigated to do things in "Hm-like" style:

transform('items', :*, 'date', &Date.method(:parse))
# instead of
transform_items('items') do
  transform 'date', &Date.method(:parse))
end

(The former now looks "cleaner" for me, though it contradicts open_weather_map example, with WEATHER_PROCESSOR reused on several levels.)

But that's for future (also, transform_items maybe should be transform_each?.. Looks more Ruby, don't remember why I wanted _items suffix in the first place.)

PS: On a totally unrelated note, yesterday evening younger colleague asked for help with some Magento integration, and we struggled for a few hours with rough SOAP/Savon, and then I googled for ready solution, and suddenly found magento-rb and everything worked fine just out-of-the-box, and then I was like "Hey, I know this guy!" So, thanks, I guess :)

@joelvh
Copy link
Contributor Author

joelvh commented Feb 28, 2018

@zverok Ha! What a funny coincidence that you guys found that Magento project. Glad it was of help!

Regarding multiple PRs: I tried to largely group updates into commits, so I would initially recommend looking through the commits as a means to understand the changes. If that isn't possible, I'll see about multiple PRs, if that's OK?

Regarding transformations: Yes, Hm was the project, and as a matter of fact, it looks like you just started that very recently. Solnic's Transproc is the other one, and both of these have lots of common transformations built-in. My goal is to develop a set of gems that can help with solving various use cases, that can all be used together. Primarily, the goal is to provide DSLs to construct ETL workflows or data migrations, report generation, and other things in an abstract way that easily lets you use the tools you want to use for things like transformations. (For instance, take the TLAW DSL for defining the workflow and transformations. The transformations could be performed by Hm, Transproc, or both.) A bit ambitious, I realize, but that's the goal. We can talk more about this if you're interested... for now I'm using my reporting gem as a workspace for these concepts, and will abstract out various aspects of that project into individual gems for composition for individual use cases (eventually).

Thanks for reviewing things!

@joelvh
Copy link
Contributor Author

joelvh commented Feb 28, 2018

@zverok I should note that the initial abstraction in reporting builds on top of Kiba, but Kiba is not a direct dependency. The Pipeline DSL allows for constructing the transformations, and then the job can be run by Kiba (or another framework).

Some examples in the project to illustrate the ideas:

  1. Developing DSL to construct "phases" of a pipeline (heavily influenced by Kiba, but going to evolve from there to extend beyond Kiba's capabilities)
  2. A simple DSL for adding a transformation
  3. Example of some transforms that Hm or Transproc are better suited to provide the implementation for
  4. Example of what a job definition might look like (needs to be filled in)
  5. And finally, here's how the pipeline is actually run by Kiba

I like how you developed a DSL object model that wraps around the domain object model. I will explore that approach a bit more in order to not mix helper methods with the data model. (As you will see in some of the changes I made to the transforms in TLAW, I moved what I would consider helpers to the DSL namespace in this manner.)

Thanks!

@zverok
Copy link
Contributor

zverok commented Mar 8, 2018

Hey, sorry for being silent for a week, life and work took too much of my attention.

I generally like the notion of "pipelines", and lately, too, become to think that TLAW should fit in some pipelines (therefore, probably requiring to inherit from it is not the best convention possible). My main use case (which led me to "inventing" the library) is the reality project (it is work-in-progress, here is the latest presentation from RubyConf India). It basically is made of "describers" (high-level homogenous wrappers) for external data services. The describer also can be seen as a pipeline (e.g. fetch Wikipedia page → parse it to wiki-AST → fetch some semantic information → wrap it into high-level value objects like "Temperature" or "Geographical coordinate")

TLAW usage there can be seen, for example, here. I have mixed feelings about how it fit in: on the one hand, it was really easy to describe the data source I wanted, on the other, I needed an "internal" class to descend from TLAW, while probably ability to somehow "include" the DSL into the pipeline could be better.

@joelvh
Copy link
Contributor Author

joelvh commented Mar 16, 2018

I was out of town and got a chance to look at your links now. I saw Reality when I looked through your projects a while back and thought it was very interesting. Your slides are also helpful to walk through your thinking. I have some thoughts around similar ideas of what can be built on top of TLAW and pipelines.

Reality is more designed around entities that can be used across data sources, which is neat, but also means that each data source needs to have various higher level translations to make them work together (presumably your "describers"). I will give more thought to this and how it can be done more generically and if it's in the realm of where I've been thinking of going. However, the way you are using TLAW as the API client and building on it is exactly what I like.

As I think about data sources, transformations and peristence (more related to ETL), I also think a little bit about making it more event-oriented and potentially some parts async. However, I think that is higher-level pipelines than we are currently talking about.

I will play more with TLAW concepts and how some of my goals in the previous comment fit together as I get my head around this again. It sounds like you are open to determining TLAW's place in a system of pipelines.

Let me know if you have any specific thoughts about the relationship between TLAW, pipelines and how to integrate multiple libraries together (e.g. TLAW for API client, interchangeable Hmor transproc for pre-built transformations, etc).

@joelvh
Copy link
Contributor Author

joelvh commented Mar 17, 2018

@zverok I've been looking more into ROM.rb and it looks like the HTTP adapter does what we are doing with request and response handling. There is a ROM HTTP adapter and I also found an example of a Faraday ROM HTTP adapter example.

ROM.rb is also very functional and pipeline-based. It seems like it could be a good foundation for ETL as I've been thinking about, and TLAW would be great for customizing the HTTP client/adapter. I'm going to give more thought about if/how ROM fits into things, but I wanted to see what your thoughts are.

@joelvh joelvh reopened this Mar 20, 2018
@joelvh
Copy link
Contributor Author

joelvh commented Mar 20, 2018

@zverok I've recreated commits for the remaining changes on this PR. It's two commits that encompass renaming the post-processor methods and extracts the response processor to a configurable class.

@zverok
Copy link
Contributor

zverok commented Mar 21, 2018

Let me know if you have any specific thoughts about the relationship between TLAW, pipelines and how to integrate multiple libraries together (e.g. TLAW for API client, interchangeable Hmor transproc for pre-built transformations, etc).

I have no clear idea, currently, mostly because I haven't experimented enough. The generic concerns are:

  1. How to make processing of data from Web APIs flexible. My initial guess (especially opinionated "flattening") was definitely too rigid; the "process this and that element" API is better, so your work to make it all more modular and reasonable is extremely valuable.
  2. I am not sure that "separate TLAW class" is always the best approach, it again lacks flexibility. I recently thought about that exactly in context of post-processing: on any non-trivial postprocessing block, the nesting becomes ridiculous. It would be probably nice to have something like post_process :some_private_method_name.
  3. Also (in context of Reality) I thought about if it is possible to think about some "TLAW inside, other interface outside classes", like
class SpecializedWeather
  def very_concrete_method
    search(city).weather(date) # that's private internal TLAW
      .yield_self { some very fancy wrapper }
  end

  private

  # all those methods become availabel to instance yet private.
  tlaw do 
     ........
  end 
end

I am not sure it will be good, just an idea.

As about ROM, I believe that they approach the same problem here, but from a bit different angle. Their is mostly "apply repository pattern to everything" (including HTTP, implementing something like Repository-based ActiveResource); mine is mostly utilitarian, some small tool to be chainable into bigger ones. Maybe the approaches can be joined somehow, IDK for sure :)

@joelvh
Copy link
Contributor Author

joelvh commented Mar 21, 2018

Thanks for your thoughts @zverok - I think we are pretty aligned in our thinking. I am getting more into the ROM/Dry/Hanami world and will report back about what I see the fit as.

The one thing that I have been thinking about a lot lately is the value of DSLs as a high-level interface to describe what you want your application to do, and making the implementation details configurable. I really like the Container/IoC patterns to make this even more flexible. Similar to how you describe specifying a method name for some custom transformation, that could very well refer to a key in a container -- because I definitely agree with you that post processing via nested blocks can become very unwieldy in non-trivial situations. To break it down, parts of it should probably be their own methods or classes that you compose together. Hence, I really am interested in looking at what can be done by developing a DSL to handle these complexities, which will then build the composition of the pipeline with your various implementation details behind the scenes.

I'll report back with what I find as I get more into those libraries and communities I mentioned.

In the meantime, what are your thoughts about merging this PR? (I had originally aliased methods to make the transformation methods backward-compatible. If you have concerns about the API changes, that is an option. My guess is that this could be a drop-in for you if you add back the aliases.)

Copy link
Contributor

@zverok zverok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now I can see that (save the minor comments) I definitely like it!

Let's just merge after we discuss/fix comments (the only important one is probably items vs _each thing).

I believe that this library (unfortunately) have no active users except me, and it has 0.0.x version number, so it seems some good renamings are totally acceptable.


# input is entire response, and response is fully replaced with block's
# return value
post_process { |hash| hash['foo'] } # Now only "foo"s value will be response
transform { |hash| hash['foo'] } # Now only "foo"s value will be response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transform(replace: true) should be here, right?..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zverok i mentioned in another comment of yours that I'd like to remove the replace option and just always return the value that should be used rather than mutate the argument to the block. Thoughts?

# `{"key.count" => 1, "key.continue" => false}`.
#
# @overload post_process(&block)
# @overload transform(replace: false, &block)
# Sets post-processor for whole response. Note, that in this case
# _return_ value of block is ignored, it is expected that your
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "Note,..." is not true anymore, considering replace: option :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the cherry-picks, I'm confused about what ended up wrong with the moving things around to separate PRs. I'll have to look this over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, looking at this, it is correct. The new example illustrates the same effect for when replace: false is passed (which is also the default). Further down you'll see an example of what happens with replace: true, which is new documentation.

I think the note indicates that transform(replace: false) is a misnomer and also a bit confusing. My thought is that if you don't want to transform the value, this should be a method name like handle or something (e.g. maybe process for this specific behavior) where the return value is not expected to be processed. (I noted in another comment that I would prefer to make the default behavior of transform to be replace: true.)

# ```
#
# @param key [String]

# @!method post_process_items(key, &block)
# @!method transform_items(key, &block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it transform_each, WDYT? I don't know where was my head when I called that _items :( I was younger, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha. My opinion is that the transformations should return the new value, not mutate it. So, I would probably want it to be map maybe. I meant to note this if I didn't already. This goes for all transformations, not just this one.

def self.batch(key, &block)
batcher = new(key)
batcher.instance_eval(&block)
batcher.processors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer (though, not insist)

new(key).tap { |batcher| batcher.instance_eval(&block) }.processors

@zverok
Copy link
Contributor

zverok commented Mar 22, 2018

Well, let's try this. I mean, if all the examples and docs would show it is clear enough -- I don't mind getting more functional here :) I reviewed the examples where "mutating" post_process were used, and they all probably can be rewritten in non-mutating style, probably with more clarity even:

# before
post_process { |e|
  e['coord'] = Geo::Coord.new(e['coord.lat'], e['coord.lon']) if e['coord.lat'] && e['coord.lon']
}
post_process('coord.lat') { nil }
post_process('coord.lon') { nil }

# after 
post_process { |response|
  next response unless response.key?('coord.lon') && response.key?('coord.lat')
  response.merge(
    'coord' => Geo::Coord.new(response.delete('coord.lat'), response.delete('coord.lon'))
  )
}

Then, probably the most readable API would be:

transform { |whole_response| should return next part }
transform(key) { |key value| should return new key value }
transform_array(key) do
  transform { ... }
end

About the last name: I'm still feeling a bit unsure about it, but neither transform_items, nor transform_each, nor transform_map seems to communicate "what it does" (what are "items"? "each" what? "map" to where?..). _array is dull yet straight.

I also thought about hm-style replacement:

transform('list', :*, 'sys.sunrise', &Time.method(:at))
transform('list', :*, 'weather.icon') { |i| "http://openweathermap.org/img/w/#{i}.png" }
...

...but on this "model" example (OpenWeatherMap) it doesn't bring any clarity at all (and "join coord.lon & coord.lat into coord" seems even more cumbersome).

Probably this part is subject to further investigations.

@joelvh
Copy link
Contributor Author

joelvh commented Mar 22, 2018

Here are some initial thoughts I had ruminating, but have not yet thought through all use cases.

If we consider a hierarchy of how the response is processed, maybe we can do something that I saw in a ROM video demo (but can't seem to find an example in their docs).

There was an example of two ways to define mappings. First was to use a block with the entity as an argument. The second was a block without an argument where you could call transformation methods that implicitly applied to the entity. The first was for more custom mapping (e.g. mutations or whatever you want) and returning a result. The second was to describe the transformations via the use of the transformation DSL. This is where I see either taking from ROM or implementing some aspect of ROM.

Anyways, maybe the DSL can be something like this, going with the implicit version we've been using in TLAW already:

# Hook into transformation process, responsible for returning the full result
transform do
  # This would map over whole response, assuming the whole response
  # is an array or just processes the whole response as a single item
  map do # item in the array is implicit
    rename old_key: :new_key
    # change value
    map(key) { |value| ... new value ... }
    # could have an option to convert to array if it's a single item
    map(key, ensure_array: true) { |value| ... new value ... }

    # Many more transformation helpers

  end
end

# Other custom approach
transform do |response, transformer|
  # do stuff to the response, maybe create a new object to copy resulting data to
  result = response # do stuff

  # use transformer to take advantage of transform DSL
  transformer.for result do
    # this is the same context as `transform` without arguments
  end
end

I realize this example isn't clear about what gets included or excluded in the resulting output (e.g. if keys you don't transform get added or get skipped in output, etc).

Thoughts on this approach, which describes the data hierarchy more?

@joelvh
Copy link
Contributor Author

joelvh commented Mar 22, 2018

@zverok I added a second example with more customizability to the previous comment

@zverok
Copy link
Contributor

zverok commented Mar 28, 2018

Hey. Sorry for the late answer :( Mindful responding requires a lot of my attention that I am currently short of. The thing is, I believe, that good new or updated transformations DSL hardly can be "guessed". I committed some examples (I've had them for long in gitignored folder) to examples/experimental, and with preexisting examples/*.rb it is some material to experiment with.

What I currently can say, probably: in fact, "rich DSL for hash transformations" is probably out of the scope of TLAW: it is the road that can be walked very far, and in fact, nothing implies that "hash transformations" should be part of "HTTP API wrapper" library. I started with "opinionated" library which transformed everything, but we already discussed that it was probably a false move (unlike endpoints definition API which is flexible and not assumes anything about client code).

So, what I am thinking now in light of this idea, is that probably this should be only one and ultimate interface:

endpoint ... do
   transform &anything
end

The only thing that does part of TLAW API defines is that you can specify some transformations -- just with any block-alike object. Now, this object can be produced by transproc-alike library, or be some MyCustomTransformations class responding to to_proc, or just a big hairy proc doing everything.

For my own code, I am playing in my head with idea of extending hm to produce proc objects, then it will allow doing roughly this (OpenWeatherMap example):

TRANSFORM = 
  Hm()
    .transform_values('weather', &:first)
    .transform_values('dt', %w[sys sunrise], %w[sys sunset], &Time.method(:at))
    .except('dt_txt') # ... and so on; returns #to_proc-able object

# for singular value:
transform &TRANSFORM
# for values in a list: transform each of list items
transform &Hm().tranform_values(['list', :*], &TRANSFORM)

# I can imagine different ways of data control in this approach, like...
Hm()
  .expect('weather', Array) # fails if there is no 'weather' key, or pattern === value does not match)

This way next version of TLAW will loose those small nice opinionated "flattening" features, but I am almost sure that nobody have ever used or liked them, except my demo files :)

WDYT?..

@joelvh
Copy link
Contributor Author

joelvh commented Mar 29, 2018

@zverok now we're talkin'! :-)

Yes, this is what I had in mind -- a generic way to "hook" into aspects of a pipelilne. Specifically with TLAW, the transform { ... } block is a hook that can be defined by any kind of transformation library. It should be as generic as possible, to allow for implementing anything.

My thought is that you could specify a thing that responds to call and the result is passed to it (transform MyTransformer).

Alternatively, you can "register" transformations into some sort of registry and refer to them by name (transform :my_transformer or transform(with: :hm) { transform_values('weather', &:first) }).

There could be a way to register a transformer that can be instance_exec'd so that it's the context of the transform hook. It could be an interface we define for the transformers that get defined or a wrapper or something. But basically, I like the idea of some sort of block that has a specific behavior so you can call methods within some specific context.

This is very much what some aspects of ROM do, and ROM or DRY could be some components we use behind the scenes.

@joelvh
Copy link
Contributor Author

joelvh commented Sep 27, 2018

Hey @zverok we wanna revisit getting this merged?

@zverok
Copy link
Contributor

zverok commented Sep 30, 2018

Hey @joelvh, sorry... I feel bad about this, honestly. At some point I somehow got distracted from this discussion.
But from my perspective, "the ball is on your side here" (so, initially I thought you'll act somehow on result of our discussion... and then I forgot to clarify it, sorry again).
The PR and discussion have got definitely too big to merge it "as is". WDYT about this way:

  • first, merge just small refactorings/cleanup (without ResponseProcessor classes extraction) as a separate PR, to take them of the way of discussing the essential redesign;
  • and then, try to focus on a factoring out singular simple transform method & optional Processors (flatterner, data tabler) to see how they could be chained into the transform.

?

@zverok zverok mentioned this pull request Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants