New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Use Laravel's LazyCollections for collections of DataObjects #6867
Comments
I think maybe this is how it could be done? /**
* Get a collection of contexts matching the configured query
*/
public static function getMany(Collector $query) : Collection
{
$rows = $query
->getQueryBuilder()
->select(['c.*'])
->get();
return Collection::make(function() use ($rows) {
$rows->each(function($row) {
yield static::fromRow($row);
});
});
} |
@NateWr, with apologies for the delay in getting to this -- it looks very good and I'm happy to be adopting LazyCollections. One thought, though: implementing mappings directly in the collection class is going to lead to a double-standard where there are "first class" mappings (e.g. to/from the API) implemented in the collection class, and the rest, e.g. mappings added by import/export plugins. Our current method of getting around that is to implement each mapping as a separate class -- as in the filter framework for the native import/export plugin. There are two problems with that approach:
What do you think about an injection type of approach? Something like $publicationCollection->addMapping(MAP_TO_SCHEMA, function($publication) {
...
}); Then mappings can be added by the application (e.g. mapping to/from the schema) or post-hoc (e.g. by a plugin) or even augmented by a plugin through a new (overriding) function that passes through to the old one internally. Several of these mapping additions can be handled within the same chunk of code; no need to break them out to their own class, though it's still possible. |
I can see the benefit of this for plugins to extend a particular mapping, but I'm worried about what we lose with the added layer of abstraction. A couple of concerns:
My initial instinct was to build a special mapping base class to support extensions through callbacks like this. But once I had done that and looked at making use of it, I realized that we don't really reuse maps very often. We map to JSON for our REST API, and we map to CSV once or twice. But a single mapping tool can't really handle both, because the JSON maps to multilingual object but the CSV collapses to a single language. Mappings seem to be kind of one-off things. The only place we really have code overlap is between summary/full mappings of the schema and when mapping dependent objects requires fetching data. Putting this code into a namespace PKP\Submission;
class Collection extends LazyCollection
{
public function mapToSchemaSummary(Request $request, Context $submissionContext) : Enumerable
{
return $this->map(function($submission) use ($request, $submissionContext) {
return $this->_mapItemToSchemaSummary($request, $submissionContext);
});
}
// Note: $authorUserGroups only needed for full schema mapping
public function mapToSchema(Request $request, Context $submissionContext, array $authorUserGroups) : Enumerable
{
return $this->map(function($submission) use ($request, $submissionContext) {
$item = $this->_mapItemToSchemaSummary($request, $submissionContext);
$props = Services::get('schema')->getFullProps(SCHEMA_SUBMISSION);
foreach ($props as $prop) {
switch($prop) {
case 'stageAssignments':
$props[$prop] = [];
$assignments = $this->_getStageAssignments($submission);
foreach ($assignments as $assignment) {
$props[$prop][] = [
'id' => $assignment->getId(),
'userId' => $assignment->getUserId(),
];
}
break;
...
}
}
return $item;
});
}
public function mapToCSV() : Enumerable
{
return $this-map(function($submission) {
$assignments = $this->_getStageAssignments($submission);
$row = [
'id' => $submission->getId(),
];
$row['assignments'] = join(',', array_map(function($assignment) {
return $assignment->getUserId();
}, $assignments));
});
}
protected function _mapItemToSchemaSummary(Request $request, Context $submissionContext) : Array
{
$item = [];
$props = Services::get('schema')->getSummaryProps(SCHEMA_SUBMISSION);
foreach ($props as $prop) {
switch($prop) {
case '_href':
$props[$prop] = $request->url(...);
break;
...
}
}
return $item;
}
protected function _getStageAssignments(Submission $submission) : Array
{
return DAORegistry::getDAO('StageAssignmentsDAO')->getBySubmissionId($submission->getId());
}
} This makes the mappings very simple, but we can share code however we need as well as optimize it when necessary. However, it doesn't yet have a good way for plugins to extend the mappings. Ideally, the schema system should do this for us. Plugins can extend the schema to add properties to objects. This will cover the majority of cases. However, it presumes that a plugin is extending the schema to add a property that will be saved in the database. If someone wanted to, for example, attach data to the submission API endpoint that wasn't stored in the The challenge, which I don't have a good solution to yet, is how to let plugins hook in here. We can use a hook for each iteration, which would make it easy for plugins to attach something to it. But we need to add the hook to every map and it only permits the plugin to hook in during the loop. If a plugin is fetching data that isn't part of each object, that's probably not the most performant way to get that data. Ideally, the plugin would hook in at the collection level, fetching data for all of the collection's objects at once, rather than the per-item approach in the code above. That makes me think that we probably want these hooks outside of the mapping itself. It may not be great for a very large collection, but it would be best for our most common use cases -- extending the API. If we are mapping large sets of objects, we probably want to split the collection into chunks anyway. And if I'm a plugin extending a mapping, I may want to extend a mapping in one instance (eg - export to CSV) and not another (eg - REST API response). Not sure if all of that is clear at all, but that's where my thinking has gone as I've tried to work out the mapping stuff. To summarize, I think the key things I've come to feel are;
|
I'm not sure the best way, but we introduced a
Sorry, this was confused by my example code above -- we'd probably need to introduce a collection factory in order to accomplish this. Then, a plugin (and the core code would do it the same way) could, for example, introduce a mapping from submission to JATS document... class JatsMappingPlugin extends Plugin {
const MAP_TO_JATS = 'jats';
function register(...) {
$submissionCollectionFactory = CollectionFactory::get('submission');
$submissionCollectionFactory->addMapping(self::MAP_TO_JATS, function(Submission $submission, $existingMappingFunction) {
assert($existingMappingFunction === null); // This should be the first-loaded submission to JATS mapping
$dom = new DOMDocument();
$dom->appendChild(...); // Populate a DOM to represent the submission
return $dom;
});
}
} Within the same class, we can also define mappings from (The details of the factory etc. are very much up in the air -- just an example.) When applying a mapping we would use a combination of a constant describing the target mapping ( For example, a second plugin that needed to augment the mapping (and is registered after the first using the plugin loading priority tools) can similarly: class AugmentExistingJatsMappingPlugin extends Plugin {
function register(...) {
$submissionCollectionFactory = CollectionFactory::get('submission');
$submissionCollectionFactory->addMapping(JatsMappingPlugin::MAP_TO_JATS, function(Submission $submission, $existingMappingFunction) {
// The $existingMappingFunction should now point to the function implemented in the first function. Use it.
$dom = $existingMappingFunction($submission);
$dom->appendChild(...); // Augment the existing DOM with whatever enrichment this plugin wants to add
return $dom;
});
}
} Some use cases for this:
|
Ok, I can see where you're going with this. It's similar to how a request handler passes the request/response through middleware. My two main concerns are performance and abstraction. I think the use-cases I've been focused on are for the REST API, where we really need to be able to optimize the queries involved in mapping a collection to a response. But in the use-cases you raise (JATS, OAI, import/export), performance is not as important as making the output as comprehensive as possible. On the abstraction side, the big appeal of methods attached to a Let me play around with this a bit more. At the moment I'm leaning towards having a class for each map type (JSON, CSV, JATS), but let me see how that works out. |
@asmecher I think I've got a good approach now based on your last post. A global facade/factory is used to instantiate maps (see here). $map = Map::announcementToSchema();
$items = $map->map($announcements, $context, $request); A single map object can be extended for one-off use-cases: $map = Map::announcementToSchema();
$map->extend(function(array $output, Announcement $item, PKP\Announcement\Maps\Schema $map) {
$output['example'] = 'Example';
return $output;
})
->map($announcements, $context, $request); Or a plugin can register an extension on the Map::extend(PKP\Announcement\Maps\Schema::class, function function(array $output, Announcement $item, PKP\Announcement\Maps\Schema $map) {
$output['example'] = 'Example';
return $output;
});
// Extension will be applied to every map instantiated through the facade/factory
$map = Map::announcementToSchema(); Each mapping is its own class. To test out this approach, I wrote examples to map announcements to the
One potential downside is, because we rely on the map facade to apply extensions, a coder could accidentally use the map directly and miss out on the extensions. We'll have to be careful to reinforce usage of the facade to instantiate a map. Does this look like a good way forward? Should we call this a Factory instead of a Facade? I'll be honest I don't have a really clear idea of what a Facade is, other than a class that provides a simple API to access something. |
Looking good, @NateWr -- will this require us to implement a class for each combination of mapping and entity to be mapped? If so, what do you think about setting up mappings programatically via lambda functions rather than implementing them in classes? That way a coder can choose to implement an entire (probably simple) set of mappings in a single class -- think e.g. of something that generates ATOM entries for a news feed -- rather than requiring an entire class for each. Or they can use a class if they like, setting it up using member functions. Not wedded at all to this approach -- just trying to think of ways to avoid a huge set of classes to implement a relatively simple transformation (the native import/export plugin is a good example of a transformation being hard to maintain with so many classes). I'm also wondering if it's worth reducing the reliance on static calls ( |
Yeah, it will.
How would we define the relationship between a map and its dependencies with lambda functions? The main benefit of defining classes and methods like this is that when using them, an IDE can identify what mapping methods are available and what dependencies are expected, with useful type hinting like this: And because the map is a class, we can pass it back to any plugin which is extending the map, so they have access to the dependencies too. (For example, this is required to get the root XML document when extending an XML map). If there's a way to do all this with the lambda functions, I'd be happy to explore that.
I think this is kind of The Way™ with modern PHP these days. It is jarring at first, but with namespacing and autoload, the classes become a browse-able tree structure that makes the approach more manageable. You're right, though, that there will be a lot of small entities with very simple maps. I think we will find our way to a base map class, something like the
This mimics the approach I've taken with the The other thing they do is provide some useful discovery in an IDE for new developers. I can discover what maps are available to me by exploring the facade: I'm definitely open to alternatives here. We could shift the factory down into subclasses, so it is like
No, so a method like $summaries = Map::announcementToSchema()->summary($collection, ...);
$fullRecords = Map::announcementToSchema()->map($collection, ...); The global |
I didn't read this thoroughly yet, so I just wanted to leave the link of a library that I use in C#, perhaps there's something useful in the documentation if you're still looking for ideas: https://docs.automapper.org |
Thanks @jonasraoni. It looks like there is a PHP library, automapper-plus, based on the C# library. At first glance, my main hesitation would be stepping away from the JSON schema to define entities. The big benefit we get from this is solid documentation of the REST API out-of-the-box. It may be possible to generate this from the The other thing I didn't see in the docs is how to map to something that isn't clearly represented by a PHP object. With JSON it's pretty easy to run |
I think we can borrow some of the semantics of the automapper-plus library without necessarily using it directly -- the conventions for configuring mappings are essentially a better-fleshed-out version of what Nate and I were talking about above, with less dependency on static functions (which are hard to mock/test/extend). We won't necessarily have PHP objects with data represented by properties -- I suppose we could use e.g. |
I can't see how to use this convention while still being able to identify dependencies for the map. If we are able to do something like: $mapper->map($submission, SubmissionSchemaDto::class); Where would we identify that the mapper requires the Another issue is that automapper works seems to put the bulk of the "work" into the config object itself: $config
->registerMapping(Employee::class, EmployeeDto::class)
->forMember('age', function (Employee $source) {
return date('Y') - $source->getBirthYear();
}) The And it's usually in this part of the map where we will want to be able to reduce code re-use and things. Putting it into anonymous functions may make that difficult.
I think this is kind of the impasse we're at. Right now we are pretty committed to using the JSON schema files to define our entities, validate them, document them, and read/write to the database. But most PHP frameworks, including Laravel, expect the properties to be defined on the entity classes. We can try to move in that direction, but we'll need to figure out how we allow plugins to extend entity properties, how we define the validation requirements, and how we document the entity properties. The JSON schema files give us out-of-the-box documentation for our REST API, and the documentation that's generated includes properties added by plugins. This doubles as documentation for the entities in our system. But I think that any third-party library or convention for mapping is not going to expect this. |
This pattern has been adopted and merged for some entities, with other entity conversions filed and in progress. Closing. |
Proposal
When we get a collection of objects from the database, we need to map these to output for the REST API based on the schema. At the moment, we use a
getProperties
method on the Service class to map the object, and any dependent objects, to the schema for the REST API (see example with Publications).However, since each property may have different dependencies, it's difficult to track these dependencies and to manage them in a performant way -- especially when the mapping works on deeply nested objects. (See example).
I set out to find a good way to implement this mapping to achieve the following:
After trying a lot of ways to overengineer a solution, I settled on a pretty simple approach that extends Laravel's Lazy Collections. This means that each collection of objects has all of the Collection methods available to it, and we can extend this with our own mapping functions.
First, each entity has its own
Collection
class which extendsLazyCollection
:When the DAO returns a collection of objects, it returns an instance of this
Collection
class:At this point, we can use all of the Laravel Collection methods on our result.
In our
\PKP\Contexts\Collection
class, we add a new method to map the object to a schema for the REST API:So now our method is discoverable like other collection methods, and all dependencies are declared
We can map to the schema:
And when we need to, we can map only the summary properties in the schema:
Because it's just a LazyCollection, it's easy for us to use Laravel's methods to write any kind of map that we want for one-off use-cases:
And even to write to a CSV:
Things get more complicated when we need to map an object that has many dependent objects, like a Submission. The number of dependencies expands, but the
mapToSchema
method is unique to eachCollection
, so all of the dependencies can be required as method arguments.In addition, we need a mechanism that permits OJS/OMP/OPS to add their own mapping for properties that are specific to the app.
The examples below show how this could work with a
Submission
. First, the baseCollection
class:Then the app-specific class which adds a property unique to OJS:
The dependent objects, like publications, are already
Collection
instances, so we can run their mapping functions and all of the required dependencies are declared:The same patterns are used then to map the
Publication
object and its dependent objects:Concerns
a. For this to work, the
DAO
method must convert the QueryBuilder's resultCollection
into aLazyCollection
like this:Will this be a performance problem on large result sets? Is there a way to use Generators here to prevent
fromRow
from getting called until it's used inCollection
?b. Are there any concerns with extending
LazyCollection
? Any chance that this will cause confusion for Laravel users or is this kind of extension of Laravel's core classes a normal thing?The text was updated successfully, but these errors were encountered: