Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Use Laravel's LazyCollections for collections of DataObjects #6867

Closed
NateWr opened this issue Mar 17, 2021 · 13 comments
Closed

Proposal: Use Laravel's LazyCollections for collections of DataObjects #6867

NateWr opened this issue Mar 17, 2021 · 13 comments
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.

Comments

@NateWr
Copy link
Member

NateWr commented Mar 17, 2021

Proposal

When we get a collection of objects from the database, we need to map these to output for the REST API based on the schema. At the moment, we use a getProperties method on the Service class to map the object, and any dependent objects, to the schema for the REST API (see example with Publications).

However, since each property may have different dependencies, it's difficult to track these dependencies and to manage them in a performant way -- especially when the mapping works on deeply nested objects. (See example).

I set out to find a good way to implement this mapping to achieve the following:

  1. Dependencies should be specified as function arguments in the mapping method.
  2. It should be possible to optimize the mapping when needed (eg - if we wanted to make an optimized SQL query to retrieve related data for the whole collection, instead of one at a time).
  3. An easy, intuitive way to implement alternative maps. We mostly map to our schema for the REST API. But it should be possible to use the same techniques to map a list of objects to a CSV file, for example.

After trying a lot of ways to overengineer a solution, I settled on a pretty simple approach that extends Laravel's Lazy Collections. This means that each collection of objects has all of the Collection methods available to it, and we can extend this with our own mapping functions.

First, each entity has its own Collection class which extends LazyCollection:

<?php
/**
 * @file classes/context/Collection.inc.php
 *
 * Copyright (c) 2014-2020 Simon Fraser University
 * Copyright (c) 2000-2020 John Willinsky
 * Distributed under the GNU GPL v3. For full terms see the file docs/COPYING.
 *
 * @class context
 *
 * @brief A class that represents a collection of contexts (journals, presses or preprint servers)
 */
namespace PKP\Context;

use Illuminate\Support\LazyCollection;

class Collection extends LazyCollection
{
	...
}

When the DAO returns a collection of objects, it returns an instance of this Collection class:

<?php
/**
 * @file classes/context/DAO.inc.php
 *
 * Copyright (c) 2014-2021 Simon Fraser University
 * Copyright (c) 2000-2021 John Willinsky
 * Distributed under the GNU GPL v3. For full terms see the file docs/COPYING.
 *
 * @class context
 *
 * @brief A class to interact with the contexts database.
 */
namespace PKP\Context;

use \PKP\Core\EntityDAOBase;

abstract class DAO extends EntityDAOBase
{
	...

	/**
	 * Get a collection of contexts matching the configured query
	 */
	public static function getMany(Collector $query) : Collection
	{
		$rows = $query
			->getQueryBuilder()
			->select(['c.*'])
			->get();

		return Collection::make($rows->map([static::class, 'fromRow']));
	}
}

At this point, we can use all of the Laravel Collection methods on our result.

collection-methods

In our \PKP\Contexts\Collection class, we add a new method to map the object to a schema for the REST API:

<?php
/**
 * @file classes/context/Collection.inc.php
 *
 * Copyright (c) 2014-2020 Simon Fraser University
 * Copyright (c) 2000-2020 John Willinsky
 * Distributed under the GNU GPL v3. For full terms see the file docs/COPYING.
 *
 * @class context
 *
 * @brief A class that represents a collection of contexts (journals, presses or preprint servers)
 */
namespace PKP\Context;

use \APP\Facade\Query;
use \HookRegistry;
use Illuminate\Support\Enumerable;
use Illuminate\Support\LazyCollection;
use \Request;
use \Services;

class Collection extends LazyCollection
{

	/**
	 * Map the contexts in this collection to an assoc array
	 * with all of the properties defined in the schema
	 *
	 * @param array $props List of schema properties to include
	 */
	public function mapToSchema(array $props, Request $request) : Enumerable
	{
		return $this->map(function($context) use ($props, $request) {
			$item = [];

			foreach ($props as $prop) {
				switch ($prop) {
					case '_href':
						$item[$prop] = Query::context()->getUrlApi($context, $request);
						break;
					case 'url':
						$item[$prop] = Query::context()->getUrl($context, $request);
						break;
					default:
						$item[$prop] = $context->getData($prop);
						break;
				}
			}

			$item = Services::get('schema')->addMissingMultilingualValues(SCHEMA_CONTEXT, $item, $context->getSupportedFormLocales());

			HookRegistry::call('Publication::collection::mapToSchema', [&$item, $props, $request]);

			ksort($item);

			return $item;
		});
	}
}

So now our method is discoverable like other collection methods, and all dependencies are declared

collection-map-to-schema

We can map to the schema:

$collection = Query::context()->getMany($collector);
$props = Services::get('schema')->getFullProps(SCHEMA_CONTEXT);
$items = $collection->mapToSchema($props, $request);

And when we need to, we can map only the summary properties in the schema:

$collection = Query::context()->getMany($collector);
$props = Services::get('schema')->getSummaryProps(SCHEMA_CONTEXT);
$items = $collection->mapToSchema($props, $request);

Because it's just a LazyCollection, it's easy for us to use Laravel's methods to write any kind of map that we want for one-off use-cases:

$contexts = Query::context()->getMany($collector);
$items = $contexts->map(function($context) {
	return [
		'id' => $context->getId(),
		'name' => $context->getLocalizedData('name'),
		'publishedSubmissions' => Services::get('submission')->getCount(['contextId' => $context->getId(), 'status' => STATUS_PUBLISHED]),
	];
});

And even to write to a CSV:

$contexts = Query::context()->getMany($collector);

$file = Config::getVar('files', 'files_dir') . '/test.csv';
$fp = fopen($file, 'w');
fputcsv($fp, ['id', 'name', 'urlPath']);

$contexts->each(function($context, $key) use ($fp) {
	fputcsv($fp, [
		$context->getId(),
		$context->getLocalizedData('name'),
		$context->getData('urlPath'),
	]);
});

Things get more complicated when we need to map an object that has many dependent objects, like a Submission. The number of dependencies expands, but the mapToSchema method is unique to each Collection, so all of the dependencies can be required as method arguments.

In addition, we need a mechanism that permits OJS/OMP/OPS to add their own mapping for properties that are specific to the app.

The examples below show how this could work with a Submission. First, the base Collection class:

<?php
/**
 * @file classes/submission/Collection.inc.php
 *
 * Copyright (c) 2014-2020 Simon Fraser University
 * Copyright (c) 2000-2020 John Willinsky
 * Distributed under the GNU GPL v3. For full terms see the file docs/COPYING.
 *
 * @class submission
 *
 * @brief A class that represents a collection of submissions
 */
namespace PKP\Submission;

use \APP\Facade\Query;
use \Exception;
use \HookRegistry;
use Illuminate\Support\Enumerable;
use Illuminate\Support\LazyCollection;
use \Request;
use \Services;

abstract class Collection extends LazyCollection
{

	/**
	 * Map the submissions in this collection to an assoc array
	 * with all of the properties defined in the schema
	 *
	 * This method can not be used to map submissions in more than one
	 * context. If you need to map submissions in more than one context,
	 * segment them into different collections and map each collection.
	 *
	 * @param array $props List of schema properties to include
	 * @param array $authorUserGroups A list of all UserGroups configured in the Context.
	 */
	public function mapToSchema(array $props, Request $request, Context $context, array $authorUserGroups) : Enumerable
	{
		\AppLocale::requireComponents(LOCALE_COMPONENT_APP_SUBMISSION, LOCALE_COMPONENT_PKP_SUBMISSION);

		return $this->map(function($submission) use ($props, $request, $context, $authorUserGroups) {
			$item = [];
			if ($submission->getData('contextId') !== $context->getId()) {
				throw new Exception('Submission ' . $submission->getId() . ' is not assigned to ' . $context->getId() . '.');
			}

			foreach ($props as $prop) {
				switch ($prop) {
					case '_href':
						$item[$prop] = Query::submission()->getUrlApi($submission->getId(), $context->getData('urlPath'), $request);
						break;
					case 'publications':
						$props = Services::get('schema')->getSummaryProps(SCHEMA_PUBLICATION);
						$item[$prop] = $submission->getData('publications')->mapToSchema($props, $submission, $context->getData('urlPath'), $authorUserGroups);
						break;
					case 'reviewAssignments':
						$item[$prop] = $this->getPropertyReviewAssignments($submission);
						break;
					case 'reviewRounds':
						$item[$prop] = $this->getPropertyReviewRounds($submission);
						break;
					case 'stages':
						$item[$prop] = $this->getPropertyStages($submission);
						break;
					case 'statusLabel':
						$item[$prop] = __($submission->getStatusKey());
						break;
					case 'urlAuthorWorkflow':
						$item[$prop] = Query::submission()->getUrlAuthorWorkflow($submission->getId(), $context->getData('urlPath'), $request);
						break;
					case 'urlEditorialWorkflow':
						$item[$prop] = Query::submission()->getUrlEditorialWorkflow($submission->getId(), $context->getData('urlPath'), $request);
					case 'urlWorkflow':
						$item[$prop] = Query::submission()->getWorkflowUrlByUserRoles($submission);
						break;
					default:
						$value = $this->_getAppSchemaProperty($prop, $submission, $request, $context->getData('urlPath'));
						if ($value) {
							$item[$prop] = $value;
						} else {
							$item[$prop] = $submission->getData($prop);
						}
						break;
				}
			}


			$item = Services::get('schema')->addMissingMultilingualValues(SCHEMA_SUBMISSION, $item, $context->getData('supportedSubmissionLocales'));

			HookRegistry::call('Submission::collection::mapToSchema', [&$item, $props, $request, $context, $authorUserGroups]);

			ksort($item);

			return $item;
		});
	}

	/**
	 * Add OJS-specific properties when mapping a submission to the schema
	 */
	abstract function _getAppSchemaProperty(array $prop, Submission $submission, Request $request, Context $context) : mixed;
}

Then the app-specific class which adds a property unique to OJS:

<?php
/**
 * @file classes/submission/Collection.inc.php
 *
 * Copyright (c) 2014-2020 Simon Fraser University
 * Copyright (c) 2000-2020 John Willinsky
 * Distributed under the GNU GPL v3. For full terms see the file docs/COPYING.
 *
 * @class submission
 *
 * @brief A class that represents a collection of submissions
 */
namespace APP\Submission;

use APP\Facade\Query;
use PKP\Context\Context;
use PKP\Submission\Collection as PKPSubmissionCollection;
use \Request;

class Collection extends PKPSubmissionCollection
{

	/**
	 * Add OJS-specific properties when mapping a submission to the schema
	 */
	protected function _getAppSchemaProperty(array $prop, Submission $submission, Request $request, Context $context) : mixed
	{
		switch ($prop) {
			case 'urlPublished':
				return Query::submission()->getUrlPublished($request, $context->getData('urlPath'), $submission->getBestId());
				break;
			default:
				return null;
		}
	}
}

The dependent objects, like publications, are already Collection instances, so we can run their mapping functions and all of the required dependencies are declared:

					case 'publications':
						$props = Services::get('schema')->getSummaryProps(SCHEMA_PUBLICATION);
						$item[$prop] = $submission->getData('publications')->mapToSchema($props, $submission, $context->getData('urlPath'), $authorUserGroups);
						break;

The same patterns are used then to map the Publication object and its dependent objects:

<?php
/**
 * @file classes/publication/Collection.inc.php
 *
 * Copyright (c) 2014-2020 Simon Fraser University
 * Copyright (c) 2000-2020 John Willinsky
 * Distributed under the GNU GPL v3. For full terms see the file docs/COPYING.
 *
 * @class publication
 *
 * @brief A class that represents a collection of publications
 */
namespace PKP\Publication;

use APP\Facade\Query;
use APP\Submission;
use \DAORegistry;
use \HookRegistry;
use PKP\Context\Context;
use Illuminate\Support\Enumerable;
use Illuminate\Support\LazyCollection;
use \Request;
use \Services;

abstract class Collection extends LazyCollection
{

	/**
	 * Map the publications in this collection to an assoc array
	 * with all of the properties defined in the schema
	 *
	 * @param array $props List of schema properties to include
	 * @param array $authorUserGroups All author UserGroups configured in the Context
	 */
	public function mapToSchema(array $props, Request $request, Submission $submission, Context $submissionContext, array $authorUserGroups) : Enumerable
	{
		// Users assigned as reviewers should not receive author details
		$isAnonymized = false;
		if (array_intersect(['authors', 'authorsString', 'authorsStringShort', 'galleys'], $props)) {
			$reviewAssignment = DAORegistry::getDAO('ReviewAssignmentDAO')
				->getLastReviewRoundReviewAssignmentByReviewer(
					$submission->getId(),
					$request->getUser()->getId()
				);
			$isAnonymized = !is_null($reviewAssignment) && $reviewAssignment->getReviewMethod() === SUBMISSION_REVIEW_METHOD_DOUBLEANONYMOUS;
		}

		return $this->map(function($publication) use ($props, $submission, $request, $submissionContext, $authorUserGroups, $isAnonymized) {
			$item = [];

			foreach ($props as $prop) {
				switch ($prop) {
					case '_href':
						$item[$prop] = Query::publication()->getUrlApi($request, $submissionContext->getData('urlPath'), $submission->getId(), $publication->getId());
						break;
					case 'authors':
						if ($isAnonymized) {
							$item[$prop] = [];
						} else {
							$props = Services::get('schema')->getSummaryProps(SCHEMA_AUTHOR);
							$item[$prop] = $publication->getData('authors')->mapToSchema($props, $request, $authorUserGroups);
						}
						break;
					case 'authorsString':
						$item[$prop] = '';
						if (!$isAnonymized) {
							$item[$prop] = $publication->getAuthorString($authorUserGroups);
						}
						break;
					case 'authorsStringShort':
						$item[$prop] = '';
						if (!$isAnonymized) {
							$item[$prop] = $publication->getShortAuthorString();
						}
						break;
					case 'citations':
						$citationDao = DAORegistry::getDAO('CitationDAO'); /* @var $citationDao CitationDAO */
						$item[$prop] = array_map(
							function($citation) {
								return $citation->getCitationWithLinks();
							},
							$citationDao->getByPublicationId($publication->getId())->toArray()
						);
						break;
					case 'fullTitle':
						$item[$prop] = $publication->getFullTitles();
						break;
					default:
						$value = $this->_getAppSchemaProperty($prop, $publication, $submission, $isAnonymized, $submissionContext, $request);
						if ($value) {
							$item[$prop] = $value;
						} else {
							$item[$prop] = $publication->getData($prop);
						}
						break;
				}
			}

			$item = Services::get('schema')->addMissingMultilingualValues(SCHEMA_PUBLICATION, $item, $submissionContext->getData('supportedLocales'));

			HookRegistry::call('Publication::collection::mapToSchema', [&$item, $props, $request, $submission, $submissionContext, $authorUserGroups]);

			ksort($item);

			return $item;
		});
	}

	/**
	 * Implement this function in a child class to add properties specific to
	 * an application, such as Galleys in OJS.
	 *
	 * This function should return `null` for props that are not specific to
	 * the application.
	 */
	abstract function _getAppSchemaProperty(array $prop, Publication $publication, Submission $submission, bool $isAnonymized, Request $request, string $contextUrlPath) : mixed;
}

Concerns

a. For this to work, the DAO method must convert the QueryBuilder's result Collection into a LazyCollection like this:

$rows = $query
	->getQueryBuilder()
	->select(['c.*'])
	->get();

return Collection::make($rows->map([static::class, 'fromRow']));

Will this be a performance problem on large result sets? Is there a way to use Generators here to prevent fromRow from getting called until it's used in Collection?

b. Are there any concerns with extending LazyCollection? Any chance that this will cause confusion for Laravel users or is this kind of extension of Laravel's core classes a normal thing?

@NateWr NateWr added the Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day. label Mar 17, 2021
@NateWr
Copy link
Member Author

NateWr commented Mar 17, 2021

Is there a way to use Generators here to prevent fromRow from getting called until it's used in Collection?

I think maybe this is how it could be done?

	/**
	 * Get a collection of contexts matching the configured query
	 */
	public static function getMany(Collector $query) : Collection
	{
		$rows = $query
			->getQueryBuilder()
			->select(['c.*'])
			->get();

		return Collection::make(function() use ($rows) {
			$rows->each(function($row) {
				yield static::fromRow($row);
			});
		});
	}

@asmecher
Copy link
Member

asmecher commented Apr 9, 2021

@NateWr, with apologies for the delay in getting to this -- it looks very good and I'm happy to be adopting LazyCollections.

One thought, though: implementing mappings directly in the collection class is going to lead to a double-standard where there are "first class" mappings (e.g. to/from the API) implemented in the collection class, and the rest, e.g. mappings added by import/export plugins.

Our current method of getting around that is to implement each mapping as a separate class -- as in the filter framework for the native import/export plugin. There are two problems with that approach:

  • The filter framework is inherently clumsy because of its database backing (this isn't an inherent problem in the approach, just in the current implementation)
  • This really multiplies the number of classes involved and it can be painful to maintain.

What do you think about an injection type of approach? Something like

$publicationCollection->addMapping(MAP_TO_SCHEMA, function($publication) {
  ...
});

Then mappings can be added by the application (e.g. mapping to/from the schema) or post-hoc (e.g. by a plugin) or even augmented by a plugin through a new (overriding) function that passes through to the old one internally.

Several of these mapping additions can be handled within the same chunk of code; no need to break them out to their own class, though it's still possible.

@NateWr
Copy link
Member Author

NateWr commented Apr 13, 2021

I can see the benefit of this for plugins to extend a particular mapping, but I'm worried about what we lose with the added layer of abstraction. A couple of concerns:

  1. With $collection->addMapping(TYPE, $callback), how would we define the dependencies that the map needs? This is a broader problem, in that the dependencies change based on which properties we are mapping. My approach above, to pass in the $props, won't be great for this either.
  2. When would a plugin be able to add a mapping? The plugin won't intervene every time a collection is instantiated, so a plugin would probably need some other provider that could register the mappings and make sure the right mapping extensions are called.

My initial instinct was to build a special mapping base class to support extensions through callbacks like this. But once I had done that and looked at making use of it, I realized that we don't really reuse maps very often. We map to JSON for our REST API, and we map to CSV once or twice. But a single mapping tool can't really handle both, because the JSON maps to multilingual object but the CSV collapses to a single language. Mappings seem to be kind of one-off things.

The only place we really have code overlap is between summary/full mappings of the schema and when mapping dependent objects requires fetching data. Putting this code into a Collection class would allow us to quickly pull duplicated code into protected helper functions like this:

namespace PKP\Submission;

class Collection extends LazyCollection
{

	public function mapToSchemaSummary(Request $request, Context $submissionContext) : Enumerable
	{
		return $this->map(function($submission) use ($request, $submissionContext) {
			return $this->_mapItemToSchemaSummary($request, $submissionContext);
		});
	}

	// Note: $authorUserGroups only needed for full schema mapping
	public function mapToSchema(Request $request, Context $submissionContext, array $authorUserGroups) : Enumerable
	{
		return $this->map(function($submission) use ($request, $submissionContext) {
			$item = $this->_mapItemToSchemaSummary($request, $submissionContext);

			$props = Services::get('schema')->getFullProps(SCHEMA_SUBMISSION);
			foreach ($props as $prop) {
				switch($prop) {
					case 'stageAssignments':
						$props[$prop] = [];
						$assignments = $this->_getStageAssignments($submission);
						foreach ($assignments as $assignment) {
							$props[$prop][] = [
								'id' => $assignment->getId(),
								'userId' => $assignment->getUserId(),
							];
						}
						break;

					...
				}
			}

			return $item;
		});
	}

	public function mapToCSV() : Enumerable
	{
		return $this-map(function($submission) {
			$assignments = $this->_getStageAssignments($submission);
			$row = [
				'id' => $submission->getId(),
			];
			$row['assignments'] = join(',', array_map(function($assignment) {
				return $assignment->getUserId();
			}, $assignments));
		});
	}

	protected function _mapItemToSchemaSummary(Request $request, Context $submissionContext) : Array
	{
		$item = [];
		$props = Services::get('schema')->getSummaryProps(SCHEMA_SUBMISSION);
		foreach ($props as $prop) {
			switch($prop) {
				case '_href':
					$props[$prop] = $request->url(...);
					break;
				...
			}
		}

		return $item;
	}

	protected function _getStageAssignments(Submission $submission) : Array
	{
		return DAORegistry::getDAO('StageAssignmentsDAO')->getBySubmissionId($submission->getId());
	}
}

This makes the mappings very simple, but we can share code however we need as well as optimize it when necessary. However, it doesn't yet have a good way for plugins to extend the mappings.

Ideally, the schema system should do this for us. Plugins can extend the schema to add properties to objects. This will cover the majority of cases. However, it presumes that a plugin is extending the schema to add a property that will be saved in the database. If someone wanted to, for example, attach data to the submission API endpoint that wasn't stored in the submission_settings table, that would be more complicated.

The challenge, which I don't have a good solution to yet, is how to let plugins hook in here. We can use a hook for each iteration, which would make it easy for plugins to attach something to it. But we need to add the hook to every map and it only permits the plugin to hook in during the loop. If a plugin is fetching data that isn't part of each object, that's probably not the most performant way to get that data. Ideally, the plugin would hook in at the collection level, fetching data for all of the collection's objects at once, rather than the per-item approach in the code above.

That makes me think that we probably want these hooks outside of the mapping itself. It may not be great for a very large collection, but it would be best for our most common use cases -- extending the API. If we are mapping large sets of objects, we probably want to split the collection into chunks anyway. And if I'm a plugin extending a mapping, I may want to extend a mapping in one instance (eg - export to CSV) and not another (eg - REST API response).

Not sure if all of that is clear at all, but that's where my thinking has gone as I've tried to work out the mapping stuff. To summarize, I think the key things I've come to feel are;

  1. There is not as much re-usable code in the mappings as I thought. It may not be worth building much abstraction here if we can get to the same thing with simple methods.
  2. Because dependencies change depending on what properties are being mapped, it's hard to define dependencies in an extensible map. It may make sense to build extensibility where the collection is mapped (eg - the REST API handler, the user export to CSV request handler, or in the future the import/export code).

@asmecher
Copy link
Member

...how would we define the dependencies that the map needs?

I'm not sure the best way, but we introduced a Deployment class for the native import/export plugin which describes overarching characteristics of the import/export process being performed -- for example the configuration, what items had already been imported (so they could be destroyed in case of failure), etc. That approach might be useful in defining the characteristics of the desired mapping, i.e. is it a deep or shallow mapping.

When would a plugin be able to add a mapping? The plugin won't intervene every time a collection is instantiated...

Sorry, this was confused by my example code above -- we'd probably need to introduce a collection factory in order to accomplish this. Then, a plugin (and the core code would do it the same way) could, for example, introduce a mapping from submission to JATS document...

class JatsMappingPlugin extends Plugin {
    const MAP_TO_JATS = 'jats';

    function register(...) {
        $submissionCollectionFactory = CollectionFactory::get('submission');

        $submissionCollectionFactory->addMapping(self::MAP_TO_JATS, function(Submission $submission, $existingMappingFunction) {
            assert($existingMappingFunction === null); // This should be the first-loaded submission to JATS mapping

            $dom = new DOMDocument();
            $dom->appendChild(...); // Populate a DOM to represent the submission
           return $dom;
        });
    }
}

Within the same class, we can also define mappings from $publication to JATS and register them the same way. Then the submission to JATS mapping could call on the publication to JATS mapping to decompose the operation in a way that can be both used internally and externally.

(The details of the factory etc. are very much up in the air -- just an example.)

When applying a mapping we would use a combination of a constant describing the target mapping (MAP_TO_JATS) and the type hinting on the transformation function's first parameter (Submission) to identify which function to call. When adding a mapping to an existing mapping (e.g. a second call to addMapping with MAP_TO_JATS), the new mapping gets passed the old mapping so that it can call it.

For example, a second plugin that needed to augment the mapping (and is registered after the first using the plugin loading priority tools) can similarly:

class AugmentExistingJatsMappingPlugin extends Plugin {
    function register(...) {
        $submissionCollectionFactory = CollectionFactory::get('submission');

        $submissionCollectionFactory->addMapping(JatsMappingPlugin::MAP_TO_JATS, function(Submission $submission, $existingMappingFunction) {
            // The $existingMappingFunction should now point to the function implemented in the first function. Use it.
            $dom = $existingMappingFunction($submission);

            $dom->appendChild(...); // Augment the existing DOM with whatever enrichment this plugin wants to add

           return $dom;
        });
    }
}

Some use cases for this:

  • Augmentation of existing mappings could be useful for:
    • Mappings defined in pkp-lib that need augmentation in the applications
    • Plugins that augment other mappings, e.g. a Coalition Publica augmentation to JATS, or a Driver augmentation to OAI
    • Plugins that augment the schema
  • Plugins that introduce new mappings (as a replacement for the filter framework, for example)
    • OAI formats
    • Import/export formats

@NateWr
Copy link
Member Author

NateWr commented Apr 14, 2021

Ok, I can see where you're going with this. It's similar to how a request handler passes the request/response through middleware. My two main concerns are performance and abstraction. I think the use-cases I've been focused on are for the REST API, where we really need to be able to optimize the queries involved in mapping a collection to a response. But in the use-cases you raise (JATS, OAI, import/export), performance is not as important as making the output as comprehensive as possible.

On the abstraction side, the big appeal of methods attached to a LazyCollection is that there's very little mental overhead. We can be as simple as we want for simple objects (announcements) and grow in complexity with complex ones (submissions). But I think you've made the case that -- at least for submissions -- we're going to need robust support for plugin extensibility and a lot of different maps (I'd add Crossref/Datacite and CSV to the use-cases you listed).

Let me play around with this a bit more. At the moment I'm leaning towards having a class for each map type (JSON, CSV, JATS), but let me see how that works out.

@NateWr
Copy link
Member Author

NateWr commented Apr 15, 2021

@asmecher I think I've got a good approach now based on your last post.

A global facade/factory is used to instantiate maps (see here).

$map = Map::announcementToSchema();
$items = $map->map($announcements, $context, $request);

A single map object can be extended for one-off use-cases:

$map = Map::announcementToSchema();
$map->extend(function(array $output, Announcement $item, PKP\Announcement\Maps\Schema $map) {
    $output['example'] = 'Example';
    return $output;
  })
  ->map($announcements, $context, $request);

Or a plugin can register an extension on the Map facade/factory:

Map::extend(PKP\Announcement\Maps\Schema::class, function function(array $output, Announcement $item, PKP\Announcement\Maps\Schema $map) {
  $output['example'] = 'Example';
  return $output;
});

// Extension will be applied to every map instantiated through the facade/factory
$map = Map::announcementToSchema();

Each mapping is its own class. To test out this approach, I wrote examples to map announcements to the Schema and OAI (I know announcements aren't in OAI, just an example to check if XML construction would work).

One potential downside is, because we rely on the map facade to apply extensions, a coder could accidentally use the map directly and miss out on the extensions. We'll have to be careful to reinforce usage of the facade to instantiate a map.

Does this look like a good way forward? Should we call this a Factory instead of a Facade? I'll be honest I don't have a really clear idea of what a Facade is, other than a class that provides a simple API to access something.

@asmecher
Copy link
Member

Looking good, @NateWr -- will this require us to implement a class for each combination of mapping and entity to be mapped? If so, what do you think about setting up mappings programatically via lambda functions rather than implementing them in classes?

That way a coder can choose to implement an entire (probably simple) set of mappings in a single class -- think e.g. of something that generates ATOM entries for a news feed -- rather than requiring an entire class for each. Or they can use a class if they like, setting it up using member functions.

Not wedded at all to this approach -- just trying to think of ways to avoid a huge set of classes to implement a relatively simple transformation (the native import/export plugin is a good example of a transformation being hard to maintain with so many classes).

I'm also wondering if it's worth reducing the reliance on static calls (Map::) with functions named for the transformation they implement (announcementToSchema); unless we use function call magic (__call) it would be hard to keep these extensible. (Unless I'm misunderstanding the approach you're taking -- I don't see announcementToSchema implemented on the Map class, so maybe you are intending to use magic calls?)

@NateWr
Copy link
Member Author

NateWr commented Apr 19, 2021

will this require us to implement a class for each combination of mapping and entity to be mapped?

Yeah, it will.

If so, what do you think about setting up mappings programatically via lambda functions rather than implementing them in classes?

How would we define the relationship between a map and its dependencies with lambda functions? The main benefit of defining classes and methods like this is that when using them, an IDE can identify what mapping methods are available and what dependencies are expected, with useful type hinting like this:

autocomplete-maps

And because the map is a class, we can pass it back to any plugin which is extending the map, so they have access to the dependencies too. (For example, this is required to get the root XML document when extending an XML map).

If there's a way to do all this with the lambda functions, I'd be happy to explore that.

just trying to think of ways to avoid a huge set of classes to implement a relatively simple transformation

I think this is kind of The Way™ with modern PHP these days. It is jarring at first, but with namespacing and autoload, the classes become a browse-able tree structure that makes the approach more manageable.

You're right, though, that there will be a lot of small entities with very simple maps. I think we will find our way to a base map class, something like the SchemaDAO, which handles the basic mappings on its own.

I'm also wondering if it's worth reducing the reliance on static calls (Map::) with functions named for the transformation they implement (announcementToSchema)

This mimics the approach I've taken with the Query/Command facades. These facades abstract away the ojs/pkp-lib differences, so we can make one call, Map::submissionToSchema(), and get the appropriate class regardless of the application. We won't have use PKP\Submissions\Query; statements strewn through the codebase that need to be updated to APP\Submissions\Query whenever the class diverges between apps.

The other thing they do is provide some useful discovery in an IDE for new developers. I can discover what maps are available to me by exploring the facade:

autocomplete-all-maps

I'm definitely open to alternatives here. We could shift the factory down into subclasses, so it is like SubmissionMap::toSchema(). Or Map::submission()->toSchema(). But I think we'll just be multiplying the classes here without much benefit. The facade itself is pretty clean and so I'd hope it would be pretty easy to maintain.

(Unless I'm misunderstanding the approach you're taking -- I don't see announcementToSchema implemented on the Map class, so maybe you are intending to use magic calls?)

No, so a method like Map::announcementToSchema() only returns the class which can perform this mapping. That class can itself have multiple mappings. For example, each schema map is likely to have a summary/full map:

$summaries = Map::announcementToSchema()->summary($collection, ...);
$fullRecords = Map::announcementToSchema()->map($collection, ...);

The global Map factory/facade is just a convenient way to instantiate a map for the correct application with the correct extensions.

@jonasraoni
Copy link
Contributor

I didn't read this thoroughly yet, so I just wanted to leave the link of a library that I use in C#, perhaps there's something useful in the documentation if you're still looking for ideas: https://docs.automapper.org

@NateWr
Copy link
Member Author

NateWr commented Apr 21, 2021

Thanks @jonasraoni. It looks like there is a PHP library, automapper-plus, based on the C# library.

At first glance, my main hesitation would be stepping away from the JSON schema to define entities. The big benefit we get from this is solid documentation of the REST API out-of-the-box. It may be possible to generate this from the Dto classes instead, but I'm not immediately sure how that would work.

The other thing I didn't see in the docs is how to map to something that isn't clearly represented by a PHP object. With JSON it's pretty easy to run json_encode() on a flat PHP object. But what about mapping to XML or CSV? Do you know how that would work?

@asmecher
Copy link
Member

I think we can borrow some of the semantics of the automapper-plus library without necessarily using it directly -- the conventions for configuring mappings are essentially a better-fleshed-out version of what Nate and I were talking about above, with less dependency on static functions (which are hard to mock/test/extend). We won't necessarily have PHP objects with data represented by properties -- I suppose we could use e.g. __get magic functions to make things transparent but if we're working towards using Laravel conventions I'd want to make sure this would be a compatible approach.

@NateWr
Copy link
Member Author

NateWr commented Apr 27, 2021

the conventions for configuring mappings are essentially a better-fleshed-out version of what Nate and I were talking about above, with less dependency on static functions (which are hard to mock/test/extend)

I can't see how to use this convention while still being able to identify dependencies for the map. If we are able to do something like:

$mapper->map($submission, SubmissionSchemaDto::class);

Where would we identify that the mapper requires the Request object to map the URLs and the context's UserGroups to render the author string?

Another issue is that automapper works seems to put the bulk of the "work" into the config object itself:

$config
    ->registerMapping(Employee::class, EmployeeDto::class)
    ->forMember('age', function (Employee $source) {
        return date('Y') - $source->getBirthYear();
    })

The forMember method is where all of the work is. So the Dto classes don't seem to contain much at all except a list of properties. But where would this configuration code live? If it goes into a factory class, we're going to have all of our mapping code live separately from the objects they are mapping.

And it's usually in this part of the map where we will want to be able to reduce code re-use and things. Putting it into anonymous functions may make that difficult.

We won't necessarily have PHP objects with data represented by properties -- I suppose we could use e.g. __get magic functions to make things transparent but if we're working towards using Laravel conventions I'd want to make sure this would be a compatible approach.

I think this is kind of the impasse we're at. Right now we are pretty committed to using the JSON schema files to define our entities, validate them, document them, and read/write to the database. But most PHP frameworks, including Laravel, expect the properties to be defined on the entity classes.

We can try to move in that direction, but we'll need to figure out how we allow plugins to extend entity properties, how we define the validation requirements, and how we document the entity properties. The JSON schema files give us out-of-the-box documentation for our REST API, and the documentation that's generated includes properties added by plugins. This doubles as documentation for the entities in our system. But I think that any third-party library or convention for mapping is not going to expect this.

@NateWr
Copy link
Member Author

NateWr commented Aug 10, 2021

This pattern has been adopted and merged for some entities, with other entity conversions filed and in progress. Closing.

@NateWr NateWr closed this as completed Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.
Projects
None yet
Development

No branches or pull requests

3 participants