Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby-source-wordpress): add normalizers option to modify normalizers #18079

Conversation

paweljedrzejczyk
Copy link
Contributor

@paweljedrzejczyk paweljedrzejczyk commented Oct 3, 2019

Description

Adding normalizers option that allows to add/remove normalizers

Example use

This allows for adding the normalizer for e.g. filtering out media library items from being processed/downloaded when they don't meet the criteria - e.g. are not attached to any page.

const dropUnusedMediaNormalizer = {
  name: "dropUnusedMediaNormalizer",
  normalizer: function({ entities }) {
    return entities.filter(e => {
      if (e.__type === "wordpress__wp_media" && !e.post) {
        return false
      }
      return true
    })
  },
}
module.exports = {
  plugins: [
    {
      resolve: "gatsby-source-wordpress",
      options: {
        // ...
        normalizers: normalizers => [dropUnusedMediaNormalizer, ...normalizers],
      },
    },
  ],
}

@paweljedrzejczyk paweljedrzejczyk requested review from a team as code owners October 3, 2019 14:37
@wardpeet
Copy link
Contributor

wardpeet commented Oct 9, 2019

Can you explain your use case for this? Is it just to speed up things?

@paweljedrzejczyk
Copy link
Contributor Author

@wardpeet We have a wordpress site with large media library due to many custom post types and at the same time the same wordpress instance is used to generate site with gatsby. We don't need any of the media files that are attached to custom post types in gatsby, just those attached to pages in ACF.

It greatly speeds up the build time, since for our case it don't need to download 2000 extra images every time.

default normalizer option that could be used to do it is triggered after all media is downloaded, this is the same normalizer function but trigerred just after fetching the entities before any gatsby processing.

@wardpeet
Copy link
Contributor

I don't mind this change as I think it's an easy way to filter out entities before they are being fetched. I would suggest naming the option: filterEntities or something similar, normalizer doesn't seem right to me.

let me ping our wordpress folks @jasonbahl @TylerBarnes

@wardpeet wardpeet added the status: awaiting reviewer response A pull request that is currently awaiting a reviewer's response label Oct 14, 2019
@wardpeet wardpeet changed the title [gatsby-source-wordpress] add after fetch normalizer feat(gatsby-source-wordpress): add after fetch normalizer Oct 14, 2019
@TylerBarnes
Copy link
Contributor

This is something I've wanted in this plugin before, it's definitely a useful feature. Thanks @paweljedrzejczyk !
I think calling it a normalizer still makes sense since it runs before all the other normalizers and could be used either to filter or normalize entities.
The naming is a bit clunky but I can't think of anything better. Maybe initialNormalizer? Not sure.

@pieh I believe you've worked on this plugin the most, it would be good to hear your thoughts.

@pieh
Copy link
Contributor

pieh commented Oct 15, 2019

This sets a bit of dangerous precedence of adding different config field that will run similar function as one that already exists (normalizer) just before all the builtin normalizer functions.

I don't oppose adding some way to do this, because there are definitely use cases for it as described in #18079 (comment) .

But from API design perspective this is troublesome, because what if someone will have use case to add something similar in the middle of normalizers chain we have in the plugin (

// Normalize data & create nodes
// Create fake wordpressId form element who done have any in the database
entities = normalize.generateFakeWordpressId(entities)
// Remove ACF key if it's not an object, combine ACF Options
entities = normalize.normalizeACF(entities)
// Combine ACF Option Data entities into one but split by IDs + options
entities = normalize.combineACF(entities)
// Creates entities from object collections of entities
entities = normalize.normalizeEntities(entities)
// Standardizes ids & cleans keys
entities = normalize.standardizeKeys(entities)
// Converts to use only GMT dates
entities = normalize.standardizeDates(entities)
// Lifts all "rendered" fields to top-level.
entities = normalize.liftRenderedField(entities)
// Exclude entities of unknown shape
entities = normalize.excludeUnknownEntities(entities)
// Creates Gatsby IDs for each entity
entities = normalize.createGatsbyIds(createNodeId, entities, _siteURL)
// Creates links between authors and user entities
entities = normalize.mapAuthorsToUsers(entities)
// Creates links between posts and tags/categories.
entities = normalize.mapPostsToTagsCategories(entities)
// Creates links between tags/categories and taxonomies.
entities = normalize.mapTagsCategoriesToTaxonomies(entities)
// Normalize menu items
entities = normalize.normalizeMenuItems(entities)
// Creates links from entities to media nodes
entities = normalize.mapEntitiesToMedia(entities)
// Downloads media files and removes "sizes" data as useless in Gatsby context.
entities = await normalize.downloadMediaFiles({
entities,
store,
cache,
createNode,
createNodeId,
touchNode,
getNode,
_auth,
reporter,
keepMediaSizes,
})
// Creates links between elements and parent element.
entities = normalize.mapElementsToParent(entities)
// Search and replace Content Urls
entities = normalize.searchReplaceContentUrls({
entities,
searchAndReplaceContentUrls,
})
entities = normalize.mapPolylangTranslations(entities)
entities = normalize.createUrlPathsFromLinks(entities)
// apply custom normalizer
if (typeof _normalizer === `function`) {
entities = _normalizer({
entities,
store,
cache,
createNode,
createNodeId,
touchNode,
getNode,
typePrefix,
refactoredEntityTypes,
baseUrl,
protocol,
_siteURL,
hostingWPCOM,
useACF,
acfOptionPageIds,
auth,
verboseOutput,
perPage,
searchAndReplaceContentUrls,
concurrentRequests,
excludedRoutes,
keepMediaSizes,
})
}
) to workaround something else?

I feel like we should have a way to declare order where we want to insert the normalizer - maybe similar to priority you can declare in wordpress action/filter setup (?). This is something I don't like very much honestly (using arbitrary priority numbers) in the user code, so maybe something that could work like this:

// get internal normalizers identifier so you could order normalizer function execution relative to those
const { NormalizerSteps } = require(`gatsby-source-wordpress`)

module.exports = {
  plugins: [
    {
      resolve: `gatsby-source-wordpress`,
      options: {
        // [...] regular options
        // overload normalizer option (make sure current usage continue to work)
        normalizer: [
          {
            // declare that this should run before this step
            // for completeness, probably should support both before and after
            before: NormalizerSteps.mapEntitiesToMedia,
			cb: entities => {
              // do something with entities
              return entities
            }
          }
        ]
      }
  ]
}

This is just an idea (probably not very good one). This is open for discussion

@TylerBarnes
Copy link
Contributor

TylerBarnes commented Oct 15, 2019

@pieh I like that idea. Are you thinking about priority in case sub-plugins also use the option and there are multiple normalizers on the same step?
For now it seems like it might be unnecessary but the rest holds up to me.

It's just semantics but I like the idea of adding a new option normalizers instead of using the existing one:

module.exports = {
    plugins: [
        {
            resolve: `gatsby-source-wordpress`,
            options: {
                // [...] regular options
                normalizers: [
                    {
                        // declare that this should run before this step
                        before: NormalizerSteps.mapEntitiesToMedia,
                        normalizer: entities => {
                            // do something with entities
                            return entities
                        }
                    },
                    {
                        // declare that this should run after this step
                        after: NormalizerSteps.mapEntitiesToMedia,
                        normalizer: entities => {
                            // do something with entities
                            return entities
                        }
                    }
                ]
            }
        }
    ]
}

Edit: We could maybe even just have the normalizer steps return a number and sort all normalizers by that field before running them one by one.

module.exports = {
    plugins: [
        {
            resolve: `gatsby-source-wordpress`,
            options: {
                // [...] regular options
                normalizers: [
                    {
                        step: NormalizerSteps.mapEntitiesToMedia // 100,
                        normalizer: entities => {
                            // do something with entities
                            return entities
                        }
                    },
                    {
                        step: NormalizerSteps.mapEntitiesToMedia - 1 // 99,
                        normalizer: entities => {
                            // do something with entities
                            return entities
                        }
                    }
                ]
            }
        }
    ]
}

@pieh
Copy link
Contributor

pieh commented Oct 16, 2019

Are you thinking about priority in case sub-plugins also use the option and there are multiple normalizers on the same step?

Yeah, that would be tricky with what I proposed. But this is generally difficult because just the nature of sub plugins should be that they shouldn't need to know or care about other subplugins. If they would need to handle races/ordering between other subplugins, it would break subplugin separation contract. We kind of see this problem with some of remark subplugins, where order of declaration in subplugin list matters in some cases, so this is defenitely something to think about, but at this point I don't have any good solutions.

It's just semantics but I like the idea of adding a new option normalizers instead of using the existing one:

Yeah, agree here. Implementation wise, we would probably convert current normalizer plugin option to use normalizers construct internally, so all normalizers use same code path in the end.

Edit: We could maybe even just have the normalizer steps return a number and sort all normalizers by that field before running them one by one.

This seems dangerous for user code. We would assign arbitrary numbers there, and users might end up using literal numbers instead of importing them. Which means that if we needed to shift normalizer "priority" (i.e. if we would have normalizer "A" (priority 10) and normalizer "B" (priority 20), add later 10 more normalizers in between we could run out of integer number in range we gave ourselves - should we use floating point priorities there?). I'm pretty surprised that this technique works in wordpress (from things I know about where they use this convention is priority for add_action/add_filter, admin menu items positions) - maybe I just never hit edge cases with that, when some plugins would clash for same priority causing problems in the end.

@TylerBarnes
Copy link
Contributor

In WordPress if you add two filters with the same priority, they run in the order they were added. If we were to compare this directly to WordPress though, each normalizer here would probably have it's own filter name which minimizes the problem of adding additional internal normalizers later. In that comparison it would be a mix of the two ideas:

const { NormalizerSteps } = require(`gatsby-source-wordpress`)

module.exports = {
    plugins: [
        {
            resolve: `gatsby-source-wordpress`,
            options: {
                // [...] regular options
                normalizers: [
                    {
                        step: NormalizerSteps.A,
                        priority: 11,
                       // this would run after A
                        normalizer: entities => {
                            // do something with entities
                            return entities
                        }
                    },
                    {
                        step: NormalizerSteps.B,
                        priority: -1,
                       // this would run before B
                        normalizer: entities => {
                            // do something with entities
                            return entities
                        }
                    }
                ]
            }
        }
    ]
}

Then if we needed to add 3 more normalizers between normalizer A and B, everything would still work since the internal normalizer order would be based on name, not number and the order of external normalizers would be relative to the name, but offset by number.

I guess the main difference is that with after/before you have 2 priorities (where all external normalizers with the same before or after priority run in whichever order they were added) and if you use integer priorities, you can change the order of external normalizers that are using the same internal step (which would play nice with sub-plugins).

I'm also not sure if this is a good way to do it but it feels like it's not far off at least.

Alternative idea

We could also import all internal normalizers to determine which order the external normalizer should come in. I think that falls over in the case of sub-plugins though.

const { Normalizers } = require(`gatsby-source-wordpress`)

module.exports = {
    plugins: [
        {
            resolve: `gatsby-source-wordpress`,
            options: {
                // [...] regular options
                normalizers: [
                    {
                        normalizer: entities => {
                            return entities
                        }
                    },
                    Normalizers,
                    {
                        normalizer: entities => {
                            return entities
                        }
                    }
                ]
            }
        }
    ]
}

Normalizer could just be an object of normalizers, so you could change the order by doing something like this:

const { Normalizers } = require(`gatsby-source-wordpress`)

const { mapEntitiesToMedia, ...normalizers } = Normalizers

module.exports = {
    plugins: [
        {
            resolve: `gatsby-source-wordpress`,
            options: {
                // [...] regular options
                normalizers: [
                    {
                        normalizer: entities => {
                            return entities
                        }
                    },
                    mapEntitiesToMedia,
                    {
                        normalizer: entities => {
                            return entities
                        }
                    },
                    normalizers
                ]
            }
        }
    ]
}

If no internal normalizers were added to the plugin option, we could just prepend them to the array.

@muescha
Copy link
Contributor

muescha commented Oct 17, 2019

you mean Normalizers is already in the right order?

how about an array with an name so it can be manipulated by finding the index of an normalizer in the array and then insert there?

Normalizers = {
		first: ...,
        before: ...,
		after: ...,
		remove: ...,
		replace: ..., // before and then remove
		final: ..., // before the last one (eq createNodesFromEntities)
		values: [
			{
				name:'mapTagsCategoriesToTaxonomies',
 				normalizer:normalize.mapTagsCategoriesToTaxonomies 
    		},
			{
				name:'searchReplaceContentUrls',
 				normalizer:normalize.searchReplaceContentUrls 
    		},
			{
				// even this one if a subplugin will overwrite it
				name:'createNodesFromEntities',  
 				normalizer:normalize.createNodesFromEntities 
    		}
		]
	}

and use the helpers:

const { Normalizers } = require(`gatsby-source-wordpress`)

Normalizers.before('searchReplaceContentUrls', {
						name: 'myFirstNormalizer'
                        normalizer: entities => {
                            return entities
                        }
                    })

module.exports = {
    plugins: [
        {
            resolve: `gatsby-source-wordpress`,
            options: {
                // [...] regular options
                normalizers: Normalizers.values

@TylerBarnes
Copy link
Contributor

@muescha , yeah they would already be in the right order. I like the idea of just passing an array of normalizers. In the spirit of keeping the API as simple as possible maybe it should take a function that's expected to return an array, so similar to what you have @muescha , but just using native JS Array methods to add your own normalizers. For ex:

module.exports = {
    plugins: [
        {
            resolve: `gatsby-source-wordpress`,
            options: {
                // [...] regular options
                normalizers: normalizers => {
                    // modify normalizers array as needed, splice, push, etc.
                    return normalizers;
                }
            }
        }
    ]
}

The nice thing about that is it's probably the simplest way we could implement this and it will also play nicely with sub-plugins. Each normalizer in the array would be an object with name and normalizer properties like you mentioned @muescha for easy array manipulation.

When we loop over them and run the normalizers on the gatsby-source-wordpress side of things, the loop can just check if each is a function or if it has a normalizer property that's a function, then either run the function or run the nested function in the normalizer property.

The downside is that someone can remove any normalizers, but that's also an upside and makes gatsby-source-wordpress much more flexible.

@muescha
Copy link
Contributor

muescha commented Oct 17, 2019

sounds good as plain Array

then maybe an helper as second parameter:

                normalizers: (normalizers, helper) => {

@paweljedrzejczyk
Copy link
Contributor Author

paweljedrzejczyk commented Nov 2, 2019

@muescha @TylerBarnes @pieh @wardpeet

I rewritten the PR with the ideas discussed above (Added a normalizers option to add/remove normalizers that are stored in an Array.) Let me know what you think.

Let me know if I should update the PR title/description.

@muescha
Copy link
Contributor

muescha commented Nov 3, 2019

👍 Looks good to me.

Hard to distinct the code between normalizer and normalizers with additional s

I would pass always all props or helpers to to subfunction - and destruct there the props. But that's only my personal taste: "always equal"

@muescha
Copy link
Contributor

muescha commented Nov 3, 2019

You are assign the values to an underscore value - but later You not use always the underscore values. Is this redundant?

@paweljedrzejczyk
Copy link
Contributor Author

You are assign the values to an underscore value - but later You not use always the underscore values. Is this redundant?

I see that it mixed up in the whole gatsby-node.js file. I might be missing something but I don't see a reason for plugin accepting foo option and then assigning that to let _foo = foo;. So all of that code could be cleaned up with just passing the values as is instead of assigning those to underscore values first.

I wanted to change as little things as possible. Currently the downloadMediaFiles normalizer accepts _auth, but custom normalizer that we can currently add with normalizer option is triggered with auth not _auth.

@paweljedrzejczyk
Copy link
Contributor Author

Hard to distinct the code between normalizer and normalizers with additional s

I am looking for a better name for it. Maybe normalizersCustomizer

@TylerBarnes
Copy link
Contributor

Hard to distinct the code between normalizer and normalizers with additional s

I am looking for a better name for it. Maybe normalizersCustomizer

I think it should stay as normalizers and normalizer should be eventually deprecated as it's the same feature but less useful. I'd say we remove normalizer from the docs and show how you can do the same with normalizers

@paweljedrzejczyk paweljedrzejczyk changed the title feat(gatsby-source-wordpress): add after fetch normalizer feat(gatsby-source-wordpress): add normalizers option to modify normalizers Nov 12, 2019
@paweljedrzejczyk
Copy link
Contributor Author

@TylerBarnes @muescha @pieh @wardpeet This PR has been sitting for a while. Do you have an idea what is required to get this into completion before can be merged? Would love to give this a final touch.

Copy link
Contributor

@TylerBarnes TylerBarnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good and works well from my local testing. I just added a typo fix and a question

packages/gatsby-source-wordpress/README.md Outdated Show resolved Hide resolved
packages/gatsby-source-wordpress/src/gatsby-node.js Outdated Show resolved Hide resolved
@TylerBarnes
Copy link
Contributor

@paweljedrzejczyk thanks for doing this, it looks good and works well from my local testing. @pieh curious if you have any more thoughts on this?
The PR adds this new option but also replaces normalizer with normalizers in the docs/example. I'm not sure what best practises are around new features that replace old ones and documentation. Should we keep both options and document both, or deprecate normalizer in favour of normalizers?

@TylerBarnes TylerBarnes added the topic: source-wordpress Related to Gatsby's integration with WordPress label Dec 2, 2019
pieh
pieh previously requested changes Dec 6, 2019
Copy link
Contributor

@pieh pieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! Left some comments, suggestions and questions ;)

packages/gatsby-source-wordpress/README.md Outdated Show resolved Hide resolved
packages/gatsby-source-wordpress/src/gatsby-node.js Outdated Show resolved Hide resolved
packages/gatsby-source-wordpress/src/gatsby-node.js Outdated Show resolved Hide resolved
packages/gatsby-source-wordpress/README.md Show resolved Hide resolved
packages/gatsby-source-wordpress/src/gatsby-node.js Outdated Show resolved Hide resolved
@TylerBarnes TylerBarnes removed the status: awaiting reviewer response A pull request that is currently awaiting a reviewer's response label Dec 11, 2019
@paweljedrzejczyk
Copy link
Contributor Author

@pieh @muescha @TylerBarnes fixed issues after code review, let me know if there is anything more to fix

@TylerBarnes
Copy link
Contributor

Hey @paweljedrzejczyk , apologies for the delayed response! The holidays delayed things a bit. Thanks again for this awesome PR and for making all the requested changes. I requested/asked-about one minor additional change. I think after that's resolved this is ready to be merged!

@TylerBarnes TylerBarnes requested a review from pieh January 9, 2020 18:38
@TylerBarnes
Copy link
Contributor

@pieh the requested changes were made and I've tested & approved. Looks like GH wants you to approve as well before we can merge.

@pieh pieh dismissed their stale review January 9, 2020 18:41

I'm dismissing my outdated review, because GitHub won't let Tyler merge otherwise :)

@TylerBarnes TylerBarnes merged commit 2f67bce into gatsbyjs:master Jan 9, 2020
@gatsbot
Copy link

gatsbot bot commented Jan 9, 2020

Holy buckets, @paweljedrzejczyk — we just merged your PR to Gatsby! 💪💜

Gatsby is built by awesome people like you. Let us say “thanks” in two ways:

  1. We’d like to send you some Gatsby swag. As a token of our appreciation, you can go to the Gatsby Swag Store and log in with your GitHub account to get a coupon code good for one free piece of swag. We’ve got Gatsby t-shirts, stickers, hats, scrunchies, and much more. (You can also unlock even more free swag with 5 contributions — wink wink nudge nudge.) See gatsby.dev/swag for details.
  2. We just invited you to join the Gatsby organization on GitHub. This will add you to our team of maintainers. Accept the invite by visiting https://github.com/orgs/gatsbyjs/invitation. By joining the team, you’ll be able to label issues, review pull requests, and merge approved pull requests.

If there’s anything we can do to help, please don’t hesitate to reach out to us: tweet at @gatsbyjs and we’ll come a-runnin’.

Thanks again!

@TylerBarnes
Copy link
Contributor

Published in gatsby-source-wordpress@3.1.57. Thanks again @paweljedrzejczyk !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: source-wordpress Related to Gatsby's integration with WordPress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants