New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeat runs with collections adds duplicates #27

Open
Aintaer opened this Issue Feb 10, 2015 · 23 comments

Comments

Projects
None yet
@Aintaer
Copy link

Aintaer commented Feb 10, 2015

Currently, repeated calls to metalsmith.build() is non-idempotent because metalsmith-collections will find the same files that belongs to a collection, and add them to the same array as exists in memory from the previous run.

This is due to metalsmith storing collections in the global metadata object. Maybe it should empty out the collections arrays every time it runs?

My workaround right now is essentially

metalsmith.build(function(err, files) {
  if (err) throw err;
  metalsmith.metadata(origMetadata);
});
@MoOx

This comment has been minimized.

Copy link

MoOx commented Apr 27, 2015

Same issue for partial rebuild with metalsmith-watch that is using run() method.

@venomgfx

This comment has been minimized.

Copy link

venomgfx commented Jul 23, 2015

I'm also getting this issue.

Couldn't get the workaround that @Aintaer mentions to work either. Do you have an idea what am I doing wrong? Thanks!

var siteBuild = metalsmith(__dirname)
    .use(paths())
    .use(relative())
    .metadata({
        site: {
            title: 'Site Title'
        }
    })
    .use(markdown())
    .use(collections({
        apples: {
            metadata: {
                name: 'Apples',
                description: ''
            }
        },
        oranges: {
            metadata: {
                name: 'Oranges',
                description: ''
            }
        }
    })
    )
    .use(templates( 'jade'))
    .use(sass())
    .use(watch({
        paths: {
            "src/**/*": "**/*.md",
            "templates/**/*": "**/*.md"
        },
        livereload: true
        })
    )
    .destination('./build')
    .build(function (err) {
        if (err) throw err;
    });
@deltamualpha

This comment has been minimized.

Copy link

deltamualpha commented Jul 23, 2015

I solved this with a fork that deletes the collections metadata before re-adding articles to collections -- deltamualpha@a3d5db6 -- but I don't know if that's the right answer overall.

@venomgfx

This comment has been minimized.

Copy link

venomgfx commented Jul 23, 2015

Thanks @deltamualpha ! My current workaround is to just use half the collection when looping through it in the template, but yeah would be nice if this got fixed for real. It's pretty strange not to find more people with the same issue, collections are quite basic functionality.

@deltamualpha

This comment has been minimized.

Copy link

deltamualpha commented Jul 23, 2015

@venomgfx I think part of it is that from what I've seen, Segment thinks that watchers as part of the build pipeline are a bad idea, and that the rebuild logic (and serving) should be external to the Metalsmith object. I'm considering re-writing my build path to work this way. Some third-party plugins have logic that handles watchers correctly, but most of the first-party ones don't.

@venomgfx

This comment has been minimized.

Copy link

venomgfx commented Jul 23, 2015

@deltamualpha Ah that makes sense yes. Perhaps it's worth looking into gulp-metalsmith. So we watch outside of the metalsmith build pipeline. Will try it out. Thanks a lot for your feedback!

@mplewis

This comment has been minimized.

Copy link

mplewis commented Aug 20, 2015

@deltamualpha: that fix works for me. Thanks for your help!

Edit: It sort of works.

Some pages when saved (e.g. .hbs templates that reload all source files) cause the duplication issue. This fixes the issue on those pages.

This solution totally deletes the collections data on pages that do NOT cause the duplication issue when saved (e.g. the source files themselves).

@razcore-art

This comment has been minimized.

Copy link

razcore-art commented Aug 22, 2015

guys, I have the same problem with metalsmith-watch so I started poking at the source code. I'm not sure if I'm doing it correctly but, based on the filename property I managed to verify if the file is already in the collections or not... seems to work in my case (and for now)... but I just have about 3 weeks nodejs experience and most of the time I have no idea what I'm doing... here's the diff:

diff -u index.js ../../blog/node_modules/metalsmith-collections/lib/index.js 
--- index.js    2015-08-22 04:53:38.962863606 +0100
+++ ../../blog/node_modules/metalsmith-collections/lib/index.js 2015-08-22 04:51:11.137564810 +0100
@@ -32,8 +32,26 @@
      * Find the files in each collection.
      */

+    var in_collections;
     Object.keys(files).forEach(function(file){
       debug('checking file: %s', file);
+
+      var collections = metadata.collections;
+      if (collections) {
+        in_collections = Object.keys(collections).map(function(collection) {
+          return collections[collection].filter(function(data) {
+            return (file === data.filename);
+          });
+        });
+        in_collections = [].concat.apply([], in_collections);
+      }
+
+      // and from outer body return if file already in collections
+      if (in_collections && in_collections.length === 1) {
+        debug('already in collections, skipping: %s', in_collections[0].filename);
+        return;
+      }
+
       var data = files[file];
       match(file, data).forEach(function(key){
         if (key && keys.indexOf(key) < 0){

maybe someone with more experience can pitch in and let us know if this works?

edit1: spoke too soon... I'm also using jade, so when I modify a jade file it adds a lot of elements to the collection for whatever reason. When I modify the *.md files this doesn't happen, I need to dig deeper

edit2: it might have just been some weird stuff going on with npm and the way I'm running, anyway, tomorrow I'll do some more tests...

edit3: damn... after looking at metalsmith-watch's source code I realized the this was already taking care of the collections problems for us... it's just that you need to be careful when you call metalsmith-watch. Can't say the same for metalsmith-browser-sync, this one doesn't take care of the collections. So my workaround from above isn't actually any good... especially because it depends on at least metalsmith-filename. As far as I can tell, you need to call metalsmith-collections before metalsmith-permalinks. That was the reason for duplicate collections in my case... hope it helps someone

@mplewis

This comment has been minimized.

Copy link

mplewis commented Aug 23, 2015

@razvanc87: I see the same behavior.

Some of the root cause of this problem is that metalsmith-watch only passes documents into the pipeline that have changed. If you have source files set to refresh only themselves, while template files refresh all source files, you will see differences in behavior between the two.

@razcore-art

This comment has been minimized.

Copy link

razcore-art commented Aug 23, 2015

I've... sopped trying to figure out what's happening and instead am using browsersync, the CLI alongside nodemon to keep everything in sync, instead of relaying on metalsmith plugins... Hope this gets fixed some day though.

@deltamualpha

This comment has been minimized.

Copy link

deltamualpha commented Aug 24, 2015

Yeah, metalsmith-watch implements some special-casing to make collections not explode immediately, but it's not perfect, and a very good example of why in-band watch & rebuild plugins might be a bad idea.

@razcore-art

This comment has been minimized.

Copy link

razcore-art commented Aug 24, 2015

Well... they are a bad idea because it's hard to deal with mutable state, but as far as efficiency goes... I'd rather have them then not (if it wouldn't interfere and make a mess of the state).

Whereas rebuilding from scratch with all the imports, etc. takes about 1-2 seconds, with in-band watch, this can be almost real-time, because you're also rebuilding basically one file at a time instead of the whole directory... which is quite nice. Unfortunately I couldn't make it work with this collections plugin. If I do get a better understanding I definitely will poke at this again :).

@cameronroe

This comment has been minimized.

Copy link

cameronroe commented Oct 24, 2015

+1

1 similar comment
@lapidus

This comment has been minimized.

Copy link

lapidus commented Oct 27, 2015

+1

@woodyrew

This comment has been minimized.

Copy link
Collaborator

woodyrew commented Oct 30, 2015

I used to have this issue with watch, however using nodemon makes life more simple and closer to real world building.

@chiefy

This comment has been minimized.

Copy link

chiefy commented Jan 11, 2016

👍 fixed it locally by checking the metadata[key] array for the same file before adding it. A real fix would be nice.

@spacedawwwg

This comment has been minimized.

Copy link
Contributor

spacedawwwg commented Mar 23, 2016

👍 +1

@spacedawwwg

This comment has been minimized.

Copy link
Contributor

spacedawwwg commented Mar 23, 2016

Forked this for now and added the metadata[key] fix mentioned by @deltamualpha

https://github.com/spacedawwwg/metalsmith-collections

use this forked version by adding

"devDependencies": {
    "metalsmith-collections": "spacedawwwg/metalsmith-collections"
}

to your package.json

I've submitted a pull request, but I'll keep the fork updated until this finally gets fixed (it has been a year since this issue was posted)

earnubs added a commit to canonical-ols/store-help that referenced this issue Apr 1, 2016

Update README.md
The bug[1] is only apparent when building via metalsmith watch, which the author suggests isn't the right way to go about things. We now build with gulp, and the bug no longer manifests.

[1] segmentio/metalsmith-collections#27

@earnubs earnubs referenced this issue Apr 1, 2016

Merged

Update README.md #27

@AndyOGo

This comment has been minimized.

Copy link

AndyOGo commented Aug 26, 2016

Another quickfix is to monkey patch the build method, like:

var build = metalsmith.build;
metalsmith.build = function() {
  build.apply(ms, arguments)
  ms.metadata(originalMeta));
}

This whipes it after every build and works with metalsmith-browser-sync too

@genexp

This comment has been minimized.

Copy link

genexp commented Nov 20, 2016

Thanks @spacedawwwg that works great.

@ReedD

This comment has been minimized.

Copy link

ReedD commented Dec 13, 2016

Here's another fix to force the metadata to reset every build. Similar idea to @AndyOGo.

metalsmith('./src')
	.use((files, metalsmith, done) => {
		setImmediate(done);
		metalsmith.metadata({
			site: {},
			package: require( './package.json')
		});
	})
@edhenderson

This comment has been minimized.

Copy link

edhenderson commented Apr 19, 2018

Did this ever get fixed @spacedawwwg ? BTW - thanks for the temp fix.

@rludgero

This comment has been minimized.

Copy link

rludgero commented Dec 20, 2018

To avoid this issue, install by https the new version of metalsmith-collections.

npm install --save-dev git+https://github.com/segmentio/metalsmith-collections.git

This will be install the latest version of plugin, since metalsmith org still does not have permission to push to NPM yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment