Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeat runs with collections adds duplicates #27

Closed
Aintaer opened this issue Feb 10, 2015 · 25 comments
Closed

Repeat runs with collections adds duplicates #27

Aintaer opened this issue Feb 10, 2015 · 25 comments

Comments

@Aintaer
Copy link

Aintaer commented Feb 10, 2015

Currently, repeated calls to metalsmith.build() is non-idempotent because metalsmith-collections will find the same files that belongs to a collection, and add them to the same array as exists in memory from the previous run.

This is due to metalsmith storing collections in the global metadata object. Maybe it should empty out the collections arrays every time it runs?

My workaround right now is essentially

metalsmith.build(function(err, files) {
  if (err) throw err;
  metalsmith.metadata(origMetadata);
});
@MoOx
Copy link

MoOx commented Apr 27, 2015

Same issue for partial rebuild with metalsmith-watch that is using run() method.

@venomgfx
Copy link

I'm also getting this issue.

Couldn't get the workaround that @Aintaer mentions to work either. Do you have an idea what am I doing wrong? Thanks!

var siteBuild = metalsmith(__dirname)
    .use(paths())
    .use(relative())
    .metadata({
        site: {
            title: 'Site Title'
        }
    })
    .use(markdown())
    .use(collections({
        apples: {
            metadata: {
                name: 'Apples',
                description: ''
            }
        },
        oranges: {
            metadata: {
                name: 'Oranges',
                description: ''
            }
        }
    })
    )
    .use(templates( 'jade'))
    .use(sass())
    .use(watch({
        paths: {
            "src/**/*": "**/*.md",
            "templates/**/*": "**/*.md"
        },
        livereload: true
        })
    )
    .destination('./build')
    .build(function (err) {
        if (err) throw err;
    });

@deltamualpha
Copy link

I solved this with a fork that deletes the collections metadata before re-adding articles to collections -- deltamualpha@a3d5db6 -- but I don't know if that's the right answer overall.

@venomgfx
Copy link

Thanks @deltamualpha ! My current workaround is to just use half the collection when looping through it in the template, but yeah would be nice if this got fixed for real. It's pretty strange not to find more people with the same issue, collections are quite basic functionality.

@deltamualpha
Copy link

@venomgfx I think part of it is that from what I've seen, Segment thinks that watchers as part of the build pipeline are a bad idea, and that the rebuild logic (and serving) should be external to the Metalsmith object. I'm considering re-writing my build path to work this way. Some third-party plugins have logic that handles watchers correctly, but most of the first-party ones don't.

@venomgfx
Copy link

@deltamualpha Ah that makes sense yes. Perhaps it's worth looking into gulp-metalsmith. So we watch outside of the metalsmith build pipeline. Will try it out. Thanks a lot for your feedback!

@mplewis
Copy link

mplewis commented Aug 20, 2015

@deltamualpha: that fix works for me. Thanks for your help!

Edit: It sort of works.

Some pages when saved (e.g. .hbs templates that reload all source files) cause the duplication issue. This fixes the issue on those pages.

This solution totally deletes the collections data on pages that do NOT cause the duplication issue when saved (e.g. the source files themselves).

@razcore-rad
Copy link

guys, I have the same problem with metalsmith-watch so I started poking at the source code. I'm not sure if I'm doing it correctly but, based on the filename property I managed to verify if the file is already in the collections or not... seems to work in my case (and for now)... but I just have about 3 weeks nodejs experience and most of the time I have no idea what I'm doing... here's the diff:

diff -u index.js ../../blog/node_modules/metalsmith-collections/lib/index.js 
--- index.js    2015-08-22 04:53:38.962863606 +0100
+++ ../../blog/node_modules/metalsmith-collections/lib/index.js 2015-08-22 04:51:11.137564810 +0100
@@ -32,8 +32,26 @@
      * Find the files in each collection.
      */

+    var in_collections;
     Object.keys(files).forEach(function(file){
       debug('checking file: %s', file);
+
+      var collections = metadata.collections;
+      if (collections) {
+        in_collections = Object.keys(collections).map(function(collection) {
+          return collections[collection].filter(function(data) {
+            return (file === data.filename);
+          });
+        });
+        in_collections = [].concat.apply([], in_collections);
+      }
+
+      // and from outer body return if file already in collections
+      if (in_collections && in_collections.length === 1) {
+        debug('already in collections, skipping: %s', in_collections[0].filename);
+        return;
+      }
+
       var data = files[file];
       match(file, data).forEach(function(key){
         if (key && keys.indexOf(key) < 0){

maybe someone with more experience can pitch in and let us know if this works?

edit1: spoke too soon... I'm also using jade, so when I modify a jade file it adds a lot of elements to the collection for whatever reason. When I modify the *.md files this doesn't happen, I need to dig deeper

edit2: it might have just been some weird stuff going on with npm and the way I'm running, anyway, tomorrow I'll do some more tests...

edit3: damn... after looking at metalsmith-watch's source code I realized the this was already taking care of the collections problems for us... it's just that you need to be careful when you call metalsmith-watch. Can't say the same for metalsmith-browser-sync, this one doesn't take care of the collections. So my workaround from above isn't actually any good... especially because it depends on at least metalsmith-filename. As far as I can tell, you need to call metalsmith-collections before metalsmith-permalinks. That was the reason for duplicate collections in my case... hope it helps someone

@mplewis
Copy link

mplewis commented Aug 23, 2015

@razvanc87: I see the same behavior.

Some of the root cause of this problem is that metalsmith-watch only passes documents into the pipeline that have changed. If you have source files set to refresh only themselves, while template files refresh all source files, you will see differences in behavior between the two.

@razcore-rad
Copy link

I've... sopped trying to figure out what's happening and instead am using browsersync, the CLI alongside nodemon to keep everything in sync, instead of relaying on metalsmith plugins... Hope this gets fixed some day though.

@deltamualpha
Copy link

Yeah, metalsmith-watch implements some special-casing to make collections not explode immediately, but it's not perfect, and a very good example of why in-band watch & rebuild plugins might be a bad idea.

@razcore-rad
Copy link

Well... they are a bad idea because it's hard to deal with mutable state, but as far as efficiency goes... I'd rather have them then not (if it wouldn't interfere and make a mess of the state).

Whereas rebuilding from scratch with all the imports, etc. takes about 1-2 seconds, with in-band watch, this can be almost real-time, because you're also rebuilding basically one file at a time instead of the whole directory... which is quite nice. Unfortunately I couldn't make it work with this collections plugin. If I do get a better understanding I definitely will poke at this again :).

@ninjasort
Copy link

+1

1 similar comment
@lapidus
Copy link

lapidus commented Oct 27, 2015

+1

@woodyrew
Copy link
Member

I used to have this issue with watch, however using nodemon makes life more simple and closer to real world building.

@chiefy
Copy link

chiefy commented Jan 11, 2016

👍 fixed it locally by checking the metadata[key] array for the same file before adding it. A real fix would be nice.

@spacedawwwg
Copy link
Contributor

👍 +1

@spacedawwwg
Copy link
Contributor

Forked this for now and added the metadata[key] fix mentioned by @deltamualpha

https://github.com/spacedawwwg/metalsmith-collections

use this forked version by adding

"devDependencies": {
    "metalsmith-collections": "spacedawwwg/metalsmith-collections"
}

to your package.json

I've submitted a pull request, but I'll keep the fork updated until this finally gets fixed (it has been a year since this issue was posted)

@AndyOGo
Copy link

AndyOGo commented Aug 26, 2016

Another quickfix is to monkey patch the build method, like:

var build = metalsmith.build;
metalsmith.build = function() {
  build.apply(ms, arguments)
  ms.metadata(originalMeta));
}

This whipes it after every build and works with metalsmith-browser-sync too

@svperfecta
Copy link

Thanks @spacedawwwg that works great.

@ReedD
Copy link

ReedD commented Dec 13, 2016

Here's another fix to force the metadata to reset every build. Similar idea to @AndyOGo.

metalsmith('./src')
	.use((files, metalsmith, done) => {
		setImmediate(done);
		metalsmith.metadata({
			site: {},
			package: require( './package.json')
		});
	})

@edhenderson
Copy link

Did this ever get fixed @spacedawwwg ? BTW - thanks for the temp fix.

@rludgero
Copy link

To avoid this issue, install by https the new version of metalsmith-collections.

npm install --save-dev git+https://github.com/segmentio/metalsmith-collections.git

This will be install the latest version of plugin, since metalsmith org still does not have permission to push to NPM yet.

@AndyOGo
Copy link

AndyOGo commented Feb 2, 2022

@webketje Why did you close this.
Is it fixed or not?

@webketje
Copy link
Member

webketje commented Feb 2, 2022

@AndyOGo please see the commit 891b24c which adds a test for this. In the latest published version, I did quite some refactoring and suspected I incidentally fixed it (I think it was even fixed long ago but just in a messy way and without updating this issue)
See also https://github.com/metalsmith/collections/blob/master/lib/index.js#L91

EDIT: My test should have been sequential, not parallel. Updated locally, but the conclusion remains, yes it's fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests