Makefiles are being indexed #110

Open
oalders opened this Issue Jun 30, 2011 · 12 comments

Comments

Projects
None yet
6 participants
Owner

oalders commented Jun 30, 2011

In my list of latest modules, some makefiles are appearing. Check this out:

http://beta.metacpan.org/search?q=makefile.pl

Member

monken commented Jul 22, 2011

Hm hard to say.

The source contains pod and with the NAME set to Makefile.PL (http://api.metacpan.org/source/CASIANO/GRID-Machine-0.127/Makefile.PL). This is actually legit.

Owner

oalders commented Jul 23, 2011

I guess I understand why they would be in the index, but we run into the problem of having many docs called "Makefile.PL" being returned. So, it looks like they're all occupying the same top-level namespace. More than 999 results returned for that.

https://metacpan.org/search?q=readme returns 685 results

https://metacpan.org/search?q=build.pl 317 results

https://metacpan.org/search?q=changes more than 999

So, if the indexing is ok, maybe this should be handled in the query which returns the search results?

Member

monken commented Jul 23, 2011

SCO seems to have the same problem http://search.cpan.org/perldoc?Makefile.PL

Member

monken commented Jul 23, 2011

Re https://metacpan.org/search?q=readme returns 685 results

while it return 685 results, only a couple are actually READMEs

And who searches for README in the first place :)

Owner

oalders commented Jul 23, 2011

My use case is that I'm searching for all modules in CPAN for iCPAN. I only want stuff that matters, since it takes up space on the device, slows searching etc. So, I need to weed out stuff that isn't a module. If README is in the index, that's fine, but it probably shouldn't be returned as a search result. Or, I just need to weed it out after. Having said that, I'm not the only one who will want these sorts of results and they should be easy to get without post-processing or adding a lot of exceptions.

The solution is likely just to add a module search wrapper to the API, so I'll close this issue here, since there's already a ticket for that.

@oalders oalders closed this Jul 23, 2011

Owner

oalders commented Jul 25, 2011

Not to beat a dead horse, but README is actually a legitimate search term and returns relevant results on sco: http://search.cpan.org/search?query=readme&mode=all

Member

monken commented Jul 25, 2011

Right, so would a solution be to exclude README, Makefile.PL et al?
If yes, exclude from the search or modify the indexer to set authorized => false (which would make it dead simple to exclude from the search).

Owner

oalders commented Jul 25, 2011

Either of those work for me, but I'm not sure what makes the most sense. Maybe we can pull @clintongormley into this. I'd be interested to hear his opinion. :)

@oalders oalders reopened this Jul 25, 2011

Owner

oalders commented Jul 25, 2011

@doy has also pointed out that README.pod including a =head1 NAME tag probably results in Acme::CPANAuthors::Ukrainian being listed twice here: https://metacpan.org/release/Acme-CPANAuthors-Ukrainian

Contributor

timbunce commented Jul 25, 2011

It would be nice to be able to search for "dbi makefile.pl" to quickly view that makefile.
However, only actual modules should be accessible via https://metacpan.org/module/... (see issue 110).

All other files in search results should link to a non-versioned dist/DIST/FILENAME so the url is permanent (even if the person releasing the dist changes) and can accumulate page rank. This is something that SCO got wrong and Graham planned to change it. http://www.w3.org/Provider/Style/URI.html

Contributor

doy commented Jul 25, 2011

I think the right answer here is really just to explicitly not index anything that would otherwise be listed under "Other files" (makefile, readme, changelog, etc).

Contributor

timbunce commented Jul 26, 2011

That seems like a good place to start. "Do less, but do it right."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment