bibtex citations in documentation, or regular citations? #195

Closed
rcurtin opened this Issue Dec 29, 2014 · 11 comments

Projects

None yet

1 participant

@rcurtin
Member
rcurtin commented Dec 29, 2014

Reported by rcurtin on 26 Apr 42094464 03:05 UTC
Right now I do:

$ nca -h
Neighborhood Components Analysis (NCA)

  This program implements Neighborhood Components Analysis, both a linear
  dimensionality reduction technique and a distance learning technique.  The
  method seeks to improve k-nearest-neighbor classification on a dataset by
  scaling the dimensions.  The method is nonparametric, and does not require a
  value of k.  It works by using stochastic ("soft") neighbor assignments and
  using optimization techniques over the gradient of the accuracy of the
  neighbor assignments.

  For more details, see the following published paper:

  @inproceedings{
    author = {Goldberger, Jacob and Roweis, Sam and Hinton, Geoff and
        Salakhutdinov, Ruslan},
    booktitle = {Advances in Neural Information Processing Systems 17},
    pages = {513--520},
    publisher = {MIT Press},
    title = {{Neighbourhood Components Analysis}},
    year = {2004}
  }

  To work, this algorithm needs labeled data.  It can be given as the last row
  of the input dataset (--input_file), or alternatively in a separate file
  (--labels_file).

  ...

Should it output the bibtex citation code like that, or should we output the actual citation? (a la below)

$ nca -h
Neighborhood Components Analysis (NCA)

  This program implements Neighborhood Components Analysis, both a linear
  dimensionality reduction technique and a distance learning technique.  The
  method seeks to improve k-nearest-neighbor classification on a dataset by
  scaling the dimensions.  The method is nonparametric, and does not require a
  value of k.  It works by using stochastic ("soft") neighbor assignments and
  using optimization techniques over the gradient of the accuracy of the
  neighbor assignments.

  For more details, see the following published paper:

  Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R.  "Neighbourhood
  Components Analysis", pp. 513-520, Advances in Neural Information Processing Systems
  17.  MIT Press, 2004.

  To work, this algorithm needs labeled data.  It can be given as the last row
  of the input dataset (--input_file), or alternatively in a separate file
  (--labels_file).

  ...

CCing the usual suspects so we can gather ideas. Don't feel obligated to have an opinion. :)

@rcurtin rcurtin self-assigned this Dec 29, 2014
@rcurtin rcurtin added this to the mlpack 1.0.1 milestone Dec 29, 2014
@rcurtin rcurtin closed this Dec 29, 2014
@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by march on 21 Nov 42094496 10:24 UTC
My vote is for the bibtex citation. It has all the same info, plus I'm always for making the frantic search for citations in the middle of the night easier. However, I do enough latex that I read it the same as normal text, so I might be biased.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by pram on 17 Feb 42094522 09:46 UTC
I agree with Bill that the bibtex will be useful (and readable to us LaTex hackers). However, I see these citations as part of the doxygen documentation and in that case, the simple readable format might be useful as well. How about adding both? (since the readable format just need 2-3 lines and isn't too much of an addition to a pretty big introduction.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by niche on 2 Nov 42094540 06:06 UTC
I think the actual citation is easier for a quick copy paste into google to search for the paper. Not sure if it's possible, but I would prefer the source code file to have bibtex, and then doxygen output to the documentation the actual citation. If doing that is hard/impossible, then, just bibtex. Provided the number of citations isn't large, just doing both seems fair too.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by jcline3 on 28 Jul 42094576 07:31 UTC
Is -h really the place for that? Wouldn't a link to the (nonexistant?) nca tutorial be a more appropriate place for a more detailed account of the algorithm or a link to an external one than the blurb about what it is you already have? Or perhaps the man page? If the help output is too verbose then it stops being useful for determining how to actually use the executable, which is the only reason I ever pass -h.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by pram on 27 Jul 42094589 00:34 UTC
Oh wait, I was not paying attention. I really don't prefer either of the citations in the -h, I would prefer the citation(s) in the doxygen-able comments, not in the PROGRAM_INFO.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by niche on 23 Apr 42097431 04:51 UTC
I agree with this pram user guy (I was talking about doxygen-generated documentation in my previous comments). I don't see the point of having citations in -h. I feel that -h should have minimal output mainly for the purpose of reminding the user how to specify certain options, but without detailed explanations of what the options mean. For example, for the SVM C parameter, -h might mention C controls the amount of regularization, but it can leave out the precise mathematical form of the regularization and leave out a reference to the precise stuff (if the user wants that, go to the Doxygen, which probably shows the optimization problem and has a reference).

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 18 Mar 42097468 14:14 UTC
I would think the use of a citation in the runtime help would be something more like "This program implements ${SOMEMETHOD}. For more information see this paper: (then a citation)". I think a good percentage of users won't be going into the code necessarily but just want to run the program and get the results, so what -h gives might be all they ever see. Even so, the title of the method generally is enough to go find the relevant publications.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by jcline3 on 31 Mar 42097485 23:14 UTC
Why not just link to our documentation of the method? We can put all the information and references anyone could ever want there.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 26 Nov 42102651 03:41 UTC
So if we are referencing external documentation, do we have a robust way to do this? URLs change over time. I think that consensus seems to be that leaving out citations altogether in PROGRAM_INFO() is the way to go.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by speet3 on 31 Mar 42102713 09:38 UTC
I would think you are relatively safe linking to the mlpack.org api, provided you can properly resolve the documentation url for the method of interest. I would think that shouldn't be too hard, especially if the link is generated somehow, rather than a hard-coded string. Since the source is distributed with documentation, all you need to do as a fallback for something catastrophic (like changing the website url) would be to add a tag at the end like " or consult the documentation for XXX provided with your distribution of mlpack." That also covers the case of having an outdated version of mlpack, but if you're looking for citation info, you should either have the latest version of mlpack or know which citation information you are looking for anyway.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 12 Feb 42162473 14:19 UTC
Okay, so following general consensus (no citations in -h, BiBTeX citations in Doxygen comments, links to external documentation), I modified CLI so that it gives this output at the end of './program -h' (r11630):

For further information, including relevant papers, citations, and theory,
consult the documentation found at http://www.mlpack.org or included with your
distribution of MLPACK.

Then I removed all of the citations from the -h output and converted all non-BiBTeX citations that I could find to BiBTeX format (r11634-r11637).

So I think I can call this resolved (unless anyone has an issue with what I've done).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment