Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Years with fewer than four digits don't sort correctly #228

Closed
moewew opened this issue Jun 10, 2018 · 29 comments
Closed

Years with fewer than four digits don't sort correctly #228

moewew opened this issue Jun 10, 2018 · 29 comments

Comments

@moewew
Copy link
Contributor

moewew commented Jun 10, 2018

In the following MWE it seems that years are padded from the right for sorting and not from the left

\documentclass[american]{article}
\usepackage{babel}
\usepackage{filecontents}
\usepackage[
  backend = biber,
  style = authoryear,
]{biblatex}
\addbibresource{\jobname.bib}
\begin{filecontents}{\jobname.bib}
@book{de:re:publica:0060,
  author  = {Cicero},
  title      = {De re publica -- 60},
  date     = {0060},
}
@book{de:re:publica:0010,
  author  = {Cicero},
  title      = {De re publica -- 10},
  date     = {0010},
}
@book{de:re:publica:0300,
  author  = {Cicero},
  title      = {De re publica -- 300},
  date     = {0300},
}
@book{de:re:publica:0100,
  author  = {Cicero},
  title      = {De re publica -- 100},
  date     = {0100},
}
@book{de:re:duplica,
  author  = {Cicero},
  title      = {De re Duplica},
  date     = {2018},
}
\end{filecontents}

\begin{document}
\nocite{*}

\printbibliography
\end{document}

results in

Cicero (10). De re publica – 10.
— (100). De re publica – 100.
— (2018). De re Duplica.
— (300). De re publica – 300.
— (60). De re publica – 60.

I know that I could use

\DeclareSortingTemplate{nyt}{
  \sort{
    \field{presort}
  }
  \sort[final]{
    \field{sortkey}
  }
  \sort{
    \field{sortname}
    \field{author}
    \field{editor}
    \field{translator}
    \field{sorttitle}
    \field{title}
  }
  \sort{
    \field{sortyear}
    \field[padchar=0]{year}
  }
  \sort{
    \field{sorttitle}
    \field{title}
  }
  \sort{
    \field{volume}
    \literal{0}
  }
}

but somehow it feels weird that I would have to enable proper integer sorting for the year...

@plk
Copy link
Owner

plk commented Jun 10, 2018

Looks like a bug in the sorting key extractor code - seems to be sorting year as a string. Looking into it.

plk added a commit to plk/biblatex that referenced this issue Jun 10, 2018
@moewew
Copy link
Contributor Author

moewew commented Jun 10, 2018

I see you have added a commit to fix this. Unfortunately, the change leads to unexpected result in some edge cases. sortyear should be given precedence over year, it should not live in a \sort section of its own.

\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\usepackage{csquotes}

\usepackage[style=authoryear, backend=biber]{biblatex}


\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@book{appleby,
  author   = {Humphrey Appleby},
  title    = {A Title},
  sortyear = {1980},
  date     = {1990},
}
@book{appleby:b,
  author   = {Humphrey Appleby},
  title    = {B Title},
  sortyear = {1980},
  date     = {1989},
}
\end{filecontents}

\addbibresource{\jobname.bib}

\iffalse
\DeclareSortingTemplate{nyt}{
  \sort{
    \field{presort}
  }
  \sort[final]{
    \field{sortkey}
  }
  \sort{
    \field{sortname}
    \field{author}
    \field{editor}
    \field{translator}
    \field{sorttitle}
    \field{title}
  }
  \sort{
    \field{sortyear}
  }
  \sort{
    \field{year}
  }
  \sort{
    \field{sorttitle}
    \field{title}
  }
  \sort{
    \field{volume}
    \literal{0}
  }
}
\fi

\begin{document}
\cite{appleby:b,appleby}
\printbibliography
\end{document}

As fas as I can see the problem is that sortyear is a literal and year an integer and so we can't meaningfully compare the two if they are around... Is that correct?

@plk
Copy link
Owner

plk commented Jun 10, 2018

It's not quite that. It's that the sorting data schema which is needed to generate the internal data structures needed to construct sortkey extraction and generation structures is currently not generated per-key. It is very complicated to fix this in biber as the particular field selected for a sort would need to be tracked per-entry to generate the the correct sorting data schema. Thinking about it.

plk added a commit to plk/biblatex that referenced this issue Jun 10, 2018
@plk
Copy link
Owner

plk commented Jun 10, 2018

It's surprising I didn't notice this before. However, it's really difficult to solve this. Sorting needs to know the datatypes of what it is comparing and this assumes that everything in a \sort is the same datatype (well, it really assumes that everything is either an integer or isn't). sortyear isn't necessarily an integer, by design. The construction of the sorting dataschema uses this assumption as a shortcut by detecting the data type of a \sort set by just looking at the first element and since sortyear is a literal, it sorts year as a literal too. This is more than a biber issue, it's a sorting algorithm problem. If sortyear were an integer datatype in the datamodel, it would be fine - what do you think about that solution?

@moewew
Copy link
Contributor Author

moewew commented Jun 10, 2018

People (and in fact we - as in biblatex-examples.bib) use things like sortyear = {1984-0}, all the time. That would have to continue to work, so it seems tricky to make sortyear an integer...

@plk
Copy link
Owner

plk commented Jun 10, 2018

True. We have to change something though as year is more important than sortyear. This is rather intractable - we have to compare either numerically or alphabetically for each \sort and currently neither option works and never really can unless we can guarantee the data type by the data model or the sorting spec ...

@moewew
Copy link
Contributor Author

moewew commented Jun 10, 2018

We could just pad the year with zeros automatically and hope for the best. This should work for positive years, not sure about negative years...

@plk
Copy link
Owner

plk commented Jun 10, 2018

Really don't want to do that - that's what we used to do and I switched to better sort algorithm because padding and string sort is awful with the expanded ISO date stuff we now support. It seems to me that the whole existence of sortyear is strange as a literal anyway. Probably should be an integer. I suspect that the majority of cases for this can be solved by simply putting another \sort in after the sortyear/year to discriminate further. This would be a somewhat breaking change but I think in the long term, it's better.

@moewew
Copy link
Contributor Author

moewew commented Jun 10, 2018

Conceptually that would be better, I agree. But I fear it would be too big a change to render sortyear unusable. With integers you simply can't get fine sorting like sortyear = {1984-1}, vs sortyear = {1984-2},

@plk
Copy link
Owner

plk commented Jun 10, 2018

True but that's really an abuse of the field anyway. It's essentially a way of making the correct semantic solution of having a following \sort macro into a hacky syntactic solution. If there is an ordering within a year (which is exactly what this syntax is designed to do), then there should be a further month or season or something like that.

@moewew
Copy link
Contributor Author

moewew commented Jun 10, 2018

Yeah, theoretically I agree. But practically it can happen that one needs to control the year sorting and does not have other semantic options available. Think of two @inbooks of the same author in the same book, where you want to sort the first before the second chapter, but sorting by title would give the opposite result. Sure I could add pages to the sorting scheme, but that would be ludicrous.

@plk
Copy link
Owner

plk commented Jun 10, 2018

I'm not sure adding pages would be ludicrous in those circumstances if semantically you want the paper earlier in the collection listed first as it's the pages that determine that ...

@moewew
Copy link
Contributor Author

moewew commented Jun 10, 2018

Mhh, I really hoped I could win you over with pages (that's why I did not go for a volume example as in knuth:ct:a etc.) ;-).

Again, in principle I agree. But sortyear is a really well established hack (even biblatex-examples.bib has 10 instances of it) and I'm really wary of getting rid of it.

@plk
Copy link
Owner

plk commented Jun 11, 2018

There is no way to make this work perfectly with hacked sortyears and the current situation is the worst I think. In general, sortX fields are the same datatype as the X field - sortyear is the exception and I think we need to fix that. I propose:

  • Allow sortyear to be int or str but with a note that if it doesn't parse as an int, sorting will not be guranteed
  • coerce sortyear to an int in some reasonable way

@moewew
Copy link
Contributor Author

moewew commented Jun 11, 2018

I can't think of any way that coerces sortyear to int while keeping the sorting of common sortyear idioms as 1986-00, 1986-01 as expected.

@plk
Copy link
Owner

plk commented Jun 11, 2018

There isn't really a generalisable way but this current sortyear hacking is horrible and exactly the sort of thing that biblatex was designed to avoid. Since it is used mostly to sort collections before collection items etc. isn't volume really for this?

@plk
Copy link
Owner

plk commented Jun 11, 2018

For example, take the Nietzsche texts in the examples.bib. If you remove the sortyear from them, you get the same results because we already sort by volume after the year. So it's not clear that the hack is even needed in there?

@moewew
Copy link
Contributor Author

moewew commented Jun 11, 2018

For biblatex-examples.bib's examples volume sorting should indeed do the right thing. I assume in general people should be able to define a proper sort algorithm and with it should be able to write down a \DeclareSortingTemplate to sort their bibliography as expected without resorting to sortyear.

I still believe that sortyear hacking is a viable way to deal with some situations. The question really is how many users would be affected and how many things we are going to break badly with this. I have no idea how many people use sortyear. I'd have thought its use is not entirely unusual, but I may well suffer from sample bias.

@plk
Copy link
Owner

plk commented Jun 11, 2018

I honestly can't imagine that much would break as people using sortyear would naturally want it to compare stringwise with year. I'd rather break it and advise people on a per-case basis to use a proper sorting template. The current situation is much worse to my mind - year sorting is completely broken, it's just an accident of string sorting that it works for current millenium years.

@moewew
Copy link
Contributor Author

moewew commented Jun 11, 2018

Maybe we should at least start a short survey on comp.text.tex to inquire how widely used sortyear is.

@plk
Copy link
Owner

plk commented Jun 11, 2018

Ok - do you want to do that? I can prepare the changes in one commit in DEV so it can be tested and reverted.

@moewew
Copy link
Contributor Author

moewew commented Jun 11, 2018

OK, will do.

edit posted to c.t.t: https://groups.google.com/d/msg/comp.text.tex/CVSosV6gEiw/_C3sjunmAgAJ

plk added a commit to plk/biblatex that referenced this issue Jun 11, 2018
plk added a commit that referenced this issue Jun 11, 2018
@u-fischer
Copy link

Can't sortyear be a float? And 1984-01 interpretated as 1984.01?

(I personally never used sortyear, so simply changing it to int is fine for me too).

@plk
Copy link
Owner

plk commented Jun 12, 2018

Yes, there are some hacks like this that could be done but it won't help much as that's only one example of the possible formats. Also, if it was a float, year would need to be a float too and that slows down comparisons etc.

@moewew
Copy link
Contributor Author

moewew commented Jun 12, 2018

But it would give people a workaround to salvage their hacks. If floats are too slow, we could go with a fixed number of decimal places...

For the benefit of future me: The examples in biblatex-examples.bib still sort as expected without sortyear because of the sortitle field. But at least the knuth:ct:... examples would still work as expected with nyvt and without sorttitle. In particular some things can be fixed with sorttitle if sortyear is not available any more.

@plk
Copy link
Owner

plk commented Jun 12, 2018

An issue raised by the Knuth works is that volume is a string but its default datatype is an int. I think I may parse int fields as we do ranges to convert them to numbers for sorting.

@moewew
Copy link
Contributor Author

moewew commented Jun 12, 2018

According to the docs that already happens...

The volume of a multi-volume book or a periodical. It is expected to be an integer, not
necessarily in arabic numerals since biber will automatically from roman numerals
or arabic letter to integers internally for sorting purposes.

@plk
Copy link
Owner

plk commented Jun 12, 2018

Ah, yes, I see I already did this ...

@u-fischer
Copy link

By coincidence I just got a question about this with a real example. The user wanted to sort manually a number of reports and had used year={2011a} and year{2011b} which didn't work. Remembering this discussion I did not suggest sortyear={2011-a} or something like this, but considered a bit and now think that the suggestion of @moewew to use sorttitle is actually one of the logical solutions (inserting an extra field in the sort order would be another).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants