Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Vancouver list style specified via a template on the page (similar to {{Use dmy dates}} template) #4236

Merged
merged 24 commits into from Dec 14, 2023

Conversation

maximmasiutin
Copy link
Contributor

@maximmasiutin maximmasiutin commented Dec 6, 2023

If the page contains {{Use vanc name-list-style}} template, then it uses |vauthors= and |veditors= attributes rather than firstN/lastN and editor-firstN/editor-lastN. This is similar to {{Use dmy dates}} template when the Citation Bot uses date format as specified on the page. To reproduce this behaviour, edit a page on Wikipedia, add {{Use vanc name-list-style}} template (or {{Use vanc name-list-style|date=December 2023}}), delete author names (firstN/lastN) and run the bot. It will fill the names as vauthors.

Update: there is a discussion at https://en.wikipedia.org/wiki/User_talk:Citation_bot#Specifying%20name%20list%20style%20for%20newly-added%20name%20entries
on what template to use as a hint.

@GlazerMann
Copy link
Collaborator

changing the existing authors will get the bot banned.

@maximmasiutin
Copy link
Contributor Author

changing the existing authors will get the bot banned.

It does not change existing authors. It only fills the authors if they are missing and if there is a template {{Use vanc name-list-style}}, at least this is intended.

@maximmasiutin
Copy link
Contributor Author

If there is no {{Use vanc name-list-style}} the bot works same way.

@maximmasiutin
Copy link
Contributor Author

So for the newly-filled authors, this variable is used to select style on filling the author list.

@maximmasiutin
Copy link
Contributor Author

Currently almost all articles on molecular biology and medicine use the name list style |vauthors=
See examples at:

So it is a common style; no need to change anything, just use it for new refs on a page that has a template {{Use vanc name-list-style}}

…e-list-style}} template, then it uses |vauthors= and |veditors= attributes rather than firstN/lastN and editor-firstN/editor-lastN. This is similar to {{Use dmy dates}} template when the Citation Bot uses date format as specified on the page. To reproduce this behaviour, edit a page on Wikipedia, add {{Use vanc name-list-style}} template (or {{Use vanc name-list-style|date=December 2023}}) delete author names (firstN/lastN) and run the bot. It will fill the names as vauthors.
@maximmasiutin
Copy link
Contributor Author

I made the required change and not it only fills the vauthors parameter if no authors or editors were filled and {{Use vanc name-list-style}} was set.

@maximmasiutin
Copy link
Contributor Author

@GlazerMann You can test php ./process_page.php ""CYP303A1" --slow --savetofiles to see that it only fills authors where they were not filled, using the style specified by the template.

@GlazerMann
Copy link
Collaborator

Please add some tests to the Template tests.

@maximmasiutin
Copy link
Contributor Author

Please add some tests to the Template tests.

Thank you, let me do that!

@maximmasiutin
Copy link
Contributor Author

I tried to make tests work but didn't figure out how did they expand citations; expansion only worked to me if I run the software, not on tests. Could you please add those pages to tests, because, as with time, expansion depends on a config template used on the page:

The authors expanded by default to first1/last1

input page:
{{Use dmy dates|date=December 2023}}<ref>{{cite journal | title = The microRNA miR-184 regulates the CYP303A1 transcript level to control molting of Locusta migratoria | journal = Insect Science | date = June 2020 | volume = 28 | issue = 4 | pages = 941–951 | pmid = 32524775 | doi = 10.1111/1744-7917.12837 | s2cid = 219588137 }}</ref>
output page:
{{Use dmy dates|date=December 2023}}<ref>{{cite journal | title = The microRNA miR-184 regulates the CYP303A1 transcript level to control molting of Locusta migratoria | journal = Insect Science | date = June 2020 | volume = 28 | issue = 4 | pages = 941–951 | pmid = 32524775 | doi = 10.1111/1744-7917.12837 | s2cid = 219588137 | last1 = Wang | first1 = Yan‐Li | last2 = Wu | first2 = Li‐Xian | last3 = Li | first3 = Hui‐Yong | last4 = Wen | first4 = Xue‐Qin | last5 = Ma | first5 = En‐Bo | last6 = Zhu | first6 = Kun‐Yan | last7 = Zhang | first7 = Jian‐Zhen }}</ref>

On a page with {{Use vanc name-list-style}} template the authors expanded to vauthors

input page:
{{Use vanc name-list-style|date=December 2023}}<ref>{{cite journal | title = The microRNA miR-184 regulates the CYP303A1 transcript level to control molting of Locusta migratoria | journal = Insect Science | date = June 2020 | volume = 28 | issue = 4 | pages = 941–951 | pmid = 32524775 | doi = 10.1111/1744-7917.12837 | s2cid = 219588137 }}</ref>
output page:
{{Use vanc name-list-style|date=December 2023}}<ref>{{cite journal | title = The microRNA miR-184 regulates the CYP303A1 transcript level to control molting of Locusta migratoria | journal = Insect Science | date = June 2020 | volume = 28 | issue = 4 | pages = 941–951 | pmid = 32524775 | doi = 10.1111/1744-7917.12837 | s2cid = 219588137 | vauthors = Wang Y, Wu L, Li H, Wen X, Ma E, Zhu K, Zhang J }}</ref>

When the auhors are already expanded, they don't change, no matter what tempalte we use:

Input page:
{{Use vanc name-list-style|date=December 2023}}<ref>{{cite journal | title = The microRNA miR-184 regulates the CYP303A1 transcript level to control molting of Locusta migratoria | journal = Insect Science | date = June 2020 | volume = 28 | issue = 4 | pages = 941–951 | pmid = 32524775 | doi = 10.1111/1744-7917.12837 | s2cid = 219588137 | last1 = Wang | first1 = Yan‐Li | last2 = Wu | first2 = Li‐Xian | last3 = Li | first3 = Hui‐Yong | last4 = Wen | first4 = Xue‐Qin | last5 = Ma | first5 = En‐Bo | last6 = Zhu | first6 = Kun‐Yan | last7 = Zhang | first7 = Jian‐Zhen }}</ref>
Output page:
{{Use vanc name-list-style|date=December 2023}}<ref>{{cite journal | title = The microRNA miR-184 regulates the CYP303A1 transcript level to control molting of Locusta migratoria | journal = Insect Science | date = June 2020 | volume = 28 | issue = 4 | pages = 941–951 | pmid = 32524775 | doi = 10.1111/1744-7917.12837 | s2cid = 219588137 | last1 = Wang | first1 = Yan‐Li | last2 = Wu | first2 = Li‐Xian | last3 = Li | first3 = Hui‐Yong | last4 = Wen | first4 = Xue‐Qin | last5 = Ma | first5 = En‐Bo | last6 = Zhu | first6 = Kun‐Yan | last7 = Zhang | first7 = Jian‐Zhen }}</ref>

@GlazerMann
Copy link
Collaborator

I will be away from my desk for a couple days. Some of "Bot Full Test Suite" tests are broken and some work only half the time. That's a problem with tests that depends upon others.

@maximmasiutin
Copy link
Contributor Author

I will run these tests also, maybe will have ideas. It is better to avoid 8-bit and unicode characters inplace, or they may be later distorted. Better to use 7-bit representation, such as html entities and decode them inplace, e.g. htmlspecialchars_decode("&#12103");

@maximmasiutin
Copy link
Contributor Author

I will be away from my desk for a couple days. Some of "Bot Full Test Suite" tests are broken and some work only half the time. That's a problem with tests that depends upon others.

I fixed the two character conversion tests that failed on the server, and made one test fail verbose (with Google Book dates).

However, those character conversion tests are incorrectly written, you cannot convert immediate string literals that way. If you wish, I will change the other 3 conversion tests same way, because they may also fail should something change in configuration.

As for the data requests, I suggest you to configure GitHub API, maybe it will not fail. I don't know why it doesn't request data for some tests. I didn't request in my case also.

@maximmasiutin
Copy link
Contributor Author

changing the existing authors will get the bot banned.

Do you know why the following code is commented:

  if ($works === FALSE) {
    // $cache_bad[$doi] = TRUE; do not store to save memory
    return FALSE;
  }

Is there really low memory? It does not allow to cache bad responses and article checking takes really long time. There is a timeout and it tries over and over again for hours.

Take, for example, an article [[Cholesterol]] https://en.wikipedia.org/wiki/Cholesterol

It has the following citation:

{{cite journal | vauthors = Jesch ED, Carr TP | title = Food Ingredients That Inhibit Cholesterol Absorption | journal = Preventive Nutrition and Food Science | volume = 22 | issue = 2 | pages = 67–80 | date = June 2017 | pmid = 28702423 | pmc = 5503415 | doi = 10.3746/pnf.2017.22.2.67 }}

The URL https://doi.org/10.3746/pnf.2017.22.2.67 takes long to respond and finishes with timeout, but the bot tries again.

I have a feeling, that all bad responses should be cached.

Also, there is little use to wait for 1-3-5 seconds after a 20-second timeout and then try again about 10 times, the URL will unlikely work. There is probably reason to sleep when the rate of request is too high, and we got an error like 403, but with a timeout, there is little means to sleep.

Can you please run the bot on the page [[Cholesterol]] in "slow" mode and see how it processes this page?

@maximmasiutin
Copy link
Contributor Author

There is a discussion at https://en.wikipedia.org/wiki/User_talk:Citation_bot#Specifying%20name%20list%20style%20for%20newly-added%20name%20entries on which template to support. Please let the discussion come to a consensus.

@maximmasiutin
Copy link
Contributor Author

I updated the pull request according to the discussion at https://en.wikipedia.org/wiki/User_talk:Citation_bot#Specifying%20name%20list%20style%20for%20newly-added%20name%20entries

it now uses the template {{cs1 config|name-list-style=vanc}} as discussed in Wikipedia

@maximmasiutin
Copy link
Contributor Author

@GlazerMann - can you please approve the pull request, as we reached consensus on Wikipedia on this feature, see https://en.wikipedia.org/wiki/User_talk:Citation_bot#Specifying%20name%20list%20style%20for%20newly-added%20name%20entries

@GlazerMann
Copy link
Collaborator

GlazerMann commented Dec 14, 2023

There is an error with a typo. The static tests found it. Undefined variable: $tkf. The source view even highlights the errors these days.

@GlazerMann
Copy link
Collaborator

You asked about this line " // $cache_bad[$doi] = TRUE; do not store to save memory". The reason those are not cached, is because they are cached in the dx.doi.org check, so there is no need to cache the faile in the crossref check array.

@GlazerMann
Copy link
Collaborator

I changed:
if (is_string($tfk) && (strlen($tkf) > 0)) {
to
if (is_string($tfk) && (strlen($tfk) > 0)) {

@maximmasiutin
Copy link
Contributor Author

Oops, sorry, let me retest that. I tested, but then fixed a style error and not retested.

@maximmasiutin
Copy link
Contributor Author

@GlazerMann -- I've just checked it and it works properly.

citation-bot % php ./process_page.php "CYP303A1" --slow --savetofiles

It should save the file CYP303A1.md, and if you compare it with the original content, it will contain vauthors attributes added, such as | vauthors = Wu L, Jia Q, Zhang X, Zhang X, Liu S, Park Y, Feyereisen R, Zhu KY, Ma E, Zhang J, Li S, etc.

Now it should be OK. Please see https://en.wikipedia.org/wiki/CYP303A1 for the sample code that when the bot processes, should add vauthors rather than first1/last1 etc.

@maximmasiutin
Copy link
Contributor Author

You asked about this line " // $cache_bad[$doi] = TRUE; do not store to save memory". The reason those are not cached, is because they are cached in the dx.doi.org check, so there is no need to cache the faile in the crossref check array.

Can you please add a source code comment that explains that, as it was not quite clear why that was commented.

@GlazerMann
Copy link
Collaborator

I added a bunch of tests and fixed some bugs

@maximmasiutin
Copy link
Contributor Author

I added a bunch of tests and fixed some bugs

How can I test them?

@GlazerMann
Copy link
Collaborator

Test are in test suite and run as part of the "Bot Full Test Suite".

@GlazerMann
Copy link
Collaborator

Please verify that tests looks correct. I have included some extra stuff such a {{!}} to make sure that the test is bit more like a real page.

@GlazerMann
Copy link
Collaborator

The tests all need changed. The DOI is not in crossref, so the tests are unreliable.

@GlazerMann GlazerMann merged commit 6fa3d81 into ms609:master Dec 14, 2023
3 of 6 checks passed
@maximmasiutin
Copy link
Contributor Author

I've tried the updated master on the CYP303A1 page at Wikipedia and it works correctly, thank you very much! Should I review something else, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants