Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please ensure proper citation practice and proper assignment of credit in your academic papers #30

Closed
fkiraly opened this issue Jul 27, 2022 · 14 comments

Comments

@fkiraly
Copy link

fkiraly commented Jul 27, 2022

Kindly ensure you follow best academic citation practice in your academic papers, e.g., the accompanying paper here:
https://arxiv.org/abs/2207.03517

I'm aware that in industry it might not be common or expected to comment on prior art (or even considered counterproductive for marketing reasons), and that Nixtla is primarily a commercial venture. But, since you are publishing in the arXiv you are at least making the claim of adhering to basic scientific standards, so I think you should at least give it a try.

A proper "literature review" or "prior art" section is the minimum in any scientific paper, and that means commenting on the specific context of your contribution, not just providing a generic list of packages in an appendix.

As past contributors to sktime, you are probably aware of its hierarchical framework functionality, and the fact that the algorithms you present are also implemented there. See, e.g., this conference presentation from April:
https://github.com/sktime/sktime-tutorial-pydata-berlin-2022

Perhaps more important even are the earlier python packages when it comes to hierarchical forecasting, which have developed pertinent designs, and which you don't even cite, e.g.,:
https://github.com/carlomazzaferro/scikit-hts, FYI @carlomazzaferro
https://github.com/AngelPone/pyhts, FYI @AngelPone

And there is, of course, even more in R.

You can't claim to be a "reference" (in the title of your paper!) without making reference to what has come before. Pun intended.

From your mission statement: "We intend to continue maintaining and increasing the repository, promoting collaboration across the forecasting community."

I do hope to see that, giving due credit in your scientific papers would be a good start.

Let me know if you have any questions on best scientific practice.

@mergenthaler
Copy link
Member

mergenthaler commented Jul 27, 2022

Thank you very much for your kind comments, @fkiraly.

We are still working on the details of the papers, and your comments are highly welcomed. We will include the proper citations to sktime, R, Darts, and to the valuable contributions that @carlomazzaferro and @AngelPone have made.

If you find any further omissions, we are more than happy to amend them in the spirit of best scientific practice.

@mergenthaler
Copy link
Member

In the spirit of transparency and collaboration,@alexhallam also pointed out the README section didn't include a proper References section. (BTW: congrats on tablespoon)

We have updated the README accordingly and have also submitted a vaster lit review in the paper. @fkiraly, we will let you know once the changes are publicly available in case you have any further observations.

Furthermore, this thread inspired us to make an empirical comparison of (some of) the available implementations, and we will be publishing the results of the experiments in the following days.

@alexhallam
Copy link

Thanks! Also great package! I will be using it a lot.

@mergenthaler
Copy link
Member

mergenthaler commented Jul 29, 2022

@fkiraly, we updated the README and included a section on open-source libraries in the HierarchicalForecast: A Reference Framework for Hierarchical Forecasting in Python. We think this closes the issue, but feel free to comment if something is missing.

Inspired by your comment and in the spirit of advancing the field, we take you up on your offer and will ask you some further questions regarding best scientific practice:

  1. We did not find any validations or experiments related to your hts re-implementation. We were curious if that aligns with the best scientific practice and, furthermore, with the recommendations of your own paper, which encourages replicability and demonstrations of effectiveness.

  2. Given that we weren’t able to find validations for the hts methods, we did some experiments ourselves; the comparison was incomplete due to sktime’s exclusion of TopDown, MiddleOut, and ERM reconciliation strategies. For the methods that we could compare, and as shown in the results table, we found they might be some problems in your implementation.

2.1 We believe it is best scientific practice (or at least professional courtesy) to double-check with the specialized colleagues involved before assuming some malpractice or being condescending. In that spirit, we would like to invite you to help us identify possible mistakes we made in our experiments. We hope we did something wrong, but in the unlikely absence of experimental errors or mistakes from our side, we think it would be appropriate to fix the issue that might be (silently) affecting thousands of sktime users. As you can see in the table, the errors are particularly big for some datasets.

  1. One final note on proper citation practice and proper assignment of credit, after reading your pydata Berlin presentation, it seems that you might also have omitted some references in your notebook. Wouldn’t it be convenient to include further citations besides mentioning Hyndman’s book? To complement your notebook, feel free to look at our literature review.

@fkiraly
Copy link
Author

fkiraly commented Oct 14, 2022

@mergenthaler, apologies, I forgot about this thread. Let me answer one by one.

We did not find any validations or experiments related to your hts re-implementation. We were curious if that aligns with the best scientific practice and, furthermore, with the recommendations of your own paper, which encourages replicability and demonstrations of effectiveness.

The way this aligns with best scientific practice:

  • sktime have not published a paper on the hts functionality, there are no claims of correctness made. You are right, in-principle you would want to write a paper of the kind eventually, and the burden of proof is with sktime authors.
  • implementation is by an expert in the field, @ciaran-g. That is lowest level of empirical evidence (D), your level of evidence is higher and justifies writing a paper.

@fkiraly
Copy link
Author

fkiraly commented Oct 14, 2022

We believe it is best scientific practice (or at least professional courtesy) to double-check with the specialized colleagues involved before assuming some malpractice or being condescending.

There are multiple things here that you are mixing up:

  • bad citation practice -> nixtla
  • violation of best open source collaboration practice -> nixtla
  • honest scientific errors -> sktime

sktime might be accused of honest scientific errors, e.g., a mistake in the implementation. I don't think that is ethically problematic though, as long as these are honestly addressed and discussed.

Ethical issues though surround what I think you were doing:

  • bad citation practice against better knowledge, not acknowledging properly people who are working on the same thing earlier. Ideas of implementing a methon in an open source package, interface designs, etc, are valuable in itself. You can't just ignore prior work in the software space, even if the implementation is faulty.

  • sktime is an openly governed open source community, while nixtla is commercially controlled. You chose to reimplement hierarchical method and then position them in social media etc as "first of their kind" and the best thing since sliced bread, instead of participating in the community effort. Even though there is nothing wrong with it, it violates the communitarian/collaborative expectations of the open source field, because you are trying to take ownership and put it under the soft control of a company, instead of working towards a common good together.

@fkiraly
Copy link
Author

fkiraly commented Oct 14, 2022

One final note on proper citation practice and proper assignment of credit, after reading your pydata Berlin presentation, it seems that you might also have omitted some references in your notebook. Wouldn’t it be convenient to include further citations besides mentioning Hyndman’s book?

The notebook is not a primary academic publication, it is a software tutorial.

For a software tutorial on methodology, I think a textbook reference is enough, or a reference to pertinent academic paper if applicable. But we didn't write an academic paper, and (as far as I remember) didn't make any scientific claims about our implementation being special or new, except that it's there and this is how you use it.

If you require a full review on all methods, we would have had write basically multiple chapters from Hyndman's book in an updated form to accompany the tutorial, which seems absurd.

Conversely, if you write a paper on state-of-art in implementing hierarchical methods - like you did - you need to do your homework.

@fkiraly
Copy link
Author

fkiraly commented Oct 14, 2022

PS: speaking about collaborative open source expectations: if you suspect an error in sktime reconcilers, it would be great if you would open a bug report on sktime and ping the author. Or, help to diagnose and fix.

For instance, might the issues in your experiment with sktime be rooted in your use of a base estimator with a known bug: sktime/sktime#3162 (not mentioned in the table) rather than the reconciliation methods themselves?

@mergenthaler
Copy link
Member

Hi @fkiraly, thanks for taking the time to write such a complete response.

As mentioned three months ago, we have included all the proper citations in the README and the hierarchical forecasting work-in-progress paper. If you find anything missing, please let us know.

Regarding the rest of this thread: we feel your tone and intent are not aiming for improvement or collaboration. We appreciate your opinions, but we simply disagree with them.

Fortunately enough, neither you nor we are in the position to declare a final verdict on the other issues discussed:
Luckily the eventual publication and validity of the paper will be decided by a peer of experts, and whether or not we contribute to the python time series community and satisfy their expectations will also not be defined by you but by the community itself. (If you still have a problem with the current status of the wip paper being on Arvix, you can alert them.)

We hope you have a great fall developer's day!

@Nixtla Nixtla deleted a comment from kdgutier Oct 14, 2022
@ciaran-g
Copy link

Hi, I know you have closed this issue but I'm just posting my thoughts on it all. I'm also looking forward trying out the probabilistic coherent methods you have implemented - v. nice! 👍

While I was implementing the hierarchical reconcilers earlier this year, I was informally checking that it was consistent with fable via a notebook and some synthetic data. Having a look at your own experiments I decided to clean it up a bit and expand it to use the labour dataset from your case study as well. I used a simple but equivalent AR forecaster for both sktime and fable, and you can see the results here.

The results confirm that the difference between fable and sktime in your experiments linked here, in terms of forecast accuracy, are down to problems with the base forecaster (as has been discussed above and reported), at least for the labour dataset. I think that it would be fair to make that clear. Also, I see you plan to include the same base forecasts for all comparisons in the TODO section, and I think this is probably the best way to compare only the hierarchical implementation of the different libraries.

Hope this is helpful and thanks for uploading the datasets and your experiments

@kdgutier
Copy link
Collaborator

kdgutier commented Oct 21, 2022

Hi @ciaran-g

Do you know what was the problem with sktime's previous base forecasts?
Thanks for sharing such a careful study.

@mergenthaler
Copy link
Member

@ciaran-g, thanks for the input we will try to make that clearer.

@mergenthaler
Copy link
Member

@ciaran-g, the disclaimer, and links to your experiment have been included here. Is there anything else we should note or include?

@ciaran-g
Copy link

Do you know what was the problem with sktime's previous base forecasts?

I think that issue is still active @kdgutier unfortunately and it's more to do with the statsmodels forecaster, but having a look here it seems like setting some starting parameters for the model might be a quick way forward

Is there anything else we should note or include?

Looks good to me @mergenthaler, thanks for adding that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants