New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Theory citation function #123
Conversation
One issue that has occurred to me: |
One idea would be to have all parameters stored not in a traditional dict, but in some new subclass of class dict_cite(dict):
def __getitem__(self, key):
citations.register_key(key)
return dict.__getitem__(self, key)
def get(self, key, *args, **kwargs):
citations.register_key(key)
return dict.get(self, key, *args, **kwargs) But this might also lead to some problems. E.g. if you show the content of such a dict in a Jupyter notebook, it would access all keys and thus register the references for all its items. One work around would be to allow switching on and off the |
Yes, something exactly like AwareDict would work - a ParameterCitationsAwareDict as it were (a snappier name might be in order). Regarding the inspire info for the parameters, currently the parameter data is split over 3 files: parameters_uncorrelated, parameters_correlated, and parameter_metadata. We could just put the inspire key into the metadata file, but this seems bad since the references could easily not be updated when the numbers get changed. |
Thanks, these are all good points! But since they are only indirectly related to the theory citation function, I opened the new issue #124 for further discussion. |
Notes for myself on what I'm doing
|
So I've now gone through and added citations based arxiv references in the comments / docstrings. The release notes for v2 mention a few specific papers - for the Higgs physics I've added citations for the paper by David and Adam Falkowski, while for the beta decay and the updated B->D(*) form factors the "new stuff" is basically just in new parameters, which goes back to our discussion earlier about citing parameters (and then became #124). There are two things which seem like they should come from somewhere specific but have no comments:
Both observables were added by @DavidMStraub so he should know. |
Also add license notice
By default, just cite the flavio paper. Removed the read_citation method, as rather than reading the bibtex from a file, it seems better to just ouput the INSPIRE texkeys and worry about getting bibtex from INSPIRE later. (E.g. if you print to a .tex file, you could use INSPIRE's bibliography generator at https://inspirehep.net/bibliography-generator, or if fixed David's inspiretools python package.)
The way flavio does Gamma_12 is based on the a,b,c notation originally introduced in the 2003 BBLN paper
Gives the theory papers to be cited for an observable, using a throwaway instance. As part of this, move the registering of the flavio paper out of the constructor so you don't get that as well.
Since this only gets used here, just do it in the constructor. (In the original PyBaMM version the _reset just gets used for testing, which I think we want to do slightly differently anyway.)
This reverts commit a6a545e. Yeah, we need the reset method for testing...
Return something instead of printing directly, unless outputting to a file.
Note the CP asymmetry needs citations adding, wherever the numbers in ka1_r, ka2_r, ka1_i came from.
Just the Inami-Lim function - everything else worth citing comes in through parameters (fB, B, RG factors)
Allow for arguments to be passed to prediction function. Also change the name as you are really getting the citations for a general theory prediction, not just the SM.
Forgot to update the test when I renamed the function
That's also good points! Thread safety might actually be important since multiprocessing is even used within flavio, e.g. when computing covariance matrices. And it would be quite inconvenient if the citation feature would not work in such a case. Getting all the references for computing a covariance matrix seems to be a reasonable task that should not require computing the covariance matrix on a single thread. But how can the citation feature be made thread safe? I have never really thought about such problems before but I think it requires storing the references in some object that can be shared between different processes. One possibility that I found is to use import multiprocessing
manager = multiprocessing.Manager()
shared_dict = manager.dict()
def job(string):
shared_dict[string] = None
pool = multiprocessing.Pool(2)
pool.map(job, ['a', 'b', 'c', 'a'])
pool.close()
pool.join() Then import multiprocessing
class Citations:
_papers_to_cite = multiprocessing.Manager().dict()
@classmethod
def reset(cls):
cls._papers_to_cite.clear()
@classmethod
def register(cls, inspire_key):
cls._papers_to_cite[inspire_key] = None
@classmethod
def to_set(cls):
return set(cls._papers_to_cite.keys())
@classmethod
def to_string(cls):
return ",".join(cls._papers_to_cite.keys())
from .citations import Citations and references could be added by simply calling flavio.Citations.register('my_inspire_key') I don't know if it's actually a good idea to implement it like this, but a first try works as expected: import multiprocessing
import flavio
def job(string):
flavio.Citations.register(string)
pool = multiprocessing.Pool(2)
pool.map(job, ['a', 'b', 'c', 'a'])
pool.close()
pool.join() Then |
I agree that having a class with only class methods was maybe not the best idea. I think it's a very good idea to use a module. But this module can actually be a class instance that has nice features like e.g. properties. This is described e.g. at https://stackoverflow.com/questions/2447353/getattr-on-a-module/7668273#7668273 after "Update" . I have a working implementation based on this that makes the following things possible:
All of this is achieved with the following from multiprocessing import Manager
import flavio
import sys
class CitationScope(object):
def __enter__(self):
self._citations_global = flavio.citations._papers_to_cite
flavio.citations._papers_to_cite = flavio.citations._manager.dict()
return flavio.citations
def __exit__(self, type, value, traceback):
flavio.citations._papers_to_cite = self._citations_global
class Citations:
_manager = Manager()
scope = CitationScope()
def __init__(self):
self._papers_to_cite = self._manager.dict()
self.register("Straub:2018kue")
def __iter__(self):
for citation in self._papers_to_cite.keys():
yield citation
def __str__(self):
return ",".join(self._papers_to_cite.keys())
@property
def string(self):
return str(self)
@property
def set(self):
return set(self)
def register(self, inspire_key):
self._papers_to_cite[inspire_key] = None
def clear(self):
self._papers_to_cite.clear()
def reset(self):
self.clear()
self.register("Straub:2018kue")
def switch_manager(self, manager):
self._manager = manager
self.__init__()
sys.modules[__name__] = Citations() If we agree that this is actually a good solution for the points we were discussing, I can push my version to the |
I agree, the context manager was not very convincing. I have implemented your suggestions, which has actually the nice side effect that since the context manager is a method now, it can receive a with flavio.citations.collect(multiprocess.Manager()) as citations:
pool = multiprocess.Pool(2)
pool.map(job, ['a','b','c','a'])
pool.close()
pool.join() this manager is also passed on to the context manager used inside the def theory_citations(self, *args, **kwargs):
with flavio.citations.collect() as citations:
flavio.sm_prediction(self.name, *args, **kwargs)
return citations.set I have made also some other improvements, in particular one doesn't lose the references anymore when using the from multiprocessing import Manager
import flavio
import sys
from itertools import chain
class CitationScope:
def __init__(self, manager=None):
if manager is not None:
self._manager = manager
else:
self._manager = flavio.citations._manager
def __enter__(self):
self._citations_global = flavio.citations
flavio.citations = Citations(self._manager)
return flavio.citations
def __exit__(self, type, value, traceback):
flavio.citations = self._citations_global
class Citations:
collect = CitationScope
def __init__(self, manager, initial_citations=[], copied_citations=[]):
self._manager = manager
self._initial_citations = initial_citations
self._papers_to_cite = self._manager.dict()
self._papers_to_cite.update(
{k:None for k in chain(initial_citations, copied_citations)}
)
def __iter__(self):
for citation in self._papers_to_cite.keys():
yield citation
def __str__(self):
return ",".join(self._papers_to_cite.keys())
@property
def string(self):
return str(self)
@property
def set(self):
return set(self)
def register(self, inspire_key):
self._papers_to_cite[inspire_key] = None
def clear(self):
self._papers_to_cite.clear()
def reset(self):
self.clear()
self._papers_to_cite.update({k:None for k in self._initial_citations})
def switch_manager(self, manager):
self.__init__(manager, self._initial_citations, self.set)
sys.modules[__name__] = Citations(Manager(), ["Straub:2018kue"]) |
Okay, having caught up with the discussion it all sounds good, and the new code looks nice to me. |
I have thought about this again and also considered the computing speed of the approach. I noticed that these I played around a bit with other shared objects that might be faster. I tried out Another shared object that is even faster is So in the end, I think we have the following possibilities:
@MJKirk @DavidMStraub what do you think? |
It seems to me that it's not just import flavio
from flavio.statistics.likelihood import FastLikelihood
from wilson import Wilson
import flavio.plots as fpl
my_obs = (
("<Rmue>(B+->Kll)", 1.1, 6.0),
("<Rmue>(B0->K*ll)", 0.045, 1.1),
("<Rmue>(B0->K*ll)", 1.1, 6.0)
)
L = FastLikelihood(name = "likelihood test", observables = my_obs)
L.make_measurement(threads = 4)
def my_LL(wcs):
ReC9mu, ImC9mu = wcs
par = flavio.default_parameters.get_central_all()
wc = Wilson({"C9_bsmumu" : ReC9mu + 1j* ImC9mu, "C10_bsmumu" : -(ReC9mu + 1j*ImC9mu)},
scale = 4.8, eft = "WET", basis = "flavio")
return L.log_likelihood(par, wc)
C9contour_data = fpl.likelihood_contour_data(my_LL, -2, 0, -3.5, 3.5, n_sigma = (1, 2), threads = 4)
print(flavio.default_citations) The final line just prints Have you done any "realistic" tests of how much the |
%%timeit
flavio.sm_prediction("BR(Bs->mumu)") yields around 470 μs with the conventional For theory predictions that have to call %%timeit
flavio.sm_prediction("<Rmue>(B0->K*ll)", 0.045, 1.1) yields around 170 ms for both Concerning the regexp stuff used by the |
Okay, yeah, that's too much of a slowdown. Darn. It seems to me that the |
7056f71
to
21ba0b0
Compare
OK, I've now pushed a new version using @MJKirk @DavidMStraub what do you think about this solution? |
Generally I'm happy. I don't think it's a problem that you can only cite papers within the main flavio source code, since the "point" of this feature is to cite the source of flavio;s internals. (Although actually if you did want to for some personal reason, you could just add to the citations yaml file manually.) In terms of some small improvements, should there be a nicer error message if you try and register a paper that isn't in |
Well in principle it would also be easy to add a class method to
Yes, good point. I've just added this to the Apart from this, I think we still need some docstrings and some tests. |
Yeah, I'm of the opinion that that is all we need to support, so it's fine as is.
👍 Docstrings and some more tests I can work on in the next day or two |
Specifically test that a multithreaded computation works and gives the same citations as single threaded.
Remove the test for a "unknown" inspire key, since it becomes a "known" inspirekey as soon as we re-run update_citations.py. Add a test to check that the list of "known" citations matches what's in the source code.
The Travis CI checks currently fail due to a problem with the new PyPI version of |
Ah okay. Sadly Github doesn't seem to have an emoji for "phew, I'm glad I didn't manage to break everything with one simple commit" 😂 |
3e27730
to
13f3946
Compare
@MJKirk I've added a small commit that I still had lying around and that slightly improves the regexp in the Is there still something you want to add? I think in principle, this PR looks quite good. @DavidMStraub what do you think? |
No, there's nothing else from me, I'm happy for you to merge it. Maybe I can just open an issue to list the couple of observables where I wasn't sure, for future reference rather than them being buried in the middle of this long PR? (It was ee->WW, neutrino trident, and perhaps B decay generally). |
OK, done!
Yes, I think it's a good idea to open a new issue for this! |
* Copy over PyBaMM citations code Also add license notice * Minimal working version of citations function By default, just cite the flavio paper. Removed the read_citation method, as rather than reading the bibtex from a file, it seems better to just ouput the INSPIRE texkeys and worry about getting bibtex from INSPIRE later. (E.g. if you print to a .tex file, you could use INSPIRE's bibliography generator at https://inspirehep.net/bibliography-generator, or if fixed David's inspiretools python package.) * A initial citation reference for testing The way flavio does Gamma_12 is based on the a,b,c notation originally introduced in the 2003 BBLN paper * Add SM_citations function to Observable class Gives the theory papers to be cited for an observable, using a throwaway instance. As part of this, move the registering of the flavio paper out of the constructor so you don't get that as well. * Merge _reset method into constructor Since this only gets used here, just do it in the constructor. (In the original PyBaMM version the _reset just gets used for testing, which I think we want to do slightly differently anyway.) * Add docstring to SM_citations * Copy over the PyBaMM citation test class * Revert "Merge _reset method into constructor" This reverts commit a6a545e. Yeah, we need the reset method for testing... * Rejig the print functions to return a list Return something instead of printing directly, unless outputting to a file. * Add some tests of the citation functionality * Fix typo in docstring * Citations for B->Xgamma Note the CP asymmetry needs citations adding, wherever the numbers in ka1_r, ka2_r, ka1_i came from. * Citations for meson mixing Just the Inami-Lim function - everything else worth citing comes in through parameters (fB, B, RG factors) * Add citations for W and Z observables * Improve theory citation function Allow for arguments to be passed to prediction function. Also change the name as you are really getting the citations for a general theory prediction, not just the SM. * Add citations for beta decay * Fix failing test Forgot to update the test when I renamed the function * Citations for mu and tau decays * Citations for quark mass conversions * Citations for K decays * Add D decay form factor citations * Add a bunch of b decay citations * Add bvll citations * Add B -> P formfactor citations * Add B -> gamma formfactor citations * Add citations for B->V formfactors * Small change to docstrings * Add citations for Higgs stuff I'm assuming all the numbers come (indirectly) from 1911.07866 as mentioned in the release notes for flavio v2 * Adds citations for B->llgamma * Fix indentation Got borked when I rebased * Move copyright notice to appropriate file * Give citations instance a unique name * Correct docstring * Fix code typo Somehow this find+replace got mucked up * Improve internals of the citation class Update tests and the theory_citation function too * Remove tex citation string method, add properties * Add convenience `register_citation` method * use multiprocessing.Array * raise meaningful error meassge if inspire key not in YAML * Add docstrings for the main methods * Update and add tests Specifically test that a multithreaded computation works and gives the same citations as single threaded. * Update tests Remove the test for a "unknown" inspire key, since it becomes a "known" inspirekey as soon as we re-run update_citations.py. Add a test to check that the list of "known" citations matches what's in the source code. * improve regexp in `extract_citations` function Co-authored-by: Peter Stangl <peter.stangl@ph.tum.de>
A start on the theory citation function as discussed in #120 - just the basics so far, not to be merged yet.