-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(nmod): Add nmod_ctx to store is_prime #179
base: main
Are you sure you want to change the base?
Conversation
9d225b3
to
34a3b49
Compare
This should be actually working now unlike previous commits. I haven't implemented the prime checks yet but I have changed it so that This gives: In [3]: sys.getsizeof(nmod(3, 7))
Out[3]: 48 That is the same as on master. What has happened is that rather than storing a 64 bit When calling a C function this means that it looks like: __pyx_v_r->val = nmod_mul(__pyx_v_val, __pyx_v_s2->val, __pyx_v_s2->ctx->mod); Whereas on master it looks like: __pyx_v_r->val = nmod_mul(__pyx_v_val, __pyx_v_s->val, __pyx_v_r->mod); That has an extra indirection to dereference two pointers rather than one. In the context of all of the surrounding code though I doubt it is significant. It is very likely the context is going to be high in the memory cache if this indirection happens a lot. A simple timing with the PR: In [3]: %timeit a*a
333 ns ± 8.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) With master that is: In [3]: %timeit a*a
202 ns ± 3.71 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) So it looks like there is a significant slowdown. I will investigate further but I doubt that the pointer indirection is the cause. More INCREF/DECREF is involved since the context object is a reference-counted. Possibly this is more significant: any_as_nmod(&val, t, (<nmod>s).mod)
vs:
s2.ctx.any_as_nmod(&val, t) It might be better to inline that a bit more somehow rather than calling a method on ctx so maybe the Cython code needs to look like: any_as_nmod(&val, t, (<nmod>s.ctx).mod) Actually I just tried that it brings the time back down to: In [3]: %timeit a*a
244 ns ± 2.18 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) I'll push that approach up... So clearly we need to a bit careful here because subtle differences in exactly how the code is called can make a big difference to overall timings. I want to try to get back down to at least the same time if not faster than master. |
It is potentially not worth slowing down nmod for prime modulus just to have better error handling for non-prime modulus... |
I'm not sure where that comes from. General Python object overhead is 16 bytes. An In [3]: sys.getsizeof(fmpz(3))
Out[3]: 24 An In [4]: sys.getsizeof(fmpq(3))
Out[4]: 32 An ctypedef struct nmod_t:
mp_limb_t n
mp_limb_t ninv
flint_bitcnt_t norm according to Here in this PR though an Maybe adding an attribute that is a Python object adds some other overhead... The struct in the generated C code is: struct __pyx_obj_5flint_5types_4nmod_nmod {
struct __pyx_obj_5flint_10flint_base_10flint_base_flint_scalar __pyx_base;
mp_limb_t val;
struct __pyx_obj_5flint_5types_4nmod_nmod_ctx *ctx;
}; So it is just It would be nice if we could get it down to 32 bytes rather than 48... |
After playing around with this a bit I think that it is not possible to store an I show a lot of timings below and they vary quite a bit from run to run (e.g. if I quit ipython and restart it). The general trends are as described though. This time I am building with Timings for master: In [1]: from flint import *
In [2]: ai = 10 # CPython caches small ints
In [3]: %timeit ai * ai
27 ns ± 0.169 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [4]: ai = 1000 # Larger ints not cached
In [5]: %timeit ai * ai
49.4 ns ± 1.36 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [7]: ag = gmpy2.mpz(10) # gmpy2 slower for small ints
In [8]: %timeit ag * ag
74.2 ns ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [9]: af = nmod(10, 17) # nmod similar
In [10]: %timeit af * af
75.3 ns ± 0.255 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) So For very small integers CPython caches them in memory which avoids the heap allocation. That makes it 3x faster than By contrast with my best efforts (not pushed) to micro-optimise the approach in this PR I get: In [3]: %timeit af * af
104 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) That's a 30% slowdown. I believe that this is purely due to the unavoidable overhead of needing to assign a Python object in Cython like:
This happens for every operation that creates a new /* "flint/types/nmod.pyx":403
* raise ValueError("cannot coerce integers mod n with different n")
* r = nmod.__new__(nmod)
* r.ctx = s.ctx # <<<<<<<<<<<<<<
* r.val = nmod_mul(s.val, t.val, s.ctx.mod)
* return r
*/
__pyx_t_2 = ((PyObject *)__pyx_v_s->ctx);
__Pyx_INCREF(__pyx_t_2);
__Pyx_GIVEREF(__pyx_t_2);
__Pyx_GOTREF((PyObject *)__pyx_v_r->ctx);
__Pyx_DECREF((PyObject *)__pyx_v_r->ctx);
__pyx_v_r->ctx = ((struct __pyx_obj_5flint_5types_4nmod_nmod_ctx *)__pyx_t_2);
__pyx_t_2 = 0; I don't know what all these macros are doing but the analogous code when we store an Also here are timings for a more macro benchmark where SymPy uses python-flint's In [13]: from sympy import *
In [14]: M = randMatrix(100)
In [15]: dM = M.to_DM(GF(17))
In [26]: %timeit dM.inv()
11.2 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Converting that to a Flint In [27]: fM = dM.to_dense().rep.rep
In [28]: type(fM)
Out[28]: flint.types.nmod_mat.nmod_mat
In [30]: %timeit fM.inv()
1.16 ms ± 35.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) So that is about SymPy's pure Python matrix implementation operating with python-flint's With this PR instead of python-flint master we get: In [8]: %timeit dM.inv()
15.9 ms ± 158 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) That's 16 milliseconds rather than 11 milliseconds so a 50% slowdown. To control all extraneous factors I made these two simpler demo types: from cpython.object cimport PyTypeObject, PyObject_TypeCheck
cdef class int_mod1:
cdef unsigned long long val
cdef unsigned long long mod
def __init__(self, val, mod):
self.val = val
self.mod = mod
def __repr__(self):
return f"{self.val} mod {self.mod}"
def __mul__(self, other):
cdef int_mod1 result
if not PyObject_TypeCheck(other, <PyTypeObject*>int_mod1):
return NotImplemented
if self.mod != (<int_mod1>other).mod:
raise ValueError("cannot multiply integers mod n with different n")
result = int_mod1.__new__(int_mod1)
result.val = (self.val * (<int_mod1>other).val) % self.mod
result.mod = self.mod
return result
cdef class int_mod2_ctx:
cdef unsigned long long mod
def __init__(self, mod):
self.mod = mod
cdef class int_mod2:
cdef unsigned long long val
cdef int_mod2_ctx ctx
def __init__(self, val, mod):
self.val = val
self.ctx = int_mod2_ctx(mod)
def __repr__(self):
return f"{self.val} mod {self.ctx.mod}"
def __mul__(self, other):
cdef int_mod2 result
if not PyObject_TypeCheck(other, <PyTypeObject*>int_mod2):
return NotImplemented
if self.ctx is not (<int_mod2>other).ctx:
raise ValueError("cannot multiply integers mod n with different n")
result = int_mod2.__new__(int_mod2)
result.val = (self.val * (<int_mod2>other).val) % self.ctx.mod
result.ctx = self.ctx
return result The only difference between these two is whether we have a In [10]: a1 = int_mod1(1000, 10000)
In [11]: a2 = int_mod2(1000, 10000)
In [12]: %timeit a1 * a1
91.4 ns ± 2.16 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [13]: %timeit a2 * a2
142 ns ± 0.38 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) So that one difference (using an Note also: In [17]: sys.getsizeof(a1)
Out[17]: 32
In [18]: sys.getsizeof(a2)
Out[18]: 48 That sort of explains why in this PR I think that proves then that we can't have an For In [15]: %timeit fM.inv()
1.12 ms ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) That is because a 30 nanosecond overhead just doesn't register in the context of a 1 millisecond operation. The conclusion then is this: we have Part of the reason for wanting to have contexts (apart from uniformity across types) is to store whether or not the modulus is prime. This is actually not needed in the case of In conclusion then I need to revert a lot of the changes to |
I asked on the Cython mailing list and this is apparently due to cyclic GC: |
Using
Turns out I was completely wrong about this. I forgot that in this particular case what happens is that SymPy's sparse implementation converts the matrix to the dense representation which basically means it uses This is a better comparison of these using python-flint master: In [1]: from sympy import *
In [2]: M = randMatrix(100)
In [3]: dM_sympy_sparse = M.to_DM(GF(17)).to_sdm()
In [4]: dM_sympy_dense = M.to_DM(GF(17)).to_ddm()
In [5]: dM_nmod_mat = M.to_DM(GF(17)).to_dfm().rep
In [6]: %timeit dM_sympy_sparse.rref()
136 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: %timeit dM_sympy_dense.rref()
81.6 ms ± 694 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [8]: %timeit dM_nmod_mat.rref()
313 µs ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) So in this PR with If I remove the diff --git a/src/flint/types/nmod.pyx b/src/flint/types/nmod.pyx
index 3b99b54..145b15d 100644
--- a/src/flint/types/nmod.pyx
+++ b/src/flint/types/nmod.pyx
@@ -195,7 +195,6 @@ cdef class nmod_ctx:
return self._new(&v)
-@cython.no_gc
cdef class nmod(flint_scalar):
"""
The nmod type represents elements of Z/nZ for word-size n. Then timings are: With In [4]: %timeit dM_sympy_dense.rref()
93.2 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [5]: %timeit dM_sympy_dense.rref()
91.6 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: %timeit dM_sympy_dense.rref()
94.6 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Without In [8]: %timeit dM_sympy_dense.rref()
110 ms ± 1.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [9]: %timeit dM_sympy_dense.rref()
112 ms ± 2.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [10]: %timeit dM_sympy_dense.rref()
113 ms ± 4.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) With master: In [4]: %timeit dM_sympy_dense.rref()
80.1 ms ± 280 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [5]: %timeit dM_sympy_dense.rref()
82.1 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: %timeit dM_sympy_dense.rref()
84 ms ± 5.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) I've repeated this a few times (different matrices, restart process, etc) and it is consistent so |
s2 = s | ||
ctx = s2.ctx | ||
sval = s2.val | ||
if not ctx.any_as_nmod(&tval, t): | ||
return NotImplemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling ctx.any_as_nmod
method is slower than calling the any_as_nmod
function...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we declare them as @cython.final
to avoid the virtual method table then it is just as fast I think...
src/flint/types/nmod_mat.pyx
Outdated
res = nmod_poly.__new__(nmod_poly) | ||
nmod_poly_init(res.val, self.val.mod.n) | ||
# XXX: don't create a new context for the polynomial | ||
res = nmod_poly_new_init(any_as_nmod_poly_ctx(self.ctx.mod.n)) | ||
nmod_mat_charpoly(res.val, self.val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to avoid creating new contexts. I think we need an nmod_mat_ctx
that holds an nmod_ctx
and an nmod_poly_ctx
as attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now an nmod_mat_ctx
holds an nmod_poly_ctx
which holds an nmod_ctx
.
src/flint/types/nmod_poly.pyx
Outdated
cdef class nmod_poly_ctx: | ||
""" | ||
Context object for creating :class:`~.nmod_poly` initalised | ||
with modulus :math:`N`. | ||
|
||
>>> nmod_poly_ctx(17) | ||
nmod_poly_ctx(17) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have __init__
raise an exception so you have to use something like nmod_poly_ctx.new()
. Then we can make all contexts unique on construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/flint/types/nmod_poly.pyx
Outdated
v = nmod_poly.__new__(nmod_poly) | ||
nmod_poly_init(v.val, self.val.mod.n) | ||
v = nmod_poly_new_init(self.ctx) | ||
nmod_poly_deflate(v.val, self.val, n) | ||
v.ctx = self.ctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have a new_nmod_poly(self.ctx)
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather self.ctx.new_nmod_poly()
as a @cython.final
method.
@GiacomoPope @Jake-Moss I don't know if either of you wants to review this. I ended up doing a bunch of somewhat unrelated things like improving the coverage plugin adding tests for full coverage and then needing to make various types consistent with error-handling etc. I also found some problems with the newly added methods like The main change here though is that On master an Here in the PR each Initially I found that the size was still 48 bytes but that is because Cython implicitly decided that an nmod should be GC-tracked if it has a PyObject pointer as one of its fields. The GC-tracking added not just 16 bytes but also significant time overhead when creating nmod instances. After discussing on the Cython mailing list I see that the I also moved functions like If you have e.g.: cdef some_func(nmod_ctx ctx, val):
...
cdef class nmod_ctx:
cdef some_func(self, val):
... then from my measurements calling the method will be significantly slower than calling the function: ctx.some_func(val) # slow
some_func(ctx, val) # fast The slowdown is due to the virtual method table because a subclass might override the I believe that attaching I wanted the context objects to be unique in memory which is not possible if using the ordinary constructor like: ctx = nmod_ctx(17) I have disabled that so constructing a context that way gives an error requiring instead that a context be created like: ctx = nmod_ctx.new(17) This (static) method first tries to lookup the context from a dict that holds all contexts previously created and if not found then it creates a new context. This ensures that the contexts are always unique in memory and so do not need This does all mean that the contexts are never freed from memory though so if someone uses many different moduli then they will gradually consume more and more memory because of context objects that remain attached to the dict in the Regardless of whether the python-flint/src/flint/test/test_all.py Lines 2343 to 2347 in 9a0327f
I would want to rewrite that as: Z = flint.fmpz_ctx()
Q = flint.fmpq_ctx()
F17 = flint.nmod_ctx(17)
F163 = flint.fmpz_mod_ctx(163) Then ideally you can use them all the same way like I changed the nmod constructor so that the second argument can be either an ctx = nmod_ctx(17)
a = nmod(3, 17)
b = nmod(3, ctx) Internally c = ctx(3) in which case there does not need to be any overhead. In any case the cost of a call like I also added contexts for At the higher level in future I would envisage having an object like Z17 = Zmod(17) which would internally hold references to all three kinds of context and then you could do e.g.: a = Z17(2)
x = Z17.poly([0, 1])
M = Z17.matrix([[1, 2], [3, 4]])
mat17 = Z17.mat_ctx()
M2 = mat17([[3], [4]]) It might make most sense for the dict management to take place at that higher Zmod level and for Zmod to be a GC-managed object that can be in a weakref dict. The individual nmod and nmod_poly instances would not hold a pointer to Z17 but rather to the I am hoping that the basic structure of the context objects here is a reasonable design that we can use for all types and domains which is why I put some time into micro-optimising and measuring timings etc to see exactly how to organise methods like A separate consideration is how to give these contexts a uniform Python-level interface. I don't think that we can afford the cost of a virtual method table lookup for something like |
I've just had a quick read over everything here + the diff but won't have the time this week for a proper review.
🎉
That is pretty interesting, I was unaware of that decorator. Seems like it might also be applicable to other types. I can't imagine any of the
This what I thought #192 would achieve, AFAIK unfortunately for the arithmetic operators, that virtual function look up is required.
I think despite this it's certainly something worth doing. AFAIK this is the case with all mpoly contexts ATM. These objects are small enough that millions would have to be made before anyone would look at it and notice something is off. It should be possible to reuse the C implementation of
I agree.
I think this is also a good idea.
This is quite interested to me, I haven't put a lot of time into micro-optimisations and benchmarks for Cython because my work outside of |
No problem. There is no rush so I will leave this here if you intend to review it later.
The benefits are going to be most extreme for Also it is worth trying to minimise the overhead for trivial cases like the zero polynomial. Consider e.g.: In [1]: M = randMatrix(10, percent=10) + randMatrix(10, percent=10)*x
In [3]: M.to_DM().to_ddm()
Out[3]: DDM([[7*x, 0, 0, 0, 0, 0, 0, 5*x, 0, 0], [0, 69, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 82*x, 0], [69, 96*x, 0, 0, 33, 0, 0, 0, 0, 0], [0, 0, 15*x + 21, 0, 0, 0, 0, 0, 0, 24*x], [14*x, 0, 0, 0, 0, 36, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 65*x, 0, 0, 35], [23, 0, 0, 0, 0, 0, 25*x, 0, 31, 14*x], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 55, 0, 87, 0, 0, 0, 0]], (10, 10), ZZ[x]) Here you have a dense matrix that is full of zeros but those zeros are actually zero polynomials. The speed of e.g. matrix multiplication with this representation depends predominantly on how fast you can multiply and add mostly zero polynomials so the overhead in trivial cases does matter. As for the reference cycles, consider this: an M = p.companion_matrix() # cyclic
M = nmod_mat.companion_matrix(p) # acyclic Hopefully that acyclic structure:
is something that can reasonably be used for all types and domains. |
Here is a timing comparison of that. With master: In [2]: M = nmod_mat([[1, 2], [3, 4]], 17)
In [3]: %timeit M*M
404 ns ± 2.52 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) With the PR: In [4]: %timeit M*M
425 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) So maybe there is some slowdown... Oh, wait it is because I forgot to add the If I fix that by adding In [3]: %timeit M*M
413 ns ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [4]: %timeit M*M
405 ns ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) The timings can vary a lot though so it is difficult to be conclusive. They usually seem consistent if I run them repeatedly in the same ipython session but if I exit ipython and then start it again they might suddenly change. They also can change significantly if I checkout a different branch and rebuild, so I guess measuring timings this short is just tricky somehow because it depends on so many things. The timings I show here are just the ones that I get as I write the message but I have seen significantly slower and faster timings as well... It is possible that In [9]: import math
In [10]: matrices = [M] * 10 ** 6
In [11]: %time math.prod(matrices)
CPU times: user 417 ms, sys: 107 µs, total: 417 ms
Wall time: 416 ms
Out[11]:
[1, 0]
[0, 1] And this is master: In [6]: %time math.prod(matrices)
CPU times: user 413 ms, sys: 0 ns, total: 413 ms
Wall time: 412 ms
Out[6]:
[1, 0]
[0, 1] So basically the same time. |
The question is whether we guarantee uniqueness of contexts or not. A cache that has a maximum size or that can be cleared cannot guarantee uniqueness. It is not necessarily critical that we should guarantee uniqueness but just it has some other implications for how you compare the contexts. Suppose for example that we have a On the other hand if we don't guarantee uniqueness then it becomes a lot easier to ensure that memory is cleared etc. The idea I suggested above about moving the dict management up to the The approach used in def __eq__(ctx1, ctx2):
if ctx1 is ctx2:
return True
else:
return _compare_by_value(ctx1, ctx2) Then in the happy fast path where the contexts match it is not much worse than a pointer comparison. I don't think any timings have been measured to see how that compares with having guaranteed unique contexts and not defining |
2c6aa95
to
ba2cec2
Compare
I had a look through this and it seems good -- i dont think i have anything big or urgent to add to what has already been discussed. |
PyPy cannot handle 3-arg pow if the first two types don't know about the third like pow(int, int, fmpz).
f102dea
to
a9215d0
Compare
The last two rebases were just because of merge conflicts after all the linting PRs and this time I pushed a commit to handle some remaining lint complaints in this branch. |
IMO I think we can include this, with the only "maybe" being the caching of the context, because im not sure what's best here with whether |
It is simple enough to change the caching of the context later. The basic question that needs resolving is whether the context can be attached to an nmod as a field (attribute). There are two kinds of type in python-flint:
In the first case a context can always just be a global object and so no data needs to be attached to instances to identify a context. In the second case we either need to attach a pointer to the context object or some data that identifies the context and could be used to construct a context object on demand. There is at least in principle some overhead in attaching a pointer to the context object because of reference counting. I don't know whether it might also be problematic in a multithreaded scenario to have all instances sharing a context across multiple threads: https://peps.python.org/pep-0703/#deferred-reference-counting Some types like Since If we can't attach the context object directly at the C level then we need to have some way to get a context from an object but it potentially needs to be created on demand (or looked up from a cache). Then for e.g. Then again we need to special case the types that don't have a context object anyway. Otherwise we could just do: cdef class flint_elem:
flint_ctx ctx Then it would always be fast in Cython to access the |
If all elements have an associated context, then we can use these context objects for the coercion. Eg we can have a |
Yes, exactly. There can be a global Cases where the context is not just a global object can use a static int __pyx_f_5flint_5types_4nmod_8nmod_ctx_any_as_nmod(
struct __pyx_obj_5flint_5types_4nmod_nmod_ctx *__pyx_v_ctx,
mp_limb_t *__pyx_v_val,
PyObject *__pyx_v_obj
) In itself attaching those methods to the context objects does not give any particular benefit because at the Cython level you still need to type what sort of context object is being called statically and what sort of type ( The benefit of attaching all of this to the contexts is that you can then have more generic Cython or Python code that uses the context objects to do conversions and coercions. You need the context objects so that you have a way of saying what it is you are converting to or from in a generic context like: ctx.convert(element)
ctx1.convert_to(element, ctx2)
ctx1.convert_from(element, ctx2)
ctx3 = ctx1.unify(ctx2)
poly2 = poly2.convert_to(ctx2)
mat2 = mat1.convert_to(ctx2)
... Also once you have the context objects they can start to share some generic code. |
yeah i really like the idea of this design decision, having the contexts done like this will help with coercion which is one of the bottlenecks in sage due to all the various parsing needed between libraries. By solving this for python-flint we should have something significantly faster |
Actually this is only true in the Looking at it if you want it to be a plain function call everywhere then you need to move the body of these methods in to the .pxd file and declare them I've just pushed a commit to make all |
Work in progress...
Fixes gh-124
Fixes gh-73
Have nmod use a context object so that we can store the additional information that the modulus is prime to avoid operations that might fail for non-prime modulus (gh-124).