-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Featurizing seems rather slow #2
Comments
That's a good point thanks for bringing it up, I'm afraid I haven't got the bandwidth to test out the best chunksizes for different sized datasets at the moment. I've added chunksize as a parameter to the ElM2D class in 0.3.15 which will be used in each of these lines if that's of use? ElM2D(chunksize=64) I'm afraid I haven't been able to fix the other spyder issue, but will leave it open for now. |
This is great, thanks! Also, no pressure on the Spyder issue. |
Something like the following seems to be a lot faster (probably because it only involves a single call to ElMD(): E = ElMD()
def gen_ratio_vector(comp):
"""Create a numpy array from a composition dictionary."""
if isinstance(comp, str):
comp = E._parse_formula(comp)
comp = E._normalise_composition(comp)
sorted_keys = sorted(comp.keys())
comp_labels = [E._get_position(k) for k in sorted_keys]
comp_ratios = [comp[k] for k in sorted_keys]
indices = np.array(comp_labels, dtype=np.int64)
ratios = np.array(comp_ratios, dtype=np.float64)
numeric = np.zeros(shape=len(E.periodic_tab[E.metric]), dtype=np.float64)
numeric[indices] = ratios
return numeric
def gen_ratio_vectors(comps):
return np.array([gen_ratio_vector(comp) for comp in comps])
U = gen_ratio_vectors(formulas)
V = gen_ratio_vectors(formulas2)
lookup, periodic_tab, metric = attrgetter("lookup", "periodic_tab", "metric")(E)
ptab_metric = periodic_tab[metric]
def get_mod_petti(x):
return [ptab_metric[lookup[a]] if b > 0 else 0 for a, b in enumerate(x)]
def get_mod_pettis(X):
return np.array([get_mod_petti(x) for x in X])
U_weights = get_mod_pettis(U)
V_weights = get_mod_pettis(V) |
It looks like this issue was introduced when the additional lookup tables were added. Because it was loading a big json from disk into ram for each composition it was slowing things down a lot. I've reduced the memory overhead of this function in ElMD and followed this suggestion a bit by caching the functions output to local memory which has significantly sped up parsing. That's now pushed to ElM2D==0.4.0 and ElMD==0.4.2 |
Does it have to do with
chunksize=1
?ElM2D/ElM2D/ElM2D.py
Lines 435 to 444 in 7c82fdc
The text was updated successfully, but these errors were encountered: