We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I added a verbose mode for the similarity transformer using tqdm. let me know if you want a PR for this. it looks like this when running benchmark.py
the benchmark script takes a rather long time on my computer so I wanted to figure out what was going on and save the model for reuse as well
diff --git a/paradox/benchmark.py b/paradox/benchmark.py index b0beba2..5922ac7 100644 --- a/paradox/benchmark.py +++ b/paradox/benchmark.py @@ -1,3 +1,4 @@ +import logging from metrics import pearson, mse from pipeline import pipeline import k_neighbors_regressor @@ -5,6 +6,7 @@ import numpy as np import similarity import parser +logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s') def report(correlations, errors, y_pred_fold): print("PC:\t\t\t%0.2f\t(+/- %0.2f)" % (np.mean(correlations), @@ -28,11 +30,14 @@ def test(model=None, categories=[]): pairs = parser.parse(mode="train") X = [pair[0] for pair in pairs] y = [pair[1] for pair in pairs] -transformer = similarity.build() +transformer = similarity.build(verbose=True) estimator = k_neighbors_regressor.build(n_neighbors=4) p = pipeline(transformers=[transformer], estimator=estimator) p.fit(X, y) +import pickle +with open('model.pickle', 'wb') as f: + pickle.dump(p, f) test(p, categories=["answer-answer"]) test(p, categories=["question-question"]) diff --git a/paradox/similarity.py b/paradox/similarity.py index d5b20b5..6496787 100644 --- a/paradox/similarity.py +++ b/paradox/similarity.py @@ -41,8 +41,8 @@ def similarity(text1, text2, levels=['surface', 'context']): return sims -def build(levels=['surface', 'context']): - pipeline = Pipeline([('transformer', Similarity(levels=levels))]) +def build(levels=['surface', 'context'], verbose=False): + pipeline = Pipeline([('transformer', Similarity(levels=levels, verbose=verbose))]) return ('similarity', pipeline) @@ -52,15 +52,24 @@ def param_grid(): class Similarity(BaseEstimator): - def __init__(self, levels=['surface']): + def __init__(self, levels=['surface'], verbose=False): self.levels = levels + self.verbose = verbose def fit(self, X, y): return self def transform(self, X): a = [] - for x in X: + + tqdm = lambda x: x + if self.verbose: + try: + from tqdm import tqdm + except ImportError: + pass + + for x in tqdm(X): a.append(self._transform(x)) return a
The text was updated successfully, but these errors were encountered:
Thank you for your effort. A PR would be good!
Sorry, something went wrong.
@pasmod submitted #33 for verbose mode. may require merging #31 and #32 first. thanks!
Successfully merging a pull request may close this issue.
I added a verbose mode for the similarity transformer using tqdm. let me know if you want a PR for this. it looks like this when running benchmark.py
the benchmark script takes a rather long time on my computer so I wanted to figure out what was going on and save the model for reuse as well
The text was updated successfully, but these errors were encountered: