PyDecode is a dynamic programming toolkit developed for research in natural langauge processing. Its aim is to be simple enough for fast prototyping, but efficient enough for research use.
Simple specifications. Dynamic programming algorithms specified through pseudo-code.
# Viterbi algorithm. ... c.init(items[0, :]) for i in range(1, n): for t in range(len(tags)): c.set(items[i, t], items[i-1, :], labels=labels[i, t, :]) graph = c.finish()
Efficient implementation. Core code in C++, python interfaces through numpy.
# Compute path. label_weights = numpy.random.random(graph.label_size) weights = pydecode.transform_label_array(graph, label_weights) path = pydecode.best_path(graph, weights)
High-level algorithms. Includes a set of widely-used algorithms.
# Inside probabilities. inside = pydecode.inside(graph, weights, kind=pydecode.LogProb) # (Max)-marginals. marginals = pydecode.marginals(graph, weights) # Pruning mask = marginals > threshold pruned_graph = pydecode.filter(graph, mask)
Integration with machine learning toolkits. Train structured models.
# Train a discriminative tagger. perceptron_tagger = StructuredPerceptron(tagger) perceptron_tagger.fit(X, Y) Y_test = perceptron_tagger.predict(X_test)
Visualization tools. IPython integrated tools for debugging and teaching.
pydecode.draw(graph, paths=paths)