Skip to content

Commit

Permalink
Add script for converting Penn Treebank to Stanford format
Browse files Browse the repository at this point in the history
  • Loading branch information
evelinacs committed Aug 10, 2018
1 parent 981836b commit 51f044f
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions exp/alto/tools/format_tree.py
@@ -0,0 +1,12 @@
#!/usr/bin/env python3

import sys
import re

def format_tree(): #converts from Penn Treebank to Stanford output
regex = re.compile(r"\(([A-Za-z_$]+)")
with open(sys.argv[1]) as np_lines:
for line in np_lines:
print(regex.sub(r"\1(", line), end="")

format_tree()

0 comments on commit 51f044f

Please sign in to comment.