You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Brown cluster is relatively easy; you can use any available tool to generate the brown clusters and use the following script to convert into the NLP4J format:
Ambiguity class is a hashmap, where the key is a word and the value is the list of possible pos tags. You can save this also to a java object and compress it to the xz format.
I made use of Percy Liang's C++ implementation of the Brown hierarchical word clustering algorithm.
Once the clusters are created using Liang's implementation, they are then converted using the script specified in your your response. These converted files are then placed in nlp4j-english-1.1.2.jar in the lexica directory alongside the other cluster and ambiguity classes.
In the config-decode-pos.xml and the config-train-pos.xml files, the following lexica field is adapted: <word_clusters field="word_form_lowercase">edu/emory/mathcs/nlp/lexica/SA-lang-clusters.xz</word_clusters>
No errors arise when training with these specifications, however the accuracy of the PoS tagger model remains unchanged when compared to its control model which is trained without the cluster class. I am not sure what could be the cause of this.
I have also tried all of the possible word cluster fields, including:
Hi,
I would like to know whether it is possible to train my own Ambiguity and Cluster models to be used with POS tagging South African languages.
The only options available are the currently included models:
If it be possible, how could I go about creating them?
Regards
The text was updated successfully, but these errors were encountered: