CDACM-Reordering-System (ROSY) - v1.0 - 2014-11-26

This package provides a source side Reordering system (ROSY) for English, used primarily for English to Indian language machine translation. ROSY accepts an English sentence in plain text format, reorders some of the words according to a given set of rules, and outputs the resulting sentence in plain text form. For xml input it extracts sentences, reorders and put it back in the same xml file.

Original Reordering code by Ananthakrishnan Ramanathan & Ritesh M. Shah. Modified code by Raj Nath Patel. For details on the approach and reordering rules, refer to "Raj N. Patel, Rohit Gupta, Prakash B. Pimpale and Sasikumar M, Reordering rules for English-Hindi SMT, in Second Workshop on Hybrid Approaches to Translation, ACL 2013, Sofia, Bulgaria, August 4-9, 2013"

LICENSE

The software is licensed under GPL V3. Please see the file LICENSE.txt

For more information, bug reports, and fixes, contact: Raj Nath Patel / Prakash Pimpale CDAC, Rain tree marg, sector 7, CBD Belapur Navi Mumbai - 400614 India rajnathp@cdac.in/prakash@cdac.in http://kbcs.in/

CONTACT

For questions about this distribution, please contact KBCS, CDAC Mumbai at sasi@cdac.in. We provide assistance on a best-effort basis.

TUTORIAL/QUICKSTART INSTRUCTIONS

Quickstart guidelines, primarily for end users who wish to use the reordering, are below.

DEPENDENCIES

-OS - Linux
-perl should be installed 
-python should be installed	
-python xml library(lxml) should be installed 
("lxml" Can be installed using command "#yum install python-lxml")

INSTALLING

To install the system, just extract the file 'cdacm-kbcs.rosy-v1.0.tar.gz'. On command line-

$tar -xzvf cdacm-kbcs.rosy-v1.0.tar.gz

USE

You can use the system in two ways. If your input is in plain text file (example at ./sample-input.txt) you should use cdacm.ROSY-T.sh script for reordering and cdacm.ROSY-X.sh if it is in XML format (example at ./sample-input.xml), in the following way -

$bash <SCRIPTNAME> <INPUTFILE> 

Sample Usage :
Usage 1:
	- $bash cdacm.ROSY-T.sh sample-input.txt
It takes text file sample-input.txt as input and outputs reordered sentences in file sample-input.reorderd.txt

Usage 2:
	-$bash cdacm.ROSY-X.sh sample-input.xml
it takes xml file as input, extract sentences inside the "input" tag, reorders it and put it back in the same file. Sample Input and Output given below.

SAMPLE XML INPUT:

<Input sentence="01">It is beautiful.<Reordered></Reordered></Input>
<Input sentence="02">Jaipur , popularly known as the Pink City , is the capital of Rajasthan state , India.<Reordered></Reordered></Input>
NOTE: Its part of xml input file. For whole format see the "sample-input.xml"

SAMPLE XML OUTPUT:
<Input sentence="01">It is beautiful.<Reordered>It beautiful is .</Reordered></Input>
<Input sentence="02">Jaipur , popularly known as the Pink City , is the capital of Rajasthan state , India.<Reordered>Jaipur , popularly the Pink City as known , Rajasthan state , It beautiful is India . of the capital is .

NOTE: Intermediate outputs will be saved "cdacm.rosy-v1.0/tmp"

PROGRAMMATIC USE

You need to make a call to cdacm.reorder-text.sh and cdacm.reorder-xml.sh from you program to use the system.

DETAILED

INSIDE

-cdacm-kbcs.rosy-v1.0

-research //directory containing publications

	-input.txt //sample input text file

	-input.xml // sample input xml file

-tools //directory containing all other tools

	-grammar //containing trained model for English parser(Stanford)

		-englishPCFG.ser.gz //model

	-Parse //directory containing Recursive Decent Parser

		-RecDescent.pm //Recursive Decent Parser

	-cdacm-reorder.pl //perl script containing reordering rules

	-changeAtt.py // To change the attribute value in xml files			
	
	-compare2files.py //To match to file at word level. In English parsing process, parser changes some words, like colour-> color.

	-convert.py // convert the parsed sentence output, compatible to "cdacm-reorder.pl"
	
	-getAttValue.py// To get input text from xml file

	-lexparser.sh // Its shell script to run "Stanford Parser"

	-runstanfordparser.sh // To run "lexparser.sh"

	-stanford-parser.jar // Stanford Parsing system

-cdacm.reorder-text.sh //shell script to initiate reordering for text file

-cdacm.reorder-xml.sh  //shell script to initiate reordering for xml file

-sample-input.txt //sample input text file

-sample-input.xml //sample input xml file

-README// This file

INPUT-OUTPUT SPECIFICATION

System takes input in two format

-text files //Plain English text, sentences separated by newline. Sample is given as "sample-input.txt".

-xml file //Sample is given as "sample-input.xml". For any change in xml format, you need to change "changeAtt.py" and "getAttValue.py" accordingly

System Output

-For text input, final reordered output will be stored in "filename.reordered.txt. For Example if input file is "/some/path/input.txt", then output will be stored in "/some/path/input.reordered.txt".

-For xml input, there should be empty "Reordered" sub-tag within "Input" tag. Reordered output will be stored within "Reordered" tag.

CONTRIBUTORS

Ananthakrishnan Ramanathan
Ritesh Shah
Jayprasad Hegde
Prakash B. Pimpale
Rohit Gupta
Raj Nath Patel

OTHER SOFTWARES (included with this distribution)

-The system uses Stanford Parser [http://nlp.stanford.edu/software/lex-parser.shtml] which is licensed under GNU GPL V2 and later (http://www.gnu.org/licenses/gpl-2.0.html) -The system also makes use of Parse-RecDescent: Perl module for generating recursive-descent parsers [https://github.com/jtbraun/Parse-RecDescent] which is licensed under PERLARTISTIC - the Perl Artistic License(http://search.cpan.org/~rjbs/perl-5.18.4/pod/perlartistic.pod)

ACKNOWLEDGEMENT

This system is developed as part of English to Indian Language Machine Translation (EILMT) project. We would like to thank the Technology Development for Indian Languages (TDIL) program and the Department of Electronics & Information Technology(DEITY), Govt. of India for funding EILMT project.

CHANGES

2014-11-26 1.0 Initial release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDACM-Reordering-System (ROSY) - v1.0 - 2014-11-26

LICENSE

CONTACT

TUTORIAL/QUICKSTART INSTRUCTIONS

DEPENDENCIES

INSTALLING

USE

SAMPLE XML INPUT:

PROGRAMMATIC USE

DETAILED

INPUT-OUTPUT SPECIFICATION

System Output

CONTRIBUTORS

OTHER SOFTWARES (included with this distribution)

ACKNOWLEDGEMENT

CHANGES

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
research		research
tools		tools
LICENSE.txt		LICENSE.txt
README.md		README.md
cdacm.ROSY-T.sh		cdacm.ROSY-T.sh
cdacm.ROSY-X.sh		cdacm.ROSY-X.sh
sample-input.txt		sample-input.txt
sample-input.xml		sample-input.xml

License

patelrajnath/reordering4smt

Folders and files

Latest commit

History

Repository files navigation

CDACM-Reordering-System (ROSY) - v1.0 - 2014-11-26

LICENSE

CONTACT

TUTORIAL/QUICKSTART INSTRUCTIONS

DEPENDENCIES

INSTALLING

USE

SAMPLE XML INPUT:

PROGRAMMATIC USE

DETAILED

INPUT-OUTPUT SPECIFICATION

System Output

CONTRIBUTORS

OTHER SOFTWARES (included with this distribution)

ACKNOWLEDGEMENT

CHANGES

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages