Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you provide LDC2002E18, LDC2003E07, LDC2003E14, LDC2004T07, LDC2004T08 and LDC2005T06 corpora? #2

Open
aitaita-research opened this issue Oct 11, 2019 · 0 comments

Comments

@aitaita-research
Copy link

Thank you for providing such a good corpus resource.
I have the following two questions:

  • First, is the corpora that you provided are the LDC original format?
  • Second, could you please provide me a copy containing the LDC2002E18, LDC2003E07, LDC2003E14, LDC2004T07, LDC2004T08 and LDC2005T06 portion in the original format?

I have access to the preprocessed data of these corpora, but I have never seen the original files. I am curious about it, and I wonder how to preprocess with the original file to my preprocessed data, as most of Machine Translation Paper does not state this in detail.

And I use this only for research purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant