Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be possible to release code in Python 3.4+ for any future repos? #50

Closed
jli05 opened this issue Apr 29, 2016 · 7 comments · Fixed by #53
Closed

Would it be possible to release code in Python 3.4+ for any future repos? #50

jli05 opened this issue Apr 29, 2016 · 7 comments · Fixed by #53

Comments

@jli05
Copy link
Contributor

jli05 commented Apr 29, 2016

I find this repo is a reference in this domain and highly important. I just hope we could migrate to Python 3.4+ for any future repos, so that we're future-oriented and moving forward.

@orhanf
Copy link
Collaborator

orhanf commented Apr 29, 2016

@jli05 would you mind stating the pros and cons of this in the context of nmt and dl4mt repo here, thanks a lot.

@jli05
Copy link
Contributor Author

jli05 commented May 1, 2016

Basically I hope this and future repos achieve more than merely as a tool for academic research.

Cons:

  1. Time on refactoring the current software/data systems to work with Python 3.4+. The amount of effort is proportional to the size of current systems.
  2. A little learning curve, max half a day.

Pros:

  1. Evolving with time. Python 2.7 is entering the End Of Life cycle till 2020 (PEP 373); Ubuntu is phasing out install of Python 2.7 by default. It's not hard to imagine newer releases of RHEL will adopt similar policies.
  2. Allows other parties to contribute who're developing with modern choice of languages.
  3. Allows constant maintenance of the repos so that it's a thriving experience for whoever maintain them.

Theano is one of the rare academic repos that's constantly maintained and streamlined, which can rival corporate offerings in a certain aspect. If we google for BlinkDB, the last commit was two years ago -- it remained a good concept that earned the author a paper. I believe software has wider and deeper impacts than publications today. I know this repo was initially put together as a tutorial; however I'd wish more for it and all future sequels: to be a point of reference for NMT, which showcases excellence in algorithm and brick-and-mortar code.

@kyunghyuncho
Copy link
Collaborator

@jli05 Thans for sharing your view! I think this definitely makes sense.

@orhanf How about we don't fully validate the changes (as in training a full model and verifying that the same BLEU scores for all the models and language pairs we've tried), but simply make it runnable with Python3?

@jli05 Would you kindly help us move toward Python3 by making a PR for, say, one of the sessions?

@jli05
Copy link
Contributor Author

jli05 commented May 1, 2016

Sure I'll do it in the coming 1-2 weeks. Thanks @kyunghyuncho @orhanf !

@jli05
Copy link
Contributor Author

jli05 commented May 7, 2016

Could we confirm what's the exact workflow in data/?

My understanding is that for a simplest setup, we could run setup_local_env.sh. Is that sufficient? We made no call to preprocess.sh therein?

All the sessions take in the tokenised wiki dump and its associated dictionary from wiki.tok.txt.gz and wiki.tok.txt.gz.pkl. Currently we don't have scripts for generating them?

@kyunghyuncho
Copy link
Collaborator

@orhanf can you answer this?

@orhanf orhanf mentioned this issue May 15, 2016
4 tasks
@orhanf
Copy link
Collaborator

orhanf commented May 15, 2016

@jli05 , a detailed description is provided with #53 but let me clarify this further here.

setup_local_env.sh was the initial script provided in the repo, intended to download an example data and preprocess it (only tokenization). Later on, in order to use subword-units (bpe), another script was added (preprocess.sh), which pre-processes existing data (tokenize, learn bpe, apply bpe and shuffle)

We now merged both functionalities into setup_local_env.sh and made it to call preprocess.sh optionally (with -b flag) when you want to use bpe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants