Options duplicated when given to Hadoop #50

Closed
andrix opened this Issue Jan 23, 2012 · 15 comments

Projects

None yet

3 participants

@andrix
Contributor
andrix commented Jan 23, 2012

Hi Klbostee,

I got bitten by this problem when running a script. I'm using the last dumbo version, but I see the same problem in 0.21.29. I could find using PDB that the duplicates appear here : https://github.com/klbostee/dumbo/blob/master/dumbo/core.py#L384.

I workaround it with this : http://paste.pocoo.org/show/539376/. I think the best is that the options be a set of (key, value) instead of a list, but I don't know if the options can hold some non hashable value on it.

Well, what do you think? I think there is some room to improve options handling, but better to know well from you a bit more about its values.

Btw, I'm running this line (I have to change the names, sorry) on the cluster:

python2.5 -m dumbo.cmd start mylib.mymodule.script -cachefile hdfspath/to/myfile.db#myfile.db -param locale=en-gb -param workdir=/some/other/hdfspath -param dataversion=2012-01-23T01-59-59.797550 -input /data/input/2012/01/20 -output /data/output/2012/01/20/transformed -hadoop system

Well, thanks,
Andrés

@klbostee
Owner

Hey Andres,

I was already aware of this issue yeah, but unfortunately I've never managed to find some time to fix it so far. Any help in tracking down the exact cause and/or implementing a clean solution/fix would be highly appreciated! I also agree that the whole options system could use a rework really, so I'd be more than happy te review a refactoring of it...

Thanks,
-Klaas

@andrix
Contributor
andrix commented Jan 23, 2012

Ok, let me do some rework on options. Question: Can the opts hold a value that is not hasble , because this will avoid to use a set, so maybe we should use some custom dict or new class?

@klbostee
Owner

It's all strings, so not being hashable shouldn't be a problem I think.

-Klaas

On 23 Jan 2012, at 20:17, "Andrés Moreira"
reply@reply.github.com
wrote:

Ok, let me do some rework on options. Question: Can the opts hold a value that is not hasble , because this will avoid to use a set, so maybe we should use some custom dict or new class?


Reply to this email directly or view it on GitHub:
#50 (comment)

@andrix
Contributor
andrix commented Jan 25, 2012

@klbostee Well, after a lot of hours of refactoring I've the change done : andrix@3eb93ad. I've added a test on tests/testutil.py for the old code and the new one. I couldn't test it yet on the cluster (I'm at home), but I belive will work. Anyway, can you take a look at the changes and review it? Also, if you have the time, test it across your dumbo projects :).

Andres

@klbostee
Owner

Quickly glanced through the code and it sure looks very promising! Will try to find some time to review it more closely ASAP...

@andrix
Contributor
andrix commented Jan 25, 2012

@klbostee I've found a set of bugs when testing on the cluster, I'm working on fix them. Let you know later when I have them solved.

@andrix
Contributor
andrix commented Jan 25, 2012

@klbostee Ok. I've tested across some jobs and after a few fixes I made in my branch worked everything. Now if you have the time, can you review the set of commits : https://github.com/andrix/dumbo/commits/master ?

@klbostee
Owner
klbostee commented Feb 5, 2012

I finally found some time to have a closer look, sorry for the delay.

Overall the refactor looks pretty good, but it's currently not as backwards compatible as I would like it to be. For instance, the following code from dumbo/cmd.py won't work anymore when the start function is called with a list of tuples for the opts param as far as I can tell:

def start(prog,
          opts,
          stdout=sys.stdout,
          stderr=sys.stderr):
    opts += Options(configopts('common'))
    opts += Options(configopts('start'))  

    pyenv = envdef('PYTHONPATH', opts['libegg'],
                   shortcuts=dict(configopts('eggs', prog)),
                   extrapaths=sys.path) 

The opts variable will then still be a list in the envdef call and thus opts['libegg'] will generate a TypeError because list indices must be integers, not strings.

Think you could easily solve this issue in a general way? It would also be good if we could test the backwards compatibility more thoroughly via additional unit tests or so, 'cause non-compatible changes to the options system are likely to break many of the dumbo programs out there...

@andrix
Contributor
andrix commented Feb 5, 2012

Hi Klas, thanks for the time to review it.

I didn't test the case you spot there, I've just tested the dumbo suite test and some of the jobs we use at mydeco, all of them working OK. Indeed I think it need some more tests, but as you, I didn't have the time to add more unit testing, including, continue improving the patch.
I hope to have some time in the next weeks to add this, but feel free to pull from my repo and if you have the time, improve it, and fix the problems you found.

Andres

@jejansse
Collaborator

I checked the commit and made some changes to fix the issues Klaas reported. The problem was that some code passes a list object as the opts parameter. If you then do opts += Options(configopts('common')) it will fail as you're using the __iadd__ of list, not the __iadd__ of Options. I fixed it by calling Options on opts before the __iadd__ happens and by changing the constructor of Options such that it works for both a list and an Options parameter.

@jejansse jejansse closed this Feb 17, 2012
@andrix
Contributor
andrix commented Feb 27, 2012

@jejansse: cool, so now we can bother Klas to re-check and if evertything ok, push to trunk ?

@klbostee
Owner

Jeroen is a committer, so all of this already in trunk (i.e. my master branch).

@andrix
Contributor
andrix commented Feb 27, 2012

@Klas: That is amazing! Thanks for your work on review the patch, and I'm very glad that it's on master branch :)

@klbostee
Owner
klbostee commented Mar 1, 2012

This is part of release-0.21.32 actually!

@andrix
Contributor
andrix commented Mar 1, 2012

@klbostee Yes sorry for the noise, when I reliased I deleted the comment! Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment