Fix distributing typedbytes module when it's not installed as an egg #51

Closed
wants to merge 2 commits into
from

Projects

None yet

3 participants

@aripollak

The existing condition didn't make sense if typedbytes was not installed
as an egg, since it would make Hadoop think the typedbytes module was on
HDFS. The new method is the same as what's in backends/common.

aripollak added some commits Feb 15, 2012
@aripollak aripollak Fix distributing typedbytes module when it's not installed as an egg
The existing condition didn't make sense if typedbytes was not installed
as an egg, since it would make Hadoop think the typedbytes module was on
HDFS. The new method is the same as what's in backends/common.
c3e0f0a
@aripollak aripollak Add a docstring about envdef() d000583
@klbostee
Owner

I'm afraid I'm not following completely here. When typedbytes is not installed as an egg, then the old code will revert to opts.add('file', modpath) which will make sure the .py file is send along and thus available on HDFS, right? Not sure what's left to fix then...

@aripollak

Unfortunately I forgot exactly what was happening, but I definitely tested dumbo after installing typedbytes through pip, and it didn't work with the original code but it worked with this change.
I think the problem might have been that opts['file'] would be re-interpreted by the code starting at line 176. The module path didn't start with file://, so it wasn't actually getting passed to streaming as a -file. But if you add it as a libegg, it does get sent along with the job.

@optimuspaul

typedbytes is installed on every node in my cluster, I don't need or want it distributed to HDFS. Does this patch remove that "feature"?

@klbostee
Owner
klbostee commented Sep 5, 2013

Think this is a better solution:

b67a7b1

Thanks!

@klbostee klbostee closed this Sep 5, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment