-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem with load-tsv function #57
Comments
You need to change the binding of how many results you are returning. By (def ^:dynamic max-load-records 1000). You can wrap your function with a max-load-records binding as follows: (defn- hashed-data [file-name](binding [pigpen.local/max-load-records 100000 On Sunday, September 7, 2014, Michael Rubanov notifications@github.com
|
Yeah, I put that cap in there because the version of rx I'm using doesn't unsubscribe properly from the observable. It's kind of a hacky fix, but this prevents you from processing potentially large files just to throw the result away. In general, the REPL should only be used for vetting your code & then you'd run at scale on the cluster, but 100k should be well within the limits of what it can handle locally. At Netflix, we sample large GB files over the network directly into pigpen - without this limit it was just continuing to download the file on a background thread & slowing down the REPL. This was painful when I just wanted the first 10 records. The longer term fix is to upgrade the version of rx that I'm using, but they tend to break their API frequently so I've been waiting for v1.0 to be released. -Matt On Sunday, September 7, 2014 at 11:17 AM, mapstrchakra wrote:
|
Thanks for clarification , though following wrapping didn't solved this issue in repl , still getting 1000 items back:
|
Your example shows the trailing paren after the binding expression… To use binding, you need to enclose the code that requires the rebinding. (binding [pigpen.local/max-load-records 100000](->> %28pig/load-tsv) In this case, you'd want to make sure that the code calling pig/dump is what gets wrapped, not the load command. The load command just builds an expression tree. (def x (pig/load …)) (binding [pigpen.local/max-load-records 100000](pig/dump x)) Let me know if that works for you. -Matt On Sunday, September 7, 2014 at 2:35 PM, Michael Rubanov wrote:
|
After thinking about this some more, I'm going to change the default to be unlimited and add this as an option to limit it only if you need it. |
Fixed entirely in #61 This binding is no longer necessary. |
Hi ,
I am trying to run following function on tsv file with more than 100k line, while running it on my laptop.
The function looks like this.
Instead of getting same amount of lines like in input file I am always getting only 1000 items.
do I miss something obvious ?
The text was updated successfully, but these errors were encountered: