New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantile and StreamingQuantile don't work - 'can't work with argument null' #27
Comments
What do you expect to happen? Shouldn't you define Quantile first with '0.10', '0.90'? |
That code was junk. Now I'm trying this, but I can't seem to use the emails = load '/me/Data/test_mbox' using AvroStorage(); token_records = foreach just_id_body generate message_id, I get this: <line 11, column 46> Invalid field projection. Projected field or <file ./topics.pig, line 6, column 15> Invalid scalar projection: On Mon, Jan 28, 2013 at 11:00 PM, Matt Hayes notifications@github.comwrote:
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com |
/* Data Fu / DEFINE Quantile datafu.pig.stats.Quantile('0.11','0.89'); set default_parallel 5 rmf /tmp/tf_idf_scores.txt import 'tfidf.macro'; emails = load '/me/Data/test_mbox' using AvroStorage(); token_records_a = foreach just_id_body generate message_id, FLATTEN(TokenizeText(body)) as token; This returns 1.0,1.0... it is confusing. |
Hmm seems like you are trying to get the distribution of token counts, right? Shouldn't you do a GROUP ALL and then pass in the total as a bag to Quantile? Also make sure you sort the totals before passing into Quantile. |
thanks, I did that and it works! On Tue, Jan 29, 2013 at 10:00 AM, Matt Hayes notifications@github.comwrote:
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com |
Great! |
file ./topics.pig, line 31, column 62> Failed to generate logical plan. Nested exception: java.lang.RuntimeException: could not instantiate 'datafu.pig.stats.Quantile' with arguments 'null'
When:
quantiles = foreach (group token_counts all) generate FLATTEN(datafu.pig.stats.Quantile('0.10', '0.90')) as (low_ten, high_ten);
The text was updated successfully, but these errors were encountered: