Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregating a numeric array #3572

Open
iris-garden opened this issue May 6, 2024 · 0 comments
Open

Aggregating a numeric array #3572

iris-garden opened this issue May 6, 2024 · 0 comments
Labels
discourse migrated from discuss.hail.is

Comments

@iris-garden
Copy link
Owner

Note

The following post was exported from discuss.hail.is, a forum for asking questions about Hail which has since been deprecated.

(Dec 14, 2023 at 15:07) MsUTR said:

Hello all. I am trying to process the Hail table from gnomAD v4. In the hail table, there is a row field called freq with this schema:

Row fields:
    'locus': locus<GRCh38> 
    'alleles': array<str> 
    'freq': array<struct {
        AC: int32, 
        AF: float64, 
        AN: int32, 
        homozygote_count: int64
    }> 

I want to annotate a new row that aggregates the entire array of ht.freq.AC per row, I tried it with this method but it didn’t work:

ht = ht.annotate(freq_AC_sum=hl.agg.array_sum(ht.freq.AC))
ExpressionException: 'Table.annotate: field 'freq_AC_sum'' does not support aggregation

May I please get some advice on how I can do that? Thank you very much.

(Apr 05, 2024 at 13:42) Wen_He said:

Hi Do you find any solution? thanks

(Apr 11, 2024 at 12:38) patrick-schultz said:

Apologies for the delayed response to both of you!

You can aggregate over an array expression a using a.aggregate. So the above example would be

ht = ht.annotate(freq_AC_sum=ht.freq.aggregate(lambda freq: hl.agg.sum(freq.AC))

In the special case of computing the sum of an array, you can also use hl.sum. So if ht.AC were a top-level array field, you could do

ht = ht.annotate(freq_AC_sum=hl.sum(ht.AC))

I know the naming isn’t super clear, but the hl.agg.array_sum aggregator is for aggregating many array fields of the same length, producing an array of the elementwise aggregates.

@iris-garden iris-garden added the discourse migrated from discuss.hail.is label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discourse migrated from discuss.hail.is
Projects
None yet
Development

No branches or pull requests

1 participant