# "Data Wrangling - Aggregation on Multiple Columns"
> "Create new columns by aggregation"

- toc: true
- branch: master
- badges: true
- hide_binder_badge: True
- hide_colab_badge: True
- comments: true
- author: Samuel Oranyeli
- categories: [python, pydatatable]
- image: images/some_folder/your_image.png
- hide: false
- search_exclude: true
- metadata_key1: "python datatable"
- metadata_key2: "python"

#### [Link to Source data](https://stackoverflow.com/questions/64903216/append-new-column-to-df-after-sum)

Task: Sum the columns based on the prefix of the individual columns

In [1]:
from collections import defaultdict
from datatable import dt, f

df = dt.Frame({'sn': [1, 2, 3],
               'C1-1': [4, 2, 1],
               'C1-2': [3, 2, 2],
               'C1-3': [5, 0, 0],
               'H2-1': [4, 2, 0],
               'H2-2': [1, 0, 2],
               'K3-1': [4, 1, 1],
               'K3-2': [2, 2, 2]})

df

Unnamed: 0_level_0,sn,C1-1,C1-2,C1-3,H2-1,H2-2,K3-1,K3-2
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪,▪▪▪▪,▪▪▪▪,▪▪▪▪,▪▪▪▪,▪▪▪▪,▪▪▪▪
0,1,4,3,5,4,1,4,2
1,2,2,2,0,2,0,1,2
2,3,1,2,0,0,2,1,2


## **Complete Solution**

In [2]:
# import libraries
from collections import defaultdict
from datatable import dt, f

# iterate to pair prefix with relevant columns
mapping = defaultdict(list)
for entry in df.names[1:]:
    key = entry.split("-")[0]
    key = f"total_{key}" # f-strings
    mapping[key].append(f[entry]) # f-expressions

mapping = {key: dt.rowsum(value) 
           for key, value in mapping.items()}

# actual computation occurs here
df[:, f.sn.extend(mapping)]

Unnamed: 0_level_0,sn,total_C1,total_H2,total_K3
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪,▪▪▪▪,▪▪▪▪
0,1,12,5,6
1,2,4,2,3
2,3,3,2,3


## **Breakdown of Solution**

Step 1: Create a dictionary where the key is the prefix, and the values are the columns that start with the prefix.

In [3]:
mapping = defaultdict(list)
for entry in df.names[1:]:
    key = entry.split("-")[0]
    key = f"total_{key}" # f-strings
    mapping[key].append(f[entry])
    
mapping

defaultdict(list,
            {'total_C1': [FExpr<f['C1-1']>,
              FExpr<f['C1-2']>,
              FExpr<f['C1-3']>],
             'total_H2': [FExpr<f['H2-1']>, FExpr<f['H2-2']>],
             'total_K3': [FExpr<f['K3-1']>, FExpr<f['K3-2']>]})

Step 2: Create a dictionary containing f-expressions, that are essentially the rowsum of the values in *mapping*:

In [4]:
mapping = {key: dt.rowsum(value) 
           for key, value in mapping.items()}

mapping

{'total_C1': Expr:rowsum([FExpr<f['C1-1']>, FExpr<f['C1-2']>, FExpr<f['C1-3']>]; ),
 'total_H2': Expr:rowsum([FExpr<f['H2-1']>, FExpr<f['H2-2']>]; ),
 'total_K3': Expr:rowsum([FExpr<f['K3-1']>, FExpr<f['K3-2']>]; )}

Step 3: Run the actual computation:

In [5]:
df[:, f.sn.extend(mapping)]

Unnamed: 0_level_0,sn,total_C1,total_H2,total_K3
Unnamed: 0_level_1,▪▪▪▪,▪▪▪▪,▪▪▪▪,▪▪▪▪
0,1,12,5,6
1,2,4,2,3
2,3,3,2,3
