## Create a New DataFrame Using Existing DataFrame

This section covers some pandas methods to use an existing DataFrame to create a new DataFrame with different functionalities. 

### pandas.DataFrame.agg: Aggregate over Columns or Rows Using Multiple Operations

If you want to aggregate over columns or rows using one or more operations, try `pd.DataFrame.agg`.

In [5]:
from collections import Counter
import pandas as pd


def count_two(nums: list):
    return Counter(nums)[2]


df = pd.DataFrame({"coll": [1, 3, 5], "col2": [2, 4, 6]})
df.agg(["sum", count_two])

Unnamed: 0,coll,col2
sum,9,12
count_two,0,1


### pandas.DataFrame.agg: Apply Different Aggregations to Different Columns

If you want to apply different aggregations to different columns, insert a dictionary of column and aggregation methods to the `pd.DataFrame.agg` method.

In [3]:
import pandas as pd 

df = pd.DataFrame({"a": [1, 2, 3, 4], "b": [2, 3, 4, 5]})

df.agg({"a": ["sum", "mean"], "b": ["min", "max"]})

Unnamed: 0,a,b
sum,10.0,
mean,2.5,
min,,2.0
max,,5.0


### Assign Name to Pandas Aggregation

By default, aggregating a column returns the name of that column.

In [14]:
import pandas as pd 

df = pd.DataFrame({"size": ["S", "S", "M", "L"], "price": [2, 3, 4, 5]})

print(df.groupby('size').agg({'price': 'mean'}))

      price
size       
L       5.0
M       4.0
S       2.5


If you want to assign a new name to an aggregation, add `name = (column, agg_method)` to `agg`.

In [13]:
df.groupby('size').agg(mean_price=('price', 'mean'))

Unnamed: 0_level_0,mean_price
size,Unnamed: 1_level_1
L,5.0
M,4.0
S,2.5


### pandas.pivot_table: Turn Your DataFrame Into a Pivot Table

A pivot table is useful to summarize and analyze the patterns in your data. If you want to turn your DataFrame into a pivot table, use `pandas.pivot_table`.

In [2]:
import pandas as pd 

df = pd.DataFrame(
    {
        "item": ["apple", "apple", "apple", "apple", "apple"],
        "size": ["small", "small", "large", "large", "large"],
        "location": ["Walmart", "Aldi", "Walmart", "Aldi", "Aldi"],
        "price": [3, 2, 4, 3, 2.5],
    }
)

df

Unnamed: 0,item,size,location,price
0,apple,small,Walmart,3.0
1,apple,small,Aldi,2.0
2,apple,large,Walmart,4.0
3,apple,large,Aldi,3.0
4,apple,large,Aldi,2.5


In [51]:
pivot = pd.pivot_table(
    df, values="price", index=["item", "size"], columns=["location"], aggfunc="mean"
)
pivot

Unnamed: 0_level_0,location,Aldi,Walmart
item,size,Unnamed: 2_level_1,Unnamed: 3_level_1
apple,large,2.75,4.0
apple,small,2.0,3.0


<IPython.core.display.Javascript object>

### DataFrame.groupby.sample: Get a Random Sample of Items from Each Category in a Column	

If you want to get a random sample of items from each category in a column, use `pandas.DataFrame.groupby.sample`.This method is useful when you want to get a subset of a DataFrame while keeping all categories in a column.

In [1]:
import pandas as pd 

df = pd.DataFrame({"col1": ["a", "a", "b", "c", "c", "d"], "col2": [4, 5, 6, 7, 8, 9]})
df.groupby("col1").sample(n=1)

Unnamed: 0,col1,col2
0,a,4
2,b,6
4,c,8
5,d,9


To get 2 items from each category, use `n=2`.

In [37]:
df = pd.DataFrame(
    {
        "col1": ["a", "a", "b", "b", "b", "c", "c", "d", "d"],
        "col2": [4, 5, 6, 7, 8, 9, 10, 11, 12],
    }
)
df.groupby("col1").sample(n=2)

Unnamed: 0,col1,col2
0,a,4
1,a,5
4,b,8
2,b,6
5,c,9
6,c,10
8,d,12
7,d,11


<IPython.core.display.Javascript object>