# Iman Noor
# ByteWise Fellow

# **Data Wrangling: Join, Combine, and Reshape.**

- Data Wrangling is the process of gathering, collecting, and transforming Raw data into another format for better understanding, decision-making, accessing, and analysis in less time. 
- Data Wrangling is also known as `Data Munging`.

# Data Wrangling in Python

## Data Exploration

In this process, the data is studied, analyzed, and understood by visualizing representations of data.

## Dealing with Missing Values

Most of the datasets having a vast amount of data contain missing values of NaN. They need to be taken care of by replacing them with mean, mode, the most frequent value of the column, or simply by dropping the row having a NaN value.

## Reshaping Data

In this process, data is manipulated according to the requirements, where new data can be added or pre-existing data can be modified.

## Filtering Data

Sometimes datasets are comprised of unwanted rows or columns which are required to be removed or filtered.

## Other

After dealing with the raw dataset with the above functionalities, we get an efficient dataset as per our requirements. Then, it can be used for a required purpose like data analyzing, machine learning, data visualization, model training, etc.

## Data Exploration

In [1]:
# Importing libraries
import pandas as pd
import numpy as np

In [2]:
data = {'Name': ['Jai', 'Princi', 'Gaurav', 
                 'Anuj', 'Ravi', 'Natasha', 'Riya'],
        'Age': [17, 17, 18, 17, 18, 17, 17],
        'Gender': ['M', 'F', 'M', 'M', 'M', 'F', 'F'],
        'Marks': [90, 76, 'NaN', 74, 65, 'NaN', 71]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Marks
0,Jai,17,M,90.0
1,Princi,17,F,76.0
2,Gaurav,18,M,
3,Anuj,17,M,74.0
4,Ravi,18,M,65.0
5,Natasha,17,F,
6,Riya,17,F,71.0


## Dealing with missing values

In [3]:
# Computing avg
c = avg = 0
for ele in df['Marks']:
    if str(ele).isnumeric():
        c += 1
        avg += ele
avg /=c
# Replacing missing values
df = df.replace(to_replace="NaN", value=avg)
df

Unnamed: 0,Name,Age,Gender,Marks
0,Jai,17,M,90.0
1,Princi,17,F,76.0
2,Gaurav,18,M,75.2
3,Anuj,17,M,74.0
4,Ravi,18,M,65.0
5,Natasha,17,F,75.2
6,Riya,17,F,71.0


## Data Replacing in Data Wrangling

In [4]:
df['Gender'] = df['Gender'].map({'M':0, 'F':1}).astype(float)
df

Unnamed: 0,Name,Age,Gender,Marks
0,Jai,17,0.0,90.0
1,Princi,17,1.0,76.0
2,Gaurav,18,0.0,75.2
3,Anuj,17,0.0,74.0
4,Ravi,18,0.0,65.0
5,Natasha,17,1.0,75.2
6,Riya,17,1.0,71.0


## Filtering data in Data Wrangling

In [5]:
df = df[df['Marks']>=75].copy()
df.drop('Age', axis=1, inplace=True)
df

Unnamed: 0,Name,Gender,Marks
0,Jai,0.0,90.0
1,Princi,1.0,76.0
2,Gaurav,0.0,75.2
5,Natasha,1.0,75.2


## Data Wrangling  Using Merge Operation
> **Syntax:** pd.merge( data_frame1,data_frame2, on=”field “) 

In [6]:
info = pd.DataFrame({
    'ID': [101, 102, 103, 104, 105, 106, 
           107, 108, 109, 110],
    'NAME': ['Jagroop', 'Praveen', 'Harjot', 
             'Pooja', 'Rahul', 'Nikita',
             'Saurabh', 'Ayush', 'Dolly', "Mohit"],
    'BRANCH': ['CSE', 'CSE', 'CSE', 'CSE', 'CSE', 
               'CSE', 'CSE', 'CSE', 'CSE', 'CSE']})
info

Unnamed: 0,ID,NAME,BRANCH
0,101,Jagroop,CSE
1,102,Praveen,CSE
2,103,Harjot,CSE
3,104,Pooja,CSE
4,105,Rahul,CSE
5,106,Nikita,CSE
6,107,Saurabh,CSE
7,108,Ayush,CSE
8,109,Dolly,CSE
9,110,Mohit,CSE


In [7]:
fee_status = pd.DataFrame(
    {'ID': [101, 102, 103, 104, 105, 
            106, 107, 108, 109, 110],
     'PENDING': ['5000', '250', 'NIL', 
                 '9000', '15000', 'NIL',
                 '4500', '1800', '250', 'NIL']})
fee_status

Unnamed: 0,ID,PENDING
0,101,5000
1,102,250
2,103,NIL
3,104,9000
4,105,15000
5,106,NIL
6,107,4500
7,108,1800
8,109,250
9,110,NIL


In [8]:
pd.merge(info, fee_status, on='ID')

Unnamed: 0,ID,NAME,BRANCH,PENDING
0,101,Jagroop,CSE,5000
1,102,Praveen,CSE,250
2,103,Harjot,CSE,NIL
3,104,Pooja,CSE,9000
4,105,Rahul,CSE,15000
5,106,Nikita,CSE,NIL
6,107,Saurabh,CSE,4500
7,108,Ayush,CSE,1800
8,109,Dolly,CSE,250
9,110,Mohit,CSE,NIL


## Data Wrangling Using Grouping Method

In [9]:
car_selling_data = {'Brand': ['Maruti', 'Maruti', 'Maruti', 
                              'Maruti', 'Hyundai', 'Hyundai', 
                              'Toyota', 'Mahindra', 'Mahindra', 
                              'Ford', 'Toyota', 'Ford'],
                    'Year':  [2010, 2011, 2009, 2013, 
                              2010, 2011, 2011, 2010,
                              2013, 2010, 2010, 2011],
                    'Sold': [6, 7, 9, 8, 3, 5, 
                             2, 8, 7, 2, 4, 2]}
df_c = pd.DataFrame(car_selling_data)
df_c

Unnamed: 0,Brand,Year,Sold
0,Maruti,2010,6
1,Maruti,2011,7
2,Maruti,2009,9
3,Maruti,2013,8
4,Hyundai,2010,3
5,Hyundai,2011,5
6,Toyota,2011,2
7,Mahindra,2010,8
8,Mahindra,2013,7
9,Ford,2010,2


## Creating Dataframe to use Grouping methods[DATA OF THE YEAR 2010]:

In [10]:
grouped = df_c.groupby('Year')
grouped.get_group(2010)

Unnamed: 0,Brand,Year,Sold
0,Maruti,2010,6
4,Hyundai,2010,3
7,Mahindra,2010,8
9,Ford,2010,2
10,Toyota,2010,4


## Creating Two Dataframe For Concatenation

In [11]:
data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 
        'Age':[27, 24, 22, 32], 
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd'],
        'Mobile No': [97, 91, 58, 76]} 
     
data2 = {'Name':['Gaurav', 'Anuj', 'Dhiraj', 'Hitesh'], 
        'Age':[22, 32, 12, 52], 
        'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 
        'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons'],
        'Salary':[1000, 2000, 3000, 4000]} 
df_1 = pd.DataFrame(data1, index=[0,1,2,3])
df_2 = pd.DataFrame(data2, index=[2,3,6,7])
df_1

Unnamed: 0,Name,Age,Address,Qualification,Mobile No
0,Jai,27,Nagpur,Msc,97
1,Princi,24,Kanpur,MA,91
2,Gaurav,22,Allahabad,MCA,58
3,Anuj,32,Kannuaj,Phd,76


In [12]:
df_2

Unnamed: 0,Name,Age,Address,Qualification,Salary
2,Gaurav,22,Allahabad,MCA,1000
3,Anuj,32,Kannuaj,Phd,2000
6,Dhiraj,12,Allahabad,Bcom,3000
7,Hitesh,52,Kannuaj,B.hons,4000


In [13]:
res = pd.concat([df_1, df_2]) # join these dataframes along axis 0
res

Unnamed: 0,Name,Age,Address,Qualification,Mobile No,Salary
0,Jai,27,Nagpur,Msc,97.0,
1,Princi,24,Kanpur,MA,91.0,
2,Gaurav,22,Allahabad,MCA,58.0,
3,Anuj,32,Kannuaj,Phd,76.0,
2,Gaurav,22,Allahabad,MCA,,1000.0
3,Anuj,32,Kannuaj,Phd,,2000.0
6,Dhiraj,12,Allahabad,Bcom,,3000.0
7,Hitesh,52,Kannuaj,B.hons,,4000.0


In [14]:
d1 = {
    'Region': ['North', 'North', 'South', 'South', 'East'],
    'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Electronics'],
    'Sales': [5000, 3000, 4000, 3500, 6000]
}

d2 = {
    'Region': ['North', 'South', 'East', 'West'],
    'Category': ['Electronics', 'Clothing', 'Furniture', 'Electronics'],
    'Sales': [4500, 3200, 2800, 5200]
}
region_df1 = pd.DataFrame(d1)
region_df2 = pd.DataFrame(d2)
region_df1

Unnamed: 0,Region,Category,Sales
0,North,Electronics,5000
1,North,Clothing,3000
2,South,Electronics,4000
3,South,Clothing,3500
4,East,Electronics,6000


In [15]:
region_df2

Unnamed: 0,Region,Category,Sales
0,North,Electronics,4500
1,South,Clothing,3200
2,East,Furniture,2800
3,West,Electronics,5200


In [16]:
merged_df = pd.merge(region_df1, region_df2, on=['Region', 'Category'], how='inner')

# Reset index after merge
multi_index_df = merged_df.set_index(['Region', 'Category'])

# Reset index to make 'Region' and 'Category' columns again
multi_index_df = multi_index_df.reset_index()

agg_df = multi_index_df.groupby('Region').agg({'Sales_x': 'sum'}).join(
    multi_index_df.groupby('Category').agg({'Sales_x': 'mean'}), lsuffix='_Total', rsuffix='_Average'
)
agg_df

Unnamed: 0_level_0,Sales_x_Total,Sales_x_Average
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
North,5000,
South,3500,


# Top 200 Richest Person in the World Dataset 💰🌍💹

In [17]:
rich = pd.read_csv('Top_Rich.csv')
rich.head()

Unnamed: 0,S. No.,Rank,Name,Age,Country,Networth,Industry
0,1,1,Bernard Arnault & family,75,France,$233 B,Fashion & Retail
1,2,2,Elon Musk,52,United States,$195 B,Automotive
2,3,3,Jeff Bezos,60,United States,$194 B,Technology
3,4,4,Mark Zuckerberg,39,United States,$177 B,Technology
4,5,5,Larry Ellison,79,United States,$141 B,Technology


# T20 World Cup 2024: Top Teams & Players💥🏏🌍

In [18]:
runs = pd.read_csv('Most_Runs.csv')
runs.head()

Unnamed: 0,Position,Team,Player,Matches,Innings,Bat Avg,Runs
0,1,AFGHANISTAN,Rahmanullah GURBAZ,8,8,35.12,281
1,2,INDIA,Rohit SHARMA,8,8,36.71,257
2,3,AUSTRALIA,Travis HEAD,7,7,42.5,255
3,4,SOUTH AFRICA,Quinton DE KOCK,9,9,27.0,243
4,5,AFGHANISTAN,Ibrahim ZADRAN,8,8,28.87,231


## **Q1. Merge two DataFrames on a single key.**

In [19]:
emp1 = pd.DataFrame({
    'employee_id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'department': ['HR', 'Marketing', 'Engineering', 'Sales'],
    'salary': [60000, 80000, 70000, 90000]
})
emp2 = pd.DataFrame({
    'employee_id': [2, 3, 5],
    'start_year': [2010, 2015, 2018],
    'experience': ['Senior', 'Mid', 'Junior'],
    'bonus': [5000, 3000, 2000]
})
emp1, emp2

(   employee_id     name   department  salary
 0            1    Alice           HR   60000
 1            2      Bob    Marketing   80000
 2            3  Charlie  Engineering   70000
 3            4    David        Sales   90000,
    employee_id  start_year experience  bonus
 0            2        2010     Senior   5000
 1            3        2015        Mid   3000
 2            5        2018     Junior   2000)

In [20]:
merged_data = pd.merge(emp1, emp2, on=['employee_id'])
merged_data

Unnamed: 0,employee_id,name,department,salary,start_year,experience,bonus
0,2,Bob,Marketing,80000,2010,Senior,5000
1,3,Charlie,Engineering,70000,2015,Mid,3000


## **Q2. Merge two DataFrames on multiple keys.**

In [21]:
emp_m1 = pd.DataFrame({
    'employee_id': [1, 2, 3, 4],
    'project_id': ['P1', 'P2', 'P3', 'P4'],
    'department': ['HR', 'Marketing', 'Engineering', 'Sales'],
    'project_budget': [100000, 150000, 120000, 180000]
})

emp_m2 = pd.DataFrame({
    'employee_id': [2, 3, 5],
    'project_id': ['P2', 'P3', 'P5'],
    'project_status': ['Completed', 'Ongoing', 'Planned'],
    'project_duration': [12, 8, 10]
})
emp_m1, emp_m2

(   employee_id project_id   department  project_budget
 0            1         P1           HR          100000
 1            2         P2    Marketing          150000
 2            3         P3  Engineering          120000
 3            4         P4        Sales          180000,
    employee_id project_id project_status  project_duration
 0            2         P2      Completed                12
 1            3         P3        Ongoing                 8
 2            5         P5        Planned                10)

In [22]:
merged_data2 = pd.merge(emp_m1, emp_m2, on=['employee_id', 'project_id'])
merged_data2

Unnamed: 0,employee_id,project_id,department,project_budget,project_status,project_duration
0,2,P2,Marketing,150000,Completed,12
1,3,P3,Engineering,120000,Ongoing,8


## **Q3. Perform an outer join, inner join, left join, and right join.**

In [23]:
# Outer join
df_o = pd.merge(emp1, emp2, on='employee_id', how='outer')
df_o

Unnamed: 0,employee_id,name,department,salary,start_year,experience,bonus
0,1,Alice,HR,60000.0,,,
1,2,Bob,Marketing,80000.0,2010.0,Senior,5000.0
2,3,Charlie,Engineering,70000.0,2015.0,Mid,3000.0
3,4,David,Sales,90000.0,,,
4,5,,,,2018.0,Junior,2000.0


![image.png](attachment:image.png)

In [24]:
# Inner join
df_i = pd.merge(emp1, emp2, on='employee_id', how='inner')
df_i

Unnamed: 0,employee_id,name,department,salary,start_year,experience,bonus
0,2,Bob,Marketing,80000,2010,Senior,5000
1,3,Charlie,Engineering,70000,2015,Mid,3000


![image.png](attachment:image.png)

In [25]:
# left join
df_l = pd.merge(emp1, emp2, on='employee_id', how='left')
df_l

Unnamed: 0,employee_id,name,department,salary,start_year,experience,bonus
0,1,Alice,HR,60000,,,
1,2,Bob,Marketing,80000,2010.0,Senior,5000.0
2,3,Charlie,Engineering,70000,2015.0,Mid,3000.0
3,4,David,Sales,90000,,,


![image.png](attachment:image.png)

In [26]:
# Right join
df_r = pd.merge(emp1, emp2, on='employee_id', how='right')
df_r

Unnamed: 0,employee_id,name,department,salary,start_year,experience,bonus
0,2,Bob,Marketing,80000.0,2010,Senior,5000
1,3,Charlie,Engineering,70000.0,2015,Mid,3000
2,5,,,,2018,Junior,2000


![image.png](attachment:image.png)

## **Q4. Concatenate two DataFrames along rows.**

In [27]:
concatenated_df = pd.concat([rich, runs], axis=0)
concatenated_df

Unnamed: 0,S. No.,Rank,Name,Age,Country,Networth,Industry,Position,Team,Player,Matches,Innings,Bat Avg,Runs
0,1.0,1.0,Bernard Arnault & family,75.0,France,$233 B,Fashion & Retail,,,,,,,
1,2.0,2.0,Elon Musk,52.0,United States,$195 B,Automotive,,,,,,,
2,3.0,3.0,Jeff Bezos,60.0,United States,$194 B,Technology,,,,,,,
3,4.0,4.0,Mark Zuckerberg,39.0,United States,$177 B,Technology,,,,,,,
4,5.0,5.0,Larry Ellison,79.0,United States,$141 B,Technology,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10,,,,,,,,11.0,ENGLAND,Phil SALT,8.0,7.0,37.60,188.0
11,,,,,,,,12.0,AUSTRALIA,David WARNER,7.0,7.0,29.66,178.0
12,,,,,,,,13.0,INDIA,Rishabh PANT,8.0,8.0,24.42,171.0
13,,,,,,,,14.0,AUSTRALIA,Marcus STOINIS,7.0,5.0,42.25,169.0


## **Q5. Concatenate two DataFrames along columns.**

In [28]:
concatenated_df_1 = pd.concat([rich, runs], axis=1)
concatenated_df_1

Unnamed: 0,S. No.,Rank,Name,Age,Country,Networth,Industry,Position,Team,Player,Matches,Innings,Bat Avg,Runs
0,1,1,Bernard Arnault & family,75,France,$233 B,Fashion & Retail,1.0,AFGHANISTAN,Rahmanullah GURBAZ,8.0,8.0,35.12,281.0
1,2,2,Elon Musk,52,United States,$195 B,Automotive,2.0,INDIA,Rohit SHARMA,8.0,8.0,36.71,257.0
2,3,3,Jeff Bezos,60,United States,$194 B,Technology,3.0,AUSTRALIA,Travis HEAD,7.0,7.0,42.50,255.0
3,4,4,Mark Zuckerberg,39,United States,$177 B,Technology,4.0,SOUTH AFRICA,Quinton DE KOCK,9.0,9.0,27.00,243.0
4,5,5,Larry Ellison,79,United States,$141 B,Technology,5.0,AFGHANISTAN,Ibrahim ZADRAN,8.0,8.0,28.87,231.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,196,195,Lei Jun,54,China,$10.9 B,Technology,,,,,,,
196,197,195,Georg Schaeffler,59,Germany,$10.9 B,Automotive,,,,,,,
197,198,195,Marcel Herrmann Telles & family,74,Brazil,$10.9 B,Food & Beverage,,,,,,,
198,199,199,David Velez & family,42,Colombia,$10.8 B,Finance & Investments,,,,,,,


## **Q6. Concatenate a list of DataFrames.**

In [29]:
df_list = [rich, runs]
concat_list = pd.concat(df_list, ignore_index=True)
concat_list

Unnamed: 0,S. No.,Rank,Name,Age,Country,Networth,Industry,Position,Team,Player,Matches,Innings,Bat Avg,Runs
0,1.0,1.0,Bernard Arnault & family,75.0,France,$233 B,Fashion & Retail,,,,,,,
1,2.0,2.0,Elon Musk,52.0,United States,$195 B,Automotive,,,,,,,
2,3.0,3.0,Jeff Bezos,60.0,United States,$194 B,Technology,,,,,,,
3,4.0,4.0,Mark Zuckerberg,39.0,United States,$177 B,Technology,,,,,,,
4,5.0,5.0,Larry Ellison,79.0,United States,$141 B,Technology,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
210,,,,,,,,11.0,ENGLAND,Phil SALT,8.0,7.0,37.60,188.0
211,,,,,,,,12.0,AUSTRALIA,David WARNER,7.0,7.0,29.66,178.0
212,,,,,,,,13.0,INDIA,Rishabh PANT,8.0,8.0,24.42,171.0
213,,,,,,,,14.0,AUSTRALIA,Marcus STOINIS,7.0,5.0,42.25,169.0


## **Q7. Reshape data using the melt function to go from wide to long format.**

In [30]:
df_long = pd.melt(rich, id_vars=['Rank', 'Name'], var_name='Attribute', value_name='Value')
df_long

Unnamed: 0,Rank,Name,Attribute,Value
0,1,Bernard Arnault & family,S. No.,1
1,2,Elon Musk,S. No.,2
2,3,Jeff Bezos,S. No.,3
3,4,Mark Zuckerberg,S. No.,4
4,5,Larry Ellison,S. No.,5
...,...,...,...,...
995,195,Lei Jun,Industry,Technology
996,195,Georg Schaeffler,Industry,Automotive
997,195,Marcel Herrmann Telles & family,Industry,Food & Beverage
998,199,David Velez & family,Industry,Finance & Investments


## **Q8. Create a pivot table to summarize data.**

In [31]:
rich['Networth'] = rich['Networth'].replace({'\$': '', ' B': ''}, regex=True).astype(float)
pivot_table = pd.pivot_table(rich, values='Networth', index='Industry', aggfunc='mean')
pivot_table

Unnamed: 0_level_0,Networth
Industry,Unnamed: 1_level_1
Automotive,34.86
Construction & Engineering,20.9
Diversified,38.606667
Energy,21.1
Fashion & Retail,38.85
Finance & Investments,26.64375
Food & Beverage,24.413333
Gambling & Casinos,21.6
Healthcare,18.566667
Logistics,30.575


## **Q9. Group data by one or more columns and perform aggregation functions (e.g., sum, mean, count).**

In [32]:
# Summary
runs.describe()

Unnamed: 0,Position,Matches,Innings,Bat Avg,Runs
count,15.0,15.0,15.0,15.0,15.0
mean,8.0,7.8,7.4,34.464667,212.8
std,4.472136,0.861892,0.985611,6.572447,35.877171
min,1.0,6.0,5.0,24.42,169.0
25%,4.5,7.0,7.0,28.645,183.0
50%,8.0,8.0,8.0,35.12,214.0
75%,11.5,8.0,8.0,40.125,237.0
max,15.0,9.0,9.0,43.8,281.0


In [33]:
runs.agg(['sum','mean','count'])

Unnamed: 0,Position,Team,Player,Matches,Innings,Bat Avg,Runs
sum,120.0,AFGHANISTANINDIAAUSTRALIASOUTH AFRICAAFGHANIST...,Rahmanullah GURBAZRohit SHARMATravis HEADQuint...,117.0,111.0,516.97,3192.0
mean,8.0,,,7.8,7.4,34.464667,212.8
count,15.0,15,15,15.0,15.0,15.0,15.0


In [34]:
runs.agg(['max','min','std','var','sem'])

Unnamed: 0,Position,Team,Player,Matches,Innings,Bat Avg,Runs
max,15.0,WEST INDIES,Travis HEAD,9.0,9.0,43.8,281.0
min,1.0,AFGHANISTAN,Andries GOUS,6.0,5.0,24.42,169.0
std,4.472136,,,0.861892,0.985611,6.572447,35.877171
var,20.0,,,0.742857,0.971429,43.197055,1287.171429
sem,1.154701,,,0.222539,0.254484,1.696998,9.263446


## **Q10. Apply multiple aggregation functions to grouped data.**

In [35]:
agg_result = rich.groupby(rich['Name']).Networth.agg(['min','max','sum','mean'])
agg_result

Unnamed: 0_level_0,min,max,sum,mean
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Abigail Johnson,29.0,29.0,29.0,29.0
Alain Wertheimer,36.8,36.8,36.8,36.8
Alexey Mordashov & family,25.5,25.5,25.5,25.5
Alice Walton,72.3,72.3,72.3,72.3
Aliko Dangote,13.4,13.4,13.4,13.4
...,...,...,...,...
Warren Buffett,133.0,133.0,133.0,133.0
Wei Jianjun & family,11.2,11.2,11.2,11.2
William Ding,33.5,33.5,33.5,33.5
Zhang Yiming,43.4,43.4,43.4,43.4


## **Q11. Use the groupby function to group data and apply custom functions.**

In [36]:
def custom_summary(group):
    s = pd.Series({
        'Average Networth': group['Networth'].mean(),
        'Total Networth': group['Networth'].sum(),
        'Number of Individuals': group.shape[0]  # number of rows in each group
    })
    return s

In [37]:
grouped_data = rich.groupby('Industry').apply(custom_summary)
grouped_data

Unnamed: 0_level_0,Average Networth,Total Networth,Number of Individuals
Industry,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Automotive,34.86,348.6,10.0
Construction & Engineering,20.9,20.9,1.0
Diversified,38.606667,579.1,15.0
Energy,21.1,189.9,9.0
Fashion & Retail,38.85,1165.5,30.0
Finance & Investments,26.64375,852.6,32.0
Food & Beverage,24.413333,366.2,15.0
Gambling & Casinos,21.6,43.2,2.0
Healthcare,18.566667,111.4,6.0
Logistics,30.575,122.3,4.0


# **The End :)**