# World's Richest People 2022 as Listed by Forbes.
The objective of this project is to analyze and visualize the data on the world's richest people (in US Dollars) in 2022 as listed by Forbes (The Billionaires List). The Billionaires List is an annual ranking of the world's wealthiest individuals, based on their net worth, as determined by Forbes.
The project will also explore the source of wealth for the world's richest people in 2022, identifying the industries and sectors that are driving wealth creation at the highest levels. This analysis will provide insight into the global economy and the trends that are shaping it.
The dataset for this project was downloaded from [kaggle](https://www.kaggle.com/datasets/prasertk/forbes-worlds-billionaires-list-2022)

In [15]:
# import libraries
import pandas as pd
import plotly.graph_objects as go
import plotly.figure_factory as ff
import plotly.express as px


pd.options.display.max_rows = 50
pd.options.display.max_columns = 22

In [6]:
# Load the the Data
df = pd.read_csv("forbes_2022_billionaires.csv",parse_dates=["birthDate"])
df.head(3)

Unnamed: 0,rank,personName,age,finalWorth,year,month,category,source,country,state,city,countryOfCitizenship,organization,selfMade,gender,birthDate,title,philanthropyScore,residenceMsa,numberOfSiblings,bio,about
0,1,Elon Musk,50.0,219000.0,2022,4,Automotive,"Tesla, SpaceX",United States,Texas,Austin,United States,Tesla,True,M,1971-06-28,CEO,1.0,,,Elon Musk is working to revolutionize transpor...,Musk was accepted to a graduate program at Sta...
1,2,Jeff Bezos,58.0,171000.0,2022,4,Technology,Amazon,United States,Washington,Seattle,United States,Amazon,True,M,1964-01-12,Entrepreneur,1.0,"Seattle-Tacoma-Bellevue, WA",,Jeff Bezos founded e-commerce giant Amazon in ...,"Growing up, Jeff Bezos worked summers on his g..."
2,3,Bernard Arnault & family,73.0,158000.0,2022,4,Fashion & Retail,LVMH,France,,Paris,France,LVMH Moët Hennessy Louis Vuitton,False,M,1949-03-05,Chairman and CEO,,,,Bernard Arnault oversees the LVMH empire of so...,"Arnault apparently wooed his wife, Helene Merc..."


In [223]:
# Check for duplicate
df.duplicated().sum()

0

In [224]:
df.isnull().sum()

rank                       0
personName                 0
age                       86
finalWorth                 0
year                       0
month                      0
category                   0
source                     0
country                   13
state                   1920
city                      44
countryOfCitizenship       0
organization            2316
selfMade                   0
gender                    16
birthDate                 99
title                   2267
philanthropyScore       2272
residenceMsa            2029
numberOfSiblings        2541
bio                        0
about                   1106
dtype: int64

## 1. Top 10 Richest people in 2022

In [13]:
top_10 = df.iloc[:10, [0, 1, 3, 8]]
top_10.columns = ["Rank", "Name", "Networth (in Billions)", "Country"]
top_10["Networth (in Billions)"] = top_10["Networth (in Billions)"]/1000
top_10

Unnamed: 0,Rank,Name,Networth (in Billions),Country
0,1,Elon Musk,219.0,United States
1,2,Jeff Bezos,171.0,United States
2,3,Bernard Arnault & family,158.0,France
3,4,Bill Gates,129.0,United States
4,5,Warren Buffett,118.0,United States
5,6,Larry Page,111.0,United States
6,7,Sergey Brin,107.0,United States
7,8,Larry Ellison,106.0,United States
8,9,Steve Ballmer,91.4,United States
9,10,Mukesh Ambani,90.7,India


In [21]:
top_10_table = ff.create_table(top_10)
top_10_table

**8 of the top 10 richest people in the world came from the United States.**

## 2. Category with the highest number of Billionaires

In [39]:
# GROUP the data by category then get the size for each category.
categories = df.groupby("category").size().reset_index(name="count").sort_values(by="count", ascending=False)
categories.reset_index(drop=True, inplace=True)

Unnamed: 0,category,count
0,Finance & Investments,392
1,Technology,343
2,Manufacturing,337
3,Fashion & Retail,250
4,Healthcare,217
5,Food & Beverage,203
6,Real Estate,193
7,Diversified,180
8,Media & Entertainment,99
9,Energy,95


In [42]:
top_10_categories = categories.head(10)
top_10_categories_table = ff.create_table(top_10_categories)
top_10_categories_table

In [115]:
trace = go.Bar(
    x = top_10_categories["category"],
    y = top_10_categories["count"],
    marker = dict(
        color = top_10_categories["count"],
        colorscale = "sunset",
    )
)
layout = dict(
    title = dict(
        text="Top 10 Sectors with the highest number of Billionaires", 
        font = dict(color="#9C9C9C")
    ),
    xaxis = dict(
        color = "#9C9C9C",
        title = "Sector",
        tickfont = dict(size=10)
    ),
    yaxis = dict(
        color = "#9C9C9C",
        title = "Number of Billionaires"
    ),
    paper_bgcolor = "#F5F5F5",
    plot_bgcolor = "#F5F5F5"
)

fig = go.Figure(data = [trace], layout = layout)
fig.show()

**FINANCE category has the highest number of Billionaires.**

## 3. Male vs Female Billionaires

In [95]:
# GROUP by gende then get the number of billionaires under each gender.
by_gender = df.groupby("gender").size().reset_index(name="count")
by_gender["gender"] = by_gender["gender"].map({"F": "Female", "M": "Male"})
by_gender

Unnamed: 0,gender,count
0,Female,311
1,Male,2341


In [96]:
by_gender_table = ff.create_table(by_gender)
by_gender_table

In [106]:
gender = go.Pie(
    labels = by_gender["gender"],
    values = by_gender["count"],
    marker = dict(colors = ["#E05F19", "#007F8E"]),
    hole=0.5
)
layout = dict(
    title = dict(
        text="Male vs Female Billionaires", 
        font = dict(color="#9C9C9C")
    ),
    paper_bgcolor = "#F5F5F5",
    plot_bgcolor = "#F5F5F5"
)

fig = go.Figure(data = [gender], layout = layout)
fig.show()

**There are significantly more MALE (88.3%) than FEMALE (11.7%) billionaires**

## 4. Which sectors do FEMALE billionaires belong to?

In [108]:
# Select rows with gender female
female = df[df["gender"] == "F"]
female.head(3)

Unnamed: 0,rank,personName,age,finalWorth,year,month,category,source,country,state,city,countryOfCitizenship,organization,selfMade,gender,birthDate,title,philanthropyScore,residenceMsa,numberOfSiblings,bio,about
13,14,Francoise Bettencourt Meyers & family,68.0,74800.0,2022,4,Fashion & Retail,L'Oréal,France,,Paris,France,,False,F,1953-07-10,,,,,"Francoise Bettencourt Meyers, the granddaughte...",Bettencourt Meyers' inheritance was the subjec...
17,18,Alice Walton,72.0,65300.0,2022,4,Fashion & Retail,Walmart,United States,Texas,Fort Worth,United States,Crystal Bridges Museum of American Art,False,F,1949-10-07,Philanthropist,2.0,"Dallas-Fort Worth-Arlington, TX",,Alice Walton is the only daughter of Walmart f...,"After graduating from Trinity College in 1971,..."
21,21,Julia Koch & family,59.0,60000.0,2022,4,Diversified,Koch Industries,United States,New York,New York,United States,,False,F,1962-04-12,,2.0,"New York, NY",,Julia Koch and her three children inherited a ...,


In [121]:
# Group the Data for female billionaires by category
category_female = female.groupby("category").size().reset_index(name="count").sort_values(by="count", ascending=False)
category_female.reset_index(drop=True, inplace=True)

In [113]:
category_female_table = ff.create_table(category_female)
category_female_table

**Most Female Billionaires are in Manufacturing, Food & Beverages**

In [120]:
trace = go.Bar(
    x = category_female["category"],
    y = category_female["count"],
    marker = dict(
        color = category_female["count"],
        colorscale = "teal",
    )
)
layout = dict(
    title = dict(
        text="Distribution of Female Billionaires by Sector", 
        font = dict(color="#9C9C9C")
    ),
    xaxis = dict(
        color = "#9C9C9C",
        title = "Sector",
        tickfont = dict(size=10)
    ),
    yaxis = dict(
        color = "#9C9C9C",
        title = "Number of Female Billionaires"
    ),
    paper_bgcolor = "#F5F5F5",
    plot_bgcolor = "#F5F5F5"
)

fig = go.Figure(data = [trace], layout = layout)
fig.show()

## 5. Distribution of Billionaires by Country

In [214]:
# GROUP the data by country then count the number of billionaires for each.
countries = df.groupby("country").size().reset_index(name="Number of Billionaires").sort_values(by="Number of Billionaires", ascending=False)
countries.reset_index(drop=True, inplace=True)
countries["Percentage"] = round((countries["Number of Billionaires"] /  countries["Number of Billionaires"].sum()) * 100, 1)
countries["Percentage"] = countries["Percentage"].apply(lambda x: f"{x}%")

In [215]:
top_countries = ff.create_table(countries.head(10))
top_countries

**USA and CHINA alone had almost 50% of all billionaires in 2022**

## 6. Distribution of FEMALE billionaires by Country

In [212]:
# GROUP the female data we filtered earlier by country, then count the number female billionaires for each.
female_by_country = female.groupby("country").size().reset_index(name="Female Billionaires").sort_values(by="Female Billionaires", ascending=False)
female_by_country.reset_index(drop=True, inplace=True)
female_by_country["Percentage"] = round((female_by_country["Female Billionaires"]/female_by_country["Female Billionaires"].sum()) *100, 1)
female_by_country["Percentage"] = female_by_country["Percentage"].apply(lambda x: f"{x}%")

In [213]:
top_10_female_by_country = ff.create_table(female_by_country.head(10))
top_10_female_by_country

**USA and CHINA still had the highest number of FEMALE billionaires in 2022**

In [216]:
# Merge the countries and female_by_country dataframes.
# We are using a left join because there some countries with no FEMALE billionaires.
combined = countries.merge(female_by_country, on="country", how="left")
combined.isnull().sum()

country                    0
Number of Billionaires     0
Percentage                 0
Female Billionaires       35
percentage                35
dtype: int64

In [217]:
## The missing values in the combined dataframe represents the countries with zero female billionaires.
# So lets replace the with zero.
combined.fillna(value=0, inplace=True)
combined.isnull().sum()

country                   0
Number of Billionaires    0
Percentage                0
Female Billionaires       0
percentage                0
dtype: int64

In [221]:
# Use a choropleth map to show billionaire distribution across the world
fig = px.choropleth(
    combined,
    locations='country',
    locationmode="country names",
    color='Number of Billionaires',
    hover_name = 'country',
    projection='natural earth1',
    hover_data=["Number of Billionaires", "Female Billionaires"],
    color_continuous_scale="reds",
    title='Distribution of Billionaires by Country',
)
fig.update_layout(
    title = dict(
        font=dict(color="#9C9C9C")
    ),
    paper_bgcolor = "#F5F5F5",
)

fig.show()

## 7. Youngest Billionaires in 2022

In [206]:
ages = df.sort_values(by="age")
ages = ages[["personName", "age", "finalWorth", "country", "selfMade"]].reset_index(drop=True)
ages["finalWorth"] = ages["finalWorth"]/1000
youngest_10 = ages.iloc[:10]
youngest_10.fillna(value="Germany", inplace=True)

In [207]:
youngest_10_table = ff.create_table(youngest_10)
youngest_10_table

**Some of the youngest billionaires have inherited wealth**

### Conclusion
In conclusion, this EDA project on the World's Richest People 2022 as listed by Forbes provides an insightful analysis of the current state of wealth distribution in the world. We have seen the countries with the highest number of billionaires, the economic sectores they come from. We've also seen the Female representation in the Billionaires club is low.
Overall, the EDA project on the World's Richest People 2022 as listed by Forbes offers valuable insights into the state of global wealth among the top 1%.