In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.offline import init_notebook_mode
plt.style.use('dark_background')
init_notebook_mode(connected=True)
import os

        
import warnings
warnings.filterwarnings('ignore')

In [2]:
forbes = pd.read_csv("C:/Users/user/Downloads/2022_forbes_billionaires.csv")

In [3]:
forbes.head()

Unnamed: 0.1,Unnamed: 0,rank,name,networth,age,country,source,industry
0,0,1,Elon Musk,$219 B,50,United States,"Tesla, SpaceX",Automotive
1,1,2,Jeff Bezos,$171 B,58,United States,Amazon,Technology
2,2,3,Bernard Arnault & family,$158 B,73,France,LVMH,Fashion & Retail
3,3,4,Bill Gates,$129 B,66,United States,Microsoft,Technology
4,4,5,Warren Buffett,$118 B,91,United States,Berkshire Hathaway,Finance & Investments


In [4]:
forbes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2600 entries, 0 to 2599
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  2600 non-null   int64 
 1   rank        2600 non-null   int64 
 2   name        2600 non-null   object
 3   networth    2600 non-null   object
 4   age         2600 non-null   int64 
 5   country     2600 non-null   object
 6   source      2600 non-null   object
 7   industry    2600 non-null   object
dtypes: int64(3), object(5)
memory usage: 162.6+ KB


Data Cleaning


In [5]:
forbes.isnull().sum()

Unnamed: 0    0
rank          0
name          0
networth      0
age           0
country       0
source        0
industry      0
dtype: int64

The dataset is does not have any null values. Now let's see if it contains any duplicated values or not.

In [6]:
forbes.duplicated().sum()

0

There are no duplicated values and neither null values. It means that the dataset is already cleaned. Now let's drop the irrelevant columns from the dataset which is also a part of data cleaning.

In [7]:
forbes.drop(columns = ['Unnamed: 0', 'rank'], axis = 'columns', inplace = True)

The irrelevant columns are now dropped and now we have 6 useful columns remaining. We can see that the networth columns is of object type and contains a string so i'm going to stip them from the values inside networth column. Later i'll convert the networth datatype to float that is essential to do.

In [8]:
forbes['networth'] = forbes['networth'].str.strip('$').str.strip('B')
forbes['networth'] = forbes['networth'].astype(float)

Exploratory Data Analysis

In [9]:
forbes.describe()

Unnamed: 0,networth,age
count,2600.0,2600.0
mean,4.86075,64.271923
std,10.659671,13.220607
min,1.0,19.0
25%,1.5,55.0
50%,2.4,64.0
75%,4.5,74.0
max,219.0,100.0


Insights

1.The maxium networth of a billionaire is 219 B dollars and the minimum networth is 1 B. The networth on average is 4.8 B dollars.
2.The maxium age of a billionaire is 100 years and the youngest age is 19. The age on average is 64 years.



In [10]:
#  Who are top 10 billionanaires and their networth?

In [11]:
fig = px.bar(forbes.sort_values(by = 'networth', ascending = False)[:10], 
             x = 'name', y = 'networth', template = 'plotly_dark', 
             color = 'networth', opacity = 0.8, title='<b>Top 10 billionanaires and their networth')
fig.show()



Insights

    Elon Must has the highest networth of 219 B dollars.
    Steve Ballmer has the lowest networth of 90.4 B dollars in top 10 billionaires.



In [12]:
#    What are the countries of top 10 billionaires?

In [13]:
px.scatter(forbes.sort_values(by='networth', ascending = False)[:10],
           x = 'name',y='networth', template = 'plotly_dark',size = 'networth',
           color = 'country',opacity = 0.85,title = '<b>Countries of top 10 billionaires')

Insights

   1. Most of the top 10 billionaires belongs from Unites States with massive networths.
   2. The richest billionaire Elon Musk is also from Unites States with a networth of 219 B dollars.
   3. Only two billionaires in top 10 are not from Unites States. One is Bernard Amault and Family from France with a networth of 158 B dollars. The other is Mukhes Ambani from India with a networth of 90.7 B dollars

In [14]:
# What are the industries of top 10 billionairs?

In [15]:
px.scatter(forbes.sort_values(by='networth', ascending = False)[:10],
           y='networth',x='name',template='plotly_dark',size='networth',
           color='industry',opacity=0.85,title='<b>Industries of top 10 billionaires')



Insights

    1.Most of the top 10 billionaires belongs from industry of Technology with massive networths.
    2.The richest billionaire Elon Musk has it's 219 B dollar networth coming from the industry of Automative.
    3.Warren Buffet has it's 118 B dollar networth coming from the industry of Finance and Investment.
    4.Bernard Amault and Family are working in Fashion and retail industry with a networth of 158 B dollars.
    5.The industry category of Mukhes Ambani is diversified with a networth of 90.7 B dollars.



In [16]:
# What are the sources of top 10 billionaires?

In [17]:
px.scatter(forbes.sort_values(by = 'networth', ascending = False)[:10],
           x = 'name', y = 'networth', template = 'plotly_dark', size = 'networth', 
           color = 'source', opacity = 0.85, title = '<b>Source of top 10 billionaires')



Insights

    1.The richest billionaire Elon Musk has it's 219 B dollars networth coming from the source of Tesla, SpaceX.
    2.The second richest billionaire Jeff Bezos has it's 171 B dollars networth coming from the source of Amazon.
    3.Steve Ballmer with a networth of 91.4 B dollars and Bill Gates with a networth of 129 B dollars have the common source of Microsoft.
    4.Sergy Brin with a networth of 107 B dollars and Lary Page with a networth of 111 B dollars have the common source of Google.



In [18]:

# What is the distribution of age in billionaires?

In [19]:
fig = px.histogram(forbes, x = 'age', template = 'plotly_dark', 
                   color = 'age', opacity = 0.9, title = '<b>Distribution of age in billionaires')
fig.show()



Insights

    1.Most of the top billionaires have the age of 64.
    2.The distribution is almost equally skewed. It shows that we have younger and old billionaires equal in numbers.
    3.The youngest billionaires has the age of 19.
    4.The oldest billionaires has the age of 100.



In [20]:
# What is the age of top 10 billionairs?

In [21]:
fig = px.bar(forbes.sort_values(by = 'networth', ascending = False)[:10],
             x = 'name', y = 'age', template = 'plotly_dark', 
             color = 'age', opacity = 0.8, title = '<b>Top 10 billionaires and their age')
fig.show()



Insights

    1.The richest billionaire Elon Musk has the age of 50.
    2.The 10th richest billionaire Steve Ballmer has the age of 66.
    3.Warren Buffet is the oldest billionaire in top 10 billionaires with the age of 91.
    4.Sergey Brin is the youngest billionaire in top 10 billionaires with the age of 48.



In [None]:
# What are youngest top 10 billionaires?

In [23]:
fig = px.bar(forbes.sort_values('age',ascending = True)[:10], 
             x = 'name', y = 'networth', template = 'plotly_dark', 
             color = 'age', opacity = 0.8, title = '<b>Top 10 youngest billionairs')
fig.show()



Insights

    1.The youngest billionaire in top 10 billionaires is Kevin David Lehmann with the age of 19 and a netoworth of 2.4 B dollars.
    2.The oldest billionaire in top 10 youngest billionaires is Gary Wang with the age of 28 and a netoworth of 5.9 B dollars.
    3.Most of the top 10 youngest billionaires have an age of 25 and a networth of around 1.5 B dollars.



In [None]:
# Who are top 10 oldest billionaires?

In [24]:
fig = px.bar(forbes.sort_values('age',ascending = False)[:10], 
             x = 'name', y = 'networth', template = 'plotly_dark',
             color = 'age', opacity = 0.8, title = '<b>Top 10 oldest billionairs')
fig.show()



Insights

    1.The oldest billionaire in top 10 oldest billionaires is George Joseph with the age of 100 and a netoworth of 1.8 B dollars.
    2.The richest billionaire in top 10 oldest billionaires is Robert Kuok with the age of 98 and a netoworth of 11.7 B dollars.
    3.The 10th billionaire in top 10 oldest billionaire is Nobutoshi Shimamura with the age of 96 and a netoworth of 1.3 B dollars.
    4.Most of the top 10 oldest billionaires have an age of around 96 and a networth of around 2 B dollars.



In [None]:
#  What is the total networth of billionaires in top 10 countries?

In [25]:
country_networth = forbes.groupby('country').sum()[['networth']].sort_values('networth', ascending = False).reset_index().head(10)


In [26]:
fig=px.scatter(country_networth, x ='country', y = 'networth', template = 'plotly_dark', 
               color = 'country', size = 'networth' , opacity = 0.85,
               title="<b>Total networth of billionaires in top 10 countries")
fig.show()



Insights

    1.United States has the highest total networth of around 4685.1 B dollars
    2.United Kingdom has the lowest total networth in top 10 countries which is around 199.1 B dollars
    3.China is the second country in top 10 countries with a netowrth of 1938.45 B dollars



In [None]:
#   What kind of industries does top 5 countries work in?

# Extracting information from the dataset about top 5 countries and what industries they have billionaires working in

In [27]:
df_us = forbes[forbes['country']=='United States']['industry'].value_counts().rename_axis('industry').reset_index(name='United States')
df_cn = forbes[forbes['country']=='China']['industry'].value_counts().rename_axis('industry').reset_index(name='China')
df_in = forbes[forbes['country']=='India']['industry'].value_counts().rename_axis('industry').reset_index(name='India')
df_ge = forbes[forbes['country']=='Germany']['industry'].value_counts().rename_axis('industry').reset_index(name='Germany')
df_fr = forbes[forbes['country']=='France']['industry'].value_counts().rename_axis('industry').reset_index(name='France')

In [28]:
# Merging the extracted dataframes and making a final one to do further analysis



df_us_cn = pd.merge(df_us, df_cn, on = 'industry', how = 'left')
df_in_ge = pd.merge(df_in, df_ge, on = 'industry', how = 'left')
df_1 = pd.merge(df_us_cn, df_in_ge, on = 'industry', how = 'left')
df = pd.merge(df_1, df_fr, on = 'industry', how = 'left')

In [29]:
df = df.replace(np.nan,0)

In [30]:
fig = px.scatter(df, x = 'industry', y = 'United States', template = 'plotly_dark', 
                 color = 'industry', size = 'United States', opacity = 0.85, 
                 title = "<b>United States networth in all industries", height = 560)
fig.show()



Insights

    1.United States has it's major networth coming from the industry of Finance and Investment which is 193 B dollars.
    2.The second biggest industry of United States is Technology with a networth of 137 B dollars.
    3.Additionaly, Unites States has it's least networth of 2 B dollars coming from the industry of Metals and Mining .



In [31]:
fig = px.scatter(df, x = 'industry', y = 'China', template = 'plotly_dark', 
                 color = 'industry', size = 'China', opacity = 0.85, 
                 title = "<b>China networth in all industries", height = 560)
fig.show()



Insights

   1.China has it's major networth coming from the industry of Manufacturing which is 142 B dollars.
   2.The second biggest industry of China is Technology with a networth of 81 B dollars.
   3.Additionaly, China has it's least networth of 4 B dollars coming from the industry of Construction and Engineering.



In [32]:
fig = px.scatter(df, x = 'industry', y = 'India', template = 'plotly_dark',
                 color = 'industry', size = 'India', opacity = 0.85, 
                 title = "<b>India networth in all industries", height = 560)
fig.show()



Insights

   1.India has it's major networth coming from the industry of Manufacturing which is 31 B dollars.
   2.The second biggest industry of India is Healthcare with a networth of 30 B dollars.
   3. Additionaly, India has it's least networth of 1 B dollars coming from many different industries such as Telcom

In [33]:
fig = px.scatter(df, x = 'industry', y = 'Germany', template = 'plotly_dark',
                 color = 'industry', size = 'Germany', opacity = 0.85, 
                 title = "<b>Germany networth in all industries", height = 560)
fig.show()



Insights

   1. Germany has it's major networth coming from the industry of Fashion and Retail which is 29 B dollars.
   2. The second biggest industry of Germany is Manufacturing with a networth of 23 B dollars.
   3. Additionaly, Germany has it's least networth of 1 B dollars coming from many different industries such as Energy.



In [34]:
fig = px.scatter(df, x = 'industry', y = 'France', template = 'plotly_dark',
                 color = 'industry', size = 'France', opacity = 0.85, 
                 title = "<b>France networth in all industries", height = 560)
fig.show()



Insights

   1. France has it's major networth coming from the industry of Fashiona and Retail which is 8 B dollars.
   2. The second biggest industry of France is Healhtcare with a networth of 7 B dollars.
   3. Additionaly, France has it's least networth of 1 B dollars coming from many industries such as Automative.



In [None]:

# What source does top 10 young billionaires have?

In [35]:
fig = px.bar(forbes.sort_values('age',ascending = True)[:10], x = 'name', y = 'networth', 
             template = 'plotly_dark', color = 'source', opacity = 0.8, 
             title = '<b>Top 10 youngest billionairs and their sources')
fig.show()



Insights

    1.The youngest billionaire Kevin David Lehmann has it's networth in the source of DrugStores.
    2.The richest billionaire in top 10 youngest billionaires is Gary Wang and have it's networth coming from Crypto Currency Exchange source.
    3.The second richest billionaire in top 10 youngest billionaires is Gustav Magnar Witzoe have it's networth coming from Fish Farming source.



In [None]:
#   What industry does top 10 oldest billionaires belong to?

In [36]:
fig = px.bar(forbes.sort_values('age',ascending = False)[:10], x = 'name', y = 'networth', 
             template = 'plotly_dark', color = 'source', opacity = 0.8, 
             title = '<b>Top 10 oldest billionairs and their sources')
fig.show()



Insights

    1.The oldest billionaire George Joseph has it's networth in the source of Insurance.
    2.The richest billionaire in top 10 oldest billionaires is Robert Kuok and have it's networth coming from Palm oil, shiping and property source.
    3.The second richest billionaire in top 10 oldest billionaires is Masatoshi Ito have it's networth coming from Retail source.



In [None]:
#  What is the total networth of billionaires in top 10 sources?

In [37]:
source_networth = forbes.groupby('source').sum()[['networth']].sort_values('networth', ascending = False).reset_index().head(10)

In [38]:
fig=px.scatter(source_networth, x ='source', y = 'networth', template = 'plotly_dark', color = 'source', size = 'networth' , opacity = 0.85, title="<b>Total networth of billionaires in top 10 sources")
fig.show()



Insights

    1.Real estate source has the highest total networth of around 573.8 B dollars
    2.Diversified is the second source with a networth of 382 B dollars
    3.Tesla, SpaceX has the lowest total networth in top 10 sources which is 219 B dollars



In [None]:
#   What industry does top 10 young billionaires belong to?

In [39]:
fig = px.bar(forbes.sort_values('age',ascending = True)[:10], x = 'name', y = 'networth',
             template = 'plotly_dark', color = 'industry', opacity = 0.8, 
             title = '<b>Top 10 youngest billionairs and the industries')
fig.show()



Insights

    1.Most of the top 10 young billionaires work in the industry of Finance and Investment.
    2.The youngest billionaire Kevin David Lehmann has it's networth in the industry of Fashion and Retail.
    3.The second richest billionaire in top 10 youngest billionaires is Gustav Magnar Witzoe have it's networth coming from Food and Beverage industry

In [None]:
# What industry does top 10 oldest billionaires belong to?

In [40]:
fig = px.bar(forbes.sort_values('age',ascending = False)[:10], x = 'name', y = 'networth', 
             template = 'plotly_dark', color = 'industry', opacity = 0.8, 
             title = '<b>Top 10 oldest billionairs and their industries')
fig.show()



Insights

    1.Most of the top 10 oldest billionaires work in the industry of Finance and Investment.
    2.The oldest billionaire George Joseph has it's networth in the industry of Finance and Investment.
    3.The richest billionaire in top 10 oldest billionaires is Robert Kuok have it's networth coming from Diversified industry.



In [None]:
# What is the total networth of billionaires in top 10 industries?

In [41]:
industry_networth = forbes.groupby('industry').sum()[['networth']].sort_values('networth', ascending = False).reset_index().head(10)

In [42]:
fig=px.scatter(industry_networth, x ='industry', y = 'networth', template = 'plotly_dark', color = 'industry', size = 'networth' , opacity = 0.85, title="<b>Total networth of billionaires in top 10 industries")
fig.show()

In [None]:


Insights

    1.Technology industry has the highest total networth of around 2168.4 B dollars
    2.Finance and Investment is the second source in industries with a networth of 1734.3 B dollars
    3.Media and Entertainment has the lowest total networth in top 10 industries which is 496.3 B dollars



Conclusion:





Richest

    The richest billionaire is Elon Mustk from United States with a networth of 219 B dollars with an age of 50 years. His source of networth is Tesla and SpaceX and he mainly focused on the industry of Automative

Youngest

    The youngest billionaire is Kevin David Lehmann from Germany with a networth of 2.4 B dollars with an age of 19 years. His source of networth is Drugstores and he mainly focused on the industry of Fashion and Retail

Oldest

    The oldest billionaire is George Joseph from United States with a networth of 1.8 B dollars with an age of 100 years. His source of networth is Insurance and he mainly focused on the industry of Fincance and Investment

Country

    Most of the top 10 billionaires are from United States and the total networth of all the billionaires in this countryy is 4685.1 B dollars. It's highest networth is in the industry of Fincance and Investment and additonaly, it's the only country to be doing business in Gambling and Casinos

Industry

    The industry of Technology is the one having the highest total networth of around 2168.4 B dollars. Furthermore, most of the top 10 billionaires have their networth in this industry.

Source

    The source of Real state is the one having the highest total networth of around 573.8 B dollars. However, none of the top 10 billionaires have their networth in this source.

