# **Forbes Billionaires List Notebook**

The Forbes World’s Billionaires list is a snapshot of wealth using stock prices and exchange rates from March 18, 2020. Some people become richer or poorer within days of publication. In this notebook we will try to analyze this list in its various facets

![](https://winapay.com/img/games/may-2019/31754b625d52c652bf35eff2219380c4e3bd8d561b2-richest-billionaires-list-2019.jpg)

# Easy preproccesing data

**Importing libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import folium

**Reading data**

In [None]:
filepath = '../input/forbes-billionaires-of-2021-20/forbes_billionaires.csv'
data = pd.read_csv(filepath)

**Creating a copy of the dataset**

In [None]:
df = data.copy()

**Let's take a look at the dataset**

In [None]:
df.head()

Little legend of the features:
* **Name** : the name and surname of the Billionaires;
* **NetWorth** : the net worth in billions $;
* **Country** : the country in which they were born;
* **Source** : the source of wealth;
* **Rank** :  the position in the Forbes list;
* **Age** : their age;
* **Residence** : the city in which they reside;
* **Citizenship** : the citizenships of them;
* **Status** : their marital status;
* **Children** : the number of children they have;
* **Education** : the name of the school/university they attended;
* **Self_made** : if they are self-made man or not.

**Replace the True or False value with 'Self-made' or 'No Self-made' for a better comprehension**

In [None]:
df['Self_made'] = df['Self_made'].replace([True, False],['Self-made', 'No self-made'])

**Have a look on the shapes of the dataframe**

In [None]:
df.shape

**And let's check how much missing values there are**

In [None]:
df.isnull().sum()

# **Top 10 Billionairs per NetWorth**

Who are the richest people in the world for the 2021 Forbes List? Let's try to find out

In [None]:
top_10_list = df[:10]

In [None]:
fig = px.bar( top_10_list, x="Name", y="NetWorth", color="Source",
             color_discrete_sequence=px.colors.qualitative.Vivid)

fig.update_layout(
    title_text='Top 10 Bilionairs in Forbes list',
    yaxis=dict(
        title='USD (bilions)',
        titlefont_size=16,
        tickfont_size=14,
    ),
)

# **Top 10 Countries**

Which are the 10 countries with the most billionaires? Let's try to find out, because I think you may be surprised

In [None]:
Country_list = df.Country
count_list = {} #create an empty list
for names in Country_list: # for any names in cuisine_list
        if (names in count_list): #if this name is already present in the cuis_list
            count_list[names]+=1 # increase his value
        else:  # else
            count_list[names]=1 # Create his index in the list
count_df = pd.DataFrame(count_list.values(),index = count_list.keys(),columns = {'Billionairs in the Countries'})
count_df.sort_values(by = 'Billionairs in the Countries',ascending = False,inplace = True) #Sort the dataframe in ascending order
top_10_count = count_df[0:10] #Pick the 10 restaurant most nominated

In [None]:
colors = ['lightslategray',] * 10
colors[0] = 'crimson'
countries = top_10_count.index
counts = top_10_count["Billionairs in the Countries"]
fig = go.Figure(data=[go.Bar(
    x= countries,
    y= counts,
    marker_color = colors # marker color can be a single color value or an iterable
)])
fig.update_layout(title_text='Which countries have the most billionaires?')

As we could have expected the nations with the most billionaires are the **United States** and **China**, followed by **India** (which i think will increase this number very quickly in the next few years), **Germany** and **Russia**.

# **Age groups**

I think that between one decade and another there may be different differences between the various mentalities and behaviors of people so I decided to divide the billionaires into age groups, which one do you think could be the most present on the list? And then, do you know who is the youngest and oldest billionair?

In [None]:
df = df.dropna(subset=['Age'])
def Age(age):
    if age >= 90 : return '90 years old'
    if age >= 80 : return '80 years old'
    if age >= 70 : return '70 years old'
    if age >= 60 : return '60 years old'
    if age >= 50 : return '50 years old'
    if age >= 40 : return '40 years old'
    if age >= 30 : return '30 years old'
    if age >= 20 : return '20 years old'
    else: return 'Teenager'
df['age_group'] = df.apply(lambda x: Age(x["Age"]), axis = 1)


In [None]:
youngest_billionair_age = df['Age'].min() # the youngest billionair is 18 years old
youngest_billionair = df[df['Age'] == 18.0]
youngest_billionair

The youngest billionair is **Kevin David Lehmann**, billionaire **since he was 14**. He is now eighteen, he has inherited from his father the "dm" grocery chain, the largest in Europe.

In [None]:
oldest_billionair_age = df['Age'].max() # the oldest billionair is 99 years old
oldest_billionair = df[df['Age'] == 99.0]
oldest_billionair

The oldest billionair is **George Joseph** (born September 11, **1921**), that is the founder of Mercury Insurance Group of Los Angeles.

In [None]:
plt.figure(figsize=(13,5))
age_categ_count = df['age_group'].value_counts()
ax = sns.countplot(x="age_group", 
                   data = df,
                   order = age_categ_count.index,
                   linewidth=2)
for rect in ax.patches:
    ax.text (rect.get_x() + rect.get_width()  / 2,rect.get_height()+ 0.75,rect.get_height(),horizontalalignment='center', fontsize = 13)
ax.set_title('What age group is the most on the Forbes list?',fontsize = 20, fontweight='bold' )
ax.set_xlabel('Age group', fontsize = 15)
ax.set_ylabel('N° of people in this category', fontsize = 15)

As we can see, **the age groups between 50 and 80 are the most present among the billionaires**. I might think it is for an experience factor as as you grow up, you can make mistakes and try to improve yourself by reaching that age with a lot of experience and money if you have made the right choices

# **Distribution of the billionairs in the various country around the world**

Before we took a look at which were the 10 countries with the most millionaires, but now with the help of a map we can see more widely the geographical arrangement of the countries of origin of the billionaires on the list

In [None]:
fig1 = px.choropleth(count_df, locations= count_df.index, locationmode='country names', color="Billionairs in the Countries",
                    color_continuous_scale=px.colors.sequential.Darkmint)

fig1.update_layout(title_text="Map of the billionairs in the world", title_font_size=24,
                  height=800, width=1000, xaxis_title="Country",
                  title_x=0.45)

fig1.show()

As we can see there is a **fairly varied distribution** on the map, unfortunately, however, **in Africa and the Middle East there is a lot of lack**. We hope that in the next few years the situation will change and there may be billionaires in those areas as well

# **Marital Status, a little of gossip**

What is the most common marital status among billionaires? Are there more married or single? Let's have a look

In [None]:
plt.figure(figsize=(13,5))
mar_stat_count = df['Status'].value_counts()
ax = sns.countplot(x="Status", 
                   data = df,
                   order = mar_stat_count.index,
                   linewidth=2)
for rect in ax.patches:
    ax.text (rect.get_x() + rect.get_width()  / 2,rect.get_height()+ 0.75,rect.get_height(),horizontalalignment='center', fontsize = 13)
ax.set_title('What is the most common marital situation among billionaires?',fontsize = 15, fontweight='bold' )
ax.set_xlabel('Marital status', fontsize = 15)
ax.set_ylabel('N° of people', fontsize = 15)

As we can see, love triumphs even among billionaires as there is **a large majority of married** billionaires, unfortunately followed by many divorced.

# **How many children do billionaires have on average?**

Do you think billionaires prefer to have children or just dedicate themselves to their career? And if they had children, how many do you think a billionaire has on average?

In [None]:
plt.figure(figsize=(20,5))
mar_stat_count = df['Children'].value_counts()
ax = sns.countplot(x="Children", 
                   data = df,
                   order = mar_stat_count.index,
                   linewidth=2)
for rect in ax.patches:
    ax.text (rect.get_x() + rect.get_width()  / 2,rect.get_height()+ 0.75,rect.get_height(),horizontalalignment='center', fontsize = 13)
ax.set_title('How many children do billionaires have?',fontsize = 30, fontweight='bold' )
ax.set_xlabel('N° of children', fontsize = 15)
ax.set_ylabel('N° of people', fontsize = 15)

As we can see, most of them have **2,3 or 4 children**, like all 'normal' families. However, there are also records of children such as the one who has 23, let's find out who he is

In [None]:
record_father_child = df['Children'].max() # The millionaire who has the most children has 23
record_father = df[df['Children'] == 23.0]
record_father

It certainly seems strange to hear that a man can have 23 children, in fact, **Mr. Roman Avdeev adopted 19**. He adopted the first two children in 2002 when he realized that his help with orphanages was ineffective.

# **There are more self-made man or not?**

In today's world, is it still possible to emerge as a self-made man or do you have to make your way inside or with a company to become rich and not get sucked into the market? Let's find out

In [None]:
plt.figure(figsize=(8,6))
mar_stat_count = df['Self_made'].value_counts()
ax = sns.countplot(x="Self_made", 
                   data = df,
                   order = mar_stat_count.index,
                   linewidth=2)
for rect in ax.patches:
    ax.text (rect.get_x() + rect.get_width()  / 2,rect.get_height()+ 0.75,rect.get_height(),horizontalalignment='center', fontsize = 13)
ax.set_title('There are more self-made man or not?',fontsize = 15, fontweight='bold' )
ax.set_ylabel('N° of people', fontsize = 10)

To my surprise, we can see that **there are more self-made man billionaires than not self-made**. So this means that you too, you who are reading this notebook, if you put your best effort into it, you could become one of them.

**Thank you so much for looking at this notebook, I hope you enjoyed it and if so I would invite you to put an upvote. If you have found any errors, please write them to me in the comments or even if you have any suggestions for improving the notebook. thank you very much again and good Kaggling!**