## Billionaire Dataset Fields

- **rank:** The ranking of the billionaire in terms of wealth.
- **finalWorth:** The final net worth of the billionaire in U.S. dollars.
- **category:** The category or industry in which the billionaire's business operates.
- **personName:** The full name of the billionaire.
- **age:** The age of the billionaire.
- **country:** The country in which the billionaire resides.
- **city:** The city in which the billionaire resides.
- **source:** The source of the billionaire's wealth.
- **industries:** The industries associated with the billionaire's business interests.
- **countryOfCitizenship:** The country of citizenship of the billionaire.
- **organization:** The name of the organization or company associated with the billionaire.
- **selfMade:** Indicates whether the billionaire is self-made (True/False).
- **status:** "D" represents self-made billionaires (Founders/Entrepreneurs) and "U" indicates inherited or unearned wealth.
- **gender:** The gender of the billionaire.
- **birthDate:** The birthdate of the billionaire.
- **lastName:** The last name of the billionaire.
- **firstName:** The first name of the billionaire.
- **title:** The title or honorific of the billionaire.
- **date:** The date of data collection.
- **state:** The state in which the billionaire resides.
- **residenceStateRegion:** The region or state of residence of the billionaire.
- **birthYear:** The birth year of the billionaire.
- **birthMonth:** The birth month of the billionaire.
- **birthDay:** The birth day of the billionaire.
- **cpi_country:** Consumer Price Index (CPI) for the billionaire's country.
- **cpi_change_country:** CPI change for the billionaire's country.
- **gdp_country:** Gross Domestic Product (GDP) for the billionaire's country.
- **gross_tertiary_education_enrollment:** Enrollment in tertiary education in the billionaire's country.
- **gross_primary_education_enrollment_country:** Enrollment in primary education in the billionaire's country.
- **life_expectancy_country:** Life expectancy in the billionaire's country.
- **tax_revenue_country_country:** Tax revenue in the billionaire's country.
- **total_tax_rate_country:** Total tax rate in the billionaire's country.
- **population_country:** Population of the billionaire's country.
- **latitude_country:** Latitude coordinate of the billionaire's country.
- **longitude_country:** Longitude coordinate of the billionaire's country.


### loading librarys

In [53]:
import pandas as pd
import numpy as np
import plotly.express as px
from datetime import datetime


## import data

In [54]:
df = pd .read_csv("Billionaires Statistics Dataset.csv")
df.head()

Unnamed: 0,rank,finalWorth,category,personName,age,country,city,source,industries,countryOfCitizenship,...,cpi_change_country,gdp_country,gross_tertiary_education_enrollment,gross_primary_education_enrollment_country,life_expectancy_country,tax_revenue_country_country,total_tax_rate_country,population_country,latitude_country,longitude_country
0,1,211000,Fashion & Retail,Bernard Arnault & family,74.0,France,Paris,LVMH,Fashion & Retail,France,...,1.1,"$2,715,518,274,227",65.6,102.5,82.5,24.2,60.7,67059887.0,46.227638,2.213749
1,2,180000,Automotive,Elon Musk,51.0,United States,Austin,"Tesla, SpaceX",Automotive,United States,...,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,328239523.0,37.09024,-95.712891
2,3,114000,Technology,Jeff Bezos,59.0,United States,Medina,Amazon,Technology,United States,...,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,328239523.0,37.09024,-95.712891
3,4,107000,Technology,Larry Ellison,78.0,United States,Lanai,Oracle,Technology,United States,...,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,328239523.0,37.09024,-95.712891
4,5,106000,Finance & Investments,Warren Buffett,92.0,United States,Omaha,Berkshire Hathaway,Finance & Investments,United States,...,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,328239523.0,37.09024,-95.712891


### check duplicated

In [55]:
#ckeck duplicated value in data
df.duplicated().sum()
# no duplicated value

0

###  check the missing value in data

In [56]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2640 entries, 0 to 2639
Data columns (total 35 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   rank                                        2640 non-null   int64  
 1   finalWorth                                  2640 non-null   int64  
 2   category                                    2640 non-null   object 
 3   personName                                  2640 non-null   object 
 4   age                                         2575 non-null   float64
 5   country                                     2602 non-null   object 
 6   city                                        2568 non-null   object 
 7   source                                      2640 non-null   object 
 8   industries                                  2640 non-null   object 
 9   countryOfCitizenship                        2640 non-null   object 
 10  organization

#### number the missing value in eack column

In [57]:
df.isnull().sum()

rank                                             0
finalWorth                                       0
category                                         0
personName                                       0
age                                             65
country                                         38
city                                            72
source                                           0
industries                                       0
countryOfCitizenship                             0
organization                                  2315
selfMade                                         0
status                                           0
gender                                           0
birthDate                                       76
lastName                                         0
firstName                                        3
title                                         2301
date                                             0
state                          

## EDA

In [58]:
df.columns

Index(['rank', 'finalWorth', 'category', 'personName', 'age', 'country',
       'city', 'source', 'industries', 'countryOfCitizenship', 'organization',
       'selfMade', 'status', 'gender', 'birthDate', 'lastName', 'firstName',
       'title', 'date', 'state', 'residenceStateRegion', 'birthYear',
       'birthMonth', 'birthDay', 'cpi_country', 'cpi_change_country',
       'gdp_country', 'gross_tertiary_education_enrollment',
       'gross_primary_education_enrollment_country', 'life_expectancy_country',
       'tax_revenue_country_country', 'total_tax_rate_country',
       'population_country', 'latitude_country', 'longitude_country'],
      dtype='object')

#### check value_counts for category columns
##### for check the strange data

In [59]:
df.category.value_counts()

category
Finance & Investments         372
Manufacturing                 324
Technology                    314
Fashion & Retail              266
Food & Beverage               212
Healthcare                    201
Real Estate                   193
Diversified                   187
Energy                        100
Media & Entertainment          91
Metals & Mining                74
Automotive                     73
Service                        53
Construction & Engineering     45
Logistics                      40
Sports                         39
Telecom                        31
Gambling & Casinos             25
Name: count, dtype: int64

In [60]:
df.country.value_counts()

country
United States           754
China                   523
India                   157
Germany                 102
United Kingdom           82
                       ... 
Portugal                  1
Georgia                   1
Eswatini (Swaziland)      1
Uzbekistan                1
Armenia                   1
Name: count, Length: 78, dtype: int64

In [61]:
df.city.value_counts()

city
New York       99
Beijing        68
Hong Kong      68
Shanghai       64
London         61
               ..
Küsnacht        1
Brownsville     1
Montpellier     1
Santa Clara     1
Makati          1
Name: count, Length: 741, dtype: int64

In [62]:
df.source.value_counts()

source
Real estate                     151
Investments                      92
Diversified                      91
Pharmaceuticals                  85
Software                         63
                               ... 
Chemical industry                 1
Readymade garments                1
Stock brokerage                   1
Nutrition, wellness products      1
Tyre manufacturing machinery      1
Name: count, Length: 906, dtype: int64

In [63]:
df.industries.value_counts()

industries
Finance & Investments         372
Manufacturing                 324
Technology                    314
Fashion & Retail              266
Food & Beverage               212
Healthcare                    201
Real Estate                   193
Diversified                   187
Energy                        100
Media & Entertainment          91
Metals & Mining                74
Automotive                     73
Service                        53
Construction & Engineering     45
Logistics                      40
Sports                         39
Telecom                        31
Gambling & Casinos             25
Name: count, dtype: int64

In [64]:
#df["industries"] = df["category"] we can drop  "category" column
df.drop("category",inplace=True,axis=1)

In [65]:
df.countryOfCitizenship.value_counts()

countryOfCitizenship
United States           735
China                   491
India                   169
Germany                 126
Russia                  104
                       ... 
Belize                    1
Eswatini (Swaziland)      1
Venezuela                 1
Algeria                   1
Panama                    1
Name: count, Length: 77, dtype: int64

In [66]:
df.organization.value_counts()

organization
Meta Platforms                  4
Gap Inc.                        3
Broadcom                        2
DJI Technology Co.              2
Twitter                         2
                               ..
Gulf States Toyota              1
Elliott Management              1
Artemis Real Estate Partners    1
Prada                           1
Television                      1
Name: count, Length: 294, dtype: int64

In [67]:
df.selfMade.value_counts()

selfMade
True     1812
False     828
Name: count, dtype: int64

In [68]:
df.status.value_counts()

status
D                       1223
U                        855
E                        268
N                        150
Split Family Fortune      79
R                         65
Name: count, dtype: int64

In [69]:
df.gender.value_counts()

gender
M    2303
F     337
Name: count, dtype: int64

In [70]:
df.title.value_counts()

title
Investor                                44
Founder                                 34
CEO                                     29
Chairman and CEO                        28
Chairman                                25
                                        ..
Co-founder and CEO                       1
Former Chairman                          1
CEO and Chairman                         1
Co-Founder, Executive Vice President     1
Chairman of the Board                    1
Name: count, Length: 97, dtype: int64

## Data Cleaning

#### Edit data type

In [71]:
# convert ["age", "birthYear", "birthMonth", "birthDay"] from float to int 
clms = ["age", "birthYear", "birthMonth", "birthDay"]
df[clms] = df[clms].replace([np.inf, -np.inf, np.nan], 0).astype(int)

#### fill nan value in organization , title

In [72]:
# fill nan value in organization , title columns with unknown
df["organization"] = df["organization"].fillna("unknown")
df["title"] = df["title"].fillna("unknown")

#### the percentage of missing value in data

In [73]:
def missing_values(n):
    df1=pd.DataFrame()
    df1["missing_values, %"]=df.isnull().sum()*100/len(df.isnull())
    df1["missing_values, sum"]=df.isnull().sum()
    return df1.sort_values(by="missing_values, %", ascending=False)
missing_values(df)

Unnamed: 0,"missing_values, %","missing_values, sum"
residenceStateRegion,71.704545,1893
state,71.477273,1887
cpi_country,6.969697,184
cpi_change_country,6.969697,184
tax_revenue_country_country,6.931818,183
life_expectancy_country,6.893939,182
gross_tertiary_education_enrollment,6.893939,182
total_tax_rate_country,6.893939,182
gross_primary_education_enrollment_country,6.856061,181
gdp_country,6.212121,164


### Removing unnecessary columns

In [74]:
# drop some columns in data because big missing value and some columns not important
df.drop(columns=["lastName","firstName","state","residenceStateRegion","latitude_country","longitude_country"],inplace=True)

In [75]:
df.head()

Unnamed: 0,rank,finalWorth,personName,age,country,city,source,industries,countryOfCitizenship,organization,...,birthDay,cpi_country,cpi_change_country,gdp_country,gross_tertiary_education_enrollment,gross_primary_education_enrollment_country,life_expectancy_country,tax_revenue_country_country,total_tax_rate_country,population_country
0,1,211000,Bernard Arnault & family,74,France,Paris,LVMH,Fashion & Retail,France,LVMH Moët Hennessy Louis Vuitton,...,5,110.05,1.1,"$2,715,518,274,227",65.6,102.5,82.5,24.2,60.7,67059887.0
1,2,180000,Elon Musk,51,United States,Austin,"Tesla, SpaceX",Automotive,United States,Tesla,...,28,117.24,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,328239523.0
2,3,114000,Jeff Bezos,59,United States,Medina,Amazon,Technology,United States,Amazon,...,12,117.24,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,328239523.0
3,4,107000,Larry Ellison,78,United States,Lanai,Oracle,Technology,United States,Oracle,...,17,117.24,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,328239523.0
4,5,106000,Warren Buffett,92,United States,Omaha,Berkshire Hathaway,Finance & Investments,United States,Berkshire Hathaway Inc. (Cl A),...,30,117.24,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,328239523.0


#### drop to nan values

In [77]:
# drop to nan values because make analysis for each columns
df.dropna(inplace=True)
df.isnull().sum()

rank                                          0
finalWorth                                    0
personName                                    0
age                                           0
country                                       0
city                                          0
source                                        0
industries                                    0
countryOfCitizenship                          0
organization                                  0
selfMade                                      0
status                                        0
gender                                        0
birthDate                                     0
title                                         0
date                                          0
birthYear                                     0
birthMonth                                    0
birthDay                                      0
cpi_country                                   0
cpi_change_country                      

#### delete $ from gdp_country colume and convert to float data type

In [78]:
df['gdp_country'] = df['gdp_country'].replace('[\$,]', '', regex=True).astype(float)

#### check the info for data

In [79]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2372 entries, 0 to 2639
Data columns (total 28 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   rank                                        2372 non-null   int64  
 1   finalWorth                                  2372 non-null   int64  
 2   personName                                  2372 non-null   object 
 3   age                                         2372 non-null   int32  
 4   country                                     2372 non-null   object 
 5   city                                        2372 non-null   object 
 6   source                                      2372 non-null   object 
 7   industries                                  2372 non-null   object 
 8   countryOfCitizenship                        2372 non-null   object 
 9   organization                                2372 non-null   object 
 10  selfMade         

#### update age for all billionaires

In [80]:
current_year = datetime.now().year
df["age"] = current_year - df["birthYear"]

#### update "status" column by "self made "column

In [82]:
def update_status(x):
    if x == True:
        return "Self-made"
    else:
        return "inherited"

df["status"] = df["selfMade"].apply(update_status)

#### describe the numerical columns

In [246]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
rank,2372.0,1275.358,743.2761,1.0,636.0,1272.0,1905.0,2540.0
finalWorth,2372.0,4780.481,10281.55,1000.0,1500.0,2400.0,4300.0,211000.0
age,2372.0,65.67159,13.12547,19.0,57.0,66.0,75.0,102.0
birthYear,2372.0,1957.328,13.12547,1921.0,1948.0,1957.0,1966.0,2004.0
birthMonth,2372.0,5.769815,3.715886,1.0,2.0,6.0,9.0,12.0
birthDay,2372.0,12.27192,9.904044,1.0,1.0,11.0,21.0,31.0
cpi_country,2372.0,128.0152,26.68692,99.55,117.24,117.24,125.08,288.57
cpi_change_country,2372.0,4.421459,3.508344,-1.9,1.7,2.9,7.5,53.5
gdp_country,2372.0,11782490000000.0,9565485000000.0,13672800000.0,1736426000000.0,19910000000000.0,21427700000000.0,21427700000000.0
gross_tertiary_education_enrollment,2372.0,67.45556,21.44938,4.0,50.6,67.0,88.2,136.6


#### describe the category columns

In [247]:
df.describe(include="O").T

Unnamed: 0,count,unique,top,freq
category,2372,18,Finance & Investments,336
personName,2372,2370,Li Li,2
country,2372,63,United States,750
city,2372,708,New York,99
source,2372,858,Real estate,120
industries,2372,18,Finance & Investments,336
countryOfCitizenship,2372,71,United States,730
organization,2372,290,unknown,2052
status,2372,2,Self-made,1669
gender,2372,2,M,2092


# Visualizations

## univariate analysis

#### the distribution of age by violin chart

In [84]:
px.violin(df,"age",title="the distribution of age")

#### the distribution of age by box chart

In [85]:
px.box(df,"age",title="the distribution of age")

#### the distribution of age by histogram

In [93]:
px.histogram(df,x = "age" , title="the distribution of age",text_auto=True,color_discrete_sequence=["green"])

#### count of billionaires in each country by histogram 

In [95]:
px.histogram(df ,x="country",text_auto=True,color_discrete_sequence=["red"])

#### Count of billionaires in each city in United States

In [252]:
df["count"] = 1
temp = df[df["country"] == "United States"].groupby("city")["count"].sum().reset_index().sort_values(by="count", ascending=False)
px.bar(temp, x="city", y="count",title="Count of billionaires in each city in United States")

#### distribution The industries associated with the billionaire's business interests

In [97]:
px.histogram(df,x="industries",text_auto=True,color="industries")

#### Percentage of male and female billionaires worldwide

In [104]:
px.pie(df,"gender",title="Percentage of male and female billionaires worldwide",color_discrete_sequence=['black', 'green', 'limegreen', 'forestgreen'])

#### Percentage of industries billionaires worldwide

In [255]:
#Percentage of industries billionaires worldwide
px.pie(df,"industries",title="Percentage of industries billionaires worldwide")

#### Percentage of selfMade billionaires worldwide

In [105]:
px.pie(df,"selfMade",title="Percentage of selfMade billionaires worldwide")

#### Percentage of status billionaires worldwide

In [109]:
px.pie(df,"status",title="Percentage of status billionaires worldwide",color_discrete_sequence=['red', 'green', 'limegreen', 'forestgreen'])

#### the distribution of cpi_country by histogram

In [110]:
# 1- "cpi_country"

px.histogram(df , x= "cpi_country", title="the distribution of cpi_country by histogram",text_auto=True)

In [113]:
#Outlier Handling in cpi_change_country
px.box(df,"cpi_change_country",title="check the Outlier" ,color_discrete_sequence=["green"] )

In [117]:
#Outlier Handling
idx_drop = df.loc[(df.cpi_change_country > 15.2), :].index
idx_drop
df.drop(idx_drop, inplace=True)

In [120]:
px.box(df,"cpi_change_country",title="Outlier Handling",color_discrete_sequence=["green"])

#### check the Outlier in gdp_country by box 

In [115]:
px.box(df,"gdp_country",title="check the Outlier",color_discrete_sequence=["green"])

#### the distribution of gdp_country by histogram

In [121]:
px.histogram(df,"gdp_country",title="the distribution gdp_country",text_auto=True,color_discrete_sequence=["black"])

## bivariant

#### the correlation matrix of ['cpi_country','cpi_change_country', 'gdp_country']

In [265]:
cols =['cpi_country','cpi_change_country', 'gdp_country']
px.imshow(df[cols].corr(),text_auto=True,title= "the correlation matrix of ['cpi_country','cpi_change_country', 'gdp_country']")

#### A scatter plot to analyze the relationship between total tax rate in a country and the wealth of individuals from that country.

In [181]:
px.scatter(df,x="total_tax_rate_country",y="finalWorth",color_discrete_sequence=["green"],title="A scatter plot to analyze the relationship between total tax rate in a country and the wealth of individuals from that country.")

#### A scatter plot to explore the relationship between tertiary education enrollment and average wealth in a country.

In [180]:
#A scatter plot to explore the relationship between tertiary education enrollment and average wealth in a country.
px.scatter(df,x="gross_tertiary_education_enrollment",y="finalWorth",color_discrete_sequence=["red"],title="A scatter plot to explore the relationship between tertiary education enrollment and average wealth in a country.")

#### A heatmap to identify any correlations between numerical variables like age, finalWorth, gdp_country

In [268]:
cols =['age','finalWorth', 'gdp_country']
px.imshow(df[cols].corr(),text_auto=True,title= "the correlation matrix of ['age','finalWorth', 'gdp_country']")

#### the distribution of age for male and female

In [128]:
px.histogram(df,"age",color="gender",title="the distribution of age for male and female",text_auto=True)

#### the distribution of industries for male and female

In [130]:
px.histogram(df,"gender",color="industries",title="the distribution of industries for male and female",text_auto=True)

#### the distribution of country for male and female

In [131]:
px.histogram(df,"country",color="gender",title="the distribution of country for male and female",text_auto=True)

## Analysis

### Q1 : The youngest billionaire

In [132]:
df[(df["age"] == df["age"].min())]

Unnamed: 0,rank,finalWorth,personName,age,country,city,source,industries,countryOfCitizenship,organization,...,birthDay,cpi_country,cpi_change_country,gdp_country,gross_tertiary_education_enrollment,gross_primary_education_enrollment_country,life_expectancy_country,tax_revenue_country_country,total_tax_rate_country,population_country
823,818,3500,Clemente Del Vecchio,19,Italy,Milan,Eyeglases,Fashion & Retail,Italy,unknown,...,6,110.62,0.6,2001244000000.0,61.9,101.9,82.9,24.3,59.1,60297396.0


### Q2 : The youngest billionaire in America

In [133]:
temp = df[df["country"]=="United States"].reset_index()

In [134]:
temp[temp["age"] == temp["age"].min()]

Unnamed: 0,index,rank,finalWorth,personName,age,country,city,source,industries,countryOfCitizenship,...,birthDay,cpi_country,cpi_change_country,gdp_country,gross_tertiary_education_enrollment,gross_primary_education_enrollment_country,life_expectancy_country,tax_revenue_country_country,total_tax_rate_country,population_country
716,2414,2405,1100,Ryan Breslow,29,United States,Miami,E-commerce software,Technology,United States,...,20,117.24,7.5,21427700000000.0,88.2,101.8,78.5,9.6,36.6,328239523.0


### Q3 : Percentage of male and female billionaires worldwide

In [135]:
df["gender"].value_counts()*100/len(df)

gender
M    88.180667
F    11.819333
Name: count, dtype: float64

### Q4 : the average of population_country for all billionaires 

In [136]:
df["population_country"].mean()

515245358.66610384

### Q5 : Self-made billionaire women

In [137]:
df[(df["gender"] == "F")&(df["selfMade"]==True)]

Unnamed: 0,rank,finalWorth,personName,age,country,city,source,industries,countryOfCitizenship,organization,...,birthDay,cpi_country,cpi_change_country,gdp_country,gross_tertiary_education_enrollment,gross_primary_education_enrollment_country,life_expectancy_country,tax_revenue_country_country,total_tax_rate_country,population_country
43,43,31200,Rafaela Aponte-Diamant,78,Switzerland,Geneva,Shipping,Logistics,Switzerland,unknown,...,26,99.55,0.4,7.030824e+11,59.6,105.2,83.6,10.1,28.8,8.574832e+06
130,130,13700,Diane Hendricks,76,United States,Afton,Building supplies,Construction & Engineering,United States,ABC Supply,...,2,117.24,7.5,2.142770e+13,88.2,101.8,78.5,9.6,36.6,3.282395e+08
161,161,10900,Judy Love & family,86,United States,Oklahoma City,Gas stations,Fashion & Retail,United States,Love's Travel Stops and Country Stores,...,17,117.24,7.5,2.142770e+13,88.2,101.8,78.5,9.6,36.6,3.282395e+08
180,179,10100,Wu Yajun,59,China,Beijing,Real estate,Real Estate,China,unknown,...,1,125.08,2.9,1.991000e+13,50.6,100.2,77.0,9.4,59.2,1.397715e+09
222,223,8800,Tatyana Bakalchuk,48,Russia,Moscow region,Ecommerce,Fashion & Retail,Russia,unknown,...,16,180.75,4.5,1.699877e+12,81.9,102.6,72.7,11.4,46.2,1.443735e+08
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2564,2540,1000,Feng Yuxia,59,China,Beijing,Pharmaceuticals,Healthcare,China,unknown,...,6,125.08,2.9,1.991000e+13,50.6,100.2,77.0,9.4,59.2,1.397715e+09
2572,2540,1000,Hang Hong,55,China,Wuxi,Machinery,Manufacturing,China,unknown,...,1,125.08,2.9,1.991000e+13,50.6,100.2,77.0,9.4,59.2,1.397715e+09
2579,2540,1000,Hedda im Brahm-Droege,68,Germany,Dusseldorf,Investments,Finance & Investments,Germany,unknown,...,1,112.85,1.4,3.845630e+12,70.2,104.0,80.9,11.5,48.8,8.313280e+07
2601,2540,1000,Ma Xiuhui,52,China,Shanghai,LED lighting,Manufacturing,China,unknown,...,7,125.08,2.9,1.991000e+13,50.6,100.2,77.0,9.4,59.2,1.397715e+09


### Q6 : the youngest billionaire women

In [138]:
temp = df[(df["gender"] == "F")].reset_index()

In [139]:
temp[temp["age"] == temp["age"].min()]

Unnamed: 0,index,rank,finalWorth,personName,age,country,city,source,industries,countryOfCitizenship,...,birthDay,cpi_country,cpi_change_country,gdp_country,gross_tertiary_education_enrollment,gross_primary_education_enrollment_country,life_expectancy_country,tax_revenue_country_country,total_tax_rate_country,population_country
202,1908,1905,1500,Alexandra Andresen,27,Norway,Oslo,Investments,Diversified,Norway,...,23,120.27,2.2,403336400000.0,82.0,100.3,82.8,23.9,36.2,5347896.0


### Q7 : the max of gdp_country

In [140]:
df["gdp_country"].max()

21427700000000.0

### Q8 : the average of gdp_country

In [141]:
df["gdp_country"].mean()

11796842498108.518

### Q9 : Does it make a difference if he is self-made but his gender is male?

In [145]:
df["count"] = 1

In [176]:
df.groupby(["selfMade","gender"])["count"].sum().reset_index()
#We conclude that he does not have to be male in order to be self-made

Unnamed: 0,selfMade,gender,count
0,False,F,198
1,False,M,505
2,True,F,82
3,True,M,1584


In [179]:
px.histogram(df.groupby(["selfMade","gender"])["count"].sum().reset_index(),x="gender",y="count",color="selfMade",text_auto=True)

### Q10 : Top 3 Countries Contribution to Each Industry

In [156]:
industry_counts = df.groupby(['industries', 'country']).size().reset_index(name='Count')
industry_counts = industry_counts.sort_values(['industries', 'Count'], ascending=[True, False])
top_countries_per_industry = industry_counts.groupby('industries').head(3)
top_countries_per_industry

Unnamed: 0,industries,country,Count
1,Automotive,China,20
4,Automotive,India,13
12,Automotive,United States,12
30,Construction & Engineering,United States,5
16,Construction & Engineering,China,4
20,Construction & Engineering,Italy,4
42,Diversified,India,22
36,Diversified,China,17
65,Diversified,Turkey,13
87,Energy,United States,35


In [159]:
px.histogram(top_countries_per_industry , x="country" , y="Count" ,color="industries",text_auto=True,title="Top 3 Countries Contribution to Each Industry")

### Q11 : Top 10 Countries with Most Billionaires

In [166]:
top_10 = df.groupby("country")["finalWorth"].sum().reset_index().sort_values('finalWorth', ascending= False).head(10)
top_10

Unnamed: 0,country,finalWorth
59,United States,4566400
10,China,1754900
22,India,628700
17,France,497400
19,Germany,417800
52,Switzerland,390900
58,United Kingdom,364000
45,Russia,351000
2,Australia,173500
8,Canada,169500


In [169]:
px.histogram(top_10,y="country",x="finalWorth",text_auto=True,color="country",title="Top 10 Countries with Most Billionaires")

### Q12 : Total Wealth by Industry

In [170]:
sum_wealth = df.groupby("industries")["finalWorth"].sum().reset_index().sort_values('finalWorth', ascending= False)
sum_wealth

Unnamed: 0,industries,finalWorth
16,Technology,1805900
4,Fashion & Retail,1606900
5,Finance & Investments,1495700
10,Manufacturing,936500
6,Food & Beverage,900200
2,Diversified,794600
8,Healthcare,608400
0,Automotive,513700
13,Real Estate,488800
3,Energy,433000


In [173]:
px.histogram(sum_wealth,y="industries",x="finalWorth",text_auto=True,color="industries",title="Total Wealth by Industry")

### Q13 : Average Wealth by Industry

In [174]:
avr_wealth = df.groupby("industries")["finalWorth"].mean().reset_index().sort_values('finalWorth', ascending= False)
avr_wealth

Unnamed: 0,industries,finalWorth
0,Automotive,7667.164179
9,Logistics,6929.032258
17,Telecom,6789.285714
4,Fashion & Retail,6640.082645
16,Technology,6227.241379
12,Metals & Mining,6198.529412
11,Media & Entertainment,4894.047619
6,Food & Beverage,4865.945946
3,Energy,4706.521739
7,Gambling & Casinos,4627.272727


In [175]:
px.histogram(avr_wealth,y="industries",x="finalWorth",text_auto=True,color="industries",title="Average Wealth by Industry")

In [162]:
df.columns

Index(['rank', 'finalWorth', 'personName', 'age', 'country', 'city', 'source',
       'industries', 'countryOfCitizenship', 'organization', 'selfMade',
       'status', 'gender', 'birthDate', 'title', 'date', 'birthYear',
       'birthMonth', 'birthDay', 'cpi_country', 'cpi_change_country',
       'gdp_country', 'gross_tertiary_education_enrollment',
       'gross_primary_education_enrollment_country', 'life_expectancy_country',
       'tax_revenue_country_country', 'total_tax_rate_country',
       'population_country', 'count'],
      dtype='object')