This document serves as a primer for an extensive dataset that offers an in-depth exploration of the realm of billionaires and their fortunes. Encompassing a diverse range of attributes, this dataset provides significant understandings into the intricate aspects of wealth distribution, sectors of business involvement, and the individual particulars of global billionaires. Discover the standings, wealth, industries, and demographic information of billionaires worldwide. Whether you are an economist, a data expert, or just someone with a keen interest in the planet's most affluent individuals, this dataset encourages you to immerse yourself in the fascinating realm of billionaires.

In [6]:
import pandas as pd
df = pd.read_csv("Billionaires Statistics Dataset.csv")
df

Unnamed: 0,rank,finalWorth,category,personName,age,country,city,source,industries,countryOfCitizenship,...,cpi_change_country,gdp_country,gross_tertiary_education_enrollment,gross_primary_education_enrollment_country,life_expectancy_country,tax_revenue_country_country,total_tax_rate_country,population_country,latitude_country,longitude_country
0,1,211000,Fashion & Retail,Bernard Arnault & family,74.0,France,Paris,LVMH,Fashion & Retail,France,...,1.1,"$2,715,518,274,227",65.6,102.5,82.5,24.2,60.7,6.705989e+07,46.227638,2.213749
1,2,180000,Automotive,Elon Musk,51.0,United States,Austin,"Tesla, SpaceX",Automotive,United States,...,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,3.282395e+08,37.090240,-95.712891
2,3,114000,Technology,Jeff Bezos,59.0,United States,Medina,Amazon,Technology,United States,...,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,3.282395e+08,37.090240,-95.712891
3,4,107000,Technology,Larry Ellison,78.0,United States,Lanai,Oracle,Technology,United States,...,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,3.282395e+08,37.090240,-95.712891
4,5,106000,Finance & Investments,Warren Buffett,92.0,United States,Omaha,Berkshire Hathaway,Finance & Investments,United States,...,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,3.282395e+08,37.090240,-95.712891
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2635,2540,1000,Healthcare,Yu Rong,51.0,China,Shanghai,Health clinics,Healthcare,China,...,2.9,"$19,910,000,000,000",50.6,100.2,77.0,9.4,59.2,1.397715e+09,35.861660,104.195397
2636,2540,1000,Food & Beverage,"Richard Yuengling, Jr.",80.0,United States,Pottsville,Beer,Food & Beverage,United States,...,7.5,"$21,427,700,000,000",88.2,101.8,78.5,9.6,36.6,3.282395e+08,37.090240,-95.712891
2637,2540,1000,Manufacturing,Zhang Gongyun,60.0,China,Gaomi,Tyre manufacturing machinery,Manufacturing,China,...,2.9,"$19,910,000,000,000",50.6,100.2,77.0,9.4,59.2,1.397715e+09,35.861660,104.195397
2638,2540,1000,Real Estate,Zhang Guiping & family,71.0,China,Nanjing,Real estate,Real Estate,China,...,2.9,"$19,910,000,000,000",50.6,100.2,77.0,9.4,59.2,1.397715e+09,35.861660,104.195397


In [31]:
import plotly.express as px


top_countries = df['country'].value_counts().head(15)
barplot = px.bar(top_countries, x=top_countries.index, y=top_countries.values,
              title="Top 10 Countries with Most Billionaires",
              color_discrete_sequence=px.colors.qualitative.Set3,
              text=top_countries.values)

barplot.update_xaxes(title="Country")
barplot.update_yaxes(title="Number of Billionaires")


barplot.update_traces(texttemplate='%{text}', textposition='outside')

barplot.show()


In [32]:
import plotly.express as px

gender_counts = df['gender'].value_counts()

genderdis = px.pie(gender_counts, labels=gender_counts.index, values=gender_counts.values,
              title="Gender Distribution of Billionaires",
              color_discrete_sequence=px.colors.qualitative.Plotly)
genderdis.update_traces(marker=dict(line=dict(color='white', width=2))
                 , textinfo='percent+label+value')

genderdis.update_layout(showlegend=False)
genderdis.update_traces(hole=0.4)

gender_labels = gender_counts.index
genderdis.add_annotation(
    text="<b>Gender</b>",
    x=0.5,
    y=0.5,
    showarrow=False,
    font=dict(size=10),
)
genderdis.show()



Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version.  Convert to a numpy array before indexing instead.



In [27]:
import plotly.express as px

whiskerplot = px.box(df, x="gender", y="age", title="Age Distribution of Billionaires by Gender",
              color_discrete_sequence=['#FFA15A', '#00B2E2'])
whiskerplot.update_xaxes(title="Gender")
whiskerplot.update_yaxes(title="Age")

whiskerplot.show()



In [33]:
correlation = px.scatter_matrix(df, dimensions=["age", "finalWorth", "total_tax_rate_country"],
                         color="gender", title='Correlation Matrix')
correlation.update_traces(marker=dict(size=6, opacity=0.6))
correlation.update_layout(margin=dict(t=50, l=50, r=50, b=50))
correlation.show()

In [34]:
distribution = px.treemap(df, path=['industries'], values='finalWorth',
                  title='Wealth Distribution by Industry',
                  color_discrete_sequence=px.colors.qualitative.Set1)
distribution.update_traces(textinfo="label+percent entry")
distribution.update_layout(margin=dict(t=50, l=0, r=0, b=0))
distribution.show()