# Billionaires - An Exploration  
by Nelson Truong  
  
Link to original dataset: https://corgis-edu.github.io/corgis/python/billionaires/  
  
This dataset mainly comprises of information about the billionaires from Forbes World's Billionaires from 1996 to 2014. The Peterson Institute for International Economics built off of these lists and added information about these billionaires (self made vs inherited wealth). In this Python notebook, we will be exploring this dataset and seeing exactly what it has to offer.

In [2]:
# First, we import the billionaires module that conveniently provides a get_billionaire() function
# which returns every single billionaire in the data base. We will store this in the billionaire variable. 
import billionaires
billionaire = billionaires.get_billionaire()

In [12]:
# Let's try printing out the first entry
print(billionaire[0])

{'name': 'Bill Gates', 'rank': 1, 'year': 1996, 'company': {'founded': 1975, 'name': 'Microsoft', 'relationship': 'founder', 'sector': ' Software', 'type': 'new'}, 'demographics': {'age': 40, 'gender': 'male'}, 'location': {'citizenship': 'United States', 'country code': 'USA', 'gdp': 8100000000000.0, 'region': 'North America'}, 'wealth': {'type': 'founder non-finance', 'worth in billions': 18.5, 'how': {'category': 'New Sectors', 'from emerging': True, 'industry': 'Technology-Computer', 'inherited': 'not inherited', 'was founder': True, 'was political': True}}}


# Format of the dataset

From the first entry, we can see that the dataset includes a lot of data about each billionaire. The first entry describes Bill Gates by giving information about the year he was on the list, the rank he had that year, the company he founded, his demographics, his location, his type of wealth, how he got that wealth, etc.

In [7]:
# From the entry on Bill Gates, we can see that certain billionaires have certain defining features.
# We can try to loop through the dataset and find each billionaire that was Rank 1 on the list.

# Declaring an empty list to contain all the entries we want.
top = []

# For loop to find the entries that we want by comparing the value of 'rank'
for x in billionaire:
    if x['rank'] == 1:
        top.append(x)

# Print those entries out
print(top)

[{'name': 'Bill Gates', 'rank': 1, 'year': 1996, 'company': {'founded': 1975, 'name': 'Microsoft', 'relationship': 'founder', 'sector': ' Software', 'type': 'new'}, 'demographics': {'age': 40, 'gender': 'male'}, 'location': {'citizenship': 'United States', 'country code': 'USA', 'gdp': 8100000000000.0, 'region': 'North America'}, 'wealth': {'type': 'founder non-finance', 'worth in billions': 18.5, 'how': {'category': 'New Sectors', 'from emerging': True, 'industry': 'Technology-Computer', 'inherited': 'not inherited', 'was founder': True, 'was political': True}}}, {'name': 'Bill Gates', 'rank': 1, 'year': 2001, 'company': {'founded': 1975, 'name': 'Microsoft', 'relationship': 'founder', 'sector': ' Software', 'type': 'new'}, 'demographics': {'age': 45, 'gender': 'male'}, 'location': {'citizenship': 'United States', 'country code': 'USA', 'gdp': 10600000000000.0, 'region': 'North America'}, 'wealth': {'type': 'founder non-finance', 'worth in billions': 58.7, 'how': {'category': 'New Sec

# Gaps in the data
From what it seems, there aren't any individuals other than Bill Gates that were rank 1 on the list (made the most money that year) from 1996 to 2014. However, it also seems as though the dataset does not include entries for individuals every year. We can do more digging about this by checking specifically on an entry that we didn't get which was 2005. What happened to the entries for that year?

In [10]:
# Retrieve the entries corresponding to 'year' == 2005.
oh_five = []
for x in billionaire:
    if x['rank'] == 2005:
        top.append(x)
        
# Print those entries out
print(oh_five)

[]


After combing the data, there seems to be no entries at all for the year 2005. It seems as though this data set only comprises of entries from the years 1996, 2001, and 2014. It would have been interesting to have data for each year but there might have been a lot of redundancy since billionaires probably don't experience that much fluctuation in their ranks. That can be seen by the fact that Bill Gates was at the top for 1996, 2001, and 2014. A simple Google search will show you that he was rank 1 in 2005 as well. However, these little gaps in the data have to be accounted for if we are to do any data analysis with this data set.

In [16]:
# Finally, what I really want to know is the name of the billionaires from 2014. I can make a function that will spit out the billionaires
# and their rank based upon the year that I put in.

def uno(year):
    uno_list = []
    for x in billionaire:
        if x['year'] == year:
            print(x['name'], x['rank'])
            
uno(2014)

Bill Gates 1
Carlos Slim Helu 2
Amancio Ortega 3
Warren Buffett 4
Larry Ellison 5
Charles Koch 6
David Koch 6
Sheldon Adelson 8
Christy Walton 9
Jim Walton 10
Liliane Bettencourt 11
Stefan Persson 12
Alice Walton 13
S. Robson Walton 14
Bernard Arnault 15
Michael Bloomberg 16
Larry Page 17
Jeff Bezos 18
Sergey Brin 19
Li Ka-shing 20
Mark Zuckerberg 21
Michele Ferrero 22
Aliko Dangote 23
Karl Albrecht 23
Carl Icahn 25
George Soros 26
David Thomson 27
Lui Che Woo 28
Dieter Schwarz 29
Prince Alwaleed Bin Talal Alsaud 30
Forrest Mars, Jr. 31
Jacqueline Mars 31
John Mars 31
Jorge Paulo Lemann 34
Lee Shau Kee 35
Steve Ballmer 36
Theo Albrecht, Jr. 36
Leonardo Del Vecchio 38
Len Blavatnik 39
Alisher Usmanov 40
Mukesh Ambani 40
Masayoshi Son 42
Michael Otto 42
Phil Knight 42
Tadashi Yanai 45
Gina Rinehart 46
Mikhail Fridman 47
Michael Dell 48
Susanne Klatten 49
Abigail Johnson 50
Viktor Vekselberg 51
Lakshmi Mittal 52
Vladimir Lisin 53
Cheng Yu-tung 54
Joseph Safra 55
Paul Allen 56
Leonid Mikhe

In [None]:
# I can also take the amount of billionaires on the list from 2014 and compare it to the number of billionaires on the list from 1996.

def one(year):
    one_list = []
    for x in billionaire:
        if x['year'] == year:
            one_list.append(x)
    return one_list