# Understanding the Dynamics of Extreme Wealth Accumulation

In an age of unprecedented wealth and inequality, this data literacy project seeks to analyze extreme wealth and its underlying factors over time. We aim to uncover trends, correlations, and forces driving the accumulation of immense fortunes.

## Key Objectives

- Identify industries and categories driving extreme wealth.
- Analyze how wealth accumulation has evolved over time.
- Investigate the causes and consequences of extreme wealth.

### Datasets

Our analysis is based on the Forbes annual billionaire dataset, widely accepted as a reliable source for studying extreme wealth distribution. We have chosen Forbes as our primary data source. Our analysis comprises three main datasets:

1. **Main Dataset:** We aim to understand the primary categorical factors affecting extreme wealth using the latest official billionaire dataset from 2023. This dataset is accessible through [Kaggle](add link here).

2. **Time Series Dataset:** We analyze Forbes data from the years 2000 to 2023, also accessed via [Kaggle](add link here).

3. **Latest Dataset:** Forbes released unofficial 2024 billionaire data on January 1st. Since it wasn't accessible yet, we scraped the data to compare it with the 2023 dataset. Additionally, we provide the scraping script for reference.

Please find details about each dataset and our analysis in the following sections.

### Table of Contents

- [Part 0 - Preparing the Data](#part-0-preparing-the-data)
- [Part 1 - Data Preprocessing](#part-1-data-preprocessing)
- [Part 2 - Visualization](#part-2-visualization)
- [Part 3 - Statistical Analysis](#part-3-statistical-analysis)
- [Part 4 - Factor Analysis](#part-4-factor-analysis)
- [Part 5 - Playground](#part-5-playground)


In [2]:
#Scrapping Newly Published 2024 Data
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import csv
import time


def scrape_forbes_billionaires(url):
    # Make sure the Chrome WebDriver is installed and in your PATH
    driver = webdriver.Chrome()

    driver.get(url)

    # Wait for and click the consent form button
    try:
        consent_button = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable(
                (By.CSS_SELECTOR, 'button.root__OblK1.root__eL_b5'))
        )
        consent_button.click()
    except Exception as e:
        print("Consent button not found or another error occurred:", e)

    # Wait for the scrollable element to be present
    scrollable_element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, 'scrolly-table'))
    )

    # Scroll to the end of the scrollable section
    last_height = driver.execute_script(
        "return arguments[0].scrollHeight", scrollable_element)
    while True:
        driver.execute_script(
            "arguments[0].scrollTo(0, arguments[0].scrollHeight);", scrollable_element)
        time.sleep(0.1)

        new_height = driver.execute_script(
            "return arguments[0].scrollHeight", scrollable_element)
        if new_height == last_height:
            break
        last_height = new_height

    soup = BeautifulSoup(driver.page_source, 'html.parser')

    # Extracting and storing the data
    billionaires_data = []
    for row in soup.find_all('tr', class_='base ng-scope'):
        # Extract relevant fields
        rank = row.find('td', class_='rank').get_text(strip=True)
        name = row.find('td', class_='name').get_text(strip=True)
        net_worth = row.find('td', class_='Net Worth').get_text(strip=True)
        age = row.find('td', class_='age').get_text(strip=True)
        source = row.find('td', class_='source').get_text(strip=True)
        country = row.find(
            'td', class_='Country/Territory').get_text(strip=True)

        billionaires_data.append([rank, name, net_worth, age, source, country])

    # Save data to a CSV file
    with open('forbes_billionaires_2024.csv', 'w', newline='', encoding='utf-8') as file:
        writer = csv.writer(file)
        writer.writerow(['Rank', 'Name', 'Net Worth',
                        'Age', 'Source', 'Country'])
        writer.writerows(billionaires_data)

    # You can manually close the browser after inspecting
    # driver.quit()


# URL of the Forbes Billionaire 2024 list
url = 'https://www.forbes.com/real-time-billionaires/'
#scrape_forbes_billionaires(url)


# Part 0 - Preparing The Data

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
from scipy.spatial import distance

In [4]:
# Main datasets
dataset2023 = pd.read_csv('billionaires_2023.csv')
dataset2024 = pd.read_csv('billionaires_2024.csv')
dataset_2000_2023 = pd.read_csv('billionaires_2000_2023.csv')

# Copy datasets for specific visualization and analysis
dataset_2023_gender = dataset2023.copy()
dataset2023_copy = dataset2023.copy()

In [5]:
dataset2024['Net Worth'] = dataset2024['Net Worth'].replace({'\$': '', ' B': ''}, regex=True).astype(float) * 1000
# Renaming columns for consistency
dataset2023_copy.rename(columns={'personName': 'Name', 'finalWorth': 'Net Worth 2023'}, inplace=True)
dataset2024.rename(columns={'Name': 'Name', 'Net Worth': 'Net Worth 2024'}, inplace=True)

# Merging the two datasets on the 'Name' column
merged_data = pd.merge(dataset2023_copy[['Name', 'Net Worth 2023']], dataset2024[['Name', 'Net Worth 2024', 'Source']], on='Name', how='inner')

# Calculating the percentage change in net worth
merged_data['Net Worth Change (%)'] = ((merged_data['Net Worth 2024'] - merged_data['Net Worth 2023']) / merged_data['Net Worth 2023']) * 100

# Getting the top 5 winners and losers
top_5_winners = merged_data.sort_values(by='Net Worth Change (%)', ascending=False).head(10)
top_5_losers = merged_data.sort_values(by='Net Worth Change (%)').head(10)

top_5_winners, top_5_losers

(                Name  Net Worth 2023  Net Worth 2024  \
 23  Prajogo Pangestu            5300         55900.0   
 15   Mark Zuckerberg           64400        122600.0   
 22      Gautam Adani           47200         77500.0   
 2         Jeff Bezos          114000        172200.0   
 11        Larry Page           79200        116100.0   
 13       Sergey Brin           76000        111400.0   
 1          Elon Musk          180000        251300.0   
 9      Steve Ballmer           80700        110900.0   
 21      Michael Dell           50100         68800.0   
 12    Amancio Ortega           77300         99100.0   
 
                          Source  Net Worth Change (%)  
 23       Petrochemicals, energy            954.716981  
 15                     Facebook             90.372671  
 22  Infrastructure, commodities             64.194915  
 2                        Amazon             51.052632  
 11                       Google             46.590909  
 13                       Goo