# Overview
This notebook represents my submission for the task on "[The World's Biggest Companies 2021](https://www.kaggle.com/berkayalan/the-worlds-biggest-companies-2021)" dataset. The description of the task is simply:
> Create an EDA and show breakdowns of companies

The [original source](https://www.forbes.com/lists/global2000/#242cc9bb5ac0) of the dataset is from Forbes, which ranks each of the companies according to the following equally-weighted metrics:
- Sales
- Profit
- Assets
- Market Value

Let's dive in!

# Imports & Settings

In [None]:
# Standard imports
import numpy as np
import pandas as pd

# Visualization tools
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
sns.set_theme(style='darkgrid')

# Preprocessing
## Previewing Data

In [None]:
df = pd.read_csv('../input/the-worlds-biggest-companies-2021/The Worlds Biggest Public Companies.csv', delimiter=';')
df.tail()

In [None]:
df.info()

In [None]:
df.isna().sum()

## Setting Index to Rank

In [None]:
df.set_index('Rank', inplace=True)
df.head()

## Converting to Floats
The `Sales`, `Profit`, `Assets`, and `Market Value` columns are all currently stored as strings and need to be converted to floats. I will keep the units in billions of dollars to ease the visualization process (axis formatting can be a pain).

In [None]:
def converter(x):
    converted = float(x[1:-2].replace(',', ''))
    if x.endswith('M'):
        converted /= 1000
    return converted

In [None]:
for col in df.columns[2:]:
    df[col] = df[col].apply(converter)

In [None]:
df.head()

In [None]:
df.info(0)

# EDA
## Number of Companies per Country

In [None]:
df_by_country = df[['Country', 'Name']].groupby('Country').count()
df_by_country.rename(columns={'Name': 'Count'}, inplace=True)
df_by_country.sort_values('Count', ascending=False, inplace=True)

fig = plt.figure(figsize=(8, 12))
sns.barplot(data=df_by_country, 
            x='Count', 
            y=df_by_country.index,
            orient='h',
            palette='crest')
plt.title('Number of Companies in Top 500', fontweight='bold')
plt.tight_layout();

## Average Metrics by Country

In [None]:
def plot_avg_metric(metric, ax=None):
    '''
    Description:
    ------------
    Plots the average (mean) value of a given metric for each country.
    
    Parameters:
    -----------
    metric : str
        The metric to plot. Can be one of 'Sales', 'Profit', 'Assets'
        or 'Market Value'.
    
    ax : matplotlib axis (optional)
        The axis to plot on. Useful when creating a figure with 
        various subplots.
    '''
    df_grouped = df.groupby('Country').mean()
    df_grouped.sort_values(metric, ascending=False, inplace=True)
    sns.barplot(data=df_grouped,
                x=metric,
                y=df_grouped.index,
                orient='h',
                palette='crest',
                ax=ax)
    ax.set_title(f'Average {metric} by Country', fontweight='bold')
    ax.set_xlabel(f'Average {metric} ($ billions)')
    plt.tight_layout(pad=1.50)

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(16, 16))
axes = np.reshape(axes, -1)

metrics = df.columns[2:]
for i, metric in enumerate(metrics):
    plot_avg_metric(metric, ax=axes[i])

## Market Value Ratios by Country

In [None]:
def market_value_ratio(metric, aggregator):
    '''
    Description:
    ------------
    Plots the ratio of the market value to the given metric for each 
    country based on the specific aggregator method.
    
    Parameters:
    -----------
    metric : str
        The metric to plot. Can be one of 'Sales', 'Profit', 'Assets'
        or 'Market Value'.
        
    aggregator : str
        The aggregation method to be used ('mean', 'median', etc.).
    '''
    df_ratio = df.copy()
    df_ratio[f'MVto{metric}'] = df_ratio['Market Value'] / df_ratio[metric]
    df_ratio = df_ratio.groupby('Country').agg(aggregator)
    df_ratio.sort_values(f'MVto{metric}', ascending=False, inplace=True)
    sns.barplot(data=df_ratio,
                x=f'MVto{metric}',
                y=df_ratio.index,
                orient='h',
                palette='crest')
    plt.title(f'{aggregator.title()} Market Value to {metric} Ratio', fontweight='bold')
    plt.xlabel('Ratio')

In [None]:
fig = plt.figure(figsize=(8, 10))
market_value_ratio('Sales', 'mean')

In [None]:
fig = plt.figure(figsize=(8, 10))
market_value_ratio('Profit', 'median')

# Future Work
There are several ways that this data can be extended to produce even more interesting visuals. Some ideas include:
- Historical data to observe how different metrics and countries have changed over time
- Additional business metrics such as debt or cash flows
- Industry information for each of the companies to calculate specific industry-level metrics