# Data Extraction
* Here I'll axtract the data from [forbes](https://www.forbes.com/billionaires/page-data/index/page-data.json).
* Then prepared the data and save it into a CSV file that will be used for Data Analysis.

In [1]:
#import libraries
import requests
import pandas as pd

### Fetch the data from the Forbes website.
* No data scraping is required because forbes provides the data directly in json.

In [2]:
def fetch_data():
    """
        Fetches the data from from and extracts the billionaires list
        as a list of dictionaries with each dictionary representing an individual billionaire and their info
    """
    url = "https://www.forbes.com/billionaires/page-data/index/page-data.json"
    content = requests.get(url)
    data = content.json()
    raw_data = data["result"]["pageContext"]["tableData"]
    return raw_data

In [3]:
def get_cols(col):
    """
        Fetches the values for each field for all bilionaires
        Return a list of values for each billionaire attribute.
    """
    col_values = []
    for person in fetch_data():
        try:
            raw_attribute = person[col]
        except KeyError:
            col_values.append(None)
        else:
            col_values.append(raw_attribute)
    return col_values

In [4]:
def create_dataframe():
    """
        Generates the Dataframe for the values returned by the get_cols function.
    """
    attributes = [
        "rank", "personName", "age", "country", "month", "year", 
         "netWorth", "source", "industries",
        "countryOfCitizenship","selfMade",
        "title", "city", "gender"
    ]
    data_dictionary = {}
    for attribute in attributes:
        dt = get_cols(attribute)
        data_dictionary[attribute] = dt
    df = pd.DataFrame(data_dictionary)
    return df

In [5]:
df = create_dataframe()
df.head()


Unnamed: 0,rank,personName,age,country,month,year,netWorth,source,industries,countryOfCitizenship,selfMade,title,city,gender
0,1,Elon Musk,50,United States,4,2022,$219 B,"Tesla, SpaceX",Automotive,United States,True,CEO,Austin,M
1,2,Jeff Bezos,58,United States,4,2022,$171 B,Amazon,Technology,United States,True,Chairman and Founder,Seattle,M
2,3,Bernard Arnault & family,73,France,4,2022,$158 B,LVMH,Fashion & Retail,France,False,Chairman and CEO,Paris,M
3,4,Bill Gates,66,United States,4,2022,$129 B,Microsoft,Technology,United States,True,Cofounder,Medina,M
4,5,Warren Buffett,91,United States,4,2022,$118 B,Berkshire Hathaway,Finance & Investments,United States,True,CEO,Omaha,M


In [6]:
df = df.rename(columns=(
    {
        "personName": "Name", "month": "Month", "year": "Year", "rank": "Rank",
        "age": "Age", "netWorth": "Networth", "source": "Source", "industries": "Industries",
        "countryOfCitizenship": "CountryOfCitizenship", "selfMade": "Selfmade", "country": "Country",
        "title": "Title", "city": "City", "gender": "Gender"
    }
))

### Save the Dataframe into a CSV file

In [7]:
df.to_csv("../Datasets/forbes_2022_billionaires.csv", index=False)