# About Dataset
The dataset contains information about Fortune 500 companies with following features:
company	rank	revenues	revenue_change	profits	assets	profit_change	ceo	industry	sector	previous_rank	country	hq_location	website	years_on_global_500_list	employees	total_stockholder_equity

# Problem statement:
The dataset needs to manumilated in order to deduce meaningful intelligence such as:
* Identification of missing values
* Finding of data elements within the dataset
* Filtering values to get the desired information
* Identifying highest number of employees working in a company
* Top employeers in each country

# Import Library

In [1]:
import pandas as pd

# Loading Dataset

Pandas loads datasets from `csv` files as `DataFrame`. A Pandas `DataFrame` is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

[Click here for more information](https://www.w3schools.com/python/pandas/pandas_dataframes.asp)

In [3]:
f500 = pd.read_csv("f500.csv",)

The head() method returns a specified number of rows, string from the top. The head() method returns the first 5 rows if a number is not specified.

In [4]:
f500.head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


Label indexing `.loc[]` `.iloc[] can be used

In [5]:
f500.loc[0]

company                                    Walmart
rank                                             1
revenues                                    485873
revenue_change                                 0.8
profits                                    13643.0
assets                                      198825
profit_change                                 -7.2
ceo                            C. Douglas McMillon
industry                     General Merchandisers
sector                                   Retailing
previous_rank                                    1
country                                        USA
hq_location                        Bentonville, AR
website                     http://www.walmart.com
years_on_global_500_list                        23
employees                                  2300000
total_stockholder_equity                     77798
Name: 0, dtype: object

In [6]:
f500.loc[1,"rank"]

2

The info() method prints information about the DataFrame. The information contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values).

In [7]:
f500.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 17 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   company                   500 non-null    object 
 1   rank                      500 non-null    int64  
 2   revenues                  500 non-null    int64  
 3   revenue_change            498 non-null    float64
 4   profits                   499 non-null    float64
 5   assets                    500 non-null    int64  
 6   profit_change             436 non-null    float64
 7   ceo                       500 non-null    object 
 8   industry                  500 non-null    object 
 9   sector                    500 non-null    object 
 10  previous_rank             500 non-null    int64  
 11  country                   500 non-null    object 
 12  hq_location               500 non-null    object 
 13  website                   500 non-null    object 
 14  years_on_g

The isnull() method returns a DataFrame object where all the values are replaced with a Boolean value True for NULL values, and otherwise False.

In [8]:
f500.isnull()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
496,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
497,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
498,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


Getting number of null values in each column

In [9]:
f500.isnull().sum(axis = 0)

company                      0
rank                         0
revenues                     0
revenue_change               2
profits                      1
assets                       0
profit_change               64
ceo                          0
industry                     0
sector                       0
previous_rank                0
country                      0
hq_location                  0
website                      0
years_on_global_500_list     0
employees                    0
total_stockholder_equity     0
dtype: int64

Boolean Indexing can be used to identify rows with null values. You can use a boolean index, a Series composed of True or False values that correspond to rows in the dataset. The True/False values describe which rows you want to select, namely only the True rows.

In [10]:
bol = f500["revenue_change"].isnull()
f500[bol]

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
90,Uniper,91,74407,,-3557.5,51541,,Klaus Schafer,Energy,Energy,0,Germany,"Dusseldorf, Germany",http://www.uniper.energy,1,12890,12889
180,Hewlett Packard Enterprise,181,50123,,3161.0,79679,,Margaret C. Whitman,Information Technology Services,Technology,0,USA,"Palo Alto, CA",http://www.hpe.com,1,195000,31448


In [11]:
f500[f500["profit_change"].isnull()]

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
5,Volkswagen,6,240264,1.5,5937.3,432116,,Matthias Muller,Motor Vehicles and Parts,Motor Vehicles & Parts,7,Germany,"Wolfsburg, Germany",http://www.volkswagen.com,23,626715,97753
7,Berkshire Hathaway,8,223604,6.1,24074.0,620854,,Warren E. Buffett,Insurance: Property and Casualty (Stock),Financials,11,USA,"Omaha, NE",http://www.berkshirehathaway.com,21,367700,283001
11,BP,12,186606,-17.4,115.0,263316,,Robert W. Dudley,Petroleum Refining,Energy,10,Britain,"London, Britain",http://www.bp.com,23,74500,95286
15,Glencore,16,173883,2.0,1379.0,124600,,Ivan Glasenberg,"Mining, Crude-Oil Production",Energy,14,Switzerland,"Baar, Switzerland",http://www.glencore.com,7,93123,44243
22,AmerisourceBergen,23,146850,8.0,1427.9,33656,,Steven H. Collis,Wholesalers: Health Care,Wholesalers,28,USA,"Chesterbrook, PA",http://www.amerisourcebergen.com,18,18500,2129
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
472,Altice,473,22953,42.2,-1722.5,84805,,Michel Combes,Telecommunications,Telecommunications,0,Netherlands,"Amsterdam, Netherlands",http://www.altice.net,1,49732,-2668
473,Onex,474,22943,3.8,-130.0,42913,,Gerald W. Schwartz,Semiconductors and Other Electronic Components,Technology,483,Canada,"Toronto, Ontario, Canada",http://www.onex.com,18,161000,-490
475,Shanxi Jincheng Anthracite Coal Mining Group,476,22875,-17.0,3.0,32954,,He Tiancai,"Mining, Crude-Oil Production",Energy,384,China,"Jincheng, China",http://www.jamg.cn,5,135691,2988
488,Sears Holdings,489,22138,-12.0,-2221.0,9362,,Edward S. Lampert,General Merchandisers,Retailing,425,USA,"Hoffman Estates, IL",http://www.searsholdings.com,23,140000,-3824


Boolean Indexing can also be used for filtering. Getting Fortune 500 companies who are in energy sector and based in Japan.

In [13]:
bol = (f500["sector"] == "Energy")  & (f500["country"] == "Japan") 
f500[bol]

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
126,JXTG Holdings,127,63629,1.8,1477.3,59767,,Yukio Uchida,Petroleum Refining,Energy,131,Japan,"Tokyo, Japan",http://www.hd.jxtg-group.co.jp,23,26247,12829
184,Tokyo Electric Power,185,49446,-2.2,1225.7,110202,4.5,Naomi Hirose,Utilities,Energy,177,Japan,"Tokyo, Japan",http://www.tepco.co.jp,23,42060,20905
388,Kansai Electric Power,389,27792,2.8,1299.3,61513,10.8,Shigeki Iwane,Utilities,Energy,391,Japan,"Osaka, Japan",http://www.kepco.co.jp,23,32666,11205
422,Idemitsu Kosan,423,25888,-1.0,813.7,23711,,Takashi Tsukioka,Petroleum Refining,Energy,412,Japan,"Tokyo, Japan",http://www.idemitsu.com,23,9139,3852
450,Chubu Electric Power,451,24028,1.1,1058.2,48580,-25.2,Satoru Katsuno,Utilities,Energy,448,Japan,"Nagoya, Japan",http://www.chuden.co.jp,23,30635,14695


Lets try to find out which companies have the most number of employees in Japan

In [14]:
japan = f500[f500["country"] == "Japan"]
j_empl = japan.sort_values("employees", ascending = False).iloc[0][0]
print('The company with most number of employees in Japan is', j_empl)

The company with most number of employees in Japan is Toyota Motor


Getting list of countries of Fortune 500 companies 

In [16]:
countries = f500["country"].unique()
for country in countries:
    print(country)

print('Total Countries are', len(countries))

USA
China
Japan
Germany
Netherlands
Britain
South Korea
Switzerland
France
Taiwan
Singapore
Italy
Russia
Spain
Brazil
Mexico
Luxembourg
India
Malaysia
Thailand
Australia
Belgium
Norway
Canada
Ireland
Indonesia
Denmark
Saudi Arabia
Sweden
Finland
Venezuela
Turkey
U.A.E
Israel
Total Countries are 34


Lets create a dictionary `{}` to find out the companies who have most number of employees in each country

In [17]:
top_employer = {}

for country in countries:
    df = f500[f500["country"] == country]
    com = df.sort_values("employees", ascending = False).iloc[0]["company"]
    top_employer[country] = com
    
print(top_employer)

{'USA': 'Walmart', 'China': 'China National Petroleum', 'Japan': 'Toyota Motor', 'Germany': 'Volkswagen', 'Netherlands': 'EXOR Group', 'Britain': 'Compass Group', 'South Korea': 'Samsung Electronics', 'Switzerland': 'Nestle', 'France': 'Sodexo', 'Taiwan': 'Hon Hai Precision Industry', 'Singapore': 'Flex', 'Italy': 'Poste Italiane', 'Russia': 'Gazprom', 'Spain': 'Banco Santander', 'Brazil': 'JBS', 'Mexico': 'America Movil', 'Luxembourg': 'ArcelorMittal', 'India': 'State Bank of India', 'Malaysia': 'Petronas', 'Thailand': 'PTT', 'Australia': 'Wesfarmers', 'Belgium': 'Anheuser-Busch InBev', 'Norway': 'Statoil', 'Canada': 'George Weston', 'Ireland': 'Accenture', 'Indonesia': 'Pertamina', 'Denmark': 'Maersk Group', 'Saudi Arabia': 'SABIC', 'Sweden': 'H & M Hennes & Mauritz', 'Finland': 'Nokia', 'Venezuela': 'Mercantil Servicios Financieros', 'Turkey': 'Koc Holding', 'U.A.E': 'Emirates Group', 'Israel': 'Teva Pharmaceutical Industries'}


Lets convert the dictionary into pandas series

In [18]:
top_empl = pd.Series(top_employer)

In [20]:
top_empl.sort_index()

Australia                            Wesfarmers
Belgium                    Anheuser-Busch InBev
Brazil                                      JBS
Britain                           Compass Group
Canada                            George Weston
China                  China National Petroleum
Denmark                            Maersk Group
Finland                                   Nokia
France                                   Sodexo
Germany                              Volkswagen
India                       State Bank of India
Indonesia                             Pertamina
Ireland                               Accenture
Israel           Teva Pharmaceutical Industries
Italy                            Poste Italiane
Japan                              Toyota Motor
Luxembourg                        ArcelorMittal
Malaysia                               Petronas
Mexico                            America Movil
Netherlands                          EXOR Group
Norway                                  