# Fortune 500 companies Data Analysis

Here we're given a data of the Fortune 500 countries.
Find the csv here - [fortune500](https://drive.google.com/file/d/1A75bwCnYL7mt87suUHdIHPycqhIMpYW9/view?usp=sharing)

In this analysis we intend to understand the following :-
    1. Find out the countries with most companies from a particular industry say Motor Vehicles and Parts.
    2. The highest ranked companies from a particular country.
    3. Fix issues with previous_rank column.
    4. Find the city with most headquarters in any particular country say USA.
    5. The average number of employees in a company from a particular country.

Importing the csv data file.

In [19]:
import numpy as np
import pandas as pd
f500 = pd.read_csv("f500.csv", index_col=0)
f500.index.name = None

Now lets intend to find out the five most common countries for Motor Vehicles and Parts and industry.

Seperate out a column for Motor Vehicles and Parts companies with their countries -----> Use value_counts to list the top most occuring countries.  

In [20]:
countries_motor = f500.loc[f500['industry'] == 'Motor Vehicles and Parts', 'country']
print(countries_motor.value_counts().head())

Japan          10
China           7
Germany         6
France          3
South Korea     3
Name: country, dtype: int64


Clearly most Fortune 500 Motor Vehicles and Parts companies are in Japan.

Now lets identify the highest ranked countries in India.

In [21]:
top_india = f500.loc[f500['country'] == 'India' , 'rank']
print(top_india.head())

Indian Oil             168
Reliance Industries    203
State Bank of India    217
Tata Motors            247
Rajesh Exports         295
Name: rank, dtype: int64


Way to go India!

Now let's look deeper into the previous_rank column

In [22]:
print(f500['previous_rank'].value_counts(dropna = False).head(5))

0      33
159     1
147     1
148     1
149     1
Name: previous_rank, dtype: int64


We can cleary see. That there's something wrong with the data here. How can the previous rank of a country be 0?
SO it can be inferred that the previous rank data for certain companies must have been missing and the data entry presonnel must have entered it as zero. 

Let's fix this and replace the previous rank zeros with NaN.

In [23]:
f500.loc[f500['previous_rank'] == 0 ,'previous_rank'] = np.nan
print(f500['previous_rank'].value_counts(dropna = False).head(5))

NaN       33
 471.0     1
 234.0     1
 125.0     1
 166.0     1
Name: previous_rank, dtype: int64


Now let's embark on the mission to find out the cities which headquarter the most number of companies in USA.

In [27]:
hq_usaCity = f500.loc[f500.loc[:,'country']== 'USA','hq_location']
print(hq_usaCity.value_counts().head(3))

New York, NY    15
Houston, TX      5
Atlanta, GA      4
Name: hq_location, dtype: int64


Killin it NYC!

Now let's also find out the avg number of employees in f500 Japanese company.

In [29]:
employeesJapan = f500.loc[f500.loc[: ,'country'] =='Japan' , 'employees']
avgEmployees = employeesJapan.mean() 
print(avgEmployees)

104564.45098039215
