# *Introduction To Pandas*

**Here, we will be working with data set from Fortune magazine's Global 500 list 2017, which ranks the top 500 corporations worldwide by revenue. The dataset (<a href="https://data.world/chasewillden/fortune-500-companies-2017">compiled here</a>) is a CSV file called f500.csv.**

### Data Dictionary:

> <font color=blue>***company***</font> - The Name of the company.<br>
> <font color=blue>***rank***</font> - The Global 500 rank for the company.<br>
> <font color=blue>***revenues***</font> - The company's total revenues for the fiscal year, in millions of dollars (USD).<br>
> <font color=blue>***revenue_change***</font> - The percentage change in revenue between the current and prior fiscal years.<br>
> <font color=blue>***profits***</font> - Net income for the fiscal year, in millions of dollars (USD).<br>
> <font color=blue>***ceo***</font> - The company's Chief Executive Officer.<br>
> <font color=blue>***industry***</font> - The industry in which the company operates.<br>
> <font color=blue>***sector***</font> - The sector in which the company operates.<br>
> <font color=blue>***previous_rank***</font> - The Global 500 rank for the company for the prior year.<br>
> <font color=blue>***country***</font> - The Country in which the company is headquartered.<br>
> <font color=blue>***hq_location***</font> - The City and Country, (or City and State for the USA) where the company is headquarted.<br>
> <font color=blue>***employees***</font> - Total employees (full-time equivalent, if available) at fiscal year-end.<br>

**We have modified the original data set into a more accessible format**

### Understanding Pandas and NumPy
1. Use Python's type() function to assign the type of f500 to f500_type.
2. Use the DataFrame.shape attribute to assign the shape of f500 to f500_shape.
3. After you have run your code, use the variable inspector to look at the variables f500, f500_type, and f500_shape.

In [2]:
import pandas as pd
f500 = pd.read_csv("f500.csv", index_col=0)
f500.index.name = None 

f500_type = type(f500)

f500_shape = f500.shape

f500

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
Volkswagen,6,240264,1.5,5937.3,432116,,Matthias Muller,Motor Vehicles and Parts,Motor Vehicles & Parts,7,Germany,"Wolfsburg, Germany",http://www.volkswagen.com,23,626715,97753
Royal Dutch Shell,7,240033,-11.8,4575.0,411275,135.9,Ben van Beurden,Petroleum Refining,Energy,5,Netherlands,"The Hague, Netherlands",http://www.shell.com,23,89000,186646
Berkshire Hathaway,8,223604,6.1,24074.0,620854,,Warren E. Buffett,Insurance: Property and Casualty (Stock),Financials,11,USA,"Omaha, NE",http://www.berkshirehathaway.com,21,367700,283001
Apple,9,215639,-7.7,45687.0,321686,-14.4,Timothy D. Cook,"Computers, Office Equipment",Technology,9,USA,"Cupertino, CA",http://www.apple.com,15,116000,128249
Exxon Mobil,10,205004,-16.7,7840.0,330314,-51.5,Darren W. Woods,Petroleum Refining,Energy,6,USA,"Irving, TX",http://www.exxonmobil.com,23,72700,167325


### Introducing DataFrames

1. Using the links above to the documentation if you need to, use the three methods we just learned about to learn more about the f500 dataframe:
    - Use the head() method to select the first 6 rows and assign the result to f500_head.
    - Use the tail() method to select the last 8 rows and assign the result to f500_tail.
    - Use the info() method to display information about the dataframe.
2. After you have run your code, use the variable inspector and output to view information about the dataframe.

In [3]:
f500_head = f500.head(6)
f500_tail = f500.tail(8)
print(f500.info())

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
rank                        500 non-null int64
revenues                    500 non-null int64
revenue_change              498 non-null float64
profits                     499 non-null float64
assets                      500 non-null int64
profit_change               436 non-null float64
ceo                         500 non-null object
industry                    500 non-null object
sector                      500 non-null object
previous_rank               500 non-null int64
country                     500 non-null object
hq_location                 500 non-null object
website                     500 non-null object
years_on_global_500_list    500 non-null int64
employees                   500 non-null int64
total_stockholder_equity    500 non-null int64
dtypes: float64(3), int64(7), object(6)
memory usage: 66.4+ KB
None


### Selecting Columns From a DataFrame By Label
1. Select the industry column, and assign the result to the variable name industries.
2. Select the rank, previous_rank and years_on_global_500_list columns, in order, and assign the result to the variable name previous.
3. Select all columns from revenues up to and including profit_change, in order, and assign the result to the variable name financial_data.

In [4]:
industries = f500["industry"]
previous = f500[["rank", "previous_rank", "years_on_global_500_list"]]
financial_data = f500.loc[:,"revenues":"profit_change"]
financial_data

Unnamed: 0,revenues,revenue_change,profits,assets,profit_change
Walmart,485873,0.8,13643.0,198825,-7.2
State Grid,315199,-4.4,9571.3,489838,-6.2
Sinopec Group,267518,-9.1,1257.9,310726,-65.0
China National Petroleum,262573,-12.3,1867.5,585619,-73.7
Toyota Motor,254694,7.7,16899.3,437575,-12.3
Volkswagen,240264,1.5,5937.3,432116,
Royal Dutch Shell,240033,-11.8,4575.0,411275,135.9
Berkshire Hathaway,223604,6.1,24074.0,620854,
Apple,215639,-7.7,45687.0,321686,-14.4
Exxon Mobil,205004,-16.7,7840.0,330314,-51.5


### Column Selection Shortcuts

1. Select the country column, and assign the result to the variable name countries.
2. Select the revenues and years_on_global_500_list columns, in order, and assign the result to the variable name revenues_years.
3. Select all columns from ceo up to and including sector, in order, and assign the result to the variable name ceo_to_sector.

In [5]:
countries = f500["country"]
revenues_years = f500[["revenues", "years_on_global_500_list"]]
ceo_to_sector = f500.loc[:, "ceo":"sector"]

revenues_years

Unnamed: 0,revenues,years_on_global_500_list
Walmart,485873,23
State Grid,315199,17
Sinopec Group,267518,19
China National Petroleum,262573,17
Toyota Motor,254694,23
Volkswagen,240264,23
Royal Dutch Shell,240033,23
Berkshire Hathaway,223604,21
Apple,215639,15
Exxon Mobil,205004,23


### Selecting Items from a Series By Label

1. From the pandas series ceos:
    - Select the item at index label Walmart and assign the result to the variable name walmart.
    - Select the items from index label Apple up to and including index label Samsung Electronics and assign the result to the variable name apple_to_samsung.
    - Select the items with index labels Exxon Mobil, BP, and Chevron, in order, and assign the result to the variable name oil_companies.

In [6]:
ceos = f500["ceo"]
walmart = ceos["Walmart"]
apple_to_samsung = ceos["Apple":"Samsung Electronics"]
oil_companies = ceos[["Exxon Mobil", "BP", "Chevron"]]

ceos

Walmart                                                       C. Douglas McMillon
State Grid                                                                Kou Wei
Sinopec Group                                                           Wang Yupu
China National Petroleum                                            Zhang Jianhua
Toyota Motor                                                          Akio Toyoda
Volkswagen                                                        Matthias Muller
Royal Dutch Shell                                                 Ben van Beurden
Berkshire Hathaway                                              Warren E. Buffett
Apple                                                             Timothy D. Cook
Exxon Mobil                                                       Darren W. Woods
McKesson                                                       John H. Hammergren
BP                                                               Robert W. Dudley
UnitedHealth Gro

### Selecting Rows From a DataFrame by Label

1. By selecting data from f500:
    - Create a new variable, drink_companies, with:
        - Rows with indicies Anheuser-Busch InBev, Coca-Cola, and Heineken Holding, in that order.
        - All columns.
    - Create a new variable big_movers, with:
        - Rows with indicies Aviva, HP, JD.com, and BHP Billiton, in that order.
        - The rank and previous_rank columns, in that order.
    - Create a new variable, middle_companies with:
        - All rows with indicies from Tata Motorsto Nationwide, inclusive.
        - All columns from rank to country, inclusive.

In [7]:
drink_companies = f500.loc[["Anheuser-Busch InBev", "Coca-Cola", "Heineken Holding"], :]
big_movers = f500.loc[["Aviva", "HP", "JD.com", "BHP Billiton"], ["rank", "previous_rank"]]
middle_companies = f500.loc["Tata Motors":"Nationwide", "rank":"country"]

big_movers

Unnamed: 0,rank,previous_rank
Aviva,90,279
HP,194,48
JD.com,261,366
BHP Billiton,350,168


### Series and DataFrame Describe Methods
1. Use the appropriate describe() method to:
    - Return a series of descriptive statistics for the profits column, and assign the result to profits_desc.
    - Return a dataframe of descriptive statistics for the revenues and employees columns, in order, and assign the result to revenue_and_employees_desc.
    - Return a dataframe of descriptive statistics for every column in the f500 dataframe, by checking the documentation for the correct value for the include parameter, and assign the result to all_desc.

In [8]:
profits_desc = f500["profits"].describe()
revenue_and_employees_desc = f500[["revenues", "employees"]].describe()
all_desc = f500.describe(include = 'all')

all_desc

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
count,500.0,500.0,498.0,499.0,500.0,436.0,500,500,500,500.0,500,500,500,500.0,500.0,500.0
unique,,,,,,,500,58,21,,34,235,500,,,
top,,,,,,,Alistair Phillips-Davies,Banks: Commercial and Savings,Financials,,USA,"Beijing, China",http://www.onex.com,,,
freq,,,,,,,1,51,118,,132,56,1,,,
mean,250.5,55416.358,4.538353,3055.203206,243632.3,24.152752,,,,222.134,,,,15.036,133998.3,30628.076
std,144.481833,45725.478963,28.549067,5171.981071,485193.7,437.509566,,,,146.941961,,,,7.932752,170087.8,43642.576833
min,1.0,21609.0,-67.3,-13038.0,3717.0,-793.7,,,,0.0,,,,1.0,328.0,-59909.0
25%,125.75,29003.0,-5.9,556.95,36588.5,-22.775,,,,92.75,,,,7.0,42932.5,7553.75
50%,250.5,40236.0,0.55,1761.6,73261.5,-0.35,,,,219.5,,,,17.0,92910.5,15809.5
75%,375.25,63926.75,6.975,3954.0,180564.0,17.7,,,,347.25,,,,23.0,168917.2,37828.5


### More Data Exploratrion Methods

1. Use Series.value_counts() and Series.head() to return the 5 most common values for the country column, and assign the results to top5_countries.
2. Use Series.value_counts() and Series.head() to return the 5 most common values for the previous rank column, and assign the results to top5_previous_rank.
3. Use the appropriate max() method to find the maximum value for only the numeric columns from f500 (you may need to check the documentation), and assign the result to the variable max_f500.
4. After you have run your code, use the variable inspector to view each of the new variables you created.

In [9]:
top5_countries = f500["country"].value_counts().head()
top5_previous_rank = f500["previous_rank"].value_counts().head()
max_f500 = f500.max(numeric_only = True)

top5_countries

USA        132
China      109
Japan       51
Germany     29
France      29
Name: country, dtype: int64

### Assignment with Pandas
1. Add a new column, revenues_b to the f500 dataframe by using vectorized division to divide the values in the existing revenues column by 1000 (converting them from millions to billions).
2. The company 'Dow Chemical' have named a new CEO. Update the value where the index label is Dow Chemical and for the ceo column to Jim Fitterling.

In [10]:
f500["revenues_b"] =  f500["revenues"]/1000
f500.loc["Dow Chemical", "ceo"] = "Jim Fitterling"

f500["revenues_b"]

Walmart                                         485.873
State Grid                                      315.199
Sinopec Group                                   267.518
China National Petroleum                        262.573
Toyota Motor                                    254.694
Volkswagen                                      240.264
Royal Dutch Shell                               240.033
Berkshire Hathaway                              223.604
Apple                                           215.639
Exxon Mobil                                     205.004
McKesson                                        198.533
BP                                              186.606
UnitedHealth Group                              184.840
CVS Health                                      177.526
Samsung Electronics                             173.957
Glencore                                        173.883
Daimler                                         169.483
General Motors                                  

### Using Boolean Indexing with Pandas Objects
1. Create a boolean series, kr_bool, that compares whether the values in the country column from the f500 dataframe are equal to "South Korea"
2. Use that boolean series to index the full f500 dataframe, assigning just the first five rows to top_5_kr.

In [11]:
kr_bool = f500["country"] == "South Korea"
top_5_kr = f500[kr_bool].head()

top_5_kr

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,revenues_b
Samsung Electronics,15,173957,-2.0,19316.5,217104,16.8,Oh-Hyun Kwon,"Electronics, Electrical Equip.",Technology,13,South Korea,"Suwon, South Korea",http://www.samsung.com,23,325000,154376,173.957
Hyundai Motor,78,80701,-0.8,4659.0,148092,-17.9,Mong-Koo Chung,Motor Vehicles and Parts,Motor Vehicles & Parts,84,South Korea,"Seoul, South Korea",http://worldwide.hyundai.com,22,129315,55639,80.701
SK Holdings,95,72579,107.4,659.7,85332,-86.0,Tae Won Chey,Petroleum Refining,Energy,294,South Korea,"Seoul, South Korea",http://www.sk.co.kr,2,84000,10858,72.579
Korea Electric Power,177,51500,-0.6,6074.1,147265,-48.3,Hwan-Eik Cho,Utilities,Energy,172,South Korea,"Jeollanam-do, South Korea",http://www.kepco.co.kr,23,43688,59394,51.5
LG Electronics,201,47712,-4.6,66.2,31348,-39.8,Seong-Jin Jo,"Electronics, Electrical Equip.",Technology,180,South Korea,"Seoul, South Korea",http://www.lg.com,17,75000,9926,47.712


### Using Boolean Arrays to Assign Values
1. Use boolean indexing to update values in the previous_rank column of the f500 dataframe:
    - Where previous there was a value of 0, there should now be a value of np.nan.
    - It is up to you whether you assign the boolean series to its own variable first, or whether you complete the operation in one line.
2. Create a new pandas series, prev_rank_after, using the same syntax that was used to create the prev_rank_before series.
3. After you have run your code, use the variable inspector to compare prev_rank_before and prev_rank_after.

In [12]:
import numpy as np
prev_rank_before = f500["previous_rank"].value_counts(dropna=False).head()

f500.loc[f500['previous_rank'] == 0, "previous_rank"] = np.nan

prev_rank_after = f500["previous_rank"].value_counts(dropna=False).head()

prev_rank_after

NaN       33
 471.0     1
 234.0     1
 125.0     1
 166.0     1
Name: previous_rank, dtype: int64

### Challenge Top Performers By Country
1. Create a series, cities_usa, containing counts of the five most common Headquarter Location cities for companies headquartered in the USA.
2. Create a series, sector_china, containing counts of the three most common sectors for companies headquartered in the China.
3. Create float object, mean_employees_japan, containing the mean average number of employees for companies headquartered in Japan

In [13]:
top_3_countries = f500["country"].value_counts().head(3)

cities_usa = f500.loc[f500["country"] == "USA", "hq_location"].value_counts()
sector_china = f500.loc[f500["country"] == "China", "sector"].value_counts().head(3)

mean_employees_japan = f500.loc[f500["country"] == "Japan", "employees"].mean()

#companies_china = f500[f500["country"] == "China"]
#financial_china = companies_china[companies_china["sector"] = "Financials"]
#print(financial_china)

sector_financial = f500[f500["sector"] == "Financials"]
# print(sector_financial)

companies_china = sector_financial[sector_financial["country"] == "China"]

companies_china

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,revenues_b
Industrial & Commercial Bank of China,22,147675,-11.7,41883.9,3473238,-5.0,Gu Shu,Banks: Commercial and Savings,Financials,15.0,China,"Beijing, China",http://www.icbc-ltd.com,19,461749,283438,147.675
China Construction Bank,28,135093,-8.7,34840.9,3016578,-4.0,Wang Hongzhang,Banks: Commercial and Savings,Financials,22.0,China,"Beijing, China",http://www.ccb.com,18,362482,226851,135.093
Agricultural Bank of China,38,117275,-12.1,27687.8,2816039,-3.6,Zhao Huan,Banks: Commercial and Savings,Financials,29.0,China,"Beijing, China",http://www.abchina.com,18,501368,189682,117.275
Ping An Insurance,39,116581,5.7,9392.0,802490,8.9,Ma Mingzhe,"Insurance: Life, Health (stock)",Financials,41.0,China,"Shenzhen, China",http://www.pingan.com,8,318588,55177,116.581
Bank of China,42,113708,-7.1,24773.4,2611539,-8.9,Chen Siqing,Banks: Commercial and Savings,Financials,35.0,China,"Beijing, China",http://www.boc.cn,23,308900,203134,113.708
China Life Insurance,51,104818,3.5,162.4,483026,-96.1,Yang Mingsheng,"Insurance: Life, Health (stock)",Financials,54.0,China,"Beijing, China",http://www.chinalife.com.cn,15,143676,14079,104.818
People’s Insurance Co. of China,114,66732,3.3,2144.3,134132,-31.0,Wu Yan,Insurance: Property and Casualty (Stock),Financials,119.0,China,"Beijing, China",http://www.picc.com.cn,8,188570,18145,66.732
Anbang Insurance Group,139,60800,124.0,3883.9,430040,0.9,Wu Xiaohui,"Insurance: Life, Health (Mutual)",Financials,,China,"Beijing, China",http://www.anbanggroup.com,1,40707,20372,60.8
Bank of Communications,171,52990,-7.1,10116.9,1209176,-4.4,Niu Ximing,Banks: Commercial and Savings,Financials,153.0,China,"Shanghai, China",http://www.bankcomm.com,9,95160,90531,52.99
CITIC Group,172,52852,-5.5,3236.3,938261,-14.0,Chang Zhenming,Diversified Financials,Financials,156.0,China,"Beijing, China",http://www.citicgroup.com.cn,9,201263,41784,52.852


# *Exploring Data with Pandas*

### Using iloc to Select by Integer Position

**We have provided code to read the f500.csv file into a dataframe and assigned it to f500, and inserted NaN values into the previous_rank column as we did in the previous mission.**

1. Select just the fifth row of the f500 dataframe, assigning the result to fifth_row.
2. Select the first three rows of the f500 dataframe, assigning the result to first_three_rows.
3. Select the first and seventh rows and the first 5 columns of the f500 dataframe, assigning the result to first_seventh_row_slice
4. After you have run your code, use the variable inspector to examine each of the objects you created.

In [14]:
fifth_row = f500.iloc[4]
first_three_rows = f500[:3]
first_seventh_row_slice = f500.iloc[[0, 6], :5]

fifth_row

rank                                                   5
revenues                                          254694
revenue_change                                       7.7
profits                                          16899.3
assets                                            437575
profit_change                                      -12.3
ceo                                          Akio Toyoda
industry                        Motor Vehicles and Parts
sector                            Motor Vehicles & Parts
previous_rank                                          8
country                                            Japan
hq_location                                Toyota, Japan
website                     http://www.toyota-global.com
years_on_global_500_list                              23
employees                                         364445
total_stockholder_equity                          157210
revenues_b                                       254.694
Name: Toyota Motor, dtype: obje

### Reading CSV files with pandas

**The pandas library is already imported from the previous screen.**

1. Use the pandas.read_csv() function to read the f500.csv CSV file as a pandas dataframe, and assign it to the variable name f500.
    - Do not use the index_col parameter, so that the dataframe has integer index labels.
2. Use the code below to insert the NaN values into the previous_rank column: f500.loc[f500["previous_rank"] == 0, "previous_rank"] = np.nan

In [15]:
f500 = pd.read_csv("f500.csv")
f500.loc[f500["previous_rank"] == 0, "previous_rank"] = np.nan

f500.head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1.0,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2.0,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4.0,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3.0,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8.0,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


### Working with Integer Labels

1. Assign the first five rows of the sorted_emp dataframe to the variable top5_emp, by choosing the correct method out of either loc[] or iloc[].

In [16]:
sorted_emp = f500.sort_values("employees", ascending=False)
top5_emp = sorted_emp.head()

top5_emp

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1.0,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3.0,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
118,China Post Group,119,65605,-5.8,4980.3,1221649,18.7,Li Guohua,"Mail, Package, and Freight Delivery",Transportation,105.0,China,"Beijing, China",http://www.chinapost.com.cn,7,941211,43114
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2.0,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
26,Hon Hai Precision Industry,27,135129,-4.3,4608.8,80436,-0.4,Terry Gou,"Electronics, Electrical Equip.",Technology,25.0,Taiwan,"New Taipei City, Taiwan",http://www.foxconn.com,13,726772,33476


### Using pandas methods to create boolean masks
1. Use the Series.notnull() method to select all rows from f500 that have a non-null value for the previous_rank column, and assign the result to previously_ranked
2. From the previously_ranked dataframe, subtract the previous_rank column from the rank column, and assign the result to rank_change.

In [17]:
previously_ranked = f500[f500["previous_rank"].notnull()]
rank_change = previously_ranked["rank"] - previously_ranked["previous_rank"]
print(rank_change)

0        0.0
1        0.0
2       -1.0
3        1.0
4       -3.0
5       -1.0
6        2.0
7       -3.0
8        0.0
9        4.0
10      -1.0
11       2.0
12      -4.0
13      -4.0
14       2.0
15       2.0
16       1.0
17      -2.0
18      -4.0
19       1.0
20       0.0
21       7.0
22      -5.0
23      -3.0
24      -8.0
25     -18.0
26       2.0
27       6.0
28      -7.0
29       6.0
       ...  
455      1.0
457     58.0
459     38.0
460    -31.0
462     44.0
464     19.0
465     15.0
467      9.0
468     77.0
469     35.0
470    -16.0
471     27.0
473     -9.0
474     14.0
475     92.0
476    -17.0
478     50.0
479      8.0
480     -9.0
481     -6.0
483     18.0
485     55.0
486     13.0
488     64.0
489    107.0
490     20.0
492     89.0
496     70.0
497     61.0
498     32.0
Length: 467, dtype: float64


### Using Boolean Operators

1. Select from the f500 dataframe:
    - Companies with revenues over 100 billion and negative profits, assigning the result to big_rev_neg_profit.
    - The first 5 companies in the Technology sector that are not headquartered in the USA, assigning the result to tech_outside_usa.

In [20]:
big_rev_neg_profit = f500[(f500["revenues"] > 100000) & (f500["profits"] < 0)]
tech_outside_usa = f500[(f500["sector"] == "Technology") & (~(f500["country"] == "USA"))].head()

print(big_rev_neg_profit)
tech_outside_usa

                company  rank  revenues  revenue_change  profits   assets  \
32  Japan Post Holdings    33    122990             3.6   -267.4  2631385   
44              Chevron    45    107567           -18.0   -497.0   260078   

    profit_change               ceo                         industry  \
32         -107.5  Masatsugu Nagato  Insurance: Life, Health (stock)   
44         -110.8    John S. Watson               Petroleum Refining   

        sector  previous_rank country    hq_location                  website  \
32  Financials           37.0   Japan   Tokyo, Japan  http://www.japanpost.jp   
44      Energy           31.0     USA  San Ramon, CA   http://www.chevron.com   

    years_on_global_500_list  employees  total_stockholder_equity  
32                        21     248384                     91532  
44                        23      55200                    145556  


Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
14,Samsung Electronics,15,173957,-2.0,19316.5,217104,16.8,Oh-Hyun Kwon,"Electronics, Electrical Equip.",Technology,13.0,South Korea,"Suwon, South Korea",http://www.samsung.com,23,325000,154376
26,Hon Hai Precision Industry,27,135129,-4.3,4608.8,80436,-0.4,Terry Gou,"Electronics, Electrical Equip.",Technology,25.0,Taiwan,"New Taipei City, Taiwan",http://www.foxconn.com,13,726772,33476
70,Hitachi,71,84558,1.2,2134.3,86742,48.8,Toshiaki Higashihara,"Electronics, Electrical Equip.",Technology,79.0,Japan,"Tokyo, Japan",http://www.hitachi.com,23,303887,26632
82,Huawei Investment & Holding,83,78511,24.9,5579.4,63837,-5.0,Ren Zhengfei,Network and Other Communications Equipment,Technology,129.0,China,"Shenzhen, China",http://www.huawei.com,8,180000,20159
104,Sony,105,70170,3.9,676.4,158519,-45.1,Kazuo Hirai,"Electronics, Electrical Equip.",Technology,113.0,Japan,"Tokyo, Japan",http://www.sony.net,23,128400,22415


### Pandas Index Alignment

**Earlier, we created the rank_change series by performing vectorized subtraction only on rows without null values. We have included the code again as a reminder.**

1. Assign the values in the rank_change to a new column in the f500 dataframe, "rank_change".
2. Once you have run your code, use the variable inspector to look at the f500 dataframe and observe how the new column aligns with the existing data.

In [22]:
previously_ranked = f500[f500["previous_rank"].notnull()]
rank_change = previously_ranked["previous_rank"] - previously_ranked["rank"]
f500["rank_change"] = rank_change

f500['rank_change'].head()

0    0.0
1    0.0
2    1.0
3   -1.0
4    3.0
Name: rank_change, dtype: float64

### Using Loops with Pandas

**We're going to produce the following dictionary of the top employer in each country:**

**{'Australia': 'Wesfarmers',<br>
 'Belgium': 'Anheuser-Busch InBev',<br>
 'Brazil': 'JBS',<br>
 ...<br>
 'U.A.E': 'Emirates Group',<br>
 'USA': 'Walmart',<br>
 'Venezuela': 'Mercantil Servicios Financieros'}**<br>
 
1. Read the documentation for the DataFrame.sort_values() method to familiarize yourself with the syntax. You will need to use only the by and ascending parameters to complete this exercise.
2. Create an empty dictionary, top_employer_by_country to store the results of the exercise.
3. Use the Series.unique() method to create an array of unique values from the country column.
4. Use a for loop to iterate over the array unique countries, and in each iteration:
    - Select only the rows that have a country name equal to the current iteration.
    - Use DataFrame.sort_values() to sort those rows by the employees column in descending order.
    - Select the first row from the sorted dataframe.
    - Extract the company name from the index label company from the first row.
    - Assign the results to the top_employer_by_country dictionary, using the country name as the key, and the company name as the value.
5. When you have run your code, use the variable inspector to view the top employer for each country.

In [23]:
top_employer_by_country = {}
unique_country = f500["country"].unique()

for c in unique_country:
    sorted_companies = f500[f500["country"] == c].sort_values(by="employees", ascending=False)
    employer_row = sorted_companies["company"]
    top_employer = employer_row.iloc[0]
    top_employer_by_country[c] = top_employer
    
top_employer_by_country

{'USA': 'Walmart',
 'China': 'China National Petroleum',
 'Japan': 'Toyota Motor',
 'Germany': 'Volkswagen',
 'Netherlands': 'EXOR Group',
 'Britain': 'Compass Group',
 'South Korea': 'Samsung Electronics',
 'Switzerland': 'Nestle',
 'France': 'Sodexo',
 'Taiwan': 'Hon Hai Precision Industry',
 'Singapore': 'Flex',
 'Italy': 'Poste Italiane',
 'Russia': 'Gazprom',
 'Spain': 'Banco Santander',
 'Brazil': 'JBS',
 'Mexico': 'America Movil',
 'Luxembourg': 'ArcelorMittal',
 'India': 'State Bank of India',
 'Malaysia': 'Petronas',
 'Thailand': 'PTT',
 'Australia': 'Wesfarmers',
 'Belgium': 'Anheuser-Busch InBev',
 'Norway': 'Statoil',
 'Canada': 'George Weston',
 'Ireland': 'Accenture',
 'Indonesia': 'Pertamina',
 'Denmark': 'Maersk Group',
 'Saudi Arabia': 'SABIC',
 'Sweden': 'H & M Hennes & Mauritz',
 'Finland': 'Nokia',
 'Venezuela': 'Mercantil Servicios Financieros',
 'Turkey': 'Koc Holding',
 'U.A.E': 'Emirates Group',
 'Israel': 'Teva Pharmaceutical Industries'}

### Challenge: Calculating Return on Assets by Sector
1. Create a new column roa in the f500 dataframe, containing the return on assets metric for each company.
2. Aggregate the data by the sector column, and create a dictionary top_roa_by_sector, with:
    - Dictionary keys with the sector name.
    - Dictionary values with the company name with the highest ROA value from that sector.

In [25]:
f500["roa"] = f500["profits"]/f500["assets"]
top_roa_by_sector = {}

companies_by_sector = f500["sector"].unique()

for i in companies_by_sector:
    company_df = f500[f500["sector"] == i].sort_values(by="roa", ascending=False)
    company_series = company_df["company"]
    top_company = company_series.iloc[0]
    top_roa_by_sector[i] = top_company
    
top_roa_by_sector

{'Retailing': 'H & M Hennes & Mauritz',
 'Energy': 'National Grid',
 'Motor Vehicles & Parts': 'Subaru',
 'Financials': 'Berkshire Hathaway',
 'Technology': 'Accenture',
 'Wholesalers': 'McKesson',
 'Health Care': 'Gilead Sciences',
 'Telecommunications': 'KDDI',
 'Engineering & Construction': 'Pacific Construction Group',
 'Industrials': '3M',
 'Food & Drug Stores': 'Publix Super Markets',
 'Aerospace & Defense': 'Lockheed Martin',
 'Food, Beverages & Tobacco': 'Philip Morris International',
 'Household Products': 'Unilever',
 'Transportation': 'Delta Air Lines',
 'Materials': 'CRH',
 'Chemicals': 'LyondellBasell Industries',
 'Media': 'Disney',
 'Apparel': 'Nike',
 'Hotels, Restaurants & Leisure': 'McDonald’s',
 'Business Services': 'Adecco Group'}