# FORTUNE 500

As we learn pandas, we'll work with a data set from [Fortune](https://fortune.com/) magazine's [2017 Global 500 list](https://en.wikipedia.org/wiki/Fortune_Global_500), which ranks the top 500 corporations worldwide by revenue. The data set was originally compiled [here](https://data.world/chasewillden/fortune-500-companies-2017); however, we modified the original data set to make it more accessible.

![Alt 'World's'](fortune-500.jpg)

The data set is a CSV file called `f500.csv`. Here is a data dictionary for some of the columns in the CSV:

- `company`: Name of the company.
- `rank`: Global 500 rank for the company.
- `revenues`: Company's total revenue for the fiscal year, in millions of dollars (USD).
- `revenue_change`: Percentage change in revenue between the current and prior fiscal year.
- `profits`: Net income for the fiscal year, in millions of dollars (USD).
- `ceo`: Company's Chief Executive Officer.
- `industry`: Industry in which the company operates.
- `sector`: Sector in which the company operates.
- `previous_rank`: Global 500 rank for the company for the prior year.
- `country`: Country in which the company is headquartered.

Similar to the import convention for NumPy (import numpy as np), the import convention for pandas is:

import pandas as pd

In the `script.py` code editor for this screen, we have already imported pandas and used the `pandas.read_csv()` [function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) to read the CSV into a dataframe and assign it to the variable name `f500`. We'll learn about `read_csv()` later in this course, but for now, all you need to know is that it automatically handles reading and parsing most CSV files.

Like NumPy's ndarrays, pandas' dataframes have a `.shape` attribute which returns a tuple representing the dimensions of each axis of the object. We'll use that and Python's `type() function` to inspect the `f500` dataframe.
instructions

1. Use Python's `type()` function to assign the type of `f500` to `f500_type`.
2. Use the `DataFrame.shape` attribute to assign the shape of `f500` to `f500_shape`.
3. After you have run your code, use the variable inspector to look at the variables `f500`, `f500_type`, and `f500_shape`.


The primary data structure in pandas is called a dataframe. Dataframes are the pandas equivalent of a Numpy 2D ndarray, with a few key differences:

- Axis values can have string labels, not just numeric ones.
- Dataframes can contain columns with multiple data types: including integer, float, and string.


![](df_anatomy_static_resized.svg)

In [1]:
import pandas as pd
from IPython.display import HTML
from urllib.request import urlopen
html = open('mine.css')
html

<_io.TextIOWrapper name='mine.css' mode='r' encoding='UTF-8'>

In [2]:
HTML(html.read())

In [3]:
f500 = pd.read_csv('f500.csv', index_col=0)
f500.index.name = None

In [4]:
type(f500)

pandas.core.frame.DataFrame

In [5]:
f500.shape

(500, 16)

In [6]:
f500.head()

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [7]:
f500.head(-497)

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523


In [8]:
f500.tail()

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


In [9]:
f500.tail(-497)

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


In [10]:
pd.options

<pandas._config.config.DictWrapper at 0x7f3d61e9a820>

In [11]:
print(pd.options)

<pandas._config.config.DictWrapper object at 0x7f3d61e9a820>


In [12]:
f500.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   rank                      500 non-null    int64  
 1   revenues                  500 non-null    int64  
 2   revenue_change            498 non-null    float64
 3   profits                   499 non-null    float64
 4   assets                    500 non-null    int64  
 5   profit_change             436 non-null    float64
 6   ceo                       500 non-null    object 
 7   industry                  500 non-null    object 
 8   sector                    500 non-null    object 
 9   previous_rank             500 non-null    int64  
 10  country                   500 non-null    object 
 11  hq_location               500 non-null    object 
 12  website                   500 non-null    object 
 13  years_on_global_500_list  500 non-null    int64  
 14  em

In [13]:
industry = f500['industry']
industry

Walmart                                     General Merchandisers
State Grid                                              Utilities
Sinopec Group                                  Petroleum Refining
China National Petroleum                       Petroleum Refining
Toyota Motor                             Motor Vehicles and Parts
                                               ...               
Teva Pharmaceutical Industries                    Pharmaceuticals
New China Life Insurance          Insurance: Life, Health (stock)
Wm. Morrison Supermarkets                    Food and Drug Stores
TUI                                               Travel Services
AutoNation                                    Specialty Retailers
Name: industry, Length: 500, dtype: object

In [14]:
type(industry)

pandas.core.series.Series

In [15]:
industry1 = f500[['industry']]
industry1

Unnamed: 0,industry
Walmart,General Merchandisers
State Grid,Utilities
Sinopec Group,Petroleum Refining
China National Petroleum,Petroleum Refining
Toyota Motor,Motor Vehicles and Parts
...,...
Teva Pharmaceutical Industries,Pharmaceuticals
New China Life Insurance,"Insurance: Life, Health (stock)"
Wm. Morrison Supermarkets,Food and Drug Stores
TUI,Travel Services


In [16]:
type(industry1)

pandas.core.frame.DataFrame

In [17]:
industry1.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   industry  500 non-null    object
dtypes: object(1)
memory usage: 24.0+ KB


In [18]:
industry1.info

<bound method DataFrame.info of                                                        industry
Walmart                                   General Merchandisers
State Grid                                            Utilities
Sinopec Group                                Petroleum Refining
China National Petroleum                     Petroleum Refining
Toyota Motor                           Motor Vehicles and Parts
...                                                         ...
Teva Pharmaceutical Industries                  Pharmaceuticals
New China Life Insurance        Insurance: Life, Health (stock)
Wm. Morrison Supermarkets                  Food and Drug Stores
TUI                                             Travel Services
AutoNation                                  Specialty Retailers

[500 rows x 1 columns]>

In [19]:
industry.shape

(500,)

In [20]:
f500.shape

(500, 16)

In [21]:
f500.index

Index(['Walmart', 'State Grid', 'Sinopec Group', 'China National Petroleum',
       'Toyota Motor', 'Volkswagen', 'Royal Dutch Shell', 'Berkshire Hathaway',
       'Apple', 'Exxon Mobil',
       ...
       'National Grid', 'Dollar General', 'Telecom Italia',
       'Xiamen ITG Holding Group', 'Xinjiang Guanghui Industry Investment',
       'Teva Pharmaceutical Industries', 'New China Life Insurance',
       'Wm. Morrison Supermarkets', 'TUI', 'AutoNation'],
      dtype='object', length=500)

Let's practice using these techniques to select specific columns from our f500 dataframe.
instructions

- Select the country column. Assign the result to the variable name countries.
- In order, select the revenues and years_on_global_500_list columns. Assign the result to the variable name revenues_years.
- In order, select all columns from ceo up to and including sector. Assign the result to the variable name ceo_to_sector.
- After you have run your code, use the variable inspector to view the variables.


In [22]:
countries = f500.loc[:, 'country']
countries

Walmart                               USA
State Grid                          China
Sinopec Group                       China
China National Petroleum            China
Toyota Motor                        Japan
                                   ...   
Teva Pharmaceutical Industries     Israel
New China Life Insurance            China
Wm. Morrison Supermarkets         Britain
TUI                               Germany
AutoNation                            USA
Name: country, Length: 500, dtype: object

In [23]:
countries1 = f500['country']
countries1

Walmart                               USA
State Grid                          China
Sinopec Group                       China
China National Petroleum            China
Toyota Motor                        Japan
                                   ...   
Teva Pharmaceutical Industries     Israel
New China Life Insurance            China
Wm. Morrison Supermarkets         Britain
TUI                               Germany
AutoNation                            USA
Name: country, Length: 500, dtype: object

In [24]:
countries2 = f500.loc[:, ['country']]
countries2

Unnamed: 0,country
Walmart,USA
State Grid,China
Sinopec Group,China
China National Petroleum,China
Toyota Motor,Japan
...,...
Teva Pharmaceutical Industries,Israel
New China Life Insurance,China
Wm. Morrison Supermarkets,Britain
TUI,Germany


In [25]:
countries3 = f500[['country']]
countries3

Unnamed: 0,country
Walmart,USA
State Grid,China
Sinopec Group,China
China National Petroleum,China
Toyota Motor,Japan
...,...
Teva Pharmaceutical Industries,Israel
New China Life Insurance,China
Wm. Morrison Supermarkets,Britain
TUI,Germany


In [26]:
revenues_years = f500.loc[:, ['revenues', 'years_on_global_500_list']]
revenues_years

Unnamed: 0,revenues,years_on_global_500_list
Walmart,485873,23
State Grid,315199,17
Sinopec Group,267518,19
China National Petroleum,262573,17
Toyota Motor,254694,23
...,...,...
Teva Pharmaceutical Industries,21903,1
New China Life Insurance,21796,2
Wm. Morrison Supermarkets,21741,13
TUI,21655,23


In [27]:
revenues_years1 = f500[['revenues', 'years_on_global_500_list']]
revenues_years1

Unnamed: 0,revenues,years_on_global_500_list
Walmart,485873,23
State Grid,315199,17
Sinopec Group,267518,19
China National Petroleum,262573,17
Toyota Motor,254694,23
...,...,...
Teva Pharmaceutical Industries,21903,1
New China Life Insurance,21796,2
Wm. Morrison Supermarkets,21741,13
TUI,21655,23


In [28]:
ceo_to_sector = f500.loc[:, 'ceo':'sector']
ceo_to_sector

Unnamed: 0,ceo,industry,sector
Walmart,C. Douglas McMillon,General Merchandisers,Retailing
State Grid,Kou Wei,Utilities,Energy
Sinopec Group,Wang Yupu,Petroleum Refining,Energy
China National Petroleum,Zhang Jianhua,Petroleum Refining,Energy
Toyota Motor,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts
...,...,...,...
Teva Pharmaceutical Industries,Yitzhak Peterburg,Pharmaceuticals,Health Care
New China Life Insurance,Wan Feng,"Insurance: Life, Health (stock)",Financials
Wm. Morrison Supermarkets,David T. Potts,Food and Drug Stores,Food & Drug Stores
TUI,Friedrich Joussen,Travel Services,Business Services


In [29]:
country_counts = f500['country'].value_counts()
country_counts

USA             132
China           109
Japan            51
France           29
Germany          29
Britain          24
South Korea      15
Switzerland      14
Netherlands      14
Canada           11
Spain             9
Brazil            7
Italy             7
India             7
Australia         7
Taiwan            6
Ireland           4
Russia            4
Sweden            3
Singapore         3
Mexico            2
Israel            1
Saudi Arabia      1
Thailand          1
Finland           1
Denmark           1
Belgium           1
Luxembourg        1
Turkey            1
U.A.E             1
Norway            1
Indonesia         1
Malaysia          1
Venezuela         1
Name: country, dtype: int64

In [30]:
country_counts['India'] # or country_counts.loc['India']

7

In [31]:
north_america = country_counts[['USA', 'Canada', 'Mexico']]
north_america

USA       132
Canada     11
Mexico      2
Name: country, dtype: int64

In [32]:
%%html
<style>
 table {margin-left: 0 !important;}
</style>

### Let's take a look at a summary of all the different label selection mechanisms we've learned in this lesson:

|Select by Label|Explicit Syntax|Shorthand Convention|
|:---|:---|:---|
|Single column from dataframe|`df.loc[:,"col1"]`|df["col1"]|
|List of columns from dataframe|`df.loc[:,["col1","col7"]]`|`df[["col1","col7"]]`|
|Slice of columns from dataframe|`df.loc[:,"col1":"col4"]`|
|Single row from dataframe|`df.loc["row4"]`|
|List of rows from dataframe|`df.loc[["row1", "row8"]]`|
|Slice of rows from dataframe|`df.loc["row3":"row5"]`|`df["row3":"row5"]`|
|Single item from series|`s.loc["item8"]`|`s["item8"]`|
|List of items from series|`s.loc[["item1","item7"]]`|`s[["item1","item7"]]`|
|Slice of items from series|`s.loc["item2":"item4"]`|`s["item2":"item4"]`|

In [34]:
big_movers = f500.loc[['Aviva', 'HP', 'JD.com', 'BHP Billiton'], ['rank', 'previous_rank']]
big_movers

Unnamed: 0,rank,previous_rank
Aviva,90,279
HP,194,48
JD.com,261,366
BHP Billiton,350,168


In [38]:
bottom_companies = f500.loc['National Grid':'AutoNation', ['rank', 'sector', 'country']]
bottom_companies

Unnamed: 0,rank,sector,country
National Grid,491,Energy,Britain
Dollar General,492,Retailing,USA
Telecom Italia,493,Telecommunications,Italy
Xiamen ITG Holding Group,494,Wholesalers,China
Xinjiang Guanghui Industry Investment,495,Wholesalers,China
Teva Pharmaceutical Industries,496,Health Care,Israel
New China Life Insurance,497,Financials,China
Wm. Morrison Supermarkets,498,Food & Drug Stores,Britain
TUI,499,Business Services,Germany
AutoNation,500,Retailing,USA
