## Understanding pandas and NumPy

The primary data structure in pandas is called a dataframe. Dataframes are the pandas equivalent of a Numpy 2D ndarray, with a few key differences:

* Axis values can have string **labels**, not just numeric ones.
* Dataframes can contain columns with **multiple data types**: including integer, float, and string.

![Jupyter](./df_anatomy_static_resized.svg)

## Introduction to the Data

In [1]:
import pandas as pd

In [4]:
f500 = pd.read_csv('f500.csv',index_col=0)
f500_type = type(f500)
f500_shape = f500.shape

In [5]:
f500

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006


## Introducing DataFrames

In [6]:
f500_head = f500.head(6)

In [7]:
f500_tail = f500.tail(8)

## Introducing DataFrames Continued

In [8]:
f500.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   rank                      500 non-null    int64  
 1   revenues                  500 non-null    int64  
 2   revenue_change            498 non-null    float64
 3   profits                   499 non-null    float64
 4   assets                    500 non-null    int64  
 5   profit_change             436 non-null    float64
 6   ceo                       500 non-null    object 
 7   industry                  500 non-null    object 
 8   sector                    500 non-null    object 
 9   previous_rank             500 non-null    int64  
 10  country                   500 non-null    object 
 11  hq_location               500 non-null    object 
 12  website                   500 non-null    object 
 13  years_on_global_500_list  500 non-null    int64  
 14  em

In [12]:
f500['industry'].value_counts()

Banks: Commercial and Savings                     51
Motor Vehicles and Parts                          34
Petroleum Refining                                28
Insurance: Life, Health (stock)                   24
Food and Drug Stores                              20
Telecommunications                                18
Mining, Crude-Oil Production                      18
Insurance: Property and Casualty (Stock)          18
Utilities                                         18
Pharmaceuticals                                   15
Trading                                           15
Aerospace and Defense                             14
Electronics, Electrical Equip.                    13
Engineering, Construction                         13
Energy                                            12
Metals                                            12
Industrial Machinery                              10
Specialty Retailers                               10
Insurance: Life, Health (Mutual)              

In [13]:
f500['sector'].value_counts()

Financials                       118
Energy                            80
Technology                        44
Motor Vehicles & Parts            34
Wholesalers                       28
Health Care                       27
Food & Drug Stores                20
Transportation                    19
Telecommunications                18
Retailing                         17
Materials                         16
Food, Beverages & Tobacco         16
Industrials                       15
Aerospace & Defense               14
Engineering & Construction        13
Chemicals                          7
Household Products                 3
Media                              3
Business Services                  3
Hotels, Restaurants & Leisure      3
Apparel                            2
Name: sector, dtype: int64

In [19]:
f500.groupby(['industry', 'sector']).size().sort_values(ascending=False)

industry                                        sector                       
Banks: Commercial and Savings                   Financials                       51
Motor Vehicles and Parts                        Motor Vehicles & Parts           34
Petroleum Refining                              Energy                           28
Insurance: Life, Health (stock)                 Financials                       24
Food and Drug Stores                            Food & Drug Stores               20
Insurance: Property and Casualty (Stock)        Financials                       18
Utilities                                       Energy                           18
Mining, Crude-Oil Production                    Energy                           18
Telecommunications                              Telecommunications               18
Trading                                         Wholesalers                      15
Pharmaceuticals                                 Health Care                      1

In [26]:
f500.groupby('country')['revenues'].sum().sort_values(ascending=False)/sum(f500['revenues'])*100

country
USA             30.593223
China           21.792731
Japan            9.785436
Germany          6.689487
France           5.780683
Britain          4.468460
Netherlands      3.117942
South Korea      2.691927
Switzerland      2.594721
Spain            1.318766
Brazil           1.314312
Italy            1.311154
Canada           1.264349
India            1.010355
Taiwan           1.003992
Russia           0.941928
Australia        0.851088
Singapore        0.589584
Ireland          0.473788
Mexico           0.396904
Sweden           0.302766
Luxembourg       0.204961
Malaysia         0.178572
Thailand         0.175829
Belgium          0.165673
Norway           0.165558
Indonesia        0.131683
Denmark          0.127991
Saudi Arabia     0.127836
Finland          0.094243
Venezuela        0.088071
Turkey           0.084654
U.A.E            0.082283
Israel           0.079049
Name: revenues, dtype: float64

## Selecting a Column From a DataFrame by Label

In [27]:
industries = f500['industry']

In [28]:
industries_type = type(industries)

## Selecting Columns From a DataFrame by Label Continued

In [29]:
countries = f500['country']
revenues_years = f500[['revenues', 'years_on_global_500_list']]
ceo_to_sector = f500.loc[:, 'ceo':'sector']

## Selecting Rows From a DataFrame by Label

In [37]:
toyota = f500.loc['Toyota Motor']
toyota

rank                                                   5
revenues                                          254694
revenue_change                                       7.7
profits                                          16899.3
assets                                            437575
profit_change                                      -12.3
ceo                                          Akio Toyoda
industry                        Motor Vehicles and Parts
sector                            Motor Vehicles & Parts
previous_rank                                          8
country                                            Japan
hq_location                                Toyota, Japan
website                     http://www.toyota-global.com
years_on_global_500_list                              23
employees                                         364445
total_stockholder_equity                          157210
Name: Toyota Motor, dtype: object

In [39]:
drink_companies = f500.loc[['Anheuser-Busch InBev', 'Coca-Cola', 'Heineken Holding'], :]
drink_companies

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Anheuser-Busch InBev,206,45905,5.3,1241.0,258381,-85.0,Carlos Brito,Beverages,"Food, Beverages & Tobacco",211,Belgium,"Leuven, Belgium",http://www.ab-inbev.com,12,206633,71339
Coca-Cola,235,41863,-5.5,6527.0,87270,-11.2,James B. Quincey,Beverages,"Food, Beverages & Tobacco",206,USA,"Atlanta, GA",http://www.coca-colacompany.com,23,100300,23062
Heineken Holding,468,23044,-0.7,861.5,41469,-18.9,Jean-Francois van Boxmeer,Beverages,"Food, Beverages & Tobacco",459,Netherlands,"Amsterdam, Netherlands",http://www.theheinekencompany.com,11,73525,6958


In [40]:
middle_companies = f500.loc['Tata Motors' :'Nationwide', 'rank': 'country']
middle_companies

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Tata Motors,247,40329,-4.2,1111.6,42162,-34.0,Guenter Butschek,Motor Vehicles and Parts,Motor Vehicles & Parts,226,India
Aluminum Corp. of China,248,40278,6.0,-282.5,75089,,Yu Dehui,Metals,Materials,262,China
Mitsui,249,40275,1.6,2825.3,103231,,Tatsuo Yasunaga,Trading,Wholesalers,245,Japan
Manulife Financial,250,40238,49.4,2209.7,537461,28.9,Donald A. Guloien,"Insurance: Life, Health (stock)",Financials,394,Canada
China Minsheng Banking,251,40234,-5.2,7201.6,848389,-1.8,Zheng Wanchun,Banks: Commercial and Savings,Financials,221,China
China Pacific Insurance (Group),252,40193,2.2,1814.9,146873,-35.7,Huo Lianhong,"Insurance: Life, Health (stock)",Financials,251,China
American Airlines Group,253,40180,-2.0,2676.0,51274,-64.8,W. Douglas Parker,Airlines,Transportation,236,USA
Nationwide,254,40074,-0.4,334.3,197790,-42.4,Stephen S. Rasmussen,Insurance: Property and Casualty (Mutual),Financials,241,USA


## Series vs Dataframes

![Juyter](./df_series_s_updated.svg)
![Juyter](./df_series_df_updated.svg)

## Value Counts Method

In [41]:
countries = f500['country']
country_counts = countries.value_counts()
country_counts

USA             132
China           109
Japan            51
Germany          29
France           29
Britain          24
South Korea      15
Netherlands      14
Switzerland      14
Canada           11
Spain             9
Australia         7
Brazil            7
India             7
Italy             7
Taiwan            6
Ireland           4
Russia            4
Sweden            3
Singapore         3
Mexico            2
Finland           1
Malaysia          1
Venezuela         1
U.A.E             1
Norway            1
Turkey            1
Belgium           1
Israel            1
Denmark           1
Thailand          1
Saudi Arabia      1
Indonesia         1
Luxembourg        1
Name: country, dtype: int64

## Selecting Items from a Series by Label

In [42]:
countries = f500['country']
countries_counts = countries.value_counts()

In [44]:
india = countries_counts['India']
india

7

In [45]:
north_america = countries_counts[['USA', 'Canada', 'Mexico']]
north_america

USA       132
Canada     11
Mexico      2
Name: country, dtype: int64

## Summary Challenge

In [46]:
big_movers = f500.loc[['Aviva', 'HP', 'JD.com', 'BHP Billiton'], ['rank', 'previous_rank']]
big_movers

Unnamed: 0_level_0,rank,previous_rank
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Aviva,90,279
HP,194,48
JD.com,261,366
BHP Billiton,350,168


In [47]:
bottom_companies = f500.loc['National Grid':'AutoNation', ['rank', 'sector', 'country']]
bottom_companies

Unnamed: 0_level_0,rank,sector,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
National Grid,491,Energy,Britain
Dollar General,492,Retailing,USA
Telecom Italia,493,Telecommunications,Italy
Xiamen ITG Holding Group,494,Wholesalers,China
Xinjiang Guanghui Industry Investment,495,Wholesalers,China
Teva Pharmaceutical Industries,496,Health Care,Israel
New China Life Insurance,497,Financials,China
Wm. Morrison Supermarkets,498,Food & Drug Stores,Britain
TUI,499,Business Services,Germany
AutoNation,500,Retailing,USA
