Pandas no es un reemplazo de NumPy, sino una extensión de NumPy. El código subyacente para pandas utiliza la biblioteca NumPy de forma extensiva, lo que significa que los conceptos que has estado aprendiendo te resultarán útiles a medida que empieces a aprender más sobre pandas.

La estructura de datos primaria en pandas se llama **dataframe**. Los marcos de datos son el equivalente en pandas de un Numpy 2D ndarray, con algunas diferencias clave:

- Los valores de eje pueden tener etiquetas strings, no solo numéricas.
- Los **dataframe** de datos pueden contener columnas con múltiples tipos de datos: incluidos enteros, flotantes y cadenas.

<img src="df_anatomy_static_resized.svg" width="600" height="300">


El conjunto de datos es un archivo CSV llamado `f500.csv`. Aquí hay un diccionario de datos para algunas de las columnas en el CSV:

- `company`: El nombre de la empresa.
- `rank`: Ranking Global 500 para la empresa.
- `revenue`: Ingresos totales de la empresa para el año fiscal, en millones de dólares (USD).
- `revenue_change`: cambio porcentual en los ingresos entre el año fiscal actual y el anterior.
- `profits`: Utilidad neta del ejercicio, en millones de dólares (USD).
- `ceo`: Director General de la empresa.
- `industry`: La industria de operación de la empresa.
- `sector`: Sector en el que opera la empresa.
- `previous_rank`: Clasificación Global 500 de la empresa para el año anterior.
- `country`: País de la sede de la empresa.

In [1]:
import pandas as pd 

In [2]:
f500 = pd.read_csv('f500.csv', index_col=0)
print(type(f500))
print(f500.shape)

<class 'pandas.core.frame.DataFrame'>
(500, 16)


In [3]:
f500.head(10)

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
Volkswagen,6,240264,1.5,5937.3,432116,,Matthias Muller,Motor Vehicles and Parts,Motor Vehicles & Parts,7,Germany,"Wolfsburg, Germany",http://www.volkswagen.com,23,626715,97753
Royal Dutch Shell,7,240033,-11.8,4575.0,411275,135.9,Ben van Beurden,Petroleum Refining,Energy,5,Netherlands,"The Hague, Netherlands",http://www.shell.com,23,89000,186646
Berkshire Hathaway,8,223604,6.1,24074.0,620854,,Warren E. Buffett,Insurance: Property and Casualty (Stock),Financials,11,USA,"Omaha, NE",http://www.berkshirehathaway.com,21,367700,283001
Apple,9,215639,-7.7,45687.0,321686,-14.4,Timothy D. Cook,"Computers, Office Equipment",Technology,9,USA,"Cupertino, CA",http://www.apple.com,15,116000,128249
Exxon Mobil,10,205004,-16.7,7840.0,330314,-51.5,Darren W. Woods,Petroleum Refining,Energy,6,USA,"Irving, TX",http://www.exxonmobil.com,23,72700,167325


In [4]:
f500.tail(2)

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


In [5]:
f500.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   rank                      500 non-null    int64  
 1   revenues                  500 non-null    int64  
 2   revenue_change            498 non-null    float64
 3   profits                   499 non-null    float64
 4   assets                    500 non-null    int64  
 5   profit_change             436 non-null    float64
 6   ceo                       500 non-null    object 
 7   industry                  500 non-null    object 
 8   sector                    500 non-null    object 
 9   previous_rank             500 non-null    int64  
 10  country                   500 non-null    object 
 11  hq_location               500 non-null    object 
 12  website                   500 non-null    object 
 13  years_on_global_500_list  500 non-null    int64  
 14  em

In [6]:
f500.dtypes

rank                          int64
revenues                      int64
revenue_change              float64
profits                     float64
assets                        int64
profit_change               float64
ceo                          object
industry                     object
sector                       object
previous_rank                 int64
country                      object
hq_location                  object
website                      object
years_on_global_500_list      int64
employees                     int64
total_stockholder_equity      int64
dtype: object

In [7]:
###
f500.loc[:,'industry']

company
Walmart                                     General Merchandisers
State Grid                                              Utilities
Sinopec Group                                  Petroleum Refining
China National Petroleum                       Petroleum Refining
Toyota Motor                             Motor Vehicles and Parts
                                               ...               
Teva Pharmaceutical Industries                    Pharmaceuticals
New China Life Insurance          Insurance: Life, Health (stock)
Wm. Morrison Supermarkets                    Food and Drug Stores
TUI                                               Travel Services
AutoNation                                    Specialty Retailers
Name: industry, Length: 500, dtype: object

In [8]:
industries = f500['industry']
industry_type = type(industries)

print(industry_type)

<class 'pandas.core.series.Series'>


In [9]:
print(industries.shape)

(500,)


<img src="df_exploded_resized.svg" width="600" height="400">

In [10]:
### 07
countries = f500['country']
revenues_years = f500[['revenues', 'years_on_global_500_list']]
ceo_to_sector = f500.loc[:, 'ceo':'sector']

In [11]:
print(countries)

company
Walmart                               USA
State Grid                          China
Sinopec Group                       China
China National Petroleum            China
Toyota Motor                        Japan
                                   ...   
Teva Pharmaceutical Industries     Israel
New China Life Insurance            China
Wm. Morrison Supermarkets         Britain
TUI                               Germany
AutoNation                            USA
Name: country, Length: 500, dtype: object


In [12]:
f500_selection = f500[['rank', 'revenues', 'profits', 'country']].head().copy()
print(f500_selection)
# image

                          rank  revenues  profits country
company                                                  
Walmart                      1    485873  13643.0     USA
State Grid                   2    315199   9571.3   China
Sinopec Group                3    267518   1257.9   China
China National Petroleum     4    262573   1867.5   China
Toyota Motor                 5    254694  16899.3   Japan


In [13]:
toyota = f500.loc['Toyota Motor']
drink_companies = f500.loc[["Anheuser-Busch InBev", "Coca-Cola", "Heineken Holding"]]
middle_companies = f500.loc['Tata Motors':'Nationwide', 'rank':'country']

<img src="df_series_s_updated.svg">

<img src="df_series_df_updated.svg">

In [14]:
f500_sel = f500.head(7).copy()
countries = f500_sel['country']
country_counts = countries.value_counts()

In [15]:
print(country_counts)

China          3
USA            1
Japan          1
Germany        1
Netherlands    1
Name: country, dtype: int64


In [17]:
countries = f500['country']

In [16]:
countries_counts = countries.value_counts()
india = countries_counts['India']
north_america = countries_counts[['USA', 'Canada', 'Mexico']]

In [18]:
big_movers = f500.loc['Aviva-HP-JD.com-BHP Billiton'.split('-'), 'rank previous_rank'.split()] # ['Aviva','HP' , 'JD.com', 'BHP Billiton']
bottom_companies = f500.loc['National Grid':'AutoNation', 'rank sector country'.split()]

In [19]:
big_movers

Unnamed: 0_level_0,rank,previous_rank
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Aviva,90,279
HP,194,48
JD.com,261,366
BHP Billiton,350,168
