# Introduction to pandas
Pandas is an extension of numpy, we will see the cores features of pandas:
<div class="row">
    <div class="col-sm-6">
      <ul> 
          <li>Dataframes</li>
          <li>Series</li>
          <li>Data exploration with pandas</li>
          <li>Data assignment with pandas</li>
          <li>Boolean indexing</li>
        </ul>
    </div>
    <div class="col-sm-6">
           <img src="https://s3.amazonaws.com/dq-content/291/fortune-500.jpg" style="width:200px;"/>
    </div>
</div>
                                                                          
We will work in this module with the dataset from <a href="http://fortune.com/">Fortune</a> magazine's <font color="green">Global 500 list</font>, which list the top <font color="green">500</font> corporations world-wide by revenue.

The dataset is a CSV file called f500.csv. Here is a data dictionary for some of the columns in the CSV:

- <font color="red">company</font> - The Name of the company.
- <font color="red">rank</font> - The Global 500 rank for the company.
- <font color="red">revenues</font> - The company's total revenues for the fiscal year, in millions of dollars (USD).
- <font color="red">revenue_change</font> - The percentage change in revenue between the current and prior fiscal years.
- <font color="red">profits</font> - Net income for the fiscal year, in millions of dollars (USD).
- <font color="red">ceo</font> - The company's Chief Executive Officer.
- <font color="red">industry</font> - The industry in which the company operates.
- <font color="red">sector</font> - The sector in which the company operates.
- <font color="red">previous_rank</font> - The Global 500 rank for the company for the prior year.
- <font color="red">country</font> - The Country in which the company is headquartered.
- <font color="red">hq_location</font> - The City and Country, (or City and State for the USA) where the company is headquarted.
- <font color="red">employees</font> - Total employees (full-time equivalent, if available) at fiscal year-end.


## Understanding panda and numpy
- Shape: numpy, pandas has shape function => a tuple(rowDim, colDim)
- type: panda type() function 

In [1]:
import pandas as pd
f500 = pd.read_csv("f500.csv", index_col=0)
f500.index.name = None

# f500 data type
f500_type = type(f500)
print(f500_type)

# f500 shape
f500_shape = f500.shape
print(f500_shape, f500_shape[0], f500_shape[1])

<class 'pandas.core.frame.DataFrame'>
(500, 16) 500 16


### DataFrame.dtypes: to get the type of each columns
**Object**: data type is used for string or any other data type except the ones we know

In [2]:
types = f500.dtypes
print(types)

rank                          int64
revenues                      int64
revenue_change              float64
profits                     float64
assets                        int64
profit_change               float64
ceo                          object
industry                     object
sector                       object
previous_rank                 int64
country                      object
hq_location                  object
website                      object
years_on_global_500_list      int64
employees                     int64
total_stockholder_equity      int64
dtype: object


In [3]:
# Get few first rows from a dataframe
firsts = f500.head(5)
firsts #print the 5 first rows of the data sets

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [4]:
# Get the last rows 
lasts = f500.tail(5)
lasts

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


In [5]:
f500_info = f500.info(verbose = True)

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
rank                        500 non-null int64
revenues                    500 non-null int64
revenue_change              498 non-null float64
profits                     499 non-null float64
assets                      500 non-null int64
profit_change               436 non-null float64
ceo                         500 non-null object
industry                    500 non-null object
sector                      500 non-null object
previous_rank               500 non-null int64
country                     500 non-null object
hq_location                 500 non-null object
website                     500 non-null object
years_on_global_500_list    500 non-null int64
employees                   500 non-null int64
total_stockholder_equity    500 non-null int64
dtypes: float64(3), int64(7), object(6)
memory usage: 66.4+ KB


In [6]:
f500_info = f500.info(verbose = False)

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Columns: 16 entries, rank to total_stockholder_equity
dtypes: float64(3), int64(7), object(6)
memory usage: 66.4+ KB


## Columns selection by label
In pandas, we can use DataFrame.loc[] to access to a  specific columns

In [7]:
df = pd.DataFrame([[1, 2], [3, 4], [3, 4], [5, 6]], index = ["viper","python", "dragon", "cobra"], columns=["max_speed", "shield"])
df

Unnamed: 0,max_speed,shield
viper,1,2
python,3,4
dragon,3,4
cobra,5,6


In [8]:
# single label which return the row as Series
max_speed = df.loc['viper']
max_speed

max_speed    1
shield       2
Name: viper, dtype: int64

In [9]:
# get list of label
labels =['viper', 'cobra']
df.loc[labels]

Unnamed: 0,max_speed,shield
viper,1,2
cobra,5,6


In [10]:
asignmt_df = df.loc['viper', 'shield']
asignmt_df = 3
asignmt_df

3

In [11]:
df.loc[:, 'max_speed'] = 3
df

Unnamed: 0,max_speed,shield
viper,3,2
python,3,4
dragon,3,4
cobra,3,6


In [12]:
df.loc['viper':'dragon', 'max_speed']

viper     3
python    3
dragon    3
Name: max_speed, dtype: int64

In [13]:
df.loc[df['max_speed'] == 0]
df

Unnamed: 0,max_speed,shield
viper,3,2
python,3,4
dragon,3,4
cobra,3,6


In [14]:
f500.head(n=5)

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [15]:
f500_rank_country = f500.loc[:, ['rank', 'country']]
f500_rank_country.head(n=5)

Unnamed: 0,rank,country
Walmart,1,USA
State Grid,2,China
Sinopec Group,3,China
China National Petroleum,4,China
Toyota Motor,5,Japan


In [16]:
# using a lambda expression with a predicate lambda df: df['colum_label'] == 'criteria'
f500_france = f500.loc[lambda df: df['country'] == 'Portugal']
france_counts = f500_france.shape[0]
france_counts

0

In [17]:
f500.loc["Walmart"]

rank                                             1
revenues                                    485873
revenue_change                                 0.8
profits                                      13643
assets                                      198825
profit_change                                 -7.2
ceo                            C. Douglas McMillon
industry                     General Merchandisers
sector                                   Retailing
previous_rank                                    1
country                                        USA
hq_location                        Bentonville, AR
website                     http://www.walmart.com
years_on_global_500_list                        23
employees                                  2300000
total_stockholder_equity                     77798
Name: Walmart, dtype: object

In [18]:
f500['country'].head()

Walmart                       USA
State Grid                  China
Sinopec Group               China
China National Petroleum    China
Toyota Motor                Japan
Name: country, dtype: object

In [19]:
f500[['rank', 'country']].head()

Unnamed: 0,rank,country
Walmart,1,USA
State Grid,2,China
Sinopec Group,3,China
China National Petroleum,4,China
Toyota Motor,5,Japan


In [20]:
f500[['rank', 'country']].tail()

Unnamed: 0,rank,country
Teva Pharmaceutical Industries,496,Israel
New China Life Insurance,497,China
Wm. Morrison Supermarkets,498,Britain
TUI,499,Germany
AutoNation,500,USA


In [21]:
countries = f500['country']
revenues_years = f500[['revenues', 'years_on_global_500_list']]
ceo_to_sector = f500.loc[:, "ceo":"sector"]

In [22]:
countries.head()

Walmart                       USA
State Grid                  China
Sinopec Group               China
China National Petroleum    China
Toyota Motor                Japan
Name: country, dtype: object

In [23]:
revenues_years.head(20)

Unnamed: 0,revenues,years_on_global_500_list
Walmart,485873,23
State Grid,315199,17
Sinopec Group,267518,19
China National Petroleum,262573,17
Toyota Motor,254694,23
Volkswagen,240264,23
Royal Dutch Shell,240033,23
Berkshire Hathaway,223604,21
Apple,215639,15
Exxon Mobil,205004,23


In [24]:
ceo_to_sector.head()

Unnamed: 0,ceo,industry,sector
Walmart,C. Douglas McMillon,General Merchandisers,Retailing
State Grid,Kou Wei,Utilities,Energy
Sinopec Group,Wang Yupu,Petroleum Refining,Energy
China National Petroleum,Zhang Jianhua,Petroleum Refining,Energy
Toyota Motor,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts


## Selecting item from pandas series i.e 1D pandas arrays

In [25]:
countries= f500['country']
countries = f500.country
print(type(countries))

<class 'pandas.core.series.Series'>


In [26]:
walmart = f500.loc['Walmart']
print(walmart)
print(type(walmart))
rank = walmart['rank']
print(rank)

rank                                             1
revenues                                    485873
revenue_change                                 0.8
profits                                      13643
assets                                      198825
profit_change                                 -7.2
ceo                            C. Douglas McMillon
industry                     General Merchandisers
sector                                   Retailing
previous_rank                                    1
country                                        USA
hq_location                        Bentonville, AR
website                     http://www.walmart.com
years_on_global_500_list                        23
employees                                  2300000
total_stockholder_equity                     77798
Name: Walmart, dtype: object
<class 'pandas.core.series.Series'>
1


In [27]:
f500.head()

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [28]:
walmart_and_toyota_motor = f500.loc[['Walmart','Toyota Motor']]
walmart_and_toyota_motor

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [29]:
#walmart_to_toyota_motor = f500.loc['Walmart':'Toyota Motor']
walmart_to_toyota_motor = f500['Walmart':'Toyota Motor']
walmart_to_toyota_motor

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [30]:
countries.head()

Walmart                       USA
State Grid                  China
Sinopec Group               China
China National Petroleum    China
Toyota Motor                Japan
Name: country, dtype: object

In [31]:
print(type(countries))

<class 'pandas.core.series.Series'>


In [32]:
countries['Walmart']

'USA'

## Select row from DataFrame by Label

### Instructions
By selecting data from f500:
- Create a new variable, drink_companies, with:
  - Rows with indicies Anheuser-Busch InBev, Coca-Cola, and Heineken Holding, in that order.
  - All columns.
- Create a new variable big_movers, with:
  - Rows with indicies Aviva, HP, JD.com, and BHP Billiton, in that order.
  - The rank and previous_rank columns, in that order.
- Create a new variable, middle_companies with:
  - All rows with indicies from Tata Motors to Nationwide, inclusive.
  - All columns from rank to country, inclusive.

In [33]:
# selectiong a list of rows from DataFrame
drink_companies = f500.loc[['Anheuser-Busch InBev', 'Coca-Cola', 'Heineken Holding']]
drink_companies.head()

# selectiong a list of rows from DataFrame
big_movers = f500.loc[['Aviva', 'HP', 'JD.com', 'BHP Billiton'], ['rank', 'previous_rank']]
big_movers.head()

middle_companies = f500.loc['Tata Motors':'Nationwide', 'rank':'country']
middle_companies

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country
Tata Motors,247,40329,-4.2,1111.6,42162,-34.0,Guenter Butschek,Motor Vehicles and Parts,Motor Vehicles & Parts,226,India
Aluminum Corp. of China,248,40278,6.0,-282.5,75089,,Yu Dehui,Metals,Materials,262,China
Mitsui,249,40275,1.6,2825.3,103231,,Tatsuo Yasunaga,Trading,Wholesalers,245,Japan
Manulife Financial,250,40238,49.4,2209.7,537461,28.9,Donald A. Guloien,"Insurance: Life, Health (stock)",Financials,394,Canada
China Minsheng Banking,251,40234,-5.2,7201.6,848389,-1.8,Zheng Wanchun,Banks: Commercial and Savings,Financials,221,China
China Pacific Insurance (Group),252,40193,2.2,1814.9,146873,-35.7,Huo Lianhong,"Insurance: Life, Health (stock)",Financials,251,China
American Airlines Group,253,40180,-2.0,2676.0,51274,-64.8,W. Douglas Parker,Airlines,Transportation,236,USA
Nationwide,254,40074,-0.4,334.3,197790,-42.4,Stephen S. Rasmussen,Insurance: Property and Casualty (Mutual),Financials,241,USA


# Series and DataFrames describe statics method
* Series.describe() method, which returns some descriptive statistic on the data contained whithin a specific pandas Series
* Pandas series are 1D dimension pandas object


In [34]:
revs = f500['revenues']
print(revs.describe())

count       500.000000
mean      55416.358000
std       45725.478963
min       21609.000000
25%       29003.000000
50%       40236.000000
75%       63926.750000
max      485873.000000
Name: revenues, dtype: float64


### Instructions
* Use the appropriate describe() method to:
   * Return a series of descriptive statistics for the profits column, and assign the result to profits_desc.
   * Return a dataframe of descriptive statistics for the revenues and employees columns, in order, and assign the result to revenue_and_employees_desc.

In [35]:
# describe method of pandas series to get some descriptive statistics values 
# return a series of descriptive statics on the columns profits
profits = f500['profits']
profits_desc = profits.describe()
print(profits_desc)

count      499.000000
mean      3055.203206
std       5171.981071
min     -13038.000000
25%        556.950000
50%       1761.600000
75%       3954.000000
max      45687.000000
Name: profits, dtype: float64


In [36]:
# DataFrame describe return a DataFrame of descriptive statics of the column
selected_columns = ['revenues', 'employees']
revenues_and_employees = f500[selected_columns]
revenues_and_employees.head()
revenues_and_employees_desc = revenues_and_employees.describe()
print(revenues_and_employees_desc)

            revenues     employees
count     500.000000  5.000000e+02
mean    55416.358000  1.339983e+05
std     45725.478963  1.700878e+05
min     21609.000000  3.280000e+02
25%     29003.000000  4.293250e+04
50%     40236.000000  9.291050e+04
75%     63926.750000  1.689172e+05
max    485873.000000  2.300000e+06


In [37]:
# The describe method return descriptive statics about numeric columns, however we can use, the include parameters in order to 
# get statics on strings columns
print(f500["country"].describe(include=['O']))

count     500
unique     34
top       USA
freq      132
Name: country, dtype: object


In [38]:
selected_columns = ['revenues', 'profits']
renues_and_profits = f500[selected_columns]
renues_and_profits.head()

# numpy ndarray where where use ndarray method to do calculation on columns and rows 
# given the index of the axis 0 for rows and  for columns
# we will DataFrame descriptive method  to get dataframe descriptive statics

Unnamed: 0,revenues,profits
Walmart,485873,13643.0
State Grid,315199,9571.3
Sinopec Group,267518,1257.9
China National Petroleum,262573,1867.5
Toyota Motor,254694,16899.3


In [39]:
# Calculation of median  of the columns revenues and profites for each observations
medians = renues_and_profits.median(axis=0)
medians = renues_and_profits.median(axis="index")

# The median is calculated by default alonf the axis = 0, otherwise the index axis
medians = renues_and_profits.median()
# we could also use .median(axis="index")
print(medians)

revenues    40236.0
profits      1761.6
dtype: float64


# pandas.Series.value_counts
Return a Series containing counts of unique values.

In [40]:
print(f500["sector"].value_counts().head(3))

Financials    118
Energy         80
Technology     44
Name: sector, dtype: int64


### Instructions

* Use <font color="red">Series.value_counts()</font> and <font color="red">Series.head()</font> to return the three most common values for the <font color="red">country</font> column, and assign the results to <font color="red">top3_countries</font>.
* Use <font color="red">Series.value_counts()</font> and <font color="red">Series.head()</font> to return the three most common values for the <font color="red">previous_rank</font> column, and assign the results to <font color="red">top3_previous_rank</font>.
* Use the appropriate max() method to find the maximum value for only the numeric columns from f500 (you may need to check the documentation), and assign the result to the variable max_f500.
* After you have run your code, use the variable inspector to view each of the new variables you created.

In [41]:
countries = f500['country'].value_counts()
top3_countries = countries.head(n=3)
print(top3_countries)
print("\n")

previous_rank = f500['previous_rank'].value_counts()
top3_previous_rank =previous_rank.head(n=3)
print(top3_previous_rank)
print("\n")

# get the max of numeric values only with DataFrame max() ùmethod
max_f500 = f500.max(numeric_only=True)
print(max_f500)

USA      132
China    109
Japan     51
Name: country, dtype: int64


0      33
159     1
147     1
Name: previous_rank, dtype: int64


rank                            500.0
revenues                     485873.0
revenue_change                  442.3
profits                       45687.0
assets                      3473238.0
profit_change                  8909.5
previous_rank                   500.0
years_on_global_500_list         23.0
employees                   2300000.0
total_stockholder_equity     301893.0
dtype: float64


In [42]:
# top 5 rank and revenues
top5_rank_revenues = f500[["rank", "revenues"]].head(n=5)
top5_rank_revenues

Unnamed: 0,rank,revenues
Walmart,1,485873
State Grid,2,315199
Sinopec Group,3,267518
China National Petroleum,4,262573
Toyota Motor,5,254694


In [43]:
# entriprises with previous_rank is 0
bool_previous_rank = f500['previous_rank'] == 0
f500[bool_previous_rank].head()

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Legal & General Group,49,105235,442.3,1697.9,577954,3.4,Nigel Wilson,"Insurance: Life, Health (stock)",Financials,0,Britain,"London, Britain",http://www.legalandgeneralgroup.com,17,8939,8579
Uniper,91,74407,,-3557.5,51541,,Klaus Schafer,Energy,Energy,0,Germany,"Dusseldorf, Germany",http://www.uniper.energy,1,12890,12889
Dell Technologies,124,64806,18.1,-1672.0,118206,,Michael S. Dell,"Computers, Office Equipment",Technology,0,USA,"Round Rock, TX",http://www.delltechnologies.com,17,138000,13243
Anbang Insurance Group,139,60800,124.0,3883.9,430040,0.9,Wu Xiaohui,"Insurance: Life, Health (Mutual)",Financials,0,China,"Beijing, China",http://www.anbanggroup.com,1,40707,20372
Albertsons Cos.,141,59678,1.6,-373.3,23755,,Robert G. Miller,Food and Drug Stores,Food & Drug Stores,0,USA,"Boise, ID",http://www.albertsons.com,13,273000,1371


### Instrcutions
- Add a new column, revenues_b to the f500 dataframe by using vectorized division to divide the values in the existing revenues column by 1000 (converting them from millions to billions).
- The company 'Dow Chemical' have named a new CEO. Update the value where the index label is Dow Chemical and for the ceo column to Jim Fitterling.

In [44]:
f500['revenues_b'] = f500['revenues']/1000
revenues_b = f500['revenues_b']
revenues_b.value_counts().head(5)

30.390    2
29.003    2
23.044    2
53.427    1
65.547    1
Name: revenues_b, dtype: int64

In [45]:
dow_chemical_ceo = f500.loc['Dow Chemical', 'ceo'] = 'Jim Fitterling'
dow_chemical_ceo

'Jim Fitterling'

In [46]:
motor_bool = f500["industry"] == "Motor Vehicles and Parts"
print(motor_bool.head())

Walmart                     False
State Grid                  False
Sinopec Group               False
China National Petroleum    False
Toyota Motor                 True
Name: industry, dtype: bool


In [47]:
f500.loc[motor_bool,'country'].head()

Toyota Motor        Japan
Volkswagen        Germany
Daimler           Germany
General Motors        USA
Ford Motor            USA
Name: country, dtype: object

### Boolean indexing with Panda

### Instructions
- Create a boolean series, kr_bool, that compares whether the values in the country column from the f500 dataframe are equal to "South Korea"
- Use that boolean series to index the full f500 dataframe, assigning just the first five rows to top_5_kr.

In [48]:
kr_bool = f500['country'] == 'South Korea'
top_5_kr = f500[kr_bool].head(5)
top_5_kr

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,revenues_b
Samsung Electronics,15,173957,-2.0,19316.5,217104,16.8,Oh-Hyun Kwon,"Electronics, Electrical Equip.",Technology,13,South Korea,"Suwon, South Korea",http://www.samsung.com,23,325000,154376,173.957
Hyundai Motor,78,80701,-0.8,4659.0,148092,-17.9,Mong-Koo Chung,Motor Vehicles and Parts,Motor Vehicles & Parts,84,South Korea,"Seoul, South Korea",http://worldwide.hyundai.com,22,129315,55639,80.701
SK Holdings,95,72579,107.4,659.7,85332,-86.0,Tae Won Chey,Petroleum Refining,Energy,294,South Korea,"Seoul, South Korea",http://www.sk.co.kr,2,84000,10858,72.579
Korea Electric Power,177,51500,-0.6,6074.1,147265,-48.3,Hwan-Eik Cho,Utilities,Energy,172,South Korea,"Jeollanam-do, South Korea",http://www.kepco.co.kr,23,43688,59394,51.5
LG Electronics,201,47712,-4.6,66.2,31348,-39.8,Seong-Jin Jo,"Electronics, Electrical Equip.",Technology,180,South Korea,"Seoul, South Korea",http://www.lg.com,17,75000,9926,47.712


# Exploring data with Pandas
We have seen in the course 1 of this step, we can select Pandas Series and DataFrames columns using label. In some situation is cumbersome to manipulate dataset using label selection, particular is the dataset is large. Pandas put in our disposition, an index-based Pandas and DataFrames indexes and columns selections thanks to the method **iloc**  integer localisation.

In [51]:
# Create a DataFrame from an array of dict()
mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
df = pd.DataFrame(mydict)
df

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,100,200,300,400
2,1000,2000,3000,4000


In [63]:
df.iloc[:, [0, 2]]

Unnamed: 0,a,c
0,1,3
1,100,300
2,1000,3000


In [69]:
# DataFrame selecting even index-based rows
callable_function_selection = df[lambda x: x.index % 2 == 0]
callable_function_selection

Unnamed: 0,a,b,c,d
0,1,2,3,4
2,1000,2000,3000,4000


In [67]:
# DataFrame single row selection
single_row = df.loc[0]
print(type(single_row))

<class 'pandas.core.series.Series'>


In [75]:
# DataFrame slicing
first_and_second_row = df[0:2]
first_and_second_row

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,100,200,300,400


### Intructions
We have provided code to read the f500.csv file into a dataframe and assigned it to f500, and inserted NaN values into the previous_rank column as we did in the previous mission.

* Select just the fifth row of the f500 dataframe, assigning the result to fifth_row.
* Select the first three rows of the f500 dataframe, assigning the result to first_three_rows.
* Select the first and seventh rows and the first 5 columns of the f500 dataframe, assigning the result to first_seventh_row_slice
* After you have run your code, use the variable inspector to examine each of the objects you created.

In [77]:
import numpy as np
f500.loc[f500["previous_rank"] == 0, "previous_rank"] = np.nan
f500.head()

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,revenues_b
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1.0,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,485.873
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2.0,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,315.199
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4.0,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,267.518
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3.0,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,262.573
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8.0,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210,254.694


In [86]:
fifth_row = f500.iloc[4]
fifth_row

first_three_rows = f500.iloc[:3]
# first_three_rows = f500.iloc[[0, 1, 2]]
first_three_rows

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,revenues_b
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1.0,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,485.873
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2.0,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,315.199
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4.0,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,267.518


In [87]:
first_seventh_row_slice = f500.iloc[[0,6], :5]
first_seventh_row_slice

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,revenues_b
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1.0,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,485.873
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2.0,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,315.199
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4.0,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,267.518
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3.0,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,262.573
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8.0,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210,254.694
Volkswagen,6,240264,1.5,5937.3,432116,,Matthias Muller,Motor Vehicles and Parts,Motor Vehicles & Parts,7.0,Germany,"Wolfsburg, Germany",http://www.volkswagen.com,23,626715,97753,240.264
Royal Dutch Shell,7,240033,-11.8,4575.0,411275,135.9,Ben van Beurden,Petroleum Refining,Energy,5.0,Netherlands,"The Hague, Netherlands",http://www.shell.com,23,89000,186646,240.033


## Reading CSV files with Pandas
Pandas has a built-in method for reading csv file, pd.read_csv(), this method take as paramter the filename, the index_col and integer indicating which columns we want use to as index for the DataFrame.

Let's have a look at the following CSV data:

company,rank,revenues,revenue_change
Walmart,1,485873,0.8
State Grid,2,315199,-4.4
Sinopec Group,3,267518,-9.1
China National Petroleum,4,262573,-12.3
Toyota Motor,5,254694,7.7

In [None]:
# f500 = pd.read_csv('f500.csv', index_col = 0)
# argument filename => 'f500.csv'
# named parameter index_col with a value 0, meaning we will use the company
# column as the index label. However when doing this, the company column name
# will be used as a row.
# to avoid this, DataFrame.index.name = None
# f500.index.name = None;

# We can read the csv file without the optional paramter index_col, in order o access
# integer location position 
