# Uvod v Pandas

## Understanding pandas and NumPy


<p></p><center><img alt="anatomy of a dataframe" src="images/df_anatomy_static_resized.svg"></center><p></p>

## About pandas

Key Features of Pandas
- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.

## Importing pandas

[Installation guide](https://pandas.pydata.org/docs/getting_started/install.html)

In [108]:
import pandas as pd

In [109]:
import numpy as np

In [110]:
pd.__version__

'1.5.0'

In [111]:
pd.read_csv?

More detailed documentation, along with tutorials and other resources, can be found at http://pandas.pydata.org/.

## Introduction to the Data

<p>The data set is a CSV file called <code>f500.csv</code>. Here is a data dictionary for some of the columns in the CSV:</p>
<ul>
<li><code>company</code>: Name of the company.</li>
<li><code>rank</code>: Global 500 rank for the company.</li>
<li><code>revenues</code>: Company's total revenue for the fiscal year, in millions of dollars (USD).</li>
<li><code>revenue_change</code>: Percentage change in revenue between the current and prior fiscal year.</li>
<li><code>profits</code>: Net income for the fiscal year, in millions of dollars (USD).</li>
<li><code>ceo</code>: Company's Chief Executive Officer.</li>
<li><code>industry</code>: Industry in which the company operates.</li>
<li><code>sector</code>: Sector in which the company operates.</li>
<li><code>previous_rank</code>: Global 500 rank for the company for the prior year.</li>
<li><code>country</code>: Country in which the company is headquartered.</li>
</ul>
</div>

<img src="images/02_io_readwrite.svg">

In [112]:
f500 = pd.read_csv("data/f500.csv", index_col=0)

In [113]:
f500

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006


In [114]:
type(f500)

pandas.core.frame.DataFrame

In [115]:
f500.shape

(500, 16)

## Introducing Pandas Objects - Data Structures

ONE OF THE KEYS TO UNDERSTANDING PANDAS IS TO UNDERSTAND
model. At the core of pandas are three data structures:

- Series — 1D (can be understood as columns of a spreadsheet)

<img src="images/01_table_series.svg">

- DataFrame — 2D (can be understood as a single spreadsheet)

<img src="images/01_table_dataframe.svg">

- Panel — 3D (can be understood as a group of spreadsheets)

<table class="table table-bordered">
<tbody><tr>
<th style="text-align:center;">Data Structure</th>
<th style="text-align:center;">Dimensions</th>
<th style="text-align:center;">Description</th>
</tr>
<tr>
<td style="text-align:center;">Series</td>
<td style="text-align:center;">1</td>
<td style="text-align:center;">1D labeled homogeneous array, sizeimmutable.</td>
</tr>
<tr>
<td style="text-align:center;">Data Frames</td>
<td style="text-align:center;">2</td>
<td style="text-align:center;">General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed
columns.</td>
</tr>
<tr>
<td style="text-align:center;">Panel</td>
<td style="text-align:center;">3</td>
<td style="text-align:center;">General 3D labeled, size-mutable array.</td>
</tr>
</tbody></table>

## Introducing DataFrames

In [116]:
f500.head(4)

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893


In [117]:
f500.tail(3)

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


In [118]:
f500.dtypes

rank                          int64
revenues                      int64
revenue_change              float64
profits                     float64
assets                        int64
profit_change               float64
ceo                          object
industry                     object
sector                       object
previous_rank                 int64
country                      object
hq_location                  object
website                      object
years_on_global_500_list      int64
employees                     int64
total_stockholder_equity      int64
dtype: object

In [119]:
f500.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   rank                      500 non-null    int64  
 1   revenues                  500 non-null    int64  
 2   revenue_change            498 non-null    float64
 3   profits                   499 non-null    float64
 4   assets                    500 non-null    int64  
 5   profit_change             436 non-null    float64
 6   ceo                       500 non-null    object 
 7   industry                  500 non-null    object 
 8   sector                    500 non-null    object 
 9   previous_rank             500 non-null    int64  
 10  country                   500 non-null    object 
 11  hq_location               500 non-null    object 
 12  website                   500 non-null    object 
 13  years_on_global_500_list  500 non-null    int64  
 14  em

## Pandas Data Selection - indexing

### Selecting a Column From a DataFrame by Label (.loc)

    df.loc[row_label, column_label]

<img src="images/03_subset_columns.svg">

In [120]:
f500.head(2)

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456


In [121]:
rank_col = f500.loc[:, "rank"]

In [122]:
rank_col = f500["rank"]

In [123]:
rank_col

company
Walmart                             1
State Grid                          2
Sinopec Group                       3
China National Petroleum            4
Toyota Motor                        5
                                 ... 
Teva Pharmaceutical Industries    496
New China Life Insurance          497
Wm. Morrison Supermarkets         498
TUI                               499
AutoNation                        500
Name: rank, Length: 500, dtype: int64

In [124]:
industries = f500["industry"]

In [125]:
type(industries)

pandas.core.series.Series

In [126]:
type(industries.values)

numpy.ndarray

In [127]:
industries.shape

(500,)

In [128]:
print(industries.values.dtype)

object


<div>
<p><img alt="dataframe exploded" src="images/df_exploded_resized.svg"></p>
</div>

In [129]:
f500.loc[:, ["country", "rank"]]

Unnamed: 0_level_0,country,rank
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,USA,1
State Grid,China,2
Sinopec Group,China,3
China National Petroleum,China,4
Toyota Motor,Japan,5
...,...,...
Teva Pharmaceutical Industries,Israel,496
New China Life Insurance,China,497
Wm. Morrison Supermarkets,Britain,498
TUI,Germany,499


In [130]:
f500[["country", "rank"]]

Unnamed: 0_level_0,country,rank
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,USA,1
State Grid,China,2
Sinopec Group,China,3
China National Petroleum,China,4
Toyota Motor,Japan,5
...,...,...
Teva Pharmaceutical Industries,Israel,496
New China Life Insurance,China,497
Wm. Morrison Supermarkets,Britain,498
TUI,Germany,499


In [131]:
f500.loc[:, "profits":"ceo"]

Unnamed: 0_level_0,profits,assets,profit_change,ceo
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Walmart,13643.0,198825,-7.2,C. Douglas McMillon
State Grid,9571.3,489838,-6.2,Kou Wei
Sinopec Group,1257.9,310726,-65.0,Wang Yupu
China National Petroleum,1867.5,585619,-73.7,Zhang Jianhua
Toyota Motor,16899.3,437575,-12.3,Akio Toyoda
...,...,...,...,...
Teva Pharmaceutical Industries,329.0,92890,-79.3,Yitzhak Peterburg
New China Life Insurance,743.9,100609,-45.6,Wan Feng
Wm. Morrison Supermarkets,406.4,11630,20.4,David T. Potts
TUI,1151.7,16247,195.5,Friedrich Joussen


<div>

<p>A summary of the techniques we've learned so far is below:</p>
<p></p><center>
<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Common Shorthand</th>
<th>Other Shorthand</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single column</td>
<td><code>df.loc[:,"col1"]</code></td>
<td bgcolor="#00FF00"><code>df["col1"]</code></td>
<td><code>df.col1</code></td>
</tr>
<tr>
<td>List of columns</td>
<td><code>df.loc[:,["col1", "col7"]]</code></td>
<td bgcolor="#00FF00"><code>df[["col1", "col7"]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of columns</td>
<td bgcolor="#00FF00"><code>df.loc[:,"col1":"col4"]</code></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</center><p></p>
</div>

<div class="alert alert-block alert-info">
<b>Vaja:</b> Select the country column. Assign the result to the variable name countries.</div>

In [132]:
countries = f500["country"]

<div class="alert alert-block alert-info">
<b>Vaja:</b> In order, select the revenues and years_on_global_500_list columns. Assign the result to the variable name revenues_years.</div>

In [133]:
revenues_years = f500[["revenues", "years_on_global_500_list"]]

<div class="alert alert-block alert-info">
<b>Vaja:</b> In order, select all columns from ceo up to and including sector. Assign the result to the variable name ceo_to_sector.</div>

In [134]:
ceo_to_sector = f500.loc[:, "ceo":"sector"]

### Selecting Rows From a DataFrame by Label (.loc)

    df.loc[row_label, column_label]

<img src="images/03_subset_rows.svg">

**Select a single row**

In [135]:
f500.head()

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [136]:
single_row = f500.loc["Sinopec Group"]

In [137]:
type(single_row)

pandas.core.series.Series

In [138]:
print(single_row.dtype)

object


In [139]:
single_row

rank                                             3
revenues                                    267518
revenue_change                                -9.1
profits                                     1257.9
assets                                      310726
profit_change                                -65.0
ceo                                      Wang Yupu
industry                        Petroleum Refining
sector                                      Energy
previous_rank                                    4
country                                      China
hq_location                         Beijing, China
website                     http://www.sinopec.com
years_on_global_500_list                        19
employees                                   713288
total_stockholder_equity                    106523
Name: Sinopec Group, dtype: object

**Select a list of rows**

In [140]:
cols = ["Toyota Motor", "Walmart"]
f500.loc[cols]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798


**Select a slice object with labels**

In [141]:
f500.loc["State Grid":"Toyota Motor"]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


<img alt="series vs dataframe: series" src="images/df_series_s_updated.svg">

<img alt="series vs dataframe: dataframe" src="images/df_series_df_updated.svg">

### Selecting Items from a Series by Label (.loc)

In [142]:
sectors = f500["sector"]

In [143]:
count_sectors = sectors.value_counts()

In [144]:
count_sectors

Financials                       118
Energy                            80
Technology                        44
Motor Vehicles & Parts            34
Wholesalers                       28
Health Care                       27
Food & Drug Stores                20
Transportation                    19
Telecommunications                18
Retailing                         17
Food, Beverages & Tobacco         16
Materials                         16
Industrials                       15
Aerospace & Defense               14
Engineering & Construction        13
Chemicals                          7
Household Products                 3
Media                              3
Hotels, Restaurants & Leisure      3
Business Services                  3
Apparel                            2
Name: sector, dtype: int64

In [145]:
count_sectors["Materials"]

16

In [146]:
count_sectors[["Materials", "Media"]]

Materials    16
Media         3
Name: sector, dtype: int64

<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Shorthand Convention</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single item from series</td>
<td><code>s.loc["item8"]</code></td>
<td bgcolor="#00FF00"> <code>s["item8"]</code></td>
</tr>
<tr>
<td>List of items from series</td>
<td><code>s.loc[["item1","item7"]]</code></td>
<td bgcolor="#00FF00"><code>s[["item1","item7"]]</code></td>
</tr>
<tr>
<td>Slice of items from series</td>
<td><code>s.loc["item2":"item4"]</code></td>
<td bgcolor="#00FF00"><code>s["item2":"item4"]</code></td>
</tr>
</tbody>
</table>

### Summary of label selection (.loc)

<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Shorthand Convention</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single column from dataframe</td>
<td><code>df.loc[:,"col1"]</code></td>
<td bgcolor="#00FF00"><code>df["col1"]</code></td>
</tr>
<tr>
<td>List of columns from dataframe</td>
<td><code>df.loc[:,["col1","col7"]]</code></td>
<td bgcolor="#00FF00"><code>df[["col1","col7"]]</code></td>
</tr>
<tr>
<td>Slice of columns from dataframe</td>
<td bgcolor="#00FF00"><code>df.loc[:,"col1":"col4"]</code></td>
<td></td>
</tr>
<tr>
<td>Single row from dataframe</td>
<td bgcolor="#00FF00"><code>df.loc["row4"]</code></td>
<td></td>
</tr>
<tr>
<td>List of rows from dataframe</td>
<td bgcolor="#00FF00"><code>df.loc[["row1", "row8"]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of rows from dataframe</td>
<td bgcolor="#00FF00"><code>df.loc["row3":"row5"]</code></td>
<td><code>df["row3":"row5"]</code></td>
</tr>
<tr>
<td>Single item from series</td>
<td><code>s.loc["item8"]</code></td>
<td bgcolor="#00FF00"><code>s["item8"]</code></td>
</tr>
<tr>
<td>List of items from series</td>
<td><code>s.loc[["item1","item7"]]</code></td>
<td bgcolor="#00FF00"><code>s[["item1","item7"]]</code></td>
</tr>
<tr>
<td>Slice of items from series</td>
<td><code>s.loc["item2":"item4"]</code></td>
<td bgcolor="#00FF00"><code>s["item2":"item4"]</code></td>
</tr>
</tbody>
</table>

<div class="alert alert-block alert-info">
<b>Vaja:</b> Create a new variable big_movers, with: Rows with indices Aviva, HP, JD.com, and BHP Billiton, in that order. The rank and previous_rank columns, in that order.</div>

In [147]:
big_movers = f500.loc[["Aviva", "HP", "JD.com", "BHP Billiton"], ["rank","previous_rank"]]
big_movers

Unnamed: 0_level_0,rank,previous_rank
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Aviva,90,279
HP,194,48
JD.com,261,366
BHP Billiton,350,168


​
 
<div class="alert alert-block alert-info">
<b>Vaja:</b> Create a new variable, bottom_companies with: All rows with indices from National Gridto AutoNation, inclusive. The rank, sector, and country columns.</div>

In [148]:
bottom_companies = f500.loc["National Grid":"AutoNation", ["rank","sector","country"]]
bottom_companies

Unnamed: 0_level_0,rank,sector,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
National Grid,491,Energy,Britain
Dollar General,492,Retailing,USA
Telecom Italia,493,Telecommunications,Italy
Xiamen ITG Holding Group,494,Wholesalers,China
Xinjiang Guanghui Industry Investment,495,Wholesalers,China
Teva Pharmaceutical Industries,496,Health Care,Israel
New China Life Insurance,497,Financials,China
Wm. Morrison Supermarkets,498,Food & Drug Stores,Britain
TUI,499,Business Services,Germany
AutoNation,500,Retailing,USA


In [149]:
f500.head(10)

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
Volkswagen,6,240264,1.5,5937.3,432116,,Matthias Muller,Motor Vehicles and Parts,Motor Vehicles & Parts,7,Germany,"Wolfsburg, Germany",http://www.volkswagen.com,23,626715,97753
Royal Dutch Shell,7,240033,-11.8,4575.0,411275,135.9,Ben van Beurden,Petroleum Refining,Energy,5,Netherlands,"The Hague, Netherlands",http://www.shell.com,23,89000,186646
Berkshire Hathaway,8,223604,6.1,24074.0,620854,,Warren E. Buffett,Insurance: Property and Casualty (Stock),Financials,11,USA,"Omaha, NE",http://www.berkshirehathaway.com,21,367700,283001
Apple,9,215639,-7.7,45687.0,321686,-14.4,Timothy D. Cook,"Computers, Office Equipment",Technology,9,USA,"Cupertino, CA",http://www.apple.com,15,116000,128249
Exxon Mobil,10,205004,-16.7,7840.0,330314,-51.5,Darren W. Woods,Petroleum Refining,Energy,6,USA,"Irving, TX",http://www.exxonmobil.com,23,72700,167325


In [150]:
f500.loc["China National Petroleum":"Volkswagen", ["ceo", "industry", "country"]]

Unnamed: 0_level_0,ceo,industry,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
China National Petroleum,Zhang Jianhua,Petroleum Refining,China
Toyota Motor,Akio Toyoda,Motor Vehicles and Parts,Japan
Volkswagen,Matthias Muller,Motor Vehicles and Parts,Germany


## Vectorized Operations

<p><img alt="Vectorized operation" src="images/vectorized.gif"></p>

In [151]:
my_series = pd.Series([1, 2, 3, 4, 5])
my_series

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [152]:
my_series = my_series + 10

In [153]:
my_series

0    11
1    12
2    13
3    14
4    15
dtype: int64

<div>
<ul>
<li><code>series_a + series_b</code> - Addition</li>
<li><code>series_a - series_b</code> - Subtraction</li>
<li><code>series_a * series_b</code> - Multiplication (this is unrelated to the multiplications used in linear algebra).</li>
<li><code>series_a / series_b</code> - Division</li>
</ul>
</div>

In [154]:
f500.head()

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [155]:
rank_change = f500["previous_rank"] - f500["rank"]

In [156]:
rank_change

company
Walmart                             0
State Grid                          0
Sinopec Group                       1
China National Petroleum           -1
Toyota Motor                        3
                                 ... 
Teva Pharmaceutical Industries   -496
New China Life Insurance          -70
Wm. Morrison Supermarkets         -61
TUI                               -32
AutoNation                       -500
Length: 500, dtype: int64

##  Series Data Exploration Methods

<div>
<ul>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.max.html"><code>Series.max()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.min.html"><code>Series.min()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html"><code>Series.mean()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.median.html"><code>Series.median()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mode.html"><code>Series.mode()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html"><code>Series.sum()</code></a></li>
</ul>

</div>

In [157]:
my_series = pd.Series([0, 1, 2, 3, 4])
my_series

0    0
1    1
2    2
3    3
4    4
dtype: int64

In [158]:
print(my_series.sum())

10


<div class="alert alert-block alert-info">
<b>Vaja:</b> Use the Series.max() method to fMind the maximum value for the rank_change series. Assign the result to the variable rank_change_max.</div>

In [159]:
rank_change.max()

226

<div class="alert alert-block alert-info">
<b>Vaja:</b> Use the Series.min() method to find the minimum value for the rank_change series. Assign the result to the variable rank_change_min.</div>

In [160]:
rank_change.min()

-500

### Series Describe Method

In [161]:
assets = f500["assets"]

In [162]:
assets

company
Walmart                           198825
State Grid                        489838
Sinopec Group                     310726
China National Petroleum          585619
Toyota Motor                      437575
                                   ...  
Teva Pharmaceutical Industries     92890
New China Life Insurance          100609
Wm. Morrison Supermarkets          11630
TUI                                16247
AutoNation                         10060
Name: assets, Length: 500, dtype: int64

In [163]:
assets.describe()

count    5.000000e+02
mean     2.436323e+05
std      4.851937e+05
min      3.717000e+03
25%      3.658850e+04
50%      7.326150e+04
75%      1.805640e+05
max      3.473238e+06
Name: assets, dtype: float64

In [164]:
f500["country"].describe()

count     500
unique     34
top       USA
freq      132
Name: country, dtype: object

<div class="alert alert-block alert-info">
<b>Vaja:</b> Return a series of descriptive statistics for the rank column in f500.</div>

In [165]:
rank = f500["rank"]
rank.describe()

count    500.000000
mean     250.500000
std      144.481833
min        1.000000
25%      125.750000
50%      250.500000
75%      375.250000
max      500.000000
Name: rank, dtype: float64

<div class="alert alert-block alert-info">
<b>Vaja:</b> Return a series of descriptive statistics for the previous_rank column in f500.</div>

In [166]:
prev_rank = f500["previous_rank"]
prev_rank.describe()

count    500.000000
mean     222.134000
std      146.941961
min        0.000000
25%       92.750000
50%      219.500000
75%      347.250000
max      500.000000
Name: previous_rank, dtype: float64

## Method Chaining

In [167]:
f500["country"].value_counts()["USA"]

132

<div class="alert alert-block alert-info">
<b>Vaja:</b> Use Series.value_counts() and Series.loc to return the number of companies with a value of 0 in the previous_rank column in the f500 dataframe. Assign the results to zero_previous_rank.</div>

In [168]:
f500["previous_rank"].value_counts()[0]

33

## Dataframe Exploration Methods

<div>

<ul>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.max.html"><code>Series.max()</code></a> and <a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.max.html"><code>DataFrame.max()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.min.html"><code>Series.min()</code></a> and <a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.min.html"><code>DataFrame.min()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html"><code>Series.mean()</code></a> and <a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html"><code>DataFrame.mean()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.median.html"><code>Series.median()</code></a> and <a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.median.html"><code>DataFrame.median()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mode.html"><code>Series.mode()</code></a> and <a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mode.html"><code>DataFrame.mode()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html"><code>Series.sum()</code></a> and <a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sum.html"><code>DataFrame.sum()</code></a></li>
</ul>

<p><img alt="dataframe axis parameters" src="images/axis_param.svg"></p>

</div>

In [169]:
f500[["revenues", "profits"]].median(axis=0)

revenues    40236.0
profits      1761.6
dtype: float64

<div class="alert alert-block alert-info">
<b>Vaja:</b> Use the DataFrame.max() method to find the maximum value for only the numeric columns from f500 (you may need to check the documentation). Assign the result to the variable max_f500.</div>

In [170]:
f500.max(numeric_only=True)

rank                            500.0
revenues                     485873.0
revenue_change                  442.3
profits                       45687.0
assets                      3473238.0
profit_change                  8909.5
previous_rank                   500.0
years_on_global_500_list         23.0
employees                   2300000.0
total_stockholder_equity     301893.0
dtype: float64

### Dataframe Describe Method

In [171]:
f500.describe()

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,previous_rank,years_on_global_500_list,employees,total_stockholder_equity
count,500.0,500.0,498.0,499.0,500.0,436.0,500.0,500.0,500.0,500.0
mean,250.5,55416.358,4.538353,3055.203206,243632.3,24.152752,222.134,15.036,133998.3,30628.076
std,144.481833,45725.478963,28.549067,5171.981071,485193.7,437.509566,146.941961,7.932752,170087.8,43642.576833
min,1.0,21609.0,-67.3,-13038.0,3717.0,-793.7,0.0,1.0,328.0,-59909.0
25%,125.75,29003.0,-5.9,556.95,36588.5,-22.775,92.75,7.0,42932.5,7553.75
50%,250.5,40236.0,0.55,1761.6,73261.5,-0.35,219.5,17.0,92910.5,15809.5
75%,375.25,63926.75,6.975,3954.0,180564.0,17.7,347.25,23.0,168917.2,37828.5
max,500.0,485873.0,442.3,45687.0,3473238.0,8909.5,500.0,23.0,2300000.0,301893.0


In [172]:
f500.describe(include=["O"])

Unnamed: 0,ceo,industry,sector,country,hq_location,website
count,500,500,500,500,500,500
unique,500,58,21,34,235,500
top,C. Douglas McMillon,Banks: Commercial and Savings,Financials,USA,"Beijing, China",http://www.walmart.com
freq,1,51,118,132,56,1


## Assignment with pandas

In [173]:
top5 = f500[["rank", "revenues"]].head(5)

In [174]:
top5

Unnamed: 0_level_0,rank,revenues
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,1,485873
State Grid,2,315199
Sinopec Group,3,267518
China National Petroleum,4,262573
Toyota Motor,5,254694


In [175]:
top5["revenues"] = 0

In [176]:
top5

Unnamed: 0_level_0,rank,revenues
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,1,0
State Grid,2,0
Sinopec Group,3,0
China National Petroleum,4,0
Toyota Motor,5,0


In [177]:
top5.loc["Sinopec Group", "revenues"] = 999

In [178]:
top5

Unnamed: 0_level_0,rank,revenues
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,1,0
State Grid,2,0
Sinopec Group,3,999
China National Petroleum,4,0
Toyota Motor,5,0


In [179]:
#f500.head()

<div class="alert alert-block alert-info">
<b>Vaja:</b> The company "Dow Chemical" has named a new CEO. Update the value where the row label is Dow Chemical and for the ceo column to Jim Fitterling in the f500 dataframe.</div>

In [180]:
f500.loc["Dow Chemical","ceo"] = "Jim Fitterling"

## Using Boolean Indexing with pandas Objects

In [182]:
d = {'name': ['Bob', 'Eva', 'Sara', 'Mihael'], 'num': [12, 8, 5, 8]}
df = pd.DataFrame(data=d, index=['w', 'x', 'y', 'z'])
df

Unnamed: 0,name,num
w,Bob,12
x,Eva,8
y,Sara,5
z,Mihael,8


In [183]:
df["num"] == 8

w    False
x     True
y    False
z     True
Name: num, dtype: bool

In [184]:
df[df["num"] == 8]

Unnamed: 0,name,num
x,Eva,8
z,Mihael,8


<div class="alert alert-block alert-info">
<b>Vaja:</b> Create a boolean series, motor_bool, that compares whether the values in the industry column from the f500 dataframe are equal to "Motor Vehicles and Parts".
Use the motor_bool boolean series to index the country column. Assign the result to motor_countries.</div>

In [None]:
#  industry -> "Motor Vehicles and Parts"

f500.loc[f500["industry"] == "Motor Vehicles and Parts", "country"]

### Using Boolean Arrays to Assign Values

In [185]:
sector = "Motor Vehicles & Parts"
f500[f500["sector"] == sector].shape[0]

34

In [101]:
sector_and = "Motor Vehicles and Parts"
f500[f500["sector"] == sector_and].shape[0]

0

In [186]:
f500.loc[f500["sector"] == "Motor Vehicles & Parts", "sector"] = "Motor Vehicles and Parts"

In [187]:
sector_and = "Motor Vehicles and Parts"
f500[f500["sector"] == sector_and].shape[0]

34

In [188]:
sector = "Motor Vehicles & Parts"
f500[f500["sector"] == sector].shape[0]

0

## Creating New Columns

In [191]:
f500["rank_change"] = f500["previous_rank"] - f500["rank"]
f500["random"] = 0

In [192]:
f500.head()

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,rank_change,random
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,0,0
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,0,0
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,1,0
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,-1,0
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles and Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210,3,0


## Vaja: Top Performers by Country

In [198]:
# country == USA
# 5 vrednosti, industry

f500.loc[f500["country"] == "USA", "industry"].value_counts().head(5)

Banks: Commercial and Savings               8
Insurance: Property and Casualty (Stock)    7
Aerospace and Defense                       6
Petroleum Refining                          6
Specialty Retailers                         6
Name: industry, dtype: int64

In [233]:
# Retailing sector 
# in - &
# ali - |
# negacija ~

f500.loc[((f500["country"] == "USA") | (f500["country"] == "China")) \
            & ~(f500["sector"] == "Retailing"), "industry"] \
            .value_counts() \
            .head()


Banks: Commercial and Savings    18
Mining, Crude-Oil Production     13
Aerospace and Defense            12
Motor Vehicles and Parts          9
Pharmaceuticals                   8
Name: industry, dtype: int64

## Reading CSV files with pandas

<div>
<p><img alt="csv_to_dataframe" src="images/csv_to_dataframe.svg"></p>


</div>

In [234]:
f500.index

Index(['Walmart', 'State Grid', 'Sinopec Group', 'China National Petroleum',
       'Toyota Motor', 'Volkswagen', 'Royal Dutch Shell', 'Berkshire Hathaway',
       'Apple', 'Exxon Mobil',
       ...
       'National Grid', 'Dollar General', 'Telecom Italia',
       'Xiamen ITG Holding Group', 'Xinjiang Guanghui Industry Investment',
       'Teva Pharmaceutical Industries', 'New China Life Insurance',
       'Wm. Morrison Supermarkets', 'TUI', 'AutoNation'],
      dtype='object', name='company', length=500)

In [235]:
f500.columns

Index(['rank', 'revenues', 'revenue_change', 'profits', 'assets',
       'profit_change', 'ceo', 'industry', 'sector', 'previous_rank',
       'country', 'hq_location', 'website', 'years_on_global_500_list',
       'employees', 'total_stockholder_equity', 'rank_change', 'random'],
      dtype='object')

In [241]:
f500 = pd.read_csv("data/f500.csv")
f500.head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [242]:
f500.index

RangeIndex(start=0, stop=500, step=1)

In [243]:
f500.loc[3, "profits"]

1867.5

## Using iloc to select by integer position

In [246]:
cols = ['company', 'rank', 'revenues']
minif500 = f500[cols].head()

In [247]:
minif500

Unnamed: 0,company,rank,revenues
0,Walmart,1,485873
1,State Grid,2,315199
2,Sinopec Group,3,267518
3,China National Petroleum,4,262573
4,Toyota Motor,5,254694


In [248]:
minif500.iloc[4]

company     Toyota Motor
rank                   5
revenues          254694
Name: 4, dtype: object

In [254]:
minif500.iloc[1:3, 1:3]

Unnamed: 0,rank,revenues
1,2,315199
2,3,267518


<p><img alt="selection using iloc" src="images/selection_iloc.svg"></p>

    df.iloc[row_index, column_index]

<div class="alert alert-block alert-info">
<b>Vaja:</b> Select just the fifth row of the f500 dataframe. Assign the result to fifth_row.</div>

In [257]:
#f500

In [255]:
f500.iloc[4]

company                                     Toyota Motor
rank                                                   5
revenues                                          254694
revenue_change                                       7.7
profits                                          16899.3
assets                                            437575
profit_change                                      -12.3
ceo                                          Akio Toyoda
industry                        Motor Vehicles and Parts
sector                            Motor Vehicles & Parts
previous_rank                                          8
country                                            Japan
hq_location                                Toyota, Japan
website                     http://www.toyota-global.com
years_on_global_500_list                              23
employees                                         364445
total_stockholder_equity                          157210
Name: 4, dtype: object

<div class="alert alert-block alert-info">
<b>Vaja:</b> Select the value in first row of the company column. Assign the result to company_value.</div>

In [261]:
# 
f500.iloc[0, 0]
# tudi deluje
f500.iloc[0]["company"]

'Walmart'

<div>

<table>
<thead>
<tr>
<th>Select by integer position</th>
<th>Explicit Syntax</th>
<th>Shorthand Convention</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single column from dataframe</td>
<td><code>df.iloc[:,3]</code></td>
<td></td>
</tr>
<tr>
<td>List of columns from dataframe</td>
<td><code>df.iloc[:,[3,5,6]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of columns from dataframe</td>
<td><code>df.iloc[:,3:7]</code></td>
<td></td>
</tr>
<tr>
<td>Single row from dataframe</td>
<td><code>df.iloc[20]</code></td>
<td></td>
</tr>
<tr>
<td>List of rows from dataframe</td>
<td><code>df.iloc[[0,3,8]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of rows from dataframe</td>
<td><code>df.iloc[3:5]</code></td>
<td><code>df[3:5]</code></td>
</tr>
<tr>
<td>Single items from series</td>
<td><code>s.iloc[8]</code></td>
<td><code>s[8]</code></td>
</tr>
<tr>
<td>List of item from series</td>
<td><code>s.iloc[[2,8,1]]</code></td>
<td><code>s[[2,8,1]]</code></td>
</tr>
<tr>
<td>Slice of items from series</td>
<td><code>s.iloc[5:10]</code></td>
<td><code>s[5:10]</code></td>
</tr>
</tbody>
</table>
</div>

<div class="alert alert-block alert-info">
<b>Vaja:</b> Select the first three rows of the f500 dataframe. Assign the result to first_three_rows.</div>

In [262]:
f500[:3]

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523


<div class="alert alert-block alert-info">
<b>Vaja:</b> Select the first and seventh rows and the first five columns of the f500 dataframe. Assign the result to first_seventh_row_slice.</div>

In [264]:
f500.iloc[[0,6], :5]

Unnamed: 0,company,rank,revenues,revenue_change,profits
0,Walmart,1,485873,0.8,13643.0
6,Royal Dutch Shell,7,240033,-11.8,4575.0


## Using pandas methods to create boolean masks

In [267]:
f500["revenue_change"].isnull().value_counts()

False    498
True       2
Name: revenue_change, dtype: int64

In [268]:
f500[f500["revenue_change"].isnull()]

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
90,Uniper,91,74407,,-3557.5,51541,,Klaus Schafer,Energy,Energy,0,Germany,"Dusseldorf, Germany",http://www.uniper.energy,1,12890,12889
180,Hewlett Packard Enterprise,181,50123,,3161.0,79679,,Margaret C. Whitman,Information Technology Services,Technology,0,USA,"Palo Alto, CA",http://www.hpe.com,1,195000,31448


<div class="alert alert-block alert-info">
<b>Vaja:</b> Use the Series.isnull() method to select all rows from f500 that have a null value for the previous_rank column. Select only the company, rank, and previous_rank columns. Assign the result to null_previous_rank.</div>

In [269]:
import numpy as np
# predpripravljeno
f500 = pd.read_csv("data/f500.csv")
f500.loc[f500["previous_rank"] == 0, "previous_rank"] = np.nan

In [274]:
f500.loc[f500["previous_rank"].isnull(), ["company", "rank", "previous_rank"]]

Unnamed: 0,company,rank,previous_rank
48,Legal & General Group,49,
90,Uniper,91,
123,Dell Technologies,124,
138,Anbang Insurance Group,139,
140,Albertsons Cos.,141,
180,Hewlett Packard Enterprise,181,
267,Hengli Group,268,
271,Johnson Controls International,272,
341,Chubb,342,
375,Charter Communications,376,


## Pandas Index Alignment

In [277]:
previously_ranked = f500[f500["previous_rank"].notnull()]

In [278]:
previously_ranked.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 467 entries, 0 to 498
Data columns (total 17 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   company                   467 non-null    object 
 1   rank                      467 non-null    int64  
 2   revenues                  467 non-null    int64  
 3   revenue_change            467 non-null    float64
 4   profits                   467 non-null    float64
 5   assets                    467 non-null    int64  
 6   profit_change             410 non-null    float64
 7   ceo                       467 non-null    object 
 8   industry                  467 non-null    object 
 9   sector                    467 non-null    object 
 10  previous_rank             467 non-null    float64
 11  country                   467 non-null    object 
 12  hq_location               467 non-null    object 
 13  website                   467 non-null    object 
 14  years_on_g

In [281]:
previously_ranked.tail()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
490,National Grid,491,22036,-3.2,10150.6,82310,160.2,John Pettigrew,Utilities,Energy,471.0,Britain,"London, Britain",http://www.nationalgrid.com,12,22132,25463
492,Telecom Italia,493,21941,-17.4,1999.4,74295,,Flavio Cattaneo,Telecommunications,Telecommunications,404.0,Italy,"Milan, Italy",http://www.telecomitalia.com,18,61227,22366
496,New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427.0,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
497,Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437.0,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
498,TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467.0,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006


In [287]:
previously_ranked.iloc[495:498]

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity


In [288]:
food = pd.DataFrame({'fruit_veg': ['fruit', 'veg', 'fruit', 'veg', 'veg'], 'qty': [4, 2, 4, 1, 2]}, 
                    index=['tomato', 'carrot', 'lime', 'corn', 'eggplant'])

In [289]:
food

Unnamed: 0,fruit_veg,qty
tomato,fruit,4
carrot,veg,2
lime,fruit,4
corn,veg,1
eggplant,veg,2


In [290]:
alt_name = pd.Series(['rocket', 'aubergine', 'maize'], index=['arugula', 'eggplant', 'corn'])

In [291]:
alt_name

arugula        rocket
eggplant    aubergine
corn            maize
dtype: object

In [294]:
food["alt_name"] = alt_name
food

Unnamed: 0,fruit_veg,qty,alt_name
tomato,fruit,4,
carrot,veg,2,
lime,fruit,4,
corn,veg,1,maize
eggplant,veg,2,aubergine


<div class="alert alert-block alert-info">
<b>Vaja:</b> Use the Series.notnull() method to select all rows from f500 that have a non-null value for the previous_rank column. Assign the result to previously_ranked.  From the previously_ranked dataframe, subtract the rank column from the previous_rank column. Assign the result to rank_change. Assign the values in the rank_change to a new column in the f500 dataframe, "rank_change".</div>

In [295]:
previously_ranked = f500[f500["previous_rank"].notnull()]

In [297]:
rank_change = previously_ranked["previous_rank"] - previously_ranked["rank"]

In [298]:
rank_change

0       0.0
1       0.0
2       1.0
3      -1.0
4       3.0
       ... 
490   -20.0
492   -89.0
496   -70.0
497   -61.0
498   -32.0
Length: 467, dtype: float64

In [300]:
f500["rank_chage"] = rank_change

In [301]:
f500.tail()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,rank_chage
495,Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337,
496,New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427.0,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507,-70.0
497,Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437.0,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111,-61.0
498,TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467.0,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006,-32.0
499,AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310,


## Boolean Operators

<div>
<table>
<thead>
<tr>
<th>pandas</th>
<th>Python equivalent</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>a &amp; b</code></td>
<td><code>a and b</code></td>
<td><code>True</code> if both <code>a</code> and <code>b</code> are <code>True</code>, else <code>False</code></td>
</tr>
<tr>
<td><code>a | b</code></td>
<td><code>a or b</code></td>
<td><code>True</code> if either <code>a</code> or <code>b</code> is <code>True</code></td>
</tr>
<tr>
<td><code>~a</code></td>
<td><code>not a</code></td>
<td><code>True</code> if <code>a</code> is <code>False</code>, else <code>False</code></td>
</tr>
</tbody>
</table>

<p><img alt="boolean operators example 1" src="images/bool_ops_1.svg"></p>

<p><img alt="boolean operators example 2" src="images/bool_ops_2.svg"></p>

<p><img alt="boolean operators example 3" src="images/bool_ops_3.svg"></p>

<p><img alt="boolean operators example 4" src="images/bool_ops_4.svg"></p>

</div>

<div class="alert alert-block alert-info">
<b>Vaja:</b> Select all companies with revenues over 100 billion and negative profits from the f500 dataframe. The result should include all columns.</div>

In [304]:
f500[(f500["revenues"] > 100_000) & (f500["profits"] < 0)]

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,rank_chage
32,Japan Post Holdings,33,122990,3.6,-267.4,2631385,-107.5,Masatsugu Nagato,"Insurance: Life, Health (stock)",Financials,37.0,Japan,"Tokyo, Japan",http://www.japanpost.jp,21,248384,91532,4.0
44,Chevron,45,107567,-18.0,-497.0,260078,-110.8,John S. Watson,Petroleum Refining,Energy,31.0,USA,"San Ramon, CA",http://www.chevron.com,23,55200,145556,-14.0


<div class="alert alert-block alert-info">
<b>Vaja:</b> Select all rows for companies headquartered in either Brazil or Venezuela. Assign the result to brazil_venezuela.</div>

In [306]:
# f500[(f500["country"] == "Brazil") | (f500["country"] == "Venezuela")]

In [315]:
f500[f500["country"].isin(["Brazil", "Venezuela"])].head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,rank_chage
74,Petrobras,75,81405,-16.3,-4838.0,246983,,Pedro Pullen Parente,Petroleum Refining,Energy,58.0,Brazil,"Rio de Janeiro, Brazil",http://www.petrobras.com.br,23,68829,76779,-17.0
112,Itau Unibanco Holding,113,66876,21.4,6666.4,415972,-13.7,Candido Botelho Bracher,Banks: Commercial and Savings,Financials,159.0,Brazil,"Sao Paulo, Brazil",http://www.itau.com.br,4,94779,37680,46.0
150,Banco do Brasil,151,58093,-13.4,2013.8,426416,-52.3,Paulo Rogerio Caffarelli,Banks: Commercial and Savings,Financials,115.0,Brazil,"Brasilia, Brazil",http://www.bb.com.br,23,100622,26551,-36.0
153,Banco Bradesco,154,57443,31.3,5127.9,366418,-5.7,Luiz Carlos Trabuco Cappi,Banks: Commercial and Savings,Financials,209.0,Brazil,"Osasco, Brazil",http://www.bradesco.com.br,21,94541,32369,55.0
190,JBS,191,48825,-0.1,107.7,31605,-92.3,Wesley Mendonca Batista,Food Production,"Food, Beverages & Tobacco",185.0,Brazil,"Sao Paulo, Brazil",http://jbss.infoinvest.com.br,8,237061,7307,-6.0


<div class="alert alert-block alert-info">
<b>Vaja:</b> Select the first five companies in the Technology sector that are not headquartered in the USA from the f500 dataframe. Assign the result to tech_outside_usa.</div>

In [316]:
f500[(f500["country"] != "USA") & (f500["sector"] == "Technology")].head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,rank_chage
14,Samsung Electronics,15,173957,-2.0,19316.5,217104,16.8,Oh-Hyun Kwon,"Electronics, Electrical Equip.",Technology,13.0,South Korea,"Suwon, South Korea",http://www.samsung.com,23,325000,154376,-2.0
26,Hon Hai Precision Industry,27,135129,-4.3,4608.8,80436,-0.4,Terry Gou,"Electronics, Electrical Equip.",Technology,25.0,Taiwan,"New Taipei City, Taiwan",http://www.foxconn.com,13,726772,33476,-2.0
70,Hitachi,71,84558,1.2,2134.3,86742,48.8,Toshiaki Higashihara,"Electronics, Electrical Equip.",Technology,79.0,Japan,"Tokyo, Japan",http://www.hitachi.com,23,303887,26632,8.0
82,Huawei Investment & Holding,83,78511,24.9,5579.4,63837,-5.0,Ren Zhengfei,Network and Other Communications Equipment,Technology,129.0,China,"Shenzhen, China",http://www.huawei.com,8,180000,20159,46.0
104,Sony,105,70170,3.9,676.4,158519,-45.1,Kazuo Hirai,"Electronics, Electrical Equip.",Technology,113.0,Japan,"Tokyo, Japan",http://www.sony.net,23,128400,22415,8.0


## Sorting Values

In [320]:
china_rows = f500[f500["country"] == "China"]

In [324]:
china_rows.sort_values("employees", ascending=False).head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,rank_chage
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3.0,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,-1.0
118,China Post Group,119,65605,-5.8,4980.3,1221649,18.7,Li Guohua,"Mail, Package, and Freight Delivery",Transportation,105.0,China,"Beijing, China",http://www.chinapost.com.cn,7,941211,43114,-14.0
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2.0,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,0.0
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4.0,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,1.0
37,Agricultural Bank of China,38,117275,-12.1,27687.8,2816039,-3.6,Zhao Huan,Banks: Commercial and Savings,Financials,29.0,China,"Beijing, China",http://www.abchina.com,18,501368,189682,-9.0


<div class="alert alert-block alert-info">
<b>Vaja:</b> Find the company headquartered in Japan with the largest number of employees.</div>

In [340]:
f500[f500["country"] == "Japan"].sort_values("employees", ascending=False).iloc[0]["company"]

'Toyota Motor'

## Using Loops with pandas

In [341]:
avg_rev_by_country = {}

countries = f500["country"].unique()

In [342]:
countries

array(['USA', 'China', 'Japan', 'Germany', 'Netherlands', 'Britain',
       'South Korea', 'Switzerland', 'France', 'Taiwan', 'Singapore',
       'Italy', 'Russia', 'Spain', 'Brazil', 'Mexico', 'Luxembourg',
       'India', 'Malaysia', 'Thailand', 'Australia', 'Belgium', 'Norway',
       'Canada', 'Ireland', 'Indonesia', 'Denmark', 'Saudi Arabia',
       'Sweden', 'Finland', 'Venezuela', 'Turkey', 'U.A.E', 'Israel'],
      dtype=object)

In [343]:
for c in countries:
    selected_rows = f500[f500["country"] == c]
    mean = selected_rows["revenues"].mean()
    avg_rev_by_country[c] = mean

In [344]:
avg_rev_by_country

{'USA': 64218.371212121216,
 'China': 55397.880733944956,
 'Japan': 53164.03921568627,
 'Germany': 63915.0,
 'Netherlands': 61708.92857142857,
 'Britain': 51588.708333333336,
 'South Korea': 49725.6,
 'Switzerland': 51353.57142857143,
 'France': 55231.793103448275,
 'Taiwan': 46364.666666666664,
 'Singapore': 54454.333333333336,
 'Italy': 51899.57142857143,
 'Russia': 65247.75,
 'Spain': 40600.666666666664,
 'Brazil': 52024.57142857143,
 'Mexico': 54987.5,
 'Luxembourg': 56791.0,
 'India': 39993.0,
 'Malaysia': 49479.0,
 'Thailand': 48719.0,
 'Australia': 33688.71428571428,
 'Belgium': 45905.0,
 'Norway': 45873.0,
 'Canada': 31848.0,
 'Ireland': 32819.5,
 'Indonesia': 36487.0,
 'Denmark': 35464.0,
 'Saudi Arabia': 35421.0,
 'Sweden': 27963.666666666668,
 'Finland': 26113.0,
 'Venezuela': 24403.0,
 'Turkey': 23456.0,
 'U.A.E': 22799.0,
 'Israel': 21903.0}

<div class="alert alert-block alert-info">
<b>Vaja:</b> Calculate the company that employs the most people in each country</div>

In [345]:
top_employer_by_country = {}

countries = f500["country"].unique()
for c in countries:
    selected_rows = f500[f500["country"] == c]
    sorted_rows = selected_rows.sort_values("employees", ascending=False)
    top_employer = sorted_rows.iloc[0]
    employer_name = top_employer["company"]
    top_employer_by_country[c] = employer_name

In [346]:
top_employer_by_country

{'USA': 'Walmart',
 'China': 'China National Petroleum',
 'Japan': 'Toyota Motor',
 'Germany': 'Volkswagen',
 'Netherlands': 'EXOR Group',
 'Britain': 'Compass Group',
 'South Korea': 'Samsung Electronics',
 'Switzerland': 'Nestle',
 'France': 'Sodexo',
 'Taiwan': 'Hon Hai Precision Industry',
 'Singapore': 'Flex',
 'Italy': 'Poste Italiane',
 'Russia': 'Gazprom',
 'Spain': 'Banco Santander',
 'Brazil': 'JBS',
 'Mexico': 'America Movil',
 'Luxembourg': 'ArcelorMittal',
 'India': 'State Bank of India',
 'Malaysia': 'Petronas',
 'Thailand': 'PTT',
 'Australia': 'Wesfarmers',
 'Belgium': 'Anheuser-Busch InBev',
 'Norway': 'Statoil',
 'Canada': 'George Weston',
 'Ireland': 'Accenture',
 'Indonesia': 'Pertamina',
 'Denmark': 'Maersk Group',
 'Saudi Arabia': 'SABIC',
 'Sweden': 'H & M Hennes & Mauritz',
 'Finland': 'Nokia',
 'Venezuela': 'Mercantil Servicios Financieros',
 'Turkey': 'Koc Holding',
 'U.A.E': 'Emirates Group',
 'Israel': 'Teva Pharmaceutical Industries'}

## Understanding SettingwithCopyWarning in pandas


### What is SettingWithCopyWarning?



<img class="full-width" src="https://www.dataquest.io/wp-content/uploads/2019/01/view-vs-copy.png" alt="view-vs-copy">



<img class="full-width" src="https://www.dataquest.io/wp-content/uploads/2019/01/modifying.png" alt="modifying">


### Chained assignment




In [351]:
f500[f500["sector"] == "Energy"]["sector"] = "Oil"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  f500[f500["sector"] == "Energy"]["sector"] = "Oil"


In [352]:
f500.head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,rank_chage
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1.0,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,0.0
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2.0,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,0.0
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4.0,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,1.0
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3.0,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,-1.0
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8.0,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210,3.0


In [353]:
f500.loc[f500["sector"] == "Energy", "sector"] = "Oil"

In [354]:
f500.head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,rank_chage
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1.0,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,0.0
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Oil,2.0,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,0.0
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Oil,4.0,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,1.0
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Oil,3.0,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,-1.0
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8.0,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210,3.0


### Hidden chaining



In [372]:
ratail = f500.loc[f500["profits"] < f500["assets"]].copy()

In [373]:
ratail.head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,rank_chage
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1.0,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,0.0
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Oil,2.0,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,0.0
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Oil,4.0,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,1.0
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Oil,3.0,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,-1.0
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8.0,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210,3.0


In [374]:
ratail.loc[35, "sector"]

'Retailing'

In [375]:
ratail.loc[86, "sector"] = "Home"

## Test

In [378]:
data = pd.read_csv("data/DATA_Xbox.csv")
winners = data.loc[data["bid"] == data["price"]]
winners.head()

Unnamed: 0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price
3,8213034705,117.5,2.998947,daysrus,10,95.0,117.5
25,8213060420,120.0,2.999722,djnoeproductions,17,1.0,120.0
44,8213067838,132.5,2.996632,*champaignbubbles*,202,29.99,132.5
45,8213067838,132.5,2.997789,*champaignbubbles*,202,29.99,132.5
66,8213073509,114.5,2.999236,rr6kids,4,1.0,114.5


In [379]:
winners.loc[304, "bidder"]

nan

In [413]:
winners.loc[304, "bidder"] = "nekineki"

In [414]:
winners.head(2)

Unnamed: 0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price
3,8213034705,0.0,2.998947,daysrus,10,95.0,117.5
25,8213060420,120.0,2.999722,djnoeproductions,17,1.0,120.0


In [420]:
b = winners[:10]

In [421]:
b.head(2)

Unnamed: 0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price
3,8213034705,0.0,2.998947,daysrus,10,95.0,117.5
25,8213060420,120.0,2.999722,djnoeproductions,17,1.0,120.0


In [422]:
b.loc[3, "bid"] = 0

In [423]:
b.head(2)

Unnamed: 0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price
3,8213034705,0.0,2.998947,daysrus,10,95.0,117.5
25,8213060420,120.0,2.999722,djnoeproductions,17,1.0,120.0


In [424]:
winners.head(2)

Unnamed: 0,auctionid,bid,bidtime,bidder,bidderrate,openbid,price
3,8213034705,0.0,2.998947,daysrus,10,95.0,117.5
25,8213060420,120.0,2.999722,djnoeproductions,17,1.0,120.0


### Tips and tricks for dealing with SettingWithCopyWarning

### Chained assignment in Depth

In [393]:
df1 = pd.DataFrame(np.arange(6).reshape((3,2)), columns=list('AB'))
df1

Unnamed: 0,A,B
0,0,1
1,2,3
2,4,5


In [394]:
df2 = df1.loc[:1].copy()

In [395]:
df2

Unnamed: 0,A,B
0,0,1
1,2,3


In [396]:
id(df1)

140016760015120

In [397]:
id(df2)

140016760014736

In [398]:
a = df2.copy()

In [399]:
id(a)

140016758577184

In [400]:
a.loc[0,"A"]  = 10

In [403]:
a

Unnamed: 0,A,B
0,10,1
1,2,3


In [401]:
df2

Unnamed: 0,A,B
0,0,1
1,2,3


In [402]:
df1

Unnamed: 0,A,B
0,0,1
1,2,3
2,4,5
