# Uvod v Pandas


## Understanding pandas and NumPy

<p>Although NumPy provides fundamental structures and tools that make working with data easier, there are several things that limit its usefulness:</p>
<ul>
<li>The lack of support for column names forces us to frame questions as multi-dimensional array operations.</li>
<li>Support for only one data type per ndarray makes it more difficult to work with data that contains both numeric and string data.</li>
<li>There are lots of low level methods, but there are many common analysis patterns that don't have pre-built methods.</li>
</ul>

## Package overview


Dokumentacija: https://pandas.pydata.org/docs/index.html

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, **real-world data analysis** in Python. Additionally, it has the broader goal of becoming the **most powerful and flexible open source data analysis/manipulation tool available in any language**. It is already well on its way toward this goal.

Here are just a few of the things that pandas does well:

- Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
- Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
- Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
- Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
- Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
- Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
- Intuitive merging and joining data sets
- Flexible reshaping and pivoting of data sets
- Hierarchical labeling of axes (possible to have multiple labels per tick)
- Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format
- Time series-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting, and lagging.

## Installing and Importing Pandas

https://pandas.pydata.org/docs/getting_started/install.html

In [1]:
import pandas as pd

In [2]:
pd.__version__

'2.2.0'

In [None]:
# uporaba TAB med pisanjaem
pd.<TAB>

In [6]:
#pd.read_csv?

## What kind of data does pandas handle?

<img alt="../../_images/01_table_dataframe.svg" class="align-center" src="https://pandas.pydata.org/docs/_images/01_table_dataframe.svg">

In [7]:
my_df = pd.DataFrame({
    "Name": ["Luka", "Alen", "Jan"],
    "Age": [34, 56, 23],
    "Confirmed": [True, False, False]
})

In [10]:
my_df

Unnamed: 0,Name,Age,Confirmed
0,Luka,34,True
1,Alen,56,False
2,Jan,23,False


In [11]:
my_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Name       3 non-null      object
 1   Age        3 non-null      int64 
 2   Confirmed  3 non-null      bool  
dtypes: bool(1), int64(1), object(1)
memory usage: 183.0+ bytes


In [12]:
type(my_df)

pandas.core.frame.DataFrame

In [14]:
type(my_df["Name"])

pandas.core.series.Series

In [15]:
type(my_df.iloc[0])

pandas.core.series.Series

In [18]:
type(my_df["Name"].values)

numpy.ndarray

## Introduction to the Data

In [23]:
f500 = pd.read_csv("data/f500.csv", index_col=0)

In [24]:
f500.shape # 500 vrstic in 16 stolpcev

(500, 16)

In [28]:
f500.head(2)

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456


In [29]:
f500.tail(2)

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


In [31]:
f500.dtypes

rank                          int64
revenues                      int64
revenue_change              float64
profits                     float64
assets                        int64
profit_change               float64
ceo                          object
industry                     object
sector                       object
previous_rank                 int64
country                      object
hq_location                  object
website                      object
years_on_global_500_list      int64
employees                     int64
total_stockholder_equity      int64
dtype: object

In [32]:
f500.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   rank                      500 non-null    int64  
 1   revenues                  500 non-null    int64  
 2   revenue_change            498 non-null    float64
 3   profits                   499 non-null    float64
 4   assets                    500 non-null    int64  
 5   profit_change             436 non-null    float64
 6   ceo                       500 non-null    object 
 7   industry                  500 non-null    object 
 8   sector                    500 non-null    object 
 9   previous_rank             500 non-null    int64  
 10  country                   500 non-null    object 
 11  hq_location               500 non-null    object 
 12  website                   500 non-null    object 
 13  years_on_global_500_list  500 non-null    int64  
 14  em

## Pandas Data Selection

<img alt="../../_images/03_subset_columns.svg" class="align-center" src="https://pandas.pydata.org/docs/_images/03_subset_columns.svg">

### Selecting Columns

`df.loc[row_label, column_label]`

In [35]:
profits = f500.loc[:, "profits"]
# v praksi pogosto uporabljeno
profits = f500["profits"]

In [36]:
type(profits)

pandas.core.series.Series

In [37]:
profits.shape

(500,)

In [41]:
profits_ceo = f500.loc[:, ["profits", "ceo"]]
# v praksi pogosto uporabljeno
profits_ceo = f500[["profits", "ceo"]]

In [42]:
assests_to_sector = f500.loc[:, "assets": "sector"] # zadnja vrednost je vključena v rezultat!
assests_to_sector.head(3)

Unnamed: 0_level_0,assets,profit_change,ceo,industry,sector
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Walmart,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing
State Grid,489838,-6.2,Kou Wei,Utilities,Energy
Sinopec Group,310726,-65.0,Wang Yupu,Petroleum Refining,Energy


<p></p><center>
<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Common Shorthand</th>
<th>Other Shorthand</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single column</td>
<td><code>df.loc[:,"col1"]</code></td>
<td bgcolor="#00FF00"><code>df["col1"]</code></td>
<td><code>df.col1</code></td>
</tr>
<tr>
<td>List of columns</td>
<td><code>df.loc[:,["col1", "col7"]]</code></td>
<td bgcolor="#00FF00"><code>df[["col1", "col7"]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of columns</td>
<td bgcolor="#00FF00"><code>df.loc[:,"col1":"col4"]</code></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</center><p></p>

<div class="alert alert-block alert-info">
<b>Vaja:</b> Select all columns from ceo up to and including sector. Assign the result to the variable name ceo_to_sector.</div>

In [46]:
ceo_to_sector = f500.loc[:, "ceo":"sector"]
ceo_to_sector.head(3)

Unnamed: 0_level_0,ceo,industry,sector
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Walmart,C. Douglas McMillon,General Merchandisers,Retailing
State Grid,Kou Wei,Utilities,Energy
Sinopec Group,Wang Yupu,Petroleum Refining,Energy


### Selecting Rows

In [53]:
sigle_row = f500.loc["Walmart"]
two_rows = f500.loc[["Walmart", "Sinopec Group"]]
sliced_rows = f500.loc["Toyota Motor": "Apple"]
sliced_rows = f500.loc["Toyota Motor": "Apple", ["industry", "country"]]
sliced_rows

Unnamed: 0_level_0,industry,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Toyota Motor,Motor Vehicles and Parts,Japan
Volkswagen,Motor Vehicles and Parts,Germany
Royal Dutch Shell,Petroleum Refining,Netherlands
Berkshire Hathaway,Insurance: Property and Casualty (Stock),USA
Apple,"Computers, Office Equipment",USA


In [60]:
# podobno amapk z iloc
sigle_row = f500.iloc[0]
two_rows = f500.iloc[[0, 2]]
sliced_rows = f500.iloc[5:9]
sliced_rows = f500.iloc[5:9, [7, 10]]
sliced_rows

Unnamed: 0_level_0,industry,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Volkswagen,Motor Vehicles and Parts,Germany
Royal Dutch Shell,Petroleum Refining,Netherlands
Berkshire Hathaway,Insurance: Property and Casualty (Stock),USA
Apple,"Computers, Office Equipment",USA


## Boolean Indexing

In [66]:
f500.loc[f500["employees"] > 500_000, ["profits", "employees"]].head(2)

Unnamed: 0_level_0,profits,employees
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,13643.0,2300000
State Grid,9571.3,926067


In [71]:
# profits > 0, years_on_global_500_list > 10: prikažemo assets, profits, employees
# & -> in operator
# | -> ali operator
# ~, ! -> negirano
f500.loc[(f500["profits"] > 0) & (f500["years_on_global_500_list"] > 10), ["assets", "profits", "employees"]].head(2)

Unnamed: 0_level_0,assets,profits,employees
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Walmart,198825,13643.0,2300000
State Grid,489838,9571.3,926067


In [77]:
# ne ameriška podjetja, ki majo več kot 500_000 zaposelnih in sekor je Energy
f500.loc[(f500["country"] != "USA") & (f500["employees"] > 500_000) & (f500["sector"] == "Energy")]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893


In [79]:
# zanimajo nas country je USA, China ali Italy in rank večji kot 250
f500.loc[f500["country"].isin(["USA", "China", "Italy"]) & (f500["rank"] > 250)].head(3)

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
China Minsheng Banking,251,40234,-5.2,7201.6,848389,-1.8,Zheng Wanchun,Banks: Commercial and Savings,Financials,221,China,"Beijing, China",http://www.cmbc.com.cn,5,58720,49297
China Pacific Insurance (Group),252,40193,2.2,1814.9,146873,-35.7,Huo Lianhong,"Insurance: Life, Health (stock)",Financials,251,China,"Shanghai, China",http://www.cpic.com.cn,7,97032,18960
American Airlines Group,253,40180,-2.0,2676.0,51274,-64.8,W. Douglas Parker,Airlines,Transportation,236,USA,"Fort Worth, TX",http://www.aa.com,23,122300,3785


## Vectorized Operations

In [88]:
f500["profits_bilions"] = f500["profits"] / 1000
f500["rank_change"] = f500["previous_rank"] - f500["rank"]
f500

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,profits_bilions,rank_change
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,13.6430,0
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,9.5713,0
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,1.2579,1
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,1.8675,-1
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210,16.8993,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337,0.3290,-496
New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507,0.7439,-70
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111,0.4064,-61
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006,1.1517,-32


## Calculate summary statistics

<ul>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.max.html"><code>Series.max()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.min.html"><code>Series.min()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html"><code>Series.mean()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.median.html"><code>Series.median()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mode.html"><code>Series.mode()</code></a></li>
<li><a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sum.html"><code>Series.sum()</code></a></li>
</ul>

In [91]:
f500["revenues"].max()

485873

In [92]:
f500["employees"].sum()

66999158

In [93]:
f500.describe()

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,previous_rank,years_on_global_500_list,employees,total_stockholder_equity,profits_bilions,rank_change
count,500.0,500.0,498.0,499.0,500.0,436.0,500.0,500.0,500.0,500.0,499.0,500.0
mean,250.5,55416.358,4.538353,3055.203206,243632.3,24.152752,222.134,15.036,133998.3,30628.076,3.055203,-28.366
std,144.481833,45725.478963,28.549067,5171.981071,485193.7,437.509566,146.941961,7.932752,170087.8,43642.576833,5.171981,108.602823
min,1.0,21609.0,-67.3,-13038.0,3717.0,-793.7,0.0,1.0,328.0,-59909.0,-13.038,-500.0
25%,125.75,29003.0,-5.9,556.95,36588.5,-22.775,92.75,7.0,42932.5,7553.75,0.55695,-28.25
50%,250.5,40236.0,0.55,1761.6,73261.5,-0.35,219.5,17.0,92910.5,15809.5,1.7616,-4.0
75%,375.25,63926.75,6.975,3954.0,180564.0,17.7,347.25,23.0,168917.2,37828.5,3.954,8.25
max,500.0,485873.0,442.3,45687.0,3473238.0,8909.5,500.0,23.0,2300000.0,301893.0,45.687,226.0


In [94]:
f500.describe(include=['O'])

Unnamed: 0,ceo,industry,sector,country,hq_location,website
count,500,500,500,500,500,500
unique,500,58,21,34,235,500
top,C. Douglas McMillon,Banks: Commercial and Savings,Financials,USA,"Beijing, China",http://www.walmart.com
freq,1,51,118,132,56,1


In [96]:
f500["industry"].value_counts().head(5)

industry
Banks: Commercial and Savings      51
Motor Vehicles and Parts           34
Petroleum Refining                 28
Insurance: Life, Health (stock)    24
Food and Drug Stores               20
Name: count, dtype: int64

In [112]:
# več kot 100_000 zaposlenih, na kitejskem, kateri sektor je najbolj pogost?
f500.loc[(f500["employees"] > 100_000) & (f500["country"] == "China"), "sector"].value_counts().head(1).index.values[0]

'Energy'

In [118]:
f500.sort_values("employees", ascending=False, inplace=True)
f500.sort_values("rank", ascending=True, inplace=True)
f500

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,profits_bilions,rank_change
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,13.6430,0
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,9.5713,0
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,1.2579,1
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,1.8675,-1
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210,16.8993,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337,0.3290,-496
New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507,0.7439,-70
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111,0.4064,-61
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006,1.1517,-32


In [122]:
# unique vrednosti za stolpec country
list_of_countries = list(f500["country"].unique())
list_of_countries

['USA',
 'China',
 'Japan',
 'Germany',
 'Netherlands',
 'Britain',
 'South Korea',
 'Switzerland',
 'France',
 'Taiwan',
 'Singapore',
 'Italy',
 'Russia',
 'Spain',
 'Brazil',
 'Mexico',
 'Luxembourg',
 'India',
 'Malaysia',
 'Thailand',
 'Australia',
 'Belgium',
 'Norway',
 'Canada',
 'Ireland',
 'Indonesia',
 'Denmark',
 'Saudi Arabia',
 'Sweden',
 'Finland',
 'Venezuela',
 'Turkey',
 'U.A.E',
 'Israel']

In [129]:
# za vsako izmed 10 najpogostejših držav v datasetu izpiši najpogostejši sector
top_10_countries_list = f500["country"].value_counts().head(10).index.to_list()
for country in top_10_countries_list:
    selected_data = f500.loc[f500["country"] == country, "sector"]
    top_sector = selected_data.value_counts().head(1).index[0]
    print(f"[{country.upper()}] Best sector: {top_sector}")

[USA] Best sector: Financials
[CHINA] Best sector: Financials
[JAPAN] Best sector: Financials
[GERMANY] Best sector: Motor Vehicles & Parts
[FRANCE] Best sector: Financials
[BRITAIN] Best sector: Financials
[SOUTH KOREA] Best sector: Technology
[NETHERLANDS] Best sector: Financials
[SWITZERLAND] Best sector: Financials
[CANADA] Best sector: Financials


<div class="alert alert-block alert-info">
<b>Vaja:</b> Find the company headquartered in Japan with the largest number of employees.</div>


In [136]:
f500.loc[f500["country"] == "Japan"].sort_values("employees", ascending=False).iloc[0].name

'Toyota Motor'