# Introduction to Pandas

<p>Although NumPy provides fundamental structures and tools that make working with data easier, there are several things that limit its usefulness:</p>
<ul>
<li>The lack of support for column names forces us to frame questions as multi-dimensional array operations.</li>
<li>Support for only one data type per ndarray makes it more difficult to work with data that contains both numeric and string data.</li>
<li>There are lots of low level methods, but there are many common analysis patterns that don't have pre-built methods.</li>

## What is Pandas?

**pandas** is a Python package providing fast, flexible, and expressive data structures **designed to make working with “relational” or “labeled” data** both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, **real-world data analysis in Python**. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data:
- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
- Any other form of observational / statistical data sets. The data need not be labeled at all to be placed into a pandas data structure

The two primary data structures of pandas, **Series (1-dimensional)** and **DataFrame (2-dimensional)**, handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering.

**pandas is built on top of NumPy** and is intended to integrate well within a scientific computing environment with many other 3rd party libraries.

Here are just a few of the things that pandas does well:
- Easy handling of **missing data** (represented as NaN) in floating point as well as non-floating point data
- Size mutability: columns can be **inserted and deleted** from DataFrame and higher dimensional objects
- Automatic and explicit **data alignment**: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
- Powerful, flexible **group by** functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
- Make it **easy to convert** ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
- Intelligent **label-based slicing, fancy indexing**, and subsetting of large data sets
- Intuitive **merging and joining** data sets
- Flexible **reshaping and pivoting** of data sets
- **Hierarchical labeling of axes** (possible to have multiple labels per tick)
- **Robust IO tools** for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format
- **Time series-specific functionality**: date range generation and frequency conversion, moving window statistics, date shifting, and lagging.

> **pandas is fast**. Many of the low-level algorithmic bits have been extensively tweaked in Cython code. However, as with anything else generalization usually sacrifices performance. So if you focus on one feature for your application you may be able to create a faster specialized tool.

## Installing Pandas

pandas can be installed via pip from PyPI.

Run: `uv pip install pandas`

> It is recommended to install and run pandas from a virtual environment, for example, using the Python standard library’s venv

pandas can also be installed with sets of optional dependencies to enable certain functionality. For example, to install pandas with the optional dependencies to read Excel files.

Run: `uv pip install "pandas[excel]"`

[Optional dependencies](https://pandas.pydata.org/docs/getting_started/install.html#optional-dependencies): You are highly encouraged to install these libraries, as they provide speed improvements, especially when working with large data sets.

Run: `uv pip install "pandas[excel,performance,html]"`

> pandas has support for accelerating certain types of binary numerical and boolean operations using the numexpr library and the bottleneck libraries. These libraries are especially useful when dealing with large data sets, and provide large speedups. numexpr uses smart chunking, caching, and multiple cores. bottleneck is a set of specialized cython routines that are especially fast when dealing with arrays that have nans.

## Data structures

<table class="table">
<colgroup>
<col style="width: 17.6%">
<col style="width: 23.5%">
<col style="width: 58.8%">
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Dimensions</p></th>
<th class="head"><p>Name</p></th>
<th class="head"><p>Description</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>1</p></td>
<td><p>Series</p></td>
<td><p>1D labeled homogeneously-typed array</p></td>
</tr>
<tr class="row-odd"><td><p>2</p></td>
<td><p>DataFrame</p></td>
<td><p>General 2D labeled, size-mutable tabular structure with potentially heterogeneously-typed column</p></td>
</tr>
</tbody>
</table>

<img alt="../../_images/01_table_dataframe.svg" class="align-center" src="https://pandas.pydata.org/docs/_images/01_table_dataframe.svg">

The best way to think about the pandas data structures is as **flexible containers for lower dimensional data**. For example, DataFrame is a container for Series, and Series is a container for scalars. We would like to be able to insert and remove objects from these containers in a dictionary-like fashion.

To load the pandas package and start working with it, import the package. The community agreed alias for pandas is `pd`, so loading pandas as `pd` is assumed standard practice for all of the pandas documentation.

The fundamental behavior about data types, indexing, axis labeling, and alignment apply across all of the objects. To get started, import NumPy and load pandas into your namespace:

In [1]:
import numpy as np
import pandas as pd


In [2]:
pd.__version__

'2.2.3'

IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature) as well as the documentation of various functions (using the ? character).

In [None]:
pd.<TAB>

In [4]:
pd.read_csv?

[1;31mSignature:[0m
[0mpd[0m[1;33m.[0m[0mread_csv[0m[1;33m([0m[1;33m
[0m    [0mfilepath_or_buffer[0m[1;33m:[0m [1;34m'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]'[0m[1;33m,[0m[1;33m
[0m    [1;33m*[0m[1;33m,[0m[1;33m
[0m    [0msep[0m[1;33m:[0m [1;34m'str | None | lib.NoDefault'[0m [1;33m=[0m [1;33m<[0m[0mno_default[0m[1;33m>[0m[1;33m,[0m[1;33m
[0m    [0mdelimiter[0m[1;33m:[0m [1;34m'str | None | lib.NoDefault'[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mheader[0m[1;33m:[0m [1;34m"int | Sequence[int] | None | Literal['infer']"[0m [1;33m=[0m [1;34m'infer'[0m[1;33m,[0m[1;33m
[0m    [0mnames[0m[1;33m:[0m [1;34m'Sequence[Hashable] | None | lib.NoDefault'[0m [1;33m=[0m [1;33m<[0m[0mno_default[0m[1;33m>[0m[1;33m,[0m[1;33m
[0m    [0mindex_col[0m[1;33m:[0m [1;34m'IndexLabel | Literal[False] | None'[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0musecols[0m[1;33m:

### Series

`Series` is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a `Series` is to call: `s = pd.Series(data, index=index)`

Here, data can be many different things:
- a Python dict
- an ndarray
- a scalar value (like 5)

The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is:

**From ndarray**

If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values `[0, ..., len(data) - 1]`.

In [5]:
rng = np.random.default_rng()
s = pd.Series(rng.random(5), index=["a", "b", "c", "d", "e"])
s

a    0.016417
b    0.286128
c    0.264153
d    0.101595
e    0.165723
dtype: float64

In [6]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [7]:
s["a"]

np.float64(0.016417107006072462)

In [8]:
rng = np.random.default_rng()
s1 = pd.Series(rng.random(5), index=["a", "a", "c", "d", "e"])
s1

a    0.864100
a    0.379844
c    0.804647
d    0.602742
e    0.168754
dtype: float64

In [10]:
s1["a"]

a    0.864100
a    0.379844
dtype: float64

In [None]:
pd.Series(rng.random(5))

> pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time.

**From dict**

`Series` can be instantiated from dicts:

In [11]:
d = {"b": 1, "a": 0, "c": 2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

If an index is passed, the values in data corresponding to the labels in the index will be pulled out.

In [12]:
d = {"a": 0.0, "b": 1.0, "c": 2.0}
pd.Series(d)

a    0.0
b    1.0
c    2.0
dtype: float64

In [13]:
pd.Series(d, index=["b", "c", "d", "a"])


b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

> NaN (not a number) is the standard missing data marker used in pandas.

Like a NumPy array, a pandas Series has a single dtype.

In [14]:
s = pd.Series(5, index=["a", "b", "c", "d", "e"])
s.dtype

dtype('int64')

This is often a NumPy dtype. However, pandas and 3rd-party libraries extend NumPy’s type system in a few places, in which case the dtype would be an ExtensionDtype. Some examples within pandas are Categorical data and Nullable integer data type. 

If you need the actual array backing a Series, use Series.array. Accessing the array can be useful when you need to do some operation without the index.

In [15]:
s.array

<NumpyExtensionArray>
[np.int64(5), np.int64(5), np.int64(5), np.int64(5), np.int64(5)]
Length: 5, dtype: int64

While Series is ndarray-like, if you need an actual ndarray, then use `Series.to_numpy()`.

In [16]:
s.to_numpy()

array([5, 5, 5, 5, 5])

### DataFrame

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:
- Dict of 1D ndarrays, lists, dicts, or Series
- 2-D numpy.ndarray
- Structured or record ndarray
- A Series
- Another DataFrame

Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments. If you pass an index and / or columns, you are guaranteeing the index and / or columns of the resulting DataFrame. Thus, a dict of Series plus a specific index will discard all data not matching up to the passed index.

If axis labels are not passed, they will be constructed from the input data based on common sense rules.

> A table of data is stored as a pandas DataFrame. Each column in a DataFrame is a Series.

<img src="./images/df_anatomy_static_resized.svg"> 


**From dict of Series or dicts**

The resulting index will be the union of the indexes of the various Series. If there are any nested dicts, these will first be converted to Series. If no columns are passed, the columns will be the ordered list of dict keys.


In [17]:
d = {
    "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
    "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [18]:
pd.DataFrame(d, index=["d", "b", "a"])

Unnamed: 0,one,two
d,,4.0
b,2.0,2.0
a,1.0,1.0


In [19]:
pd.DataFrame(d, index=["d", "b", "a"], columns=["two", "three"])

Unnamed: 0,two,three
d,4.0,
b,2.0,
a,1.0,


The row and column labels can be accessed respectively by accessing the index and columns attributes:

In [20]:
df.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [21]:
df.columns

Index(['one', 'two'], dtype='object')

**From dict of ndarrays / lists**

All ndarrays must share the same length. If an index is passed, it must also be the same length as the arrays. If no index is passed, the result will be range(n), where n is the array length.

In [None]:
d = {"one": [1.0, 2.0, 3.0, 4.0], "two": [4.0, 3.0, 2.0, 1.0]}
pd.DataFrame(d)

In [None]:
pd.DataFrame(d, index=["a", "b", "c", "d"])

**From a list of dicts**

In [24]:
data2 = [{"a": 1, "b": 2}, {"a": 5, "b": 10, "c": 20}]
pd.DataFrame(data2)

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [25]:
pd.DataFrame(data2, index=["first", "second"])

Unnamed: 0,a,b,c
first,1,2,
second,5,10,20.0


## Essential basic functionality

We'll work with a data set from Fortune magazine's 2017 Global 500 list, which ranks the top 500 corporations worldwide by revenue. The data set is a CSV file called f500.csv.

pandas provides the `read_csv()` function to read data stored as a csv file into a pandas DataFrame. pandas supports many different file formats or data sources out of the box (csv, excel, sql, json, parquet, …)

In [22]:
f500 = pd.read_csv("../data/f500.csv")
# When displaying a DataFrame, the first and last 5 rows will be shown by default:
f500

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
496,New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
497,Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
498,TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006


In [26]:
# Use Python's type() function to assign the type of f500 to f500_type.
type(f500)

pandas.core.frame.DataFrame

In [27]:
# Use the DataFrame.shape attribute to assign the shape of f500 to f500_shape.
f500.shape

(500, 17)

> When asking for the shape, no brackets are used! shape is an attribute of a DataFrame and Series. Attributes of DataFrame or Series do not need brackets. Attributes represent a characteristic of a DataFrame/Series, whereas a method (which requires brackets) do something with the DataFrame/Series

Recall that one of the features that makes pandas better for working with data is its support for string column and row labels. **Axis values can have string labels, not just numeric ones.**  To view the first few rows of our dataframe, we can use the `DataFrame.head()` method. By default, it will return the first five rows of our dataframe. However, it also accepts an optional integer parameter, which specifies the number of rows:

In [28]:
f500.head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [None]:
f500.head(3)

Likewise, we can use the `DataFrame.tail()` method to show us the last rows of our dataframe:

In [29]:
f500.tail()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
495,Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
496,New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
497,Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
498,TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
499,AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


In [30]:
f500.tail(2)

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
498,TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
499,AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


Another feature that makes pandas better for working with data is that dataframes can contain more than one data type. **Dataframes can contain columns with multiple data types: including integer, float, and string.**

We can use the `DataFrame.dtypes` attribute (similar to NumPy's `ndarray.dtype` attribute) to return information about the types of each column. 

In [None]:
f500.dtypes

We can see three different data types, or dtypes.

You may recognize the float64 dtype from our work in NumPy. Pandas uses NumPy dtypes for numeric columns, including integer64. There is also a type we haven't seen before, object, which is used for columns that have data that doesn't fit into any other dtypes. This is almost always used for columns containing string values.

When we import data, pandas will attempt to guess the correct dtype for each column. Generally, pandas does a good job with this, which means we don't need to worry about specifying dtypes every time we start to work with data.

If we wanted an overview of all the dtypes used in our dataframe, along with its shape and other information, we could use the `DataFrame.info()` method. Note that `DataFrame.info()` prints the information, rather than returning it, so we can't assign it to a variable.

In [31]:
f500.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 17 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   company                   500 non-null    object 
 1   rank                      500 non-null    int64  
 2   revenues                  500 non-null    int64  
 3   revenue_change            498 non-null    float64
 4   profits                   499 non-null    float64
 5   assets                    500 non-null    int64  
 6   profit_change             436 non-null    float64
 7   ceo                       500 non-null    object 
 8   industry                  500 non-null    object 
 9   sector                    500 non-null    object 
 10  previous_rank             500 non-null    int64  
 11  country                   500 non-null    object 
 12  hq_location               500 non-null    object 
 13  website                   500 non-null    object 
 14  years_on_g

In [32]:
f500.info(memory_usage="deep")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 17 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   company                   500 non-null    object 
 1   rank                      500 non-null    int64  
 2   revenues                  500 non-null    int64  
 3   revenue_change            498 non-null    float64
 4   profits                   499 non-null    float64
 5   assets                    500 non-null    int64  
 6   profit_change             436 non-null    float64
 7   ceo                       500 non-null    object 
 8   industry                  500 non-null    object 
 9   sector                    500 non-null    object 
 10  previous_rank             500 non-null    int64  
 11  country                   500 non-null    object 
 12  hq_location               500 non-null    object 
 13  website                   500 non-null    object 
 14  years_on_g

The DataFrame.info() method to show us the number of entries in our index (representing the number of rows), a list of each column with their dtype and the number of non-null values, as well as a summary of the different dtypes and memory usage.

We can also see that the DataFrame.info() method showed us the number of entries in our index (representing the number of rows), a list of each column with their dtype and the number of non-null values, as well as a summary of the different dtypes and memory usage. In pandas, null values are represented using NaN, just like in NumPy.

pandas objects have a number of attributes enabling you to access the metadata:

In [33]:
f500.columns

Index(['company', 'rank', 'revenues', 'revenue_change', 'profits', 'assets',
       'profit_change', 'ceo', 'industry', 'sector', 'previous_rank',
       'country', 'hq_location', 'website', 'years_on_global_500_list',
       'employees', 'total_stockholder_equity'],
      dtype='object')

In [34]:
f500.index

RangeIndex(start=0, stop=500, step=1)

In the past, pandas recommended `Series.values` or DataFrame.values for extracting the data from a Series or DataFrame. You’ll still find references to these in old code bases and online. Going forward, we recommend avoiding `.values` and using `.array` or `.to_numpy()`. `.values` has the following drawbacks:
- When your Series contains an extension type, it’s unclear whether Series.values returns a NumPy array or the extension array. `Series.to_numpy()` will always return a NumPy array, potentially at the cost of copying / coercing values.
- When your DataFrame contains a mixture of data types, DataFrame.values may involve copying data and coercing values to a common dtype, a relatively expensive operation. DataFrame.to_numpy(), being a method, makes it clearer that the returned NumPy array may not be a view on the same data in the DataFrame.

In [None]:
f500.values  # returns a NumPy array

In [35]:
f500.to_numpy()

array([['Walmart', 1, 485873, ..., 23, 2300000, 77798],
       ['State Grid', 2, 315199, ..., 17, 926067, 209456],
       ['Sinopec Group', 3, 267518, ..., 19, 713288, 106523],
       ...,
       ['Wm. Morrison Supermarkets', 498, 21741, ..., 13, 77210, 5111],
       ['TUI', 499, 21655, ..., 23, 66779, 3006],
       ['AutoNation', 500, 21609, ..., 12, 26000, 2310]],
      shape=(500, 17), dtype=object)

Set the DataFrame index using existing columns:

In [36]:
f500_company = f500.set_index("company")
f500_company.head()

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [37]:
f500_company.index

Index(['Walmart', 'State Grid', 'Sinopec Group', 'China National Petroleum',
       'Toyota Motor', 'Volkswagen', 'Royal Dutch Shell', 'Berkshire Hathaway',
       'Apple', 'Exxon Mobil',
       ...
       'National Grid', 'Dollar General', 'Telecom Italia',
       'Xiamen ITG Holding Group', 'Xinjiang Guanghui Industry Investment',
       'Teva Pharmaceutical Industries', 'New China Life Insurance',
       'Wm. Morrison Supermarkets', 'TUI', 'AutoNation'],
      dtype='object', name='company', length=500)

## Indexing and selecting data

Object selection has had a number of user-requested additions in order to support more explicit location based indexing. pandas now supports three types of multi-axis indexing.

- `.iloc` is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. `.iloc` will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with Python/NumPy slice semantics). Allowed inputs are:
    - An integer e.g. 5.
    - A list or array of integers [4, 3, 0].
    - A slice object with ints 1:7.
    - A boolean array (any NA values will be treated as False).
    - A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).
    - A tuple of row (and column) indices whose elements are one of the above inputs.
- `.loc` is primarily label based, but may also be used with a boolean array. `.loc` will raise KeyError when the items are not found. Allowed inputs are:
    - A single label, e.g. 5 or 'a' (Note that 5 is interpreted as a label of the index. This use is not an integer position along the index.).
    - A list or array of labels ['a', 'b', 'c'].
    - A slice object with labels 'a':'f' (Note that contrary to usual Python slices, both the start and the stop are included, when present in the index! See Slicing with labels and Endpoints are inclusive.)
    - A boolean array (any NA values will be treated as False).
    - A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).
    - A tuple of row (and column) indices whose elements are one of the above inputs.
- [Selection by callable](https://pandas.pydata.org/docs/user_guide/indexing.html#selection-by-callable)

### `iloc`: select by integer position

<p>Using <code>iloc[]</code> is almost identical to indexing with NumPy, with integer positions starting at <code>0</code> like ndarrays and Python lists.</p>

<div>
<li>With <code>iloc[]</code>, the ending slice <strong>is not</strong> included.</li>
<li>The index labels are integers starting from <code>0</code>.</li>
</ul>
<p>The table below summarizes how we can use <code>DataFrame.iloc[]</code> and <code>Series.iloc[]</code> to select by integer position:</p>
<table>
<thead>
<tr>
<th>Select by integer position</th>
<th>Explicit Syntax</th>
<th>Shorthand Convention</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single column from dataframe</td>
<td><code>df.iloc[:,3]</code></td>
<td></td>
</tr>
<tr>
<td>List of columns from dataframe</td>
<td><code>df.iloc[:,[3,5,6]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of columns from dataframe</td>
<td><code>df.iloc[:,3:7]</code></td>
<td></td>
</tr>
<tr>
<td>Single row from dataframe</td>
<td><code>df.iloc[20]</code></td>
<td></td>
</tr>
<tr>
<td>List of rows from dataframe</td>
<td><code>df.iloc[[0,3,8]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of rows from dataframe</td>
<td><code>df.iloc[3:5]</code></td>
<td><code>df[3:5]</code></td>
</tr>
<tr>
<td>Single items from series</td>
<td><code>s.iloc[8]</code></td>
<td><code>s[8]</code></td>
</tr>
<tr>
<td>List of item from series</td>
<td><code>s.iloc[[2,8,1]]</code></td>
<td><code>s[[2,8,1]]</code></td>
</tr>
<tr>
<td>Slice of items from series</td>
<td><code>s.iloc[5:10]</code></td>
<td><code>s[5:10]</code></td>
</tr>
</tbody>
</table>
</div>

Syntax: `df.iloc[row_index, column_index]`


In [40]:
f500_company.iloc[4]

rank                                                   5
revenues                                          254694
revenue_change                                       7.7
profits                                          16899.3
assets                                            437575
profit_change                                      -12.3
ceo                                          Akio Toyoda
industry                        Motor Vehicles and Parts
sector                            Motor Vehicles & Parts
previous_rank                                          8
country                                            Japan
hq_location                                Toyota, Japan
website                     http://www.toyota-global.com
years_on_global_500_list                              23
employees                                         364445
total_stockholder_equity                          157210
Name: Toyota Motor, dtype: object

In [41]:
f500_company.iloc[0, 0]

np.int64(1)

In [42]:
f500_company.iloc[:, 0]

company
Walmart                             1
State Grid                          2
Sinopec Group                       3
China National Petroleum            4
Toyota Motor                        5
                                 ... 
Teva Pharmaceutical Industries    496
New China Life Insurance          497
Wm. Morrison Supermarkets         498
TUI                               499
AutoNation                        500
Name: rank, Length: 500, dtype: int64

In [None]:
f500_company.iloc[1:5]

In [43]:
# shortcut
f500_company[1:5]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [44]:
# Select the first three rows of the f500_company dataframe
f500_company[:3]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523


In [45]:
# Select the first and seventh rows and the first five columns of the f500_company dataframe
f500_company.iloc[[0, 6], :5]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Walmart,1,485873,0.8,13643.0,198825
Royal Dutch Shell,7,240033,-11.8,4575.0,411275


### `loc`: select by label

<p>Because our axes in pandas have labels, we can select data using those labels — unlike in NumPy, where we needed to know the exact index location. To do this, we can use the <a target="_blank" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html#pandas.DataFrame.loc"><code>DataFrame.loc[]</code> attribute</a>. The syntax for <code>DataFrame.loc[]</code> is:</p>

    df.loc[row_label, column_label]

In [46]:
# Let's select a single column by specifying a single label:
f500_company.loc[:, "rank"]

company
Walmart                             1
State Grid                          2
Sinopec Group                       3
China National Petroleum            4
Toyota Motor                        5
                                 ... 
Teva Pharmaceutical Industries    496
New China Life Insurance          497
Wm. Morrison Supermarkets         498
TUI                               499
AutoNation                        500
Name: rank, Length: 500, dtype: int64

<p>Notice we used <code>:</code> to specify that we wish to select all rows. Also note that the new dataframe has the same row labels as the original.</p>
<p>We can also use the following shortcut to select a single column:</p>

In [47]:
f500_company["rank"]

company
Walmart                             1
State Grid                          2
Sinopec Group                       3
China National Petroleum            4
Toyota Motor                        5
                                 ... 
Teva Pharmaceutical Industries    496
New China Life Insurance          497
Wm. Morrison Supermarkets         498
TUI                               499
AutoNation                        500
Name: rank, Length: 500, dtype: int64

In [49]:
# Select the industry column. Assign the result to the variable name industries.
industries = f500_company["industry"]
type(industries)

pandas.core.series.Series

<img src="./images/df_series_s_updated.svg"> 


In [50]:
type(industries.to_numpy())

numpy.ndarray

In [None]:
industries.shape

In [51]:
industries.to_numpy().shape

(500,)

In [52]:
print(industries.to_numpy().dtype)

object


Select multiple columns:

<img src="images/df_series_df_updated.svg">

In [54]:
f500_company.loc[:, ["country", "rank"]]

Unnamed: 0_level_0,country,rank
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,USA,1
State Grid,China,2
Sinopec Group,China,3
China National Petroleum,China,4
Toyota Motor,Japan,5
...,...,...
Teva Pharmaceutical Industries,Israel,496
New China Life Insurance,China,497
Wm. Morrison Supermarkets,Britain,498
TUI,Germany,499


In [55]:
f500_company[["country", "rank"]]

Unnamed: 0_level_0,country,rank
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,USA,1
State Grid,China,2
Sinopec Group,China,3
China National Petroleum,China,4
Toyota Motor,Japan,5
...,...,...
Teva Pharmaceutical Industries,Israel,496
New China Life Insurance,China,497
Wm. Morrison Supermarkets,Britain,498
TUI,Germany,499


<p>Use a <strong>a slice object with labels</strong> to select specific columns:</p>

In [56]:
f500_company.loc[:, "rank":"profits"]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Walmart,1,485873,0.8,13643.0
State Grid,2,315199,-4.4,9571.3
Sinopec Group,3,267518,-9.1,1257.9
China National Petroleum,4,262573,-12.3,1867.5
Toyota Motor,5,254694,7.7,16899.3
...,...,...,...,...
Teva Pharmaceutical Industries,496,21903,11.5,329.0
New China Life Insurance,497,21796,-13.3,743.9
Wm. Morrison Supermarkets,498,21741,-11.3,406.4
TUI,499,21655,-5.5,1151.7


<p>We again get a dataframe object, with all of the columns from the first up until — <strong>and including</strong> — the last column in our slice. Also note there is no shortcut for selecting column slices.</p>

You may access an index on a Series or column on a DataFrame directly as an attribute:

In [57]:
f500_company.revenues

company
Walmart                           485873
State Grid                        315199
Sinopec Group                     267518
China National Petroleum          262573
Toyota Motor                      254694
                                   ...  
Teva Pharmaceutical Industries     21903
New China Life Insurance           21796
Wm. Morrison Supermarkets          21741
TUI                                21655
AutoNation                         21609
Name: revenues, Length: 500, dtype: int64

<div class="admonition warning">
<p class="admonition-title">Warning</p>
<ul class="simple">
<li><p>You can use this access only if the index element is a valid Python identifier, e.g. <code class="docutils literal notranslate"><span class="pre">s.1</span></code> is not allowed.
See <a class="reference external" href="https://docs.python.org/3/reference/lexical_analysis.html#identifiers">here for an explanation of valid identifiers</a>.</p></li>
<li><p>The attribute will not be available if it conflicts with an existing method name, e.g. <code class="docutils literal notranslate"><span class="pre">s.min</span></code> is not allowed, but <code class="docutils literal notranslate"><span class="pre">s['min']</span></code> is possible.</p></li>
<li><p>Similarly, the attribute will not be available if it conflicts with any of the following list: <code class="docutils literal notranslate"><span class="pre">index</span></code>,
<code class="docutils literal notranslate"><span class="pre">major_axis</span></code>, <code class="docutils literal notranslate"><span class="pre">minor_axis</span></code>, <code class="docutils literal notranslate"><span class="pre">items</span></code>.</p></li>
<li><p>In any of these cases, standard indexing will still work, e.g. <code class="docutils literal notranslate"><span class="pre">s['1']</span></code>, <code class="docutils literal notranslate"><span class="pre">s['min']</span></code>, and <code class="docutils literal notranslate"><span class="pre">s['index']</span></code> will
access the corresponding element or column.</p></li>
</ul>
</div>

<p>A summary of the techniques:</p>
<p></p><center>
<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Common Shorthand</th>
<th>Other Shorthand</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single column</td>
<td><code>df.loc[:,"col1"]</code></td>
<td bgcolor="#00FF00"><code>df["col1"]</code></td>
<td><code>df.col1</code></td>
</tr>
<tr>
<td>List of columns</td>
<td><code>df.loc[:,["col1", "col7"]]</code></td>
<td bgcolor="#00FF00"><code>df[["col1", "col7"]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of columns</td>
<td bgcolor="#00FF00"><code>df.loc[:,"col1":"col4"]</code></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</center><p></p>

In [58]:
# Select the country column.
f500_company["country"]

company
Walmart                               USA
State Grid                          China
Sinopec Group                       China
China National Petroleum            China
Toyota Motor                        Japan
                                   ...   
Teva Pharmaceutical Industries     Israel
New China Life Insurance            China
Wm. Morrison Supermarkets         Britain
TUI                               Germany
AutoNation                            USA
Name: country, Length: 500, dtype: object

In [59]:
# In order, select the revenues and years_on_global_500_list columns.
f500_company[["revenues", "years_on_global_500_list"]]

Unnamed: 0_level_0,revenues,years_on_global_500_list
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,485873,23
State Grid,315199,17
Sinopec Group,267518,19
China National Petroleum,262573,17
Toyota Motor,254694,23
...,...,...
Teva Pharmaceutical Industries,21903,1
New China Life Insurance,21796,2
Wm. Morrison Supermarkets,21741,13
TUI,21655,23


In [60]:
# In order, select all columns from ceo up to and including sector.
f500_company.loc[:, "ceo":"sector"]

Unnamed: 0_level_0,ceo,industry,sector
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Walmart,C. Douglas McMillon,General Merchandisers,Retailing
State Grid,Kou Wei,Utilities,Energy
Sinopec Group,Wang Yupu,Petroleum Refining,Energy
China National Petroleum,Zhang Jianhua,Petroleum Refining,Energy
Toyota Motor,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts
...,...,...,...
Teva Pharmaceutical Industries,Yitzhak Peterburg,Pharmaceuticals,Health Care
New China Life Insurance,Wan Feng,"Insurance: Life, Health (stock)",Financials
Wm. Morrison Supermarkets,David T. Potts,Food and Drug Stores,Food & Drug Stores
TUI,Friedrich Joussen,Travel Services,Business Services


Now that we've learned how to select columns by label, let's learn how to select rows using the labels of the index axis.

We use the same syntax to select rows from a dataframe as we do for columns:

    df.loc[row_label, column_label]

In [61]:
single_row = f500_company.loc["Sinopec Group"]
single_row

rank                                             3
revenues                                    267518
revenue_change                                -9.1
profits                                     1257.9
assets                                      310726
profit_change                                -65.0
ceo                                      Wang Yupu
industry                        Petroleum Refining
sector                                      Energy
previous_rank                                    4
country                                      China
hq_location                         Beijing, China
website                     http://www.sinopec.com
years_on_global_500_list                        19
employees                                   713288
total_stockholder_equity                    106523
Name: Sinopec Group, dtype: object

In [62]:
type(single_row)

pandas.core.series.Series

In [63]:
print(single_row.dtype)

object


Note the object returned is a series because it is one-dimensional. Since this series has to store integer, float, and string values, pandas uses the object dtype, since none of the numeric types could cater for all values.

In [64]:
# Select a list of rows
f500_company.loc[["Toyota Motor", "Walmart"]]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798


In [65]:
# Select a slice object with labels
f500_company["State Grid":"Toyota Motor"]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [66]:
f500_company.loc["State Grid":"Toyota Motor"]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Shorthand Convention</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single row from dataframe</td>
<td bgcolor="#00FF00"><code>df.loc["row4"]</code></td>
<td></td>
</tr>
<tr>
<td>List of rows from dataframe</td>
<td bgcolor="#00FF00"><code>df.loc[["row1", "row8"]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of rows from dataframe</td>
<td bgcolor="#00FF00"><code>df.loc["row3":"row5"]</code></td>
<td><code>df["row3":"row5"]</code></td>
</tr>
</tbody>
</table>


In [67]:
# Create a new variable big_movers, with: Rows with indices Aviva, HP, JD.com, and BHP Billiton, in that order.
# The rank and previous_rank columns, in that order.
big_movers = f500_company.loc[["Aviva", "HP", "JD.com", "BHP Billiton"], ["rank", "previous_rank"]]
big_movers

Unnamed: 0_level_0,rank,previous_rank
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Aviva,90,279
HP,194,48
JD.com,261,366
BHP Billiton,350,168


In [68]:
# Create a new variable, bottom_companies with: All rows with indices from National Gridto AutoNation, inclusive.
# The rank, sector, and country columns
bottom_companies = f500_company.loc["National Grid":"AutoNation", ["rank", "sector", "country"]]
bottom_companies

Unnamed: 0_level_0,rank,sector,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
National Grid,491,Energy,Britain
Dollar General,492,Retailing,USA
Telecom Italia,493,Telecommunications,Italy
Xiamen ITG Holding Group,494,Wholesalers,China
Xinjiang Guanghui Industry Investment,495,Wholesalers,China
Teva Pharmaceutical Industries,496,Health Care,Israel
New China Life Insurance,497,Financials,China
Wm. Morrison Supermarkets,498,Food & Drug Stores,Britain
TUI,499,Business Services,Germany
AutoNation,500,Retailing,USA


**Selecting Items from a Series**

In [69]:
sectors = f500_company["sector"]
sectors

company
Walmart                                        Retailing
State Grid                                        Energy
Sinopec Group                                     Energy
China National Petroleum                          Energy
Toyota Motor                      Motor Vehicles & Parts
                                           ...          
Teva Pharmaceutical Industries               Health Care
New China Life Insurance                      Financials
Wm. Morrison Supermarkets             Food & Drug Stores
TUI                                    Business Services
AutoNation                                     Retailing
Name: sector, Length: 500, dtype: object

In [70]:
print(type(sectors))

<class 'pandas.core.series.Series'>


In [71]:
sectors["Sinopec Group"]

'Energy'

In [72]:
sectors.loc["Sinopec Group"]

'Energy'

In [73]:
sectors[["Toyota Motor", "Walmart", "TUI"]]

company
Toyota Motor    Motor Vehicles & Parts
Walmart                      Retailing
TUI                  Business Services
Name: sector, dtype: object

In [74]:
sectors["State Grid":"Toyota Motor"]

company
State Grid                                  Energy
Sinopec Group                               Energy
China National Petroleum                    Energy
Toyota Motor                Motor Vehicles & Parts
Name: sector, dtype: object

<div>

<p>We can use <a target="_blank" href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.loc.html"><code>Series.loc[]</code></a> to select items from a series using single labels, a list, or a slice object. We can also omit <code>loc[]</code> and use bracket shortcuts for all three:</p>
<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Shorthand Convention</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single item from series</td>
<td><code>s.loc["item8"]</code></td>
<td bgcolor="#00FF00"> <code>s["item8"]</code></td>
</tr>
<tr>
<td>List of items from series</td>
<td><code>s.loc[["item1","item7"]]</code></td>
<td bgcolor="#00FF00"><code>s[["item1","item7"]]</code></td>
</tr>
<tr>
<td>Slice of items from series</td>
<td><code>s.loc["item2":"item4"]</code></td>
<td bgcolor="#00FF00"><code>s["item2":"item4"]</code></td>
</tr>
</tbody>
</table>
</div>


Pay attention with integer labels:

<img src="images/integer_labels_2.svg">

In [76]:
f500_rank = f500.set_index("rank")
f500_rank

Unnamed: 0_level_0,company,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,Walmart,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
2,State Grid,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
3,Sinopec Group,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
4,China National Petroleum,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
5,Toyota Motor,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
496,Teva Pharmaceutical Industries,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
497,New China Life Insurance,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
498,Wm. Morrison Supermarkets,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
499,TUI,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006


In [77]:
f500_rank.iloc[1]

company                                 State Grid
revenues                                    315199
revenue_change                                -4.4
profits                                     9571.3
assets                                      489838
profit_change                                 -6.2
ceo                                        Kou Wei
industry                                 Utilities
sector                                      Energy
previous_rank                                    2
country                                      China
hq_location                         Beijing, China
website                     http://www.sgcc.com.cn
years_on_global_500_list                        17
employees                                   926067
total_stockholder_equity                    209456
Name: 2, dtype: object

In [78]:
f500_rank.loc[1]

company                                    Walmart
revenues                                    485873
revenue_change                                 0.8
profits                                    13643.0
assets                                      198825
profit_change                                 -7.2
ceo                            C. Douglas McMillon
industry                     General Merchandisers
sector                                   Retailing
previous_rank                                    1
country                                        USA
hq_location                        Bentonville, AR
website                     http://www.walmart.com
years_on_global_500_list                        23
employees                                  2300000
total_stockholder_equity                     77798
Name: 1, dtype: object

In [81]:
print("a")

a
