# Pandas library and basic table data manipulation

Before any complex analysis, it is necessary to learn the basics of working with processed data. Data can be of various natures - one-dimensional, two-dimensional, structured, unstructured, visual, audio ... In the lessons of exploratory data analysis, we will work mostly with **tabular** data - such as those you certainly know from your favorite (or unpopular) tabular processor ("spreadsheet"). Usually every **row** of such a table corresponds to some thing, an example of something, or some observation. In the individual **columns** there are individual properties, measured quantities or characteristic for these things.
In the Python world, the **pandas** library is most commonly used to process tabular data. It allows you to read data from many formats (including XLS(x) workbooks), edit them in various ways, calculate columns very efficiently, directly examine some statistical indicators and last but not least, nicely visualize the results. This lesson will introduce you to the basic concepts and teach you how to access individual columns, rows and cells.
You can find more about the pandas library on its homepage: https://pandas.pydata.org/

## Import the `pandas` library

In [1]:
import pandas as pd # We will access pandas using the alias pd

💡 Although this command imports the `pandas` module (or library), it will not be available under its usual name, but under the **alias** `pd`. Conversely, the name `pandas` will no longer be defined. In normal programming, we try to avoid aliases because they reduce the readability of the code for other programmers. It's different with data analytics, because using one alias, which is also very common, saves us a lot of typing. And programmers are lazy.

In [2]:
# pandas -> Raises NameError

## Load the data table

We will jump straight into the `pandas` and show a typical example of data that we will process with this library.
To read data, Pandas has a number of `read_ *` functions that allows you to handle many different formats. The CSV ("comma-separated values" - [wiki](https://en.wikipedia.org/wiki/CSV)) is relatively common, in which each record corresponds to one line, the individual properties of the record are then separated by commas (or another character).
To work with this notebook, first download the data file from `pokemon.csv` from repository into your working folder. Experimental data are generated from [Comprehensive Pokedex at Github](https://github.com/veekun/pokedex).

In [3]:
pokemon_table = pd.read_csv("pokemon.csv")

The data (whatever it is) is now read into memory, referenced by the `pokemon_table` variable. Let&#39;s see what is hidden in them.

In [4]:
pokemon_table

Unnamed: 0,id,name,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
0,1,bulbasaur,0.7,6.9,green,quadruped,False,Grass,Poison,45,49,49,45
1,2,ivysaur,1.0,13.0,green,quadruped,False,Grass,Poison,60,62,63,60
2,3,venusaur,2.0,100.0,green,quadruped,False,Grass,Poison,80,82,83,80
3,4,charmander,0.6,8.5,red,upright,False,Fire,,39,52,43,65
4,5,charmeleon,1.1,19.0,red,upright,False,Fire,,58,64,58,80
...,...,...,...,...,...,...,...,...,...,...,...,...,...
802,803,poipole,0.6,1.8,purple,upright,False,Poison,,67,73,67,73
803,804,naganadel,3.6,150.0,purple,wings,False,Poison,Dragon,73,73,73,121
804,805,stakataka,5.5,820.0,gray,quadruped,False,Rock,Steel,61,131,211,13
805,806,blacephalon,1.8,13.0,white,humanoid,False,Fire,Ghost,53,127,53,107


If everything worked as it should, you should have a relatively nicely formatted table in front of you. The basic display in the notebook will show you the first five and last five rows (who would risk thousands of rows flooding the browser window?) of the table, along with information about the total number of rows and columns. In this case, the table contains a total of 13 properties (named `columns`) for 807 different Pokemons (numbered `rows`).
⚠️ **Warning:** In this simple case, the table was loaded correctly on the first try, without any specific parameters, all columns seem to contain usable values. This is (especially in the CSV format) actually quite some luck. There are usually problems with the input data - for example, they do not have named columns (or they have them described strangely), they use special record delimiters or decimal comma/dot confusion, many rows have missing values (or are misspelled), ... 

**Data cleaning** is somethign we will dedicate some time to in the future.            
What exactly is the object stored in the `pokemon_table` variable? What is the class like?

## Base classes in `pandas` - DataFrame, Series, Index

In [5]:
type(pokemon_table)

pandas.core.frame.DataFrame

The answer is `DataFrame`. This term (also used in other popular statistical languages, such as [R](https://www.r-project.org/)) can be in common speech replaced by word table, so it can happen to us from time to time as well. Now let's try to subset our first DataFrame.              
⚠️ **Warning:** You will soon notice that the `DataFrame` has very similar functions to a spreadsheet workbook, but you need to know where this parallel ends. Unlike Excel or LibreOffice Calc workbooks, the DataFrame contains "only" dry data, does not store any formatting, and does not offer an "editor". A nice visual representation is just a matter of interacting `pandas` with a Jupyter notebook, or you can write your own code for it.

In [6]:
heights = pokemon_table ["height"]
heights

0      0.7
1      1.0
2      2.0
3      0.6
4      1.1
      ... 
802    0.6
803    3.6
804    5.5
805    1.8
806    1.5
Name: height, Length: 807, dtype: float64

💡 The DataFrame behaves similarly to a dictionary (`dict`), among other things - when you put a key in square brackets, you get a column named this way. In fact, square brackets allow you to select from tables based on various other criteria, but we'll get to that.
Our autopsy goes on to find out what the `heights` variable is.

In [7]:
type(heights)

pandas.core.series.Series

### Series

The columns are of the `Series` type. This type looks like a list (`list`). We will check if it behaves this way:

In [8]:
heights[0] # First height? ✓

0.7

In [9]:
heights[-5:] # Last five heights? ✓

802    0.6
803    3.6
804    5.5
805    1.8
806    1.5
Name: height, dtype: float64

**Task**: Try to apply some other list operations that you already know to `heights`. Sometimes it works, sometimes it doesn't.

In [12]:
heights.count()

807

In [13]:
heights.min()

0.1

In [14]:
heights.max()

14.5

In [18]:
heights.unique()

array([ 0.7,  1. ,  2. ,  0.6,  1.1,  1.7,  0.5,  1.6,  0.3,  1.5,  1.2,
        3.5,  0.4,  0.8,  1.3,  0.9,  1.4,  0.2,  1.9,  1.8,  8.8,  2.2,
        6.5,  2.5,  2.1,  4. ,  2.3,  9.2,  5.2,  3.8, 14.5,  2.7,  6.2,
        4.5,  7. ,  2.4,  5.4,  4.2,  3.7,  3.2,  3.3,  0.1,  2.6,  2.8,
        2.9,  3. ,  5.8,  5. ,  3.9,  3.4,  5.5,  3.6])

There is also no problem between converting lists and `Series`. The easiest way you can create your own Series (notice that this is outside the table context) is to create an instance of this class with some `list` as an argument:

In [19]:
numbers = pd.Series([1, 2, 3])
numbers

0    1
1    2
2    3
dtype: int64

In [22]:
type(numbers)

pandas.core.series.Series

And vice versa:

In [26]:
type(numbers.tolist()) # Variant 1 (preferred, faster) 
#list(numbers) # Option 2

list

So how does the `Series` differ from the `list`, and what is its advantage?


In particular, each column has the following five basic properties:

<img src="static/series.svg" style="max-height: 20em;">

#### 1) values

In [27]:
heights[:50].values # For aesthetic reasons, we will shorten the column a bit

array([0.7, 1. , 2. , 0.6, 1.1, 1.7, 0.5, 1. , 1.6, 0.3, 0.7, 1.1, 0.3,
       0.6, 1. , 0.3, 1.1, 1.5, 0.3, 0.7, 0.3, 1.2, 2. , 3.5, 0.4, 0.8,
       0.6, 1. , 0.4, 0.8, 1.3, 0.5, 0.9, 1.4, 0.6, 1.3, 0.6, 1.1, 0.5,
       1. , 0.8, 1.6, 0.5, 0.8, 1.2, 0.3, 1. , 1. , 1.5, 0.2])

In [29]:
type(heights.values)

numpy.ndarray

💡 The values in `Series` are stored in a special format based on the `ndarray` type from the `numpy` library.
We will not pay attention to this now, but especially in the case of numerical values it saves memory space and speeds up mathematical operations
(for example, adding up all values in a Series is significantly faster than in a list).

#### 2) type of values

In [30]:
heights.dtype

dtype('float64')

💡 Unlike lists, all `Series` elements should be of the same type (if they are not, the next common supertype is selected). `pandas` has its own set of types, called **dtypes**, which partially copies the default data types in Python, but (especially for numeric types) is closer to how the processor works with them. And don't look for inheritance (good news?). We'll introduce the most common types next time - along with the operations that can be done with columns.

#### 3) index

In [31]:
heights.index

RangeIndex(start=0, stop=807, step=1)

💡 You access the elements of the list in numerical order (0 - first element, 1 - second, ...).

From the dictionary you select according to the key.

Pandas introduces a generalized **index**, which can be numeric, string, but even built on date/time. We will talk later about different indices.

#### 4) name

In [32]:
heights.name

'height'

💡 `Series` may or may not have a name. Note that this is a value stored inside the object itself, it has nothing to do with the name of the variable in which you store it (but it will be used to access it for the column in the table).

#### 5) size

In [33]:
heights.size

807

💡 This property tells you how many elements there are in the `Series`. It is not magical, it behaves like `len` next to the list (and after all,`len` can also be used on `Series`). For completeness, we state that, unlike other properties, this one is read-only.

**Task:** Find the values of the `name`, `index`, `dtype`, `values` and `size` attributes for the `numbers` object. Do you notice anything interesting?

Alternatively, look at the same for some of the other columns in `pokemon_table`.

In [36]:
type(numbers)

pandas.core.series.Series

In [38]:
name(numbers)

NameError: name 'name' is not defined

When creating `Series` objects, these attributes (except`size` and to a limited extent `dtype`) can be explicitly specified:

In [47]:
age = pd.Series(
    [27, 65, 14],
    name = "Age",
    index = ["Karla", "Martina", "Tyna"],
    dtype=float,
)
age

Karla      27.0
Martina    65.0
Tyna       14.0
Name: Age, dtype: float64

**Task:** Create a Series object that will contain a list of colors, animals, numbers, or some other category of things you like.

In [48]:
colors = pd.Series(
    ["#F08080", "#FF0000", "#00FF00", "#0000FF"],
    name = "color",
    index = ["Ligth coral", "Red", "Green", "Blue"],
    dtype= str,
)
colors

Ligth coral    #F08080
Red            #FF0000
Green          #00FF00
Blue           #0000FF
Name: color, dtype: object

## Index

By default, columns and tables use an unnamed numeric index, which sorts the elements one after the other from zero upwards:

In [49]:
heights.index

RangeIndex(start=0, stop=807, step=1)

However, there are other types of indices:

In [50]:
age.index

Index(['Karla', 'Martina', 'Tyna'], dtype='object')

In [51]:
colors.index

Index(['Ligth coral', 'Red', 'Green', 'Blue'], dtype='object')

In [52]:
events = pd.Series(["Independence of Czechoslovakia", "End of World War II", "Velvet Revolution"],
                   index = pd.Index([1918, 1945, 1989]), name = "year") # Index can also have a name)
events

1918    Independence of Czechoslovakia
1945               End of World War II
1989                 Velvet Revolution
Name: year, dtype: object

In [54]:
events.index #This is not ideal, since its not sequential, so i wont work as index

Int64Index([1918, 1945, 1989], dtype='int64')

This index is numerical, but the values are not (or are, but could not be) sorted and are "leaky".

In [55]:
events_precise = pd.Series(
    ["Independence of Czechoslovakia", "End of World War II", "Velvet Revolution"],
    index = pd.DatetimeIndex(['1918-10-28', '1945-05-08', '1989-11-17'])
)
events_precise.index

DatetimeIndex(['1918-10-28', '1945-05-08', '1989-11-17'], dtype='datetime64[ns]', freq=None)

By using datime it's ordering automatically

The index values can then be used in square brackets to access `Series` elements, similar to a dictionary. But the possibilities are much wider, we will show them in a short time in the context of `DataFrame`.

In [56]:
age["Martina"]

65.0

In [57]:
pokemon_table.index

RangeIndex(start=0, stop=807, step=1)

In [58]:
pokemon_table.head()

Unnamed: 0,id,name,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
0,1,bulbasaur,0.7,6.9,green,quadruped,False,Grass,Poison,45,49,49,45
1,2,ivysaur,1.0,13.0,green,quadruped,False,Grass,Poison,60,62,63,60
2,3,venusaur,2.0,100.0,green,quadruped,False,Grass,Poison,80,82,83,80
3,4,charmander,0.6,8.5,red,upright,False,Fire,,39,52,43,65
4,5,charmeleon,1.1,19.0,red,upright,False,Fire,,58,64,58,80


**Task:** Whatheaddex does the Pokémon table have?

## DataFrame

<img src="static/df.svg" style="max-height: 25em;"/>

Once we are familiar with the columns and indexes, we can return to the table, respectively `DataFrame`.

Just like the `Series` is a container of values associated with an index, the `DataFrame` is a two-dimensional container that, in addition to values (`.values`), contains two indexes - one for rows and one for columns:

In [59]:
pokemon_table.columns # Column list

Index(['id', 'name', 'height', 'weight', 'color', 'shape', 'is baby', 'type 1',
       'type 2', 'hp', 'attack', 'defense', 'speed'],
      dtype='object')

In [60]:
pokemon_table.index #Index (list of rows)

RangeIndex(start=0, stop=807, step=1)

In [61]:
pokemon_table.values

array([[1, 'bulbasaur', 0.7, ..., 49, 49, 45],
       [2, 'ivysaur', 1.0, ..., 62, 63, 60],
       [3, 'venusaur', 2.0, ..., 82, 83, 80],
       ...,
       [805, 'stakataka', 5.5, ..., 131, 211, 13],
       [806, 'blacephalon', 1.8, ..., 127, 53, 107],
       [807, 'zeraora', 1.5, ..., 112, 75, 143]], dtype=object)

In [62]:
pokemon_table.shape # Size (number of rows x number of columns)

(807, 13)

There are several ways to construct a new table (in addition to retrieving data from an external file), the most common of which are probably from a list of dictionaries or a dictionary of lists. As with `Series`, some attributes can be supplied as additional arguments.

In [65]:
pd.DataFrame({
    "number": [1, 2, 3, 4],
    "letter": ["a", "b", "c", "NA"]
})

Unnamed: 0,number,letter
0,1,a
1,2,b
2,3,c
3,4,


In [66]:
pd.DataFrame([
        {"name": "butter", "price": 42.90},
        {"name": "cheese", "price": 31.90},
        {"name": "ketchup", "price": 49.90 },   
    ],
    index=["item1", "item2", "item3"]
) #List of dictionaries

Unnamed: 0,name,price
item1,butter,42.9
item2,cheese,31.9
item3,ketchup,49.9


**Task:** Create a table (`DataFrame`) that will contain "first name", "last name" and "age" columns for characters from one of your favorite novels or movies. You can, but you don't have to use an index on it.

## Indexing
Rows, columns, numerical order, keys, ranges ... Pandas sometimes behave like lists, sometimes like dictionaries. So how do you get value from them? There is a lot to do, so accessing parts of the table  with simple square brackets `[]` is not enough.

For starters, let's adjust our pokemon table to have an interesting and easy-to-grasp index. We will use two methods of the `DataFrame` class (both return a new `DataFrame` instance, derived from the instance we are calling them to):
* `set_index` returns a table in which one of the columns is used as an index
* `sort_index` returns a table that contains the same index but sorted

In [69]:
pokemons = pokemon_table.set_index("name").sort_index()
pokemons

Unnamed: 0_level_0,id,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
abomasnow,460,2.2,135.5,white,upright,False,Grass,Ice,90,92,75,60
abra,63,0.9,19.5,brown,upright,False,Psychic,,25,20,15,90
absol,359,1.2,47.0,white,quadruped,False,Dark,,65,130,60,75
accelgor,617,0.8,25.3,red,arms,False,Bug,,80,70,40,145
aegislash,681,1.7,53.0,brown,blob,False,Steel,Ghost,60,50,150,60
...,...,...,...,...,...,...,...,...,...,...,...,...
zoroark,571,1.6,81.1,gray,upright,False,Dark,,60,105,60,105
zorua,570,0.7,12.5,gray,quadruped,False,Dark,,40,65,40,65
zubat,41,0.8,7.5,purple,wings,False,Poison,Flying,40,45,35,55
zweilous,634,1.4,50.0,blue,quadruped,False,Dark,Dragon,72,85,70,58


In [70]:
pokemons.index

Index(['abomasnow', 'abra', 'absol', 'accelgor', 'aegislash', 'aerodactyl',
       'aggron', 'aipom', 'alakazam', 'alomomola',
       ...
       'zapdos', 'zebstrika', 'zekrom', 'zeraora', 'zigzagoon', 'zoroark',
       'zorua', 'zubat', 'zweilous', 'zygarde'],
      dtype='object', name='name', length=807)

### `[]`

Let&#39;s start with square brackets:
* For `Series`, it returns the value to which the corresponding key belongs in the index (we showed this above).
* For `DataFrame` returns a column with the appropriate name

In [71]:
pokemons["height"]

name
abomasnow    2.2
abra         0.9
absol        1.2
accelgor     0.8
aegislash    1.7
            ... 
zoroark      1.6
zorua        0.7
zubat        0.8
zweilous     1.4
zygarde      5.0
Name: height, Length: 807, dtype: float64

If you put several values in the list in parentheses next to `DataFrame`, you will get more columns (and therefore `DataFrame` back!):

In [72]:
pokemons[["height", "weight"]]

Unnamed: 0_level_0,height,weight
name,Unnamed: 1_level_1,Unnamed: 2_level_1
abomasnow,2.2,135.5
abra,0.9,19.5
absol,1.2,47.0
accelgor,0.8,25.3
aegislash,1.7,53.0
...,...,...
zoroark,1.6,81.1
zorua,0.7,12.5
zubat,0.8,7.5
zweilous,1.4,50.0


**Task:** What happens when you do the same with the `Series`?

**Task:** Which of the last 5 pokemons (in alphabetical order) is the fastest?

In [76]:
#pokemons.set_index("speed").sort_index()[-5:]

pokemons["speed"][-5:] 

name
zoroark     105
zorua        65
zubat        55
zweilous     58
zygarde      95
Name: speed, dtype: int64

### `.loc []`
When we want to get a row, we use the `loc` attribute, the so-called indexer. Be careful, this is not a method and round brackets are not used, but square ones are. (There are reasons for this - this is the only way we can elegantly use abbreviated colon notation for ranges).

In [78]:
pokemons.loc["abra"] #We got a series with all its propierties

id              63
height         0.9
weight        19.5
color        brown
shape      upright
is baby      False
type 1     Psychic
type 2         NaN
hp              25
attack          20
defense         15
speed           90
Name: abra, dtype: object

We were interested in the row with the index &quot;abra&quot; and we got the expected result - `Series`, where each value is indexed by the name of the column.
However, the situation becomes interesting when we start using ranges in the index (remember that dictionaries can&#39;t do that):

In [79]:
pokemons.loc["z":]

Unnamed: 0_level_0,id,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
zangoose,335,1.3,40.3,white,upright,False,Normal,,73,115,60,90
zapdos,145,1.6,52.6,yellow,wings,False,Electric,Flying,90,90,85,100
zebstrika,523,1.6,79.5,black,quadruped,False,Electric,,75,100,63,116
zekrom,644,2.9,345.0,black,upright,False,Dragon,Electric,100,150,120,90
zeraora,807,1.5,44.5,yellow,humanoid,False,Electric,,88,112,75,143
zigzagoon,263,0.4,17.5,brown,quadruped,False,Normal,,38,30,41,60
zoroark,571,1.6,81.1,gray,upright,False,Dark,,60,105,60,105
zorua,570,0.7,12.5,gray,quadruped,False,Dark,,40,65,40,65
zubat,41,0.8,7.5,purple,wings,False,Poison,Flying,40,45,35,55
zweilous,634,1.4,50.0,blue,quadruped,False,Dark,Dragon,72,85,70,58


Pandas intelligently understood that we wanted all the keys in some range, even without them being present in the index.    
⚠️ However, this can only be done with a sorted index. If the index is not sorted, the range of existing keys is selected in order, including both extremes, as follows:

In [80]:
pokemons.loc["zangoose":"zygarde"]

Unnamed: 0_level_0,id,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
zangoose,335,1.3,40.3,white,upright,False,Normal,,73,115,60,90
zapdos,145,1.6,52.6,yellow,wings,False,Electric,Flying,90,90,85,100
zebstrika,523,1.6,79.5,black,quadruped,False,Electric,,75,100,63,116
zekrom,644,2.9,345.0,black,upright,False,Dragon,Electric,100,150,120,90
zeraora,807,1.5,44.5,yellow,humanoid,False,Electric,,88,112,75,143
zigzagoon,263,0.4,17.5,brown,quadruped,False,Normal,,38,30,41,60
zoroark,571,1.6,81.1,gray,upright,False,Dark,,60,105,60,105
zorua,570,0.7,12.5,gray,quadruped,False,Dark,,40,65,40,65
zubat,41,0.8,7.5,purple,wings,False,Poison,Flying,40,45,35,55
zweilous,634,1.4,50.0,blue,quadruped,False,Dark,Dragon,72,85,70,58


If you want to get to a specific value, you use two keys in square brackets in the order *row*, *column*.

In [82]:
pokemons.loc["zorua", "color"]#FIRST ROWS ALWAYS, THEN COLUMNS ALWAYS.

'gray'

But pay attention to the number of parentheses. If a list of keys appears in parentheses, all matching rows or values are selected in that dimension:

In [83]:
pokemons.loc[["zorua", "zubat"]]

Unnamed: 0_level_0,id,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
zorua,570,0.7,12.5,gray,quadruped,False,Dark,,40,65,40,65
zubat,41,0.8,7.5,purple,wings,False,Poison,Flying,40,45,35,55


Of course, approaches can (or not?) be combined, so you can select ranges and lists in rows and columns independently:

In [84]:
pokemons.loc["j":"k", ["color", "attack"]] #The last minus one, for that reason no K names... 

Unnamed: 0_level_0,color,attack
name,Unnamed: 1_level_1,Unnamed: 2_level_1
jangmo-o,gray,55
jellicent,white,60
jigglypuff,pink,45
jirachi,yellow,100
jolteon,yellow,65
joltik,yellow,47
jumpluff,blue,55
jynx,red,50


**Task:** What color are (all) Pokemon whose name starts with "z"?

In [87]:
pokemons.loc["z":, "color"].unique()

array(['white', 'yellow', 'black', 'brown', 'gray', 'purple', 'blue',
       'green'], dtype=object)

**Task:** How many pokemons exist with a name between the letters "d" a "f"?

**Task:** From the list of all pokemons, you select 5 with a name you like (avoid the first and last five). What type are they? Which is the highest and which is the heaviest?

### `.iloc []`
If we want to forget for a moment what index is used for a table or column, we can access the elements directly through their order (row or column numbers). This is intuitive and corresponds to the indexing you are used to working with lists.

In [88]:
pokemons.iloc[44]

id                15
height           1.0
weight          29.5
color         yellow
shape      bug-wings
is baby        False
type 1           Bug
type 2        Poison
hp                65
attack            90
defense           40
speed             75
Name: beedrill, dtype: object

In [89]:
pokemons.iloc[-10:]

Unnamed: 0_level_0,id,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
zapdos,145,1.6,52.6,yellow,wings,False,Electric,Flying,90,90,85,100
zebstrika,523,1.6,79.5,black,quadruped,False,Electric,,75,100,63,116
zekrom,644,2.9,345.0,black,upright,False,Dragon,Electric,100,150,120,90
zeraora,807,1.5,44.5,yellow,humanoid,False,Electric,,88,112,75,143
zigzagoon,263,0.4,17.5,brown,quadruped,False,Normal,,38,30,41,60
zoroark,571,1.6,81.1,gray,upright,False,Dark,,60,105,60,105
zorua,570,0.7,12.5,gray,quadruped,False,Dark,,40,65,40,65
zubat,41,0.8,7.5,purple,wings,False,Poison,Flying,40,45,35,55
zweilous,634,1.4,50.0,blue,quadruped,False,Dark,Dragon,72,85,70,58
zygarde,718,5.0,305.0,green,squiggle,False,Dragon,Ground,108,100,121,95


Here, too, it is possible to combine. So when someone asks you for a value that is &quot;bottom left&quot;, you can try:

In [94]:
pokemons.iloc[-1, 0]

718

Finally, just for completeness, let's introduce three convenient functions that select the first, last or random rows from the table (all three have an optional parameter specifying the number of rows wanted):

In [95]:
pokemons.head() # The first few lines

Unnamed: 0_level_0,id,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
abomasnow,460,2.2,135.5,white,upright,False,Grass,Ice,90,92,75,60
abra,63,0.9,19.5,brown,upright,False,Psychic,,25,20,15,90
absol,359,1.2,47.0,white,quadruped,False,Dark,,65,130,60,75
accelgor,617,0.8,25.3,red,arms,False,Bug,,80,70,40,145
aegislash,681,1.7,53.0,brown,blob,False,Steel,Ghost,60,50,150,60


In [96]:
pokemons.tail () # Last few lines

Unnamed: 0_level_0,id,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
zoroark,571,1.6,81.1,gray,upright,False,Dark,,60,105,60,105
zorua,570,0.7,12.5,gray,quadruped,False,Dark,,40,65,40,65
zubat,41,0.8,7.5,purple,wings,False,Poison,Flying,40,45,35,55
zweilous,634,1.4,50.0,blue,quadruped,False,Dark,Dragon,72,85,70,58
zygarde,718,5.0,305.0,green,squiggle,False,Dragon,Ground,108,100,121,95


**Task:** Can you write an equivalent of the `.tail()` function using indexing?

In [98]:
pokemons.iloc[-5:]

Unnamed: 0_level_0,id,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
zoroark,571,1.6,81.1,gray,upright,False,Dark,,60,105,60,105
zorua,570,0.7,12.5,gray,quadruped,False,Dark,,40,65,40,65
zubat,41,0.8,7.5,purple,wings,False,Poison,Flying,40,45,35,55
zweilous,634,1.4,50.0,blue,quadruped,False,Dark,Dragon,72,85,70,58
zygarde,718,5.0,305.0,green,squiggle,False,Dragon,Ground,108,100,121,95


In [100]:
pokemons.sample(5) # Random line

Unnamed: 0_level_0,id,height,weight,color,shape,is baby,type 1,type 2,hp,attack,defense,speed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
drapion,452,1.3,61.5,purple,armor,False,Poison,Dark,70,90,110,95
type-null,772,1.9,120.5,gray,quadruped,False,Normal,,95,95,95,59
scraggy,559,0.6,11.8,yellow,upright,False,Dark,Fighting,50,75,70,48
magmortar,467,1.6,68.0,red,upright,False,Fire,,75,95,67,83
machop,66,0.8,19.5,gray,upright,False,Fighting,,70,80,50,35


**Task (bonus):** Can you write an equivalent to the `sample()` function using indexing (and the `random` module)?

## Summary
In this lesson, we have shown three basic datatypes from `pandas` library:    
* `Series` as a one-dimensional object containing values of the same type
* `DataFrame` as a two-dimensional table composed of several `Series`
* `Index` as a generalized description of how to access `Series` or `DataFrame` elements

In addition, we learned to select columns, rows, and individual values from tables.
In the next lesson we will show what data types (more precisely `dtypes`) can be used in `pandas`, we will start counting and imitating the functions of spreadsheets.

## Exercises

The local zoo is considering investing in a new pavilion dedicated to Pokemons. 
But the zoo's director, Mr. Felix, is not sure if this investment would pay off and what it would all mean for the zoo. Someone advised him to invite you to help (we are not to blame, we swear - authors of the course). The director has compiled a list of questions he would like to know the answer to.

0. (reload data from `pokemon.csv` file)
1. For how many new animals would the zoo need food? The director would like one male and one female of each species (name).
2. The marketing department is going to create new leaflets about Pokemons for zoo visitors. All Pokemons would need information about their height, weight, length, color and type. Is all the information available?
3. The zoo considers it ideal for Pokemon to be delivered gradually, in groups of eight, as listed in the `pokemon.csv` table. Which Pokemon would be in the first, second, and last group?
4. The operation of the zoo also drew attention to the special conditions necessary for the 3 highest Pokemons. In the table `pokemon.csv` they are in positions 207, 320 and 796, but no one remembers which Pokemons it was. What are their names?
5. The director loves Onix. He would like to build a special enclosure for him - for the Rock Pokemons at speeds above 50. Would Onix like it there?
6. The Director would also like to create a section for all Pokemons starting with "i". But dark Pokemon cannot be with normal, fiery with water or grass, and electric with psychic. Will it be possible to create this section? (tip: to display all Pokemons on "i" you will need to have the index sorted alphabetically)