# pandas
pandas is a fast, powerful, flexible and easy to use open source **data analysis and manipulation tool**, built on top of the Python programming language.

pandas makes working with “relational” or “labeled” data both easy and intuitive.

pandas blends the high-performance, array-computing ideas of NumPy with the flexible data manipulation capabilities of spreadsheets and relational databases (such as
SQL). 

### What Is Data Science?
There’s a joke that says a data scientist is someone who knows more
statistics than a computer scientist and more computer science than a
statistician

### NumPy
NumPy, short for Numerical Python, has long been a cornerstone of numerical computing in Python. 
It provides the data structures, algorithms, and library glue needed for most scientific applications involving numerical data in Python. 
NumPy contains, among other things:
- A fast and efficient multidimensional array object `ndarray` (The "nd" in ndarray stands for "n-dimensional")
- Functions for performing element-wise computations with arrays or mathematical operations between arrays
- Tools for reading and writing array-based datasets to disk
- Linear algebra operations, Fourier transform, and random number generation
- A mature C API to enable Python extensions and native C or C++ code to access NumPy’s data structures and computational facilities

One of the reasons NumPy is so important for numerical computations in Python is because it is designed for **efficiency on large arrays of data**. There are a number of reasons for this:
- NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C lan‐
guage can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.
- NumPy operations perform complex computations on entire arrays without the need for Python for loops.

While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or **heterogeneous data**. 


## Install
pandas can be installed via pip from PyPI.

`pip install pandas`

### Import Conventions
The Python community has adopted a number of naming conventions for commonly used modules:

In [166]:
import pandas as pd

# pandas Datastructures

## Index
pandas’s Index objects are responsible for holding the axis labels and other metadata (like the axis name or names). 
Index objects are immutable and thus can’t be modified by the user.
DataFrame is a tabular, column-oriented <font color="Coral">data structure with both row (index) and column labels</font>

## A.Series
A Series is a one-dimensional array-like object containing a sequence of values (of similar types to NumPy types) and an associated array of data labels, called its index.
(Like a column in the table)

Note that the visual display of a Series is just **plain text**, as opposed to the nicely styled table for DataFrames. You will also see the data type or dtype of the Series.

### Creating Series

1. Series can be instantiated from a list:
```python
    calling_codes = [90, 44, 1, 81, 86]
    pd.Series(calling_codes)

```
- Since we did not specify an index for the data, a default one consisting of the integers 0 through N-1:
```
    0    90
    1    44
    2     1
    3    81
    4    86
    dtype: int64
```
- We can specify index in the constructor:
```python
    calling_codes = [90, 44, 1, 81, 86]
    labels = ['TR', 'UK', 'US', 'JP', 'CN']
    pd.Series(calling_codes, index=labels)

```
```
    TR    90
    UK    44
    US     1
    JP    81
    CN    86
    dtype: int64
```

- The passed index is a list of axis labels.

2. Series can be instantiated from dicts:
```python
    calling_codes = {'TR': 90, 'UK': 44, 'US': 1, 'JP': 81, 'CN': 86}
    pd.Series(calling_codes)
```
- Note that the dictionary already is the combination of labels and calling_codes lists above.
```
    TR    90
    UK    44
    US     1
    JP    81
    CN    86
    dtype: int64
```





In [167]:
# Series can be instantiated from a list:
# Since we did not specify an index for the data, a default one consisting of the integers 0 through N-1 

calling_codes = [90, 44, 1, 81, 86]
pd.Series(calling_codes)

0    90
1    44
2     1
3    81
4    86
dtype: int64

In [168]:
# Series can be instantiated from a list:
# We can specify an index:

calling_codes = [90, 44, 1, 81, 86]
labels = ['TR', 'UK', 'US', 'JP', 'CN']
pd.Series(calling_codes, index=labels)

TR    90
UK    44
US     1
JP    81
CN    86
dtype: int64

In [169]:
# Series can be instantiated from dicts:

calling_codes = {'TR': 90, 'UK': 44, 'US': 1, 'JP': 81, 'CN': 86}
pd.Series(calling_codes)

TR    90
UK    44
US     1
JP    81
CN    86
dtype: int64

## B. DataFrame
When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean, and process your data. 

In pandas, a data table is called a **DataFrame**.

Lets say we have country data like this:
```
  ISO2  code            name     capital
  --------------------------------------
    TR    90          Turkey      Ankara
    UK    44  United Kingdom      London
    US     1   United States  Washington
    JP    81           Japan       Tokyo
    CN    86           China     Beijing
```


### Creating DataFrame

1 . Using 2D List:
  - regular 2d lists can be visualized as a matrix - a list of rows, where is row is a list.
  - note that data itself has no information about axis labels.
```python
[
    [90, 'Turkey', 'Ankara'], 
    [44, 'United Kingdom', 'London'], 
    [1, 'United States', 'Washington'], 
    [81, 'Japan', 'Tokyo'], 
    [86, 'China', 'Beijing']
]
```
2. Using List of dictionaries [{}, {}, {}]
  - a single row in a table is like a JSON representation of an object - a dictionary.
  - in each dictionary, have information about column names, which are the <font color="Coral">keys</font>.
  
  ```python
        country = {
          "ISO2"    : "TR",
          "call"    : 90,
          "name"    : "Turkey",
          "capital" : "Ankara"
        }
  ```
  
  3. Create DataFrame from a dictionary, Column Index
   - For the plural form, think about the same JSON, but this time, the value is not a scalar - its a list.
   ```python
        countries = {
          "ISO2"    : ["TR", "UK", "US", "JP", "CN"]
          "call"    : [90, 44, 1, 81, 86]
          "name"    : ["Turkey", "United Kingdom", "United States", "Japan", "China"]
          "capital" : ["Ankara", "London", "Washington DC", "Tokyo", "Beijing"]
        }
  ```

  - version 2: Instead of list as a value, we can use a dict with index values (PK):
  ```python
        countries = {
          'call'    : {'TR': 90, 'UK': 44, 'US': 1, 'JP': 81, 'CN': 86},
          'name'    : {'TR': 'Turkey', 'UK': 'United Kingdom', 'US': 'United States','JP': 'Japan', 'CN': 'China'},
          'capital' : {'TR': 'Ankara', 'UK': 'London', 'US': 'Washington DC','JP': 'Tokyo', 'CN': 'Beijing'}
        }
  ```

 4. Create DataFrame from a dictionary, Row Index
   - We can think of a row as a heterogenous data structure, with a combination of columns identifed by an index (PK, row label)
   ```python
        {
          'TR': {'call': 90, 'name': 'Turkey','capital': 'Ankara'},
          'UK': {'call': 44, 'name': 'United Kingdom','capital': 'London'},
          'US': {'call': 1, 'name': 'United States','capital': 'Washington DC'},
          'JP': {'call': 81, 'name': 'Japan','capital': 'Tokyo'},
          'CN': {'call': 999, 'name': 'China','capital': 'Beijing'}
        }
   ```




In [170]:
# Create DataFrame from 2D List:

""" 
    0               1           2
0  90          Turkey      Ankara
1  44  United Kingdom      London
2   1   United States  Washington
3  81           Japan       Tokyo
4  86           China     Beijing
"""

countries = [[90, 'Turkey', 'Ankara'], [44, 'United Kingdom', 'London'], [1, 'United States', 'Washington'], [81, 'Japan', 'Tokyo'], [86, 'China', 'Beijing']]
df = pd.DataFrame(countries)

# print df
print(df)

    0               1           2
0  90          Turkey      Ankara
1  44  United Kingdom      London
2   1   United States  Washington
3  81           Japan       Tokyo
4  86           China     Beijing


In [171]:
# Note that previous the previous df had default integer labels for columns 
# If you specify a sequence of columns, the DataFrame’s columns will be arranged in that order (columns=)
# Often it will be desirable to create a Series with an index identifying each data point with a label (index=)
country_codes = pd.DataFrame(
    countries,
    index=['TR', 'UK', 'US', 'JP', 'CN'],
    columns=['call', 'name', 'capital']
)

# print df
print(country_codes)

    call            name     capital
TR    90          Turkey      Ankara
UK    44  United Kingdom      London
US     1   United States  Washington
JP    81           Japan       Tokyo
CN    86           China     Beijing


In [172]:
# Create DataFrame from a List of dictionaries:
# Note that in each dictionary, have information about column names, which are the 'keys'.
country1 = { "ISO2": "TR", "call": 90, "name": "Turkey", "capital": "Ankara"}
country2 = { "ISO2": "UK", "call": 44, "name": "United Kingdom", "capital": "London"}
country3 = { "ISO2": "US", "call": 1, "name": "United States", "capital": "Washington DC"}
country4 = { "ISO2": "JP", "call": 81, "name": "Japan", "capital": "Tokyo"}
country5 = { "ISO2": "CN", "call": 86, "name": "China", "capital": "Beijing"}

countries = [country1, country2, country3, country4, country5]
df = pd.DataFrame(countries)

# print df
print(df)


  ISO2  call            name        capital
0   TR    90          Turkey         Ankara
1   UK    44  United Kingdom         London
2   US     1   United States  Washington DC
3   JP    81           Japan          Tokyo
4   CN    86           China        Beijing


In [173]:
# Create DataFrame from a dictionary, Column Index
# For the plural form, think about the same JSON, but this time, the value is not a scalar - its a list.
# Note that this dictionary has information about columns, but no information about index (no row labels): default integer index from 0 to n-1
# Note that PK column (ISO2) is just a regular column, not specified to be an index.
countries = {
     "ISO2"    : ["TR", "UK", "US", "JP", "CN"],
     "call"   : [90, 44, 1, 81, 86],
     "name"    : ["Turkey", "United Kingdom", "United States", "Japan", "China"],
     "capital" : ["Ankara", "London", "Washington DC", "Tokyo", "Beijing"]
}

df = pd.DataFrame(countries)

# print df
print(df)

  ISO2  call            name        capital
0   TR    90          Turkey         Ankara
1   UK    44  United Kingdom         London
2   US     1   United States  Washington DC
3   JP    81           Japan          Tokyo
4   CN    86           China        Beijing


In [174]:
# Create DataFrame from a dictionary, Column Index (orient="columns")
# We can think of a row as a heterogenous data structure, with a combination of columns identifed by an index (PK, row label)
# Note that we have both columns and index - columns are keys, and index values are the inner keys. (row labels).

countries = {
    'call': {'TR': 90, 'UK': 44, 'US': 1, 'JP': 81, 'CN': 86},
    'name': {'TR': 'Turkey', 'UK': 'United Kingdom', 'US': 'United States','JP': 'Japan', 'CN': 'China'},
    'capital': {'TR': 'Ankara', 'UK': 'London', 'US': 'Washington DC','JP': 'Tokyo', 'CN': 'Beijing'},
}

# note .from_dict() constructor
df = pd.DataFrame.from_dict(countries, orient="columns")

# print df
print(df)

    call            name        capital
TR    90          Turkey         Ankara
UK    44  United Kingdom         London
US     1   United States  Washington DC
JP    81           Japan          Tokyo
CN    86           China        Beijing


In [175]:
# Create DataFrame from a dictionary, Row Index (orient="index")
# We can think of a row as a heterogenous data structure, with a combination of columns identifed by an index (PK, row label)
# Note that we have both columns and index - index are keys, and column values are the inner keys.

countries = {
    'TR': {'call': 90, 'name': 'Turkey','capital': 'Ankara'},
    'UK': {'call': 44, 'name': 'United Kingdom','capital': 'London'},
    'US': {'call': 1, 'name': 'United States','capital': 'Washington DC'},
    'JP': {'call': 81, 'name': 'Japan','capital': 'Tokyo'},
    'CN': {'call': 999, 'name': 'China','capital': 'Beijing'}
}

# note .from_dict() constructor
df = pd.DataFrame.from_dict(countries, orient="index")

# print df
print(df)

    call            name        capital
TR    90          Turkey         Ankara
UK    44  United Kingdom         London
US     1   United States  Washington DC
JP    81           Japan          Tokyo
CN   999           China        Beijing


In [176]:
# Example: Create DataFrame from a dictionary (plural form)

lastnames = ['Booker', 'Grey', 'Johnson', 'Jenkins', 'Smith']
emails = ['bo@example.com', 'gr@example.com', 'jo@example.com', 'je@example.com', 'sm@example.com']
usernames = ['booker12', 'grey07', 'johnson81', 'jenkins46', 'smith79']

# A dictionary where keys are the "column names" and values are the lists:
users_dict = {'LastName': lastnames, 'Email': emails, 'Username': usernames}

# Create a DataFrame from a Python dict:
df_users = pd.DataFrame(users_dict)

print(df_users)


  LastName           Email   Username
0   Booker  bo@example.com   booker12
1     Grey  gr@example.com     grey07
2  Johnson  jo@example.com  johnson81
3  Jenkins  je@example.com  jenkins46
4    Smith  sm@example.com    smith79


In [177]:
# Example: Create DataFrame from a dictionary (plural form)

scores_gryffindor = {
    'Name': ['Ron', 'Harry', 'Hermione'],
    'Math': [65, 60, 69],
    'Sci': [65, 60, 69],
    'Hist': [65, 60, 69],
    'Econ': [65, 60, 69]
}

df_scores_gryffindor = pd.DataFrame(scores_gryffindor)
print(df_scores_gryffindor)


       Name  Math  Sci  Hist  Econ
0       Ron    65   65    65    65
1     Harry    60   60    60    60
2  Hermione    69   69    69    69


In [178]:
# Example: Create DataFrame from a dictionary, Row Index (orient="index")

scores_gryffindor_rowindexaskey = {
    'Harry': {
        'Math': 60,
        'Sci': 70,
        'Hist': 80,
        'Econ': 90
    },
    'Ron': {
        'Math': 65,
        'Sci': 75,
        'Hist': 85,
        'Econ': 95
    },
    'Hermione': {
        'Math': 69,
        'Sci': 79,
        'Hist': 89,
        'Econ': 99
    }
}

df = pd.DataFrame.from_dict(scores_gryffindor_rowindexaskey, orient='index')
print(df)


          Math  Sci  Hist  Econ
Harry       60   70    80    90
Ron         65   75    85    95
Hermione    69   79    89    99


## Importing Data
pandas features a number of functions for reading tabular data as a DataFrame object.  (csv, excel, sql, json, parquet,…). How to work with these data sources is provided by function with the prefix:
- import data: `read_*`
- store data : `to_*` 

```python
    df = pd.read_csv("data/users.csv")
```

- We can specify the column which contains row labels (index):
```python
    df = pd.read_csv("data/brics.csv", index_col=0)
```


### Working with missing data 
Handling missing values is an important and frequently nuanced part of the file parsing process. Missing data is usually either not present (empty string) or marked by some sentinel value. By default, pandas uses a set of commonly occurring sentinels,
such as NA and NULL

The `pd.isnull()` and `pd.notnull()` functions in pandas should be used to detect missing data.
Series also has these as instance methods, ie. seriesname.isnull()


In [179]:
df = pd.read_csv("data/users.csv")
print(df)

       Username                   Email    Id First name Last name Active
0      blu3FisH      rachel@example.com  9012     Rachel    Booker    Yes
1     GreenFr0g   laura@yourcompany.com  2070      Laura      Grey    Yes
2      blackd0g   craig@yourcompany.com  4081      Craig   Johnson     No
3     jenkins46        mary@example.com  9346       Mary   Jenkins    Yes
4       smith79       jamie@example.com  5079      Jamie     Smith    Yes
5        john00               js@co.com   303       John     Smith     No
6   BlackTurkey    jhalprin@example.com   304        Jim   Halprin    Yes
7      BlueHawk      tjones@example.com   305     Teresa     Jones    Yes
8     GreenTree    tomjones@example.com   306      Tommy     Jones    Yes
9    OrangeFish  greggjones@example.com   307      Gregg     Jones    Yes
10      RedBoat   dthompson@example.com   308     Daniel  Thompson    Yes


In [180]:
# We can specify the column which contains row labels (index):
brics = pd.read_csv("data/brics.csv", index_col=0)
print(brics)

           country    capitol    area  population
code                                             
BR          Brazil   Brasilia   8.516      200.40
RU          Russia     Moscow  17.100      143.50
IN           India  New Delhi   3.286     1252.00
CH           China    Beijing   9.597     1357.00
SA    South Africa   Pretoria   1.221       52.98


In [181]:
# Missing values
# str: missing(NaN), space(empty), "" (NaN)
# numerics: missing(NaN), space(empty), negative
df = pd.read_csv("data/missing_values.csv")
# print(brics)

# boolean array representing isnull of a column:
# null_lastnames = pd.isnull(brics['Lastname']) # built-in function
null_lastnames = df['Lastname'].isnull()    # object method

df[null_lastnames]   # filter by column indexing

Unnamed: 0,Desc,Lastname,Quiz,Midterm,Final,Grade
0,Empty string,,70,70,70,B
1,Missing,,77,77,77,B
2,Sentinel NA,,77,77,77,B


### head(), tail() and info()
To view a small sample of a Series or DataFrame object, use the head() and tail() methods. The default number of elements to display is five, but you may pass a custom number.

In [182]:
df.head()

Unnamed: 0,Desc,Lastname,Quiz,Midterm,Final,Grade
0,Empty string,,70.0,70,70,B
1,Missing,,77.0,77,77,B
2,Sentinel NA,,77.0,77,77,B
3,space char,,88.0,88,88,A
4,Missing,Alfalfa,,80,90,A


In [183]:
df.tail(3)

Unnamed: 0,Desc,Lastname,Quiz,Midterm,Final,Grade
7,zero,Mike,0.0,50,50,D
8,negative,Alfred,-60.0,60,80,60
9,sentinel NaN,John,,50,50,


In [184]:
brics.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, BR to SA
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     5 non-null      object 
 1   capitol     5 non-null      object 
 2   area        5 non-null      float64
 3   population  5 non-null      float64
dtypes: float64(2), object(2)
memory usage: 200.0+ bytes


### Attributes and underlying data



`shape`: gives the axis dimensions of the object, consistent with ndarray
(entries, columns)

In [185]:
brics.shape

(5, 4)

In [186]:
brics.columns

Index(['country', 'capitol', 'area', 'population'], dtype='object')

## Summary statistics
Basic statistics (mean, median, min, max, counts…) are easily calculable.

describe() shows a quick statistic summary of your data

In [187]:
city_pop = pd.read_csv("data/population_tr.csv", sep=';')
city_pop.describe()

Unnamed: 0,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
count,82.0,82.0,82.0,82.0,82.0,82.0,82.0,82.0,82.0,82.0,...,82.0,82.0,82.0,82.0,82.0,82.0,82.0,82.0,82.0,82.0
mean,1578768.0,1600077.0,1619557.0,1638713.0,1658786.0,1679525.0,1700731.0,1721616.0,1744320.0,1769788.0,...,1844570.0,1869948.0,1895022.0,1920513.0,1946704.0,1970988.0,2000095.0,2028171.0,2039375.0,2065373.0
std,7182483.0,7280900.0,7371050.0,7459777.0,7552731.0,7648774.0,7746996.0,7843821.0,7946958.0,8064122.0,...,8413460.0,8531728.0,8647138.0,8765726.0,8884413.0,8996369.0,9125385.0,9258787.0,9307607.0,9429832.0
min,75221.0,75517.0,75709.0,75868.0,76050.0,76246.0,76444.0,76609.0,75675.0,74710.0,...,75797.0,75620.0,80607.0,78550.0,82193.0,80417.0,82274.0,84660.0,81910.0,83645.0
25%,271939.5,274563.8,276847.2,278026.0,279218.8,279768.8,280003.0,280102.0,279918.5,281631.8,...,282107.5,283987.0,285154.2,287849.0,290063.2,288831.8,291243.2,289810.0,289932.8,295251.5
50%,483484.5,487549.5,490201.5,492244.0,490303.5,488386.5,486403.5,484127.5,484911.0,492681.5,...,511833.0,517204.0,519505.0,519260.5,525019.0,529419.5,538070.0,537479.0,539655.0,549800.5
75%,864218.2,875172.8,885107.8,894849.0,905071.8,915641.8,926446.8,937059.5,953584.0,965956.8,...,992545.0,1006584.0,1026159.0,1038490.0,1052617.0,1065313.0,1080791.0,1097082.0,1109579.0,1128873.0
max,64729500.0,65603160.0,66401850.0,67187250.0,68010220.0,68860540.0,69729970.0,70586260.0,71517100.0,72561310.0,...,75627380.0,76667860.0,77695900.0,78741050.0,79814870.0,80810520.0,82003880.0,83155000.0,83614360.0,84680270.0


In [188]:
series1 = pd.Series([4, 7, -5, 3])


### Selecting Columns
The Python and NumPy indexing operators [] and attribute operator . provide quick and easy access to pandas data structures across a wide range of use cases.

A column in a DataFrame can be retrieved as a Series either by dict-like notation or by attribute:

- Selecting a single column as a Series 
<br>To select a single column of data, simply put the name of the column in-between the brackets.
```
brics['capital']
brics.capital:
```

In [190]:
brics['capitol']    # Series

code
BR     Brasilia
RU       Moscow
IN    New Delhi
CH      Beijing
SA     Pretoria
Name: capitol, dtype: object

You can get the array representation and index object of the Series via its `values` and `index` attributes, respectively:

In [191]:
type(brics['capitol'])  # pandas.core.series.Series


pandas.core.series.Series

In [192]:
brics['capitol'].index      # Index(['BR', 'RU', 'IN', 'CH', 'SA'], dtype='object', name='code')


Index(['BR', 'RU', 'IN', 'CH', 'SA'], dtype='object', name='code')

In [None]:
brics['country'].values     # array(['Brazil', 'Russia', 'India', 'China', 'South Africa'], dtype=object)

array(['Brazil', 'Russia', 'India', 'China', 'South Africa'], dtype=object)

Selecting multiple columns as a DataFrame: `brics[['country', 'capital']]`
<br>Pass a list as an argument to df index: [ [] ]

In [194]:
brics[['country', 'capitol']]

Unnamed: 0_level_0,country,capitol
code,Unnamed: 1_level_1,Unnamed: 2_level_1
BR,Brazil,Brasilia
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing
SA,South Africa,Pretoria


In [195]:
type(brics[['country', 'capitol']])  # pandas.core.frame.DataFrame


pandas.core.frame.DataFrame

<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-rbsA2VBKQhggwzxH7pPCaAqO46MgnOM80zW1RWuH61DGLwZJEdK2Kadq2F9CUG65" crossorigin="anonymous">

<style>
    code{
      color: red;
    }
    
    .mark{
        color: #e83e8c;
        font-weight: 400;
        font-size: 87.5%;
        line-height: 1.65;
        word-wrap: break-word;
        box-sizing: border-box;
        font-family: "SFMono-Regular",Menlo,Consolas,Monaco,Liberation Mono,Lucida Console,monospace;

        padding: 0.1rem 0.25rem;
        background-color: #e1e1e1;
        border: 1px solid #f5f5f5;
        border-radius: 0.25rem;
    }
</style>

### Selecting Row(s)
The `.loc` indexer can select subsets of rows or columns. 
Most importantly, it only selects data by the **LABEL** of the rows and columns.

```python
    df.loc[row(s), column(s)]
```

The primary function of indexing with `[]` (a.k.a. `__getitem__` for those familiar with implementing class behavior in Python) is selecting out lower-dimensional slices. The following table shows return type values when indexing pandas objects with []:

<table class="table">
  <thead>
    <tr>
      <th scope="col">Object Type</th>
      <th scope="col">Selection</th>
      <th scope="col">Return Type</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <!--<th scope="row">Series</th>-->
      <td>Series</td>
      <td><span class="mark">series[label]</span></td>
      <td>scalar value</td>
    </tr>
    <tr>
      <!--<th scope="row">Series</th>-->
      <td>DataFrame</td>
      <td><span class="mark">frame[colname]</span></td>
      <td><span class="mark">Series</span> corresponding to colname</td>
    </tr>
  </tbody>
</table>

Allowed inputs are:
1. A single label              : 'TR'
2. A list or array of labels   : ['TR', 'UK']
3. A slice object with labels  : 'TR' : 'UK'
4. A boolean array             : [True, False]

#### Slicing
- Slicing with labels behaves differently than normal Python slicing in that the end‐point is inclusive.
- Note that when slicing, no braces inside .loc[:,:]

#### .loc[] Indexer Syntax:
<table>
  <thead>
    <tr>
      <th scope="col">Selection</th>
      <th scope="col">Syntax</th>
      <th scope="col">Return</th>
    </tr>
  </thead>
    <tbody>
    <tr>
      <!--<th scope="row">Series</th>-->
      <td>Select a single row</td>
      <td><span class="mark">.loc['RU']<span></td>
      <td>`Series`</td>
    </tr>
    <tr>
      <td>Select multiple rows</td>
      <td><span class="mark">.loc[['IN', 'RU']]<span></td>
      <td><code>DataFrame</code></td>
    </tr>
    <tr>
      <td>Select a range of rows</td>
      <td><span class="mark">.loc['IN':'SA']<span></td>
      <td><code>DataFrame</code></td>
    </tr>
    <tr>
      <td>Select a cell (row and column)</td>
      <td><span class="mark">.loc['RU', 'capitol']<span></td>
      <td>Scalar value</td>
    </tr>
    <tr>
      <td>Select 2 rows and 2 columns</td>
      <td><span class="mark">.loc[['IN', 'RU'],['country', 'capital']]<span></td>
      <td><code>DataFrame</code></td>
    </tr>
    <tr>
      <td>Select a column (all of the rows)</td>
      <td><span class="mark">.loc[:, ['capital']]<span></td>
      <td><code>DataFrame</code></td>
    </tr>
    <tr>
      <td>Select 2 columns (all of the rows)</td>
      <td><span class="mark">.loc[:, ['country', 'capital']]<span></td>
      <td><code>DataFrame</code></td>
    </tr>

  </tbody>
</table>

In [196]:
brics.loc['RU']

country       Russia
capitol       Moscow
area            17.1
population     143.5
Name: RU, dtype: object

In [197]:
brics.loc[['RU', 'IN']]


Unnamed: 0_level_0,country,capitol,area,population
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0


In [None]:
brics.loc['IN':'SA']    # Note that .loc includes the last value with slice notation!

Unnamed: 0_level_0,country,capital,area,population
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0
SA,South Africa,Pretoria,1.221,52.98


In [199]:
brics.loc['RU', 'capitol']

'Moscow'

In [200]:
brics.loc[['IN', 'RU'],['country', 'capitol']]

Unnamed: 0_level_0,country,capitol
code,Unnamed: 1_level_1,Unnamed: 2_level_1
IN,India,New Delhi
RU,Russia,Moscow


In [201]:
# Select (all rows of) 2 columns
brics.loc[:, ['country', 'capitol']]

Unnamed: 0_level_0,country,capitol
code,Unnamed: 1_level_1,Unnamed: 2_level_1
BR,Brazil,Brasilia
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing
SA,South Africa,Pretoria


In [None]:
brics.loc[:, 'capital'] # Series


code
BR     Brasilia
RU       Moscow
IN    New Delhi
CH      Beijing
SA     Pretoria
Name: capital, dtype: object

In [208]:
brics.loc[:, ['capitol']]
# type(brics.loc[:, ['capitol']])   # DataFrame


Unnamed: 0_level_0,capitol
code,Unnamed: 1_level_1
BR,Brasilia
RU,Moscow
IN,New Delhi
CH,Beijing
SA,Pretoria


In [216]:
brics.loc[:, 'capitol':'population']

Unnamed: 0_level_0,capitol,area,population
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BR,Brasilia,8.516,200.4
RU,Moscow,17.1,143.5
IN,New Delhi,3.286,1252.0
CH,Beijing,9.597,1357.0
SA,Pretoria,1.221,52.98


### Selecting subsets with .iloc
The .iloc indexer is very similar to .loc but only uses integer locations to make its selections.

pandas provides a suite of methods in order to get purely integer based indexing. (0-based)
```
df.iloc[0]
```
- Selecting a single row with .iloc: `.iloc[0]` 
- Selecting multiple rows with .iloc: `.iloc[[0, 2, 4]]`
- Use slice notation to select a range of rows: `.iloc[3:5]`
- Selecting 2 rows and 2 columns: `.iloc[[0,4], [0, 2]]`

In [None]:
brics.iloc[0]   # Series

country         Brazil
capital       Brasilia
area             8.516
population       200.4
Name: BR, dtype: object

In [None]:
brics.iloc[[0, 2, 4]]   # remember, don't do df.iloc[5, 2, 4]


Unnamed: 0_level_0,country,capital,area,population
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
BR,Brazil,Brasilia,8.516,200.4
IN,India,New Delhi,3.286,1252.0
SA,South Africa,Pretoria,1.221,52.98


In [None]:
brics.iloc[2:4]


Unnamed: 0_level_0,country,capital,area,population
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


In [None]:
brics.iloc[[0, 4], [0, 2]]

Unnamed: 0_level_0,country,area
code,Unnamed: 1_level_1,Unnamed: 2_level_1
BR,Brazil,8.516
SA,South Africa,1.221


## Boolean indexing
Another common operation is the use of boolean vectors to filter the data. 
- Using a boolean vector to index a Series works exactly as in a NumPy ndarray.
- The operators are: | for or, & for and, and ~ for not.

```python
    # column:
    calling_codes = {'TR':90, 'UK':44, 'US':1, 'JP':81, 'CN':86}

    # Create Series from dictionary:
    s = pd.Series(calling_codes)

```
- Note that when we write a condition on a scalar, it returns boolean value.
We can write the condition on a Series also, then it returns a boolean vector!
```python
    filter1 = s > 80    # Series[bool]
```
- On a DataFrame object, to write a condition on a series, first we get the column by index operator []:
```python
    filter2 = df['column'] > 1    # Series[bool]
```

Using a boolean vector as a filter works as a mask:
```python
    s[s>80]
```

<pre>
    TR    90            TR     True                     TR    90
    UK    44            UK    False                     JP    81
    US     1     x      US    False                 =   CN    86
    JP    81            JP     True                     dtype: int64
    CN    86            CN     True
    dtype: int64        dtype: bool
</pre>

Using a boolean vector to index a Dataframe: 
```python
    df[df['column'] > 1000]`
```

<pre>
                        BR    False                         country    capitol   area  population
                        RU    False                     code                                      
       df        x      IN     True                 =   IN     India  New Delhi  3.286      1252.0
                        CN     True                     CH     China    Beijing  9.597      1357.0
                        SA    False
    dtype: int64        Name: population, dtype: bool
</pre>

In [248]:
# boolean vector to index a Series:

# row:
country = {'ISO':'TR', 'call':90, 'capitol':'Ankara', 'name':'Turkey'}
# column:
call_dict = {'TR':90, 'UK':44, 'US':1, 'JP':81, 'CN':86}

# Create Series from dictionary:
calling_codes = pd.Series(calling_codes)

# print(calling_codes)

# Note that when we write a condition on a scalar, it returns boolean value.
# We can write the condition on a Series also, then it returns a boolean vector!
filter1 = calling_codes > 80    # Series[bool]

print(filter1)

""" 
TR     True
UK    False
US    False
JP     True
CN     True
dtype: bool
"""

# Using a boolean vector to index a Series:
df_filtered = calling_codes[calling_codes > 80]

print(df_filtered)
""" 
TR    90
JP    81
CN    86
dtype: int64
"""

x = 0   # to stop last """ """ from printing

TR     True
UK    False
US    False
JP     True
CN     True
dtype: bool
TR    90
JP    81
CN    86
dtype: int64


In [221]:
filter = brics['population'] >= 1000.0

high_pop = brics[filter]
high_pop_country = brics.loc[filter, 'country']

print(high_pop)
print(high_pop_country)


     country    capitol   area  population
code                                      
IN     India  New Delhi  3.286      1252.0
CH     China    Beijing  9.597      1357.0
code
IN    India
CH    China
Name: country, dtype: object
