# <span style="color:#130654; font-family: Helvetica; font-size: 200%; font-weight:700"> Pandas | <span style="font-size: 50%; font-weight:300">Iteration</span>

To use pandas in python import it first by using the following command:

In [2]:
# import pandas
import pandas as pd

The behavior of basic iteration over Pandas objects depends on the type. Basic iteration (for i in object) produces −

|Data Structure| Iteration Type| Produces|
|:------------:|---------------|---------|
|**Series**| Array-like | Values|
|**DataFrame**| Dictionary-like| Column labels|
|**Panel**| Dictionary-like| Item labels|

## <span style="color:#130654">**Iterating over DataFrame**</span>

Normally iterating over a DataFrame using `for loop` will produce `column names`, to iterate over rows of DataFrame following methods can be used:

|Method | Usage|
|:-----:|------|
|**iteritems()** | iterate over the (key,value) pairs. |
|**iterrows()**  | iterate over the rows as (index,series) pairs. |
|**itertuples()**| iterate over the rows as namedtuples. |

To iterate over row, above mentioned methods can be use on dataframe with `For` loop.

Note − Do not try to modify any object while iterating. Iterating is meant for reading and the iterator returns a copy of the original object (a view), thus the changes will not reflect on the original object

<br>

### <span style="color:#130654">Create DataFrame</span>

Creating a dataset using dictionary:

In [15]:
data = {'species': ['mammal', 'mammal', 'fish'],
        'population': [3948, 4000, 6000]}
index = ['tiger', 'fox', 'shark']

Lets create a dataframe with automatic index assigned by pandas.

In [16]:
df = pd.DataFrame(data, index=index)
df

Unnamed: 0,species,population
tiger,mammal,3948
fox,mammal,4000
shark,fish,6000


If we use normal iteration over dataframe then it will product labels of columns.

*Example:*

In [17]:
for col in df:
    print(col)

species
population


<br>

### <span style="color:#130654">iteritems()</span>

Iterates over each column as key, value pair with label as key and column value as a Series object.

Since it iterate over key-value pair, then both needed to be mentioned in for loop.

In [30]:
for label, content in df.items():
    print('label:', label)
    print('content:', content, sep='\n')
    print('\n')

label: species
content:
tiger    mammal
fox      mammal
shark      fish
Name: species, dtype: object


label: population
content:
tiger    3948
fox      4000
shark    6000
Name: population, dtype: int64




<img src="../img/iteritems.png" width=400/>

Each column is iterated separately as a key-value pair in a Series.

<br>

### <span style="color:#130654">iterrows()</span>

iterrows() returns the iterator yielding each index value along with a series containing the data in each row.

In [32]:
for row_index, row in df.iterrows():
    print('index:', row_index)
    print('content:', row, sep='\n')
    print('\n')

index: tiger
content:
species       mammal
population      3948
Name: tiger, dtype: object


index: fox
content:
species       mammal
population      4000
Name: fox, dtype: object


index: shark
content:
species       fish
population    6000
Name: shark, dtype: object




Note − Because iterrows() iterate over the rows, it doesn't preserve the data type across the row.

<br>

### <span style="color:#130654">itertuples()</span>

itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. 

The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values.

In [23]:
for row in df.itertuples():
    print(row)

Pandas(Index='tiger', species='mammal', population=3948)
Pandas(Index='fox', species='mammal', population=4000)
Pandas(Index='shark', species='fish', population=6000)


<br>

----------

## <span style="color:#130654">**Bonus**</span>

### <span style="color:#130654">items()</span>

`items()` method iterate over (column name, Series) pairs.

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.

In [35]:
for label, row in df.items():
    print('label:', label)
    print('content:', row, sep='\n')
    print('\n')

label: species
content:
tiger    mammal
fox      mammal
shark      fish
Name: species, dtype: object


label: population
content:
tiger    3948
fox      4000
shark    6000
Name: population, dtype: int64




**Question:** So what is the difference between `iteritems()` and `items()` since they both are producing similar results?

**Answer:**

`items()` returns a list of tuples and it is time-consuming and memory exhausting whereas, `iteritems()` is an iter-generator method which yield tuples and it is less time consuming and less memory exhausting.

In Python 3, some changes were made and now, items() returns iterators and never builds a list fully. Moreover, in this version iteritems() was removed since items() performed the same function like viewitems() in Python 2.7.