Start witing importing required libraries:

In [1]:
import numpy as np
import pandas as pd

We will continue with the practice of basic operations in Pandas.
* Boolean comparisons
* Objects comparison
* Descriptive statistics
* Iterations

# Boolean Comparisons

Series and DataFrame have the binary comparison methods ```eq, ne, lt, gt, le```, and ```ge``` whose behavior is vectorized

In [2]:
df = pd.DataFrame({
        'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
        'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
        'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})

In [3]:
df2 = df.copy()
df.gt(df2)

Unnamed: 0,one,two,three
a,False,False,False
b,False,False,False
c,False,False,False
d,False,False,False


In [4]:
df2.ne(df)

Unnamed: 0,one,two,three
a,False,False,True
b,False,False,False
c,False,False,False
d,True,False,False


You can apply the reductions: ```empty, any(), all()```, and ```bool()``` to provide a way to summarize a boolean result.

In [5]:
(df > 0).all()

one      False
two      False
three    False
dtype: bool

In [6]:
(df > 0).any()

one      True
two      True
three    True
dtype: bool

In [7]:
(df > 0).any().any()

True

To evaluate single-element pandas objects in a boolean context, use the method bool():

In [8]:
pd.Series([True]).bool()

True

In [9]:
pd.Series([False]).bool()

False

In [10]:
pd.DataFrame([[True]]).bool()

True

In [11]:
pd.DataFrame([[False]]).bool()

False

# Objects comparison
You can conveniently perform element-wise comparisons when comparing a pandas data structure with a scalar value:

In [12]:
pd.Series(['foo', 'bar', 'baz']) == 'foo'

0     True
1    False
2    False
dtype: bool

Pandas also handles element-wise comparisons between different array-like objects of the same length:

In [13]:
pd.Series(['foo', 'bar', 'baz']) == pd.Index(['foo', 'bar', 'qux']) # beware, comparing Series and Index like this must be of equal length otherwise ValueError

0     True
1     True
2    False
dtype: bool

Trying to compare Index or Series objects of different lengths will create a ValueError:
```python
In [55]: pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo', 'bar'])
ValueError: Series lengths must match to compare
```

Often you may find that there is more than one way to compute the same result. For example, consider `df + df` and `df * 2`. To test that these two computations produce the same result, given the tools shown above, you might imagine using `(df + df == df * 2).all().all()`.
* Try to compare the two operations, `df + df` and `df * 2`, using the technique mentioned above.
    * The result is False but why is that? Let's dive deeper:

In [14]:
(df + df == df * 2).all()

one      False
two       True
three    False
dtype: bool

In [15]:
df + df == df * 2

Unnamed: 0,one,two,three
a,True,True,False
b,True,True,True
c,True,True,True
d,False,True,True


This happens because of the problem mentioned above that
```python
In [59]: np.nan == np.nan
Out[59]: False
```
So, Pandas objects (such as `Series` and `DataFrames`) have an `equals()` method for testing equality, with NaNs in corresponding locations treated as equal.

In [16]:
(df + df).equals(df * 2)

True

Note that the `Series` or `DataFrame` index needs to be in the same order for the equality to be `True`.

# Descriptive statistics
There exists a large number of methods for computing descriptive statistics and other related operations on Series, DataFrame. All of them are vectorized. Most of them are aggregations and produce a lower-dimensional result.

Generally speaking, these methods take an axis as an argument and the axis can be specified by name or integer:

In [17]:
# Aggregation for each column
df.mean(0)

one      0.316441
two     -0.095805
three   -0.447432
dtype: float64

In [18]:
# Aggregation for each index
In [78]: df.mean(1)

a    0.897601
b   -0.322814
c   -0.852630
d    0.477469
dtype: float64

By applying vectorized operations, we can describe various statistical procedures, like standardization (rendering data zero mean and standard deviation 1), very concisely:

In [20]:
ts_stand = (df - df.mean()) / df.std()
ts_stand.std()

one      1.0
two      1.0
three    1.0
dtype: float64

In the picture below, we can find a list of the most popular descriptive statistics in Pandas.

![](images/pandas.png)

# Describe

There is a convenient `describe()` function which computes a variety of summary statistics about a `eries` or the columns of a `DataFrame`:

In [23]:
series = pd.Series(np.random.randn(1000))
series[::2] = np.nan
series.describe()

count    500.000000
mean       0.008757
std        0.943793
min       -2.263300
25%       -0.641516
50%       -0.020364
75%        0.607228
max        2.760496
dtype: float64

In [25]:
frame = pd.DataFrame(np.random.randn(1000, 5),
                     columns=['a', 'b', 'c', 'd', 'e'])
frame.iloc[::2] = np.nan
frame.describe()

Unnamed: 0,a,b,c,d,e
count,500.0,500.0,500.0,500.0,500.0
mean,0.047706,0.034057,-0.048713,0.038453,0.067022
std,1.01987,1.034913,0.995225,0.953712,1.006243
min,-2.557308,-3.185835,-3.273233,-2.645631,-2.847661
25%,-0.705014,-0.561022,-0.716799,-0.603727,-0.567034
50%,0.037158,0.038636,-0.058426,0.082997,0.064137
75%,0.703834,0.660842,0.622192,0.658319,0.791623
max,3.414171,3.245486,2.98675,2.731141,3.006531


For a non-numerical `Series` object, `describe()` will give a simple summary of the number of unique values and the most frequently occurring values:

In [26]:
s = pd.Series(['a', 'a', 'b', 'b', 'a', 'a', np.nan, 'c', 'd', 'a'])
s.describe()

count     9
unique    4
top       a
freq      5
dtype: object

# Index of min/max values


The ```idxmin()``` and ```idxmax()``` functions on ```Series``` and ```DataFrame``` compute the index labels with the minimum and maximum corresponding values:

In [28]:
s1 = pd.Series(np.random.randn(5))
s1

0    0.110168
1   -0.181365
2    0.132368
3   -0.834413
4    1.409799
dtype: float64

In [29]:
s1.idxmin(), s1.idxmax()

(3, 4)

In [30]:
df1 = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])
df1

Unnamed: 0,A,B,C
0,1.309367,0.622767,1.872483
1,1.281045,-0.328631,3.796887
2,0.137384,0.605418,-1.302768
3,-0.533008,-1.327664,1.143275
4,-1.96693,-0.017972,-0.228355


In [31]:
df1.idxmin(axis=0)

A    4
B    3
C    2
dtype: int64

In [33]:
df1.idxmax(axis=1)

0    C
1    C
2    B
3    C
4    B
dtype: object

# Iterations

The behavior of basic iterations over pandas objects depends on the type. When iterating over a Series, it is regarded as array-like, and basic iterations produces the values. DataFrames follow the dict-like convention of iterating over the ```keys``` of the objects.

In short, basic iteration (for i in object) produces:
* Series: values
* DataFrame: column labels

In [36]:
df = pd.DataFrame({'col1': np.random.randn(3),
                     'col2': np.random.randn(3)}, index=['a', 'b', 'c'])
for col in df:
    print(col)

col1
col2


To iterate over the rows of a DataFrame, you can use the following methods:

* ```items()```: to iterate over the (```key, value```) pairs.
* ```iterrows()```: Iterate over the rows of a DataFrame as (index, Series) pairs. This converts the rows to Series objects, which can change the dtypes and has some performance implications.
* ```itertuples()```: Iterate over the rows of a DataFrame as namedtuples of the values. This is a lot faster than iterrows() and is in most cases preferable to use to iterate over the values of a DataFrame.

# items

Consistent with the dict-like interface, ```items()``` iterates through key-value pairs:

* Series: (index, scalar value) pairs
* DataFrame: (column, Series) pairs

In [38]:
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
for label, ser in df.items():
    print(label)
    print(ser)

a
0    1
1    2
2    3
Name: a, dtype: int64
b
0    a
1    b
2    c
Name: b, dtype: object


# iterrows

```iterrows()``` allows you to iterate through the rows of a DataFrame as Series objects. It returns an iterator yielding each index value along with a Series containing the data in each row

In [40]:
for row_index, row in df.iterrows():
        print(row_index, row, sep='\n')

0
a    1
b    a
Name: 0, dtype: object
1
a    2
b    b
Name: 1, dtype: object
2
a    3
b    c
Name: 2, dtype: object


# itertuple

The ```itertuples()``` method will return an iterator yielding a namedtuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values.

In [41]:
for row in df.itertuples():
        print(row)

Pandas(Index=0, a=1, b='a')
Pandas(Index=1, a=2, b='b')
Pandas(Index=2, a=3, b='c')
