In [1]:
import numpy as np 
import pandas as pd

Pandas objects have built-in **mathematical** and **statistical methods** to calculate **summary statistics** or perform **reductions**. These methods:
1. Extract a **single value** (e.g., sum or mean) from a Series.
2. Extract a **Series of values** from rows or columns in a DataFrame.

### Key Features:
- **Handles Missing Data**: Automatically skips `NaN` values (unless specified otherwise).
- **Similar to NumPy Methods**: Adds extra functionality like handling `NaN`.


### Simplified Explanation:
- **Column-wise Operations**: Default behavior (e.g., `sum()`, `mean()`).
- **Row-wise Operations**: Use `axis=1`.
- **Missing Data**: Automatically skipped.

In [2]:
df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5], [np.nan, np.nan], [0.75, -1.3]],
                    index=["a", "b", "c", "d"],
                    columns=["one", "two"])
df

Unnamed: 0,one,two
a,1.4,
b,7.1,-4.5
c,,
d,0.75,-1.3


In [3]:
# calling dataframe sum method returns a series containing column sums
df.sum()

one    9.25
two   -5.80
dtype: float64

In [4]:
# passing axis='columns' or axis=1 sums across the columns instead
df.sum(axis='columns')

a    1.40
b    2.60
c    0.00
d   -0.55
dtype: float64

In pandas, **summary statistics methods** like `sum()` handle missing values (`NaN`) by default using the `skipna=True` option:

### Behavior:
1. **All `NaN` in a row or column**:
   - The result is **0** for methods like `sum()`.
2. **Any non-`NaN` in a row or column**:
   - The result is calculated, **ignoring `NaN`**.
3. **`skipna=False`**:
   - Any presence of `NaN` results in the output being `NaN`.

### Summary:
- Use `skipna=True` (default) to ignore `NaN` while calculating results.
- Set `skipna=False` if you want to treat any `NaN` as invalid for the operation.

In [6]:
df.sum(axis='columns', skipna=False)

a     NaN
b    2.60
c     NaN
d   -0.55
dtype: float64

In [7]:
df.sum(axis='index', skipna=False)

one   NaN
two   NaN
dtype: float64

In [8]:
# some aggregations, like mean, require at least one non-NA value to yield a value result,
df.mean(axis='columns')

a    1.400
b    1.300
c      NaN
d   -0.275
dtype: float64

### Options for reduction methods

| **Option**      | **Purpose**                                            | **Default**       |
|------------------|--------------------------------------------------------|-------------------|
| `axis`          | Operates on columns (`0`) or rows (`1`).                | `axis=0`         |
| `skipna`        | Excludes `NaN` when calculating results.                | `skipna=True`    |
| `level`         | Operates on a specific MultiIndex level.                | `None`           |
| `numeric_only`  | Restricts the operation to numeric data types.          | `None`           |
| `min_count`     | Sets minimum valid values required for the operation.   | `0` (no minimum) |

In [9]:
# some methods, like indxmin and indxmax, returns indirect statistics, like the index value where the minimumn or maximum values are attained:
df.idxmax()

one    b
two    d
dtype: object

In [10]:
df.idxmin()

one    d
two    b
dtype: object

In [11]:
# other methods are accumulations:
df.cumsum()

Unnamed: 0,one,two
a,1.4,
b,8.5,-4.5
c,,
d,9.25,-5.8


In [12]:
# some methods are neither reductions nor accumulations
# describe is one such example, producing multiple summary statistics in one shot

df.describe()
# on non-numeric data, describe produce alternative summary statistics

Unnamed: 0,one,two
count,3.0,2.0
mean,3.083333,-2.9
std,3.493685,2.262742
min,0.75,-4.5
25%,1.075,-3.7
50%,1.4,-2.9
75%,4.25,-2.1
max,7.1,-1.3


In [14]:
obj = pd.Series(['a', 'a', 'b', 'c'] * 4)
obj

0     a
1     a
2     b
3     c
4     a
5     a
6     b
7     c
8     a
9     a
10    b
11    c
12    a
13    a
14    b
15    c
dtype: object

In [16]:
obj.describe()

count     16
unique     3
top        a
freq       8
dtype: object

## Full list of summary statistics and related methods available in pandas for both Series and DataFrame objects

Here is the **full list of summary statistics and related methods** available in pandas for both **Series** and **DataFrame** objects. These methods allow for analyzing and summarizing data.

### Aggregation/Reduction Methods
These methods compute a single value or a summary over the data:

| **Method**         | **Description**                                                                 |
|---------------------|---------------------------------------------------------------------------------|
| `count()`          | Number of non-NA/null observations.                                            |
| `sum()`            | Sum of values (with `skipna=True` by default).                                 |
| `mean()`           | Mean (average) of values.                                                     |
| `median()`         | Median (middle) value of values.                                               |
| `prod()`           | Product of values.                                                            |
| `std()`            | Standard deviation of values.                                                 |
| `var()`            | Variance of values.                                                           |
| `min()`            | Minimum value.                                                                |
| `max()`            | Maximum value.                                                                |
| `idxmin()`         | Index of the minimum value.                                                   |
| `idxmax()`         | Index of the maximum value.                                                   |
| `mode()`           | Most frequent value(s).                                                       |
| `abs()`            | Compute the absolute value.                                                   |


### Cumulative Methods
These methods compute cumulative results over the data:

| **Method**         | **Description**                                                                 |
|---------------------|---------------------------------------------------------------------------------|
| `cumsum()`         | Cumulative sum of values.                                                      |
| `cumprod()`        | Cumulative product of values.                                                  |
| `cummin()`         | Cumulative minimum value.                                                      |
| `cummax()`         | Cumulative maximum value.                                                      |



### Descriptive Statistics
These methods provide detailed summary statistics:

| **Method**         | **Description**                                                                 |
|---------------------|---------------------------------------------------------------------------------|
| `describe()`       | Generate descriptive statistics, including count, mean, std, min, and max.      |
| `quantile()`       | Compute quantiles of values.                                                    |
| `mad()`            | Mean absolute deviation from the mean.                                         |
| `skew()`           | Skewness (asymmetry) of the distribution.                                       |
| `kurt()`           | Kurtosis (tailedness) of the distribution.                                      |
| `sem()`            | Standard error of the mean.                                                    |



### Correlation and Covariance
These methods are used to analyze relationships between columns in a DataFrame:

| **Method**         | **Description**                                                                 |
|---------------------|---------------------------------------------------------------------------------|
| `corr()`           | Compute pairwise correlation between columns.                                   |
| `cov()`            | Compute pairwise covariance between columns.                                    |
| `corrwith()`       | Compute correlation with another Series or DataFrame.                           |

### Boolean Reductions
These methods work with boolean values:

| **Method**         | **Description**                                                                 |
|---------------------|---------------------------------------------------------------------------------|
| `all()`            | Return `True` if all values are `True`.                                         |
| `any()`            | Return `True` if any value is `True`.                                           |



### Index-based Reductions
These methods return the index position of specific values:

| **Method**         | **Description**                                                                 |
|---------------------|---------------------------------------------------------------------------------|
| `idxmin()`         | Return the index of the first minimum value.                                    |
| `idxmax()`         | Return the index of the first maximum value.                                    |


### Counting and Frequency
These methods provide counts and frequencies:

| **Method**         | **Description**                                                                 |
|---------------------|---------------------------------------------------------------------------------|
| `value_counts()`   | Count occurrences of unique values in a Series.                                 |
| `nunique()`        | Count the number of unique values.                                              |
