# Summary Functions and Maps in pandas

## 1. Summary Functions

Summary functions provide statistical or descriptive insights about your data.

### `describe()`
Generates descriptive statistics for numerical or categorical columns.

**Syntax:**
df['column'].describe()

**Example (Numerical Data):**


In [9]:
import seaborn as sns
import pandas as pd
sns.get_dataset_names()
ds= sns.load_dataset('titanic')
df= pd.DataFrame(ds)
df

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [10]:
df.describe() #Full table


Unnamed: 0,survived,pclass,age,sibsp,parch,fare
count,891.0,891.0,714.0,891.0,891.0,891.0
mean,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,0.0,1.0,0.42,0.0,0.0,0.0
25%,0.0,2.0,20.125,0.0,0.0,7.9104
50%,0.0,3.0,28.0,0.0,0.0,14.4542
75%,1.0,3.0,38.0,1.0,0.0,31.0
max,1.0,3.0,80.0,8.0,6.0,512.3292


In [11]:
df.survived.describe()

count    891.000000
mean       0.383838
std        0.486592
min        0.000000
25%        0.000000
50%        0.000000
75%        1.000000
max        1.000000
Name: survived, dtype: float64

### `mean()`, `unique()`, `value_counts()`
- **`mean()`**: Computes the average of numerical data.
- **`median()`**: Computes the median of numerical data.
- **`unique()`**: Returns distinct values in a column.
- **`value_counts()`**: Counts frequency of unique values.
- **`idxmax()`**: find the index (row or column label) of the first occurrence of the maximum value in a Series or DataFrame.

**Syntax:**
df['column'].mean()
df['column'].unique()
df['column'].value_counts()
    **Examples:**

In [16]:
#Mean points for wines
print(df.age.mean()) # Output: 88.44713820775404

#Unique taster names
print(df.embarked.unique()) # Output: ['Roger Voss', 'Michael Schachner', ...]

#Frequency of taster names
print(df.embark_town.value_counts())

s = pd.Series([10, 23, 45, 67, 89, 34])
max_index = s.idxmax()
print(max_index)  # Output: 4

29.69911764705882
['S' 'C' 'Q' nan]
embark_town
Southampton    644
Cherbourg      168
Queenstown      77
Name: count, dtype: int64
4


## 2. Mapping Functions

Mapping transforms data using custom logic.

### `map()`
Applies a function to each element in a Series.

**Syntax:**
df['column'].map(lambda x: transformed_value)

**Example:**

In [13]:
age_mean = df.age.mean()
df.age.map(lambda p: p - age_mean)

#The function here is lambda p: p - age_mean, which means:
#For each value p in the points column, subtract the average (age_mean) from it.

0      -7.699118
1       8.300882
2      -3.699118
3       5.300882
4       5.300882
         ...    
886    -2.699118
887   -10.699118
888          NaN
889    -3.699118
890     2.300882
Name: age, Length: 891, dtype: float64

### `apply()`
Applies a function to each row or column of a DataFrame.

**Syntax:**
df.apply(custom_function, axis='columns' or 'index')


**Example:**

In [14]:
#Subtract mean from age for each row
def remean_age(row):
    row.age = row.age - age_mean #It subtracts the mean age (age_mean) from the value in the 'age' column of that row.
    return row    #It returns the modified row.

df.apply(remean_age, axis='columns')
#The apply() method is used to apply a function to each row (because axis='columns') of the DataFrame.

#For each row, it passes the row to the remean_age function, which subtracts the mean from the 'age' value.

#Does not modify the original DataFrame unless you assign the result back to df:

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,-7.699118,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,8.300882,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,-3.699118,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,5.300882,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,5.300882,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,-2.699118,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,-10.699118,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,-3.699118,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


## Comparing `map()` and `apply()` in pandas

| Feature         | `map()` (Series)                           | `apply()` (DataFrame)                        |
|-----------------|--------------------------------------------|----------------------------------------------|
| Works on        | Series (single column)                     | DataFrame (row-wise or column-wise)          |
| Function input  | Single value (cell)                        | Whole row (as a Series) or column            |
| Use case        | Simple, column-only transforms             | Complex, multi-column or row-wise logic      |
| Output          | Series                                     | DataFrame or Series                          |



## 3. Vectorized Operations

Pandas supports fast operations between Series/DataFrames and scalars.

**Syntax:**
df['column'] - scalar_value
df['col1'] + df['col2']

**Examples:**

In [15]:
#Subtract mean from points (vectorized)
print(df.age - df.age.mean())

#Combine sex and embark_town
print(df.sex + " - " + df.embark_town.fillna('Unknown'))  #dataframe.fillna() or col.fillna() fills nan values.  df.fillna(0, inplace=True) modifies the original DataFrame

0      -7.699118
1       8.300882
2      -3.699118
3       5.300882
4       5.300882
         ...    
886    -2.699118
887   -10.699118
888          NaN
889    -3.699118
890     2.300882
Name: age, Length: 891, dtype: float64
0        male - Southampton
1        female - Cherbourg
2      female - Southampton
3      female - Southampton
4        male - Southampton
               ...         
886      male - Southampton
887    female - Southampton
888    female - Southampton
889        male - Cherbourg
890       male - Queenstown
Length: 891, dtype: object


## Key Notes:
- `map()` and `apply()` return new objects; they **do not modify the original data**.
- Vectorized operations (e.g., `df['col'] + 5`) are faster than `map()`/`apply()`.
- Use `axis='columns'` in `apply()` to process rows, `axis='index'` for columns.