## How to calculate summary statistics


https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html

In [1]:
import pandas as pd

In [2]:
titanic = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv')

In [3]:
titanic.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


In [4]:
titanic['Age'].mean()

29.69911764705882

`Operations in general exclude missing data and operate across rows by default.`



In [5]:
# median age and ticket fare price of the Titanic passengers
titanic[['Age', 'Fare']].median()

Age     28.0000
Fare    14.4542
dtype: float64

The statistic applied to multiple columns of a DataFrame is calculated for each numeric column.


In [6]:
titanic[['Age', 'Fare']].describe()

Unnamed: 0,Age,Fare
count,714.0,891.0
mean,29.699118,32.204208
std,14.526497,49.693429
min,0.42,0.0
25%,20.125,7.9104
50%,28.0,14.4542
75%,38.0,31.0
max,80.0,512.3292


---

Instead of the predefined statistics, __specific combinations of aggregating statistics__ for given columns can be defined using the 

`DataFrame.agg() method`:



In [8]:
titanic.agg(
    {'Age': ['min', 'max', 'median', 'skew'],
     'Fare': ['min', 'max', 'median', 'mean']}
)

Unnamed: 0,Age,Fare
min,0.42,0.0
max,80.0,512.3292
median,28.0,14.4542
skew,0.389108,
mean,,32.204208


---
---


## Aggregating statistics grouped by category


In [9]:
# What is the average age for male versus female Titanic passengers?
titanic.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [10]:
titanic['Sex'].unique()

array(['male', 'female'], dtype=object)

In [11]:
titanic[['Sex', 'Age']].groupby('Sex').mean()

Unnamed: 0_level_0,Age
Sex,Unnamed: 1_level_1
female,27.915709
male,30.726645
