# Data Exploration and Summary

- **When you try to apply any method on complete data frame,**
- **it is suggested to check all the columns data is of numeric,**
- **otherwise most of the methods raises error**

## 2. Summary Statistics
These methods help you understand the distribution and variability of data across different columns.

In [2]:
import pandas as pd

In [37]:
data = {
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Sales': [100, 150, 80, 200, 170],
    'Profit': [20, 30, 10, 50, 40]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Product,Sales,Profit
0,A,100,20
1,B,150,30
2,C,80,10
3,D,200,50
4,E,170,40


### 7. `df.mean()`, `df.median()`, `df.mode()`

**Mean** – average value per column

**Median** – middle value when sorted

**Mode** – most frequent value

In [7]:
print(df[['Sales', 'Profit']].mean(), end = '\n\n')    # Sales and Profit mean
print(df[['Sales', 'Profit']].median(), end = '\n\n')  # Sales and Profit median
print(df[['Sales', 'Profit']].mode(), end = '\n\n')    # May return multiple rows if multiple modes

Sales     140.0
Profit     30.0
dtype: float64

Sales     150.0
Profit     30.0
dtype: float64

   Sales  Profit
0     80      10
1    100      20
2    150      30
3    170      40
4    200      50



In [8]:
df['Product'].mode()

0    A
1    B
2    C
3    D
4    E
Name: Product, dtype: object

In [11]:
s = pd.Series([10, 20, 10, 30])
s.mode()

0    10
dtype: int64

In [12]:
s = pd.Series([10, 20, 10, 30, 20])
s.mode()

0    10
1    20
dtype: int64

### 8. `df.min()`, `df.max()`
Returns minimum and maximum values for each column.

In [38]:
print(df.min(), end='\n\n')   # Minimum across numeric columns
print(df.max())   # Maximum across numeric columns

Product     A
Sales      80
Profit     10
dtype: object

Product      E
Sales      200
Profit      50
dtype: object


In [15]:
df['Sales'].min()

80

In [16]:
df['Profit'].max()

50

### 9. `df.std()`, `df.var()`, `df.sem()`

* **Standard Deviation (`std`)** – measure of spread
* **Variance (`var`)** – square of std
* **Standard Error of Mean (`sem`)** – std / sqrt(n)

In [22]:
print(df[['Sales', 'Profit']].std(), end = '\n\n')
print(df[['Sales', 'Profit']].var(), end = '\n\n')
print(df[['Sales', 'Profit']].sem())

Sales     49.497475
Profit    15.811388
dtype: float64

Sales     2450.0
Profit     250.0
dtype: float64

Sales     22.135944
Profit     7.071068
dtype: float64


### 10. `df.quantile()`
Returns values at a given quantile.

In [25]:
df = df[['Sales', 'Profit']]
df

Unnamed: 0,Sales,Profit
0,100,20
1,150,30
2,80,10
3,200,50
4,170,40


In [29]:
print(df.quantile(0.25), end = '\n\n')  # 25th percentile
print(df.quantile(0.5), end = '\n\n')   # 50th percentile (median)
print(df.quantile([0.25, 0.5, 0.75]))

Sales     100.0
Profit     20.0
Name: 0.25, dtype: float64

Sales     150.0
Profit     30.0
Name: 0.5, dtype: float64

      Sales  Profit
0.25  100.0    20.0
0.50  150.0    30.0
0.75  170.0    40.0


### 11. `df.cumsum()`, `df.cumprod()`

* `cumsum()`: Cumulative sum across rows
* `cumprod()`: Cumulative product across rows

In [30]:
print(df[['Sales', 'Profit']].cumsum())
print(df[['Sales', 'Profit']].cumprod())

   Sales  Profit
0    100      20
1    250      50
2    330      60
3    530     110
4    700     150
         Sales    Profit
0          100        20
1        15000       600
2      1200000      6000
3    240000000    300000
4  40800000000  12000000


In [31]:
print(df.cumsum())
print(df.cumprod())

   Sales  Profit
0    100      20
1    250      50
2    330      60
3    530     110
4    700     150
         Sales    Profit
0          100        20
1        15000       600
2      1200000      6000
3    240000000    300000
4  40800000000  12000000


### Additional Notes:

* You can apply summary statistics row-wise using `axis=1`

In [34]:
df

Unnamed: 0,Sales,Profit
0,100,20
1,150,30
2,80,10
3,200,50
4,170,40


In [32]:
df.mean(axis=1)  # Mean per row

0     60.0
1     90.0
2     45.0
3    125.0
4    105.0
dtype: float64

* These methods **automatically ignore NaNs** unless specified.

<center><b>Thanks</b></center>