# Data Exploration and Summary

## 1. Data Overview

In [1]:
import pandas as pd

### 1. `df.head()` / `df.tail()`

Used to preview the first or last rows of a DataFrame.

In [2]:
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Gender': ['F', 'M', 'M', 'M', 'F']
})
df

Unnamed: 0,Name,Age,Gender
0,Alice,25,F
1,Bob,30,M
2,Charlie,35,M
3,David,40,M
4,Eve,45,F


In [3]:
df.head(3)

Unnamed: 0,Name,Age,Gender
0,Alice,25,F
1,Bob,30,M
2,Charlie,35,M


In [4]:
df.tail(2)

Unnamed: 0,Name,Age,Gender
3,David,40,M
4,Eve,45,F


### 2. `df.shape`, `df.columns`, `df.index`, `df.dtypes`

* `shape`: Tuple of (rows, columns)
* `columns`: Column names
* `index`: Row index
* `dtypes`: Data types of columns

In [7]:
print(f"Shape: {df.shape}")
print(f"Columns: {df.columns}")
print(f"Index: {df.index}")
print(f"dtypes: \n{df.dtypes}")

Shape: (5, 3)
Columns: Index(['Name', 'Age', 'Gender'], dtype='object')
Index: RangeIndex(start=0, stop=5, step=1)
dtypes: 
Name      object
Age        int64
Gender    object
dtype: object


### 3. `df.info()`

Gives a concise summary of the DataFrame including:

* Number of non-null entries
* Data types
* Memory usage

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   Gender  5 non-null      object
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes


### 4. `df.describe()`

Generates descriptive statistics (by default only numeric columns)

In [12]:
df.describe()

Unnamed: 0,Age
count,5.0
mean,35.0
std,7.905694
min,25.0
25%,30.0
50%,35.0
75%,40.0
max,45.0


In [13]:
# You can include non-numeric columns using:
df.describe(include='all')

Unnamed: 0,Name,Age,Gender
count,5,5.0,5
unique,5,,2
top,Alice,,M
freq,1,,3
mean,,35.0,
std,,7.905694,
min,,25.0,
25%,,30.0,
50%,,35.0,
75%,,40.0,


### 5. `df.memory_usage()`

In [15]:
df.memory_usage()

Index     128
Name       40
Age        40
Gender     40
dtype: int64

### 6. `df.sample()`

In [17]:
df.sample(2) # Random 2 rows

Unnamed: 0,Name,Age,Gender
0,Alice,25,F
3,David,40,M


In [20]:
df.sample(frac=0.6) # 60% of the data

Unnamed: 0,Name,Age,Gender
4,Eve,45,F
1,Bob,30,M
3,David,40,M


<center><b>Thanks</b></center>