# Using Excel, Python & R



## **1. Sample Dataset**

We'll use this dataset for practice:

| Student | Score |
| ------- | ----- |
| A       | 45    |
| B       | 50    |
| C       | 55    |
| D       | 60    |
| E       | 60    |
| F       | 65    |
| G       | 70    |
| H       | 75    |
| I       | 80    |
| J       | 90    |
| K       | 95    |

---

## **2. Excel Hands-on**

1. **Enter Data**: Paste the scores into a column (e.g., Column B).

2. **Calculate Mean, Median, Variance**:

   * **Mean**: `=AVERAGE(B2:B12)`
   * **Median**: `=MEDIAN(B2:B12)`
   * **Variance**: `=VAR.S(B2:B12)` (for sample variance)

3. **Create Histogram**:

   * Select your data → **Insert → Charts → Histogram**

4. **Create Boxplot**:

   * Excel 2016+: **Insert → Charts → Box & Whisker**

✅ Students can immediately see **central tendency and spread** visually.

---

## **3. Python Hands-on (pandas + numpy + matplotlib)**

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# Dataset
scores = np.array([45,50,55,60,60,65,70,75,80,90,95])
df = pd.DataFrame({'Student': list('ABCDEFGHIJK'), 'Score': scores})

# Mean, Median, Variance
mean_score = np.mean(scores)
median_score = np.median(scores)
variance_score = np.var(scores, ddof=1)  # Sample variance

print("Mean:", mean_score)
print("Median:", median_score)
print("Variance:", variance_score)

# Histogram
plt.hist(scores, bins=5, color='skyblue', edgecolor='black')
plt.title('Histogram of Scores')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

# Boxplot
plt.boxplot(scores, vert=False)
plt.title('Boxplot of Scores')
plt.xlabel('Score')
plt.show()
```

* `np.var(scores, ddof=1)` → calculates **sample variance**, consistent with Excel’s `VAR.S`.

---

## **4. R Hands-on**

```r
# Dataset
scores <- c(45,50,55,60,60,65,70,75,80,90,95)

# Mean, Median, Variance
mean(scores)
median(scores)
var(scores)   # Sample variance

# Histogram
hist(scores, breaks=5, col='skyblue', main='Histogram of Scores', xlab='Score', ylab='Frequency')

# Boxplot
boxplot(scores, horizontal=TRUE, col='lightgreen', main='Boxplot of Scores', xlab='Score')
```

---

## ✅ **Summary of Hands-on Practice**

| Tool   | Key Functions / Steps                                                              |
| ------ | ---------------------------------------------------------------------------------- |
| Excel  | `AVERAGE`, `MEDIAN`, `VAR.S`, Charts → Histogram / Boxplot                         |
| Python | `numpy.mean`, `numpy.median`, `numpy.var`, `matplotlib.pyplot.hist`, `plt.boxplot` |
| R      | `mean()`, `median()`, `var()`, `hist()`, `boxplot()`                               |

This covers **numerical summary + visualization**, perfect for beginners to start analyzing data and spotting trends.

---




# **Fundamentals of Statistics: Hands-on Practice (Excel, Python, R)**

---

## **Dataset for Practice**

| Student | Age | Gender | Score | Income | Education_Level |
| ------- | --- | ------ | ----- | ------ | --------------- |
| A       | 23  | M      | 45    | 50000  | Bachelor        |
| B       | 25  | F      | 50    | 60000  | Master          |
| C       | 30  | M      | 55    | 75000  | PhD             |
| D       | 22  | F      | 60    | 40000  | Bachelor        |
| E       | 27  | F      | 60    | 50000  | Master          |

---

## **Part 1: Central Tendency**

### **Exercise 1:** Calculate **Mean, Median, Mode** of `Score`

#### **Excel**

* Mean: `=AVERAGE(D2:D6)`
* Median: `=MEDIAN(D2:D6)`
* Mode: `=MODE.SNGL(D2:D6)`

#### **Python**

```python
import numpy as np
from scipy import stats

scores = [45,50,55,60,60]
print("Mean:", np.mean(scores))
print("Median:", np.median(scores))
print("Mode:", stats.mode(scores, keepdims=True).mode[0])
```

#### **R**

```r
scores <- c(45,50,55,60,60)
mean(scores)
median(scores)
mode_val <- as.numeric(names(sort(table(scores), decreasing=TRUE)[1]))
mode_val
```

---

## **Part 2: Dispersion**

### **Exercise 2:** Calculate **Range, Variance, Standard Deviation, IQR** of `Age`

#### **Excel**

* Range: `=MAX(B2:B6)-MIN(B2:B6)`
* Variance: `=VAR.S(B2:B6)`
* Standard Deviation: `=STDEV.S(B2:B6)`
* IQR: `=QUARTILE.EXC(B2:B6,3)-QUARTILE.EXC(B2:B6,1)`

#### **Python**

```python
ages = [23,25,30,22,27]
print("Range:", np.max(ages)-np.min(ages))
print("Variance:", np.var(ages, ddof=1))
print("Std Dev:", np.std(ages, ddof=1))
Q1,Q3 = np.percentile(ages,[25,75])
print("IQR:", Q3-Q1)
```

#### **R**

```r
ages <- c(23,25,30,22,27)
range(ages)
var(ages)
sd(ages)
IQR(ages)
```

---

## **Part 3: Visualization**

### **Exercise 3:** Create **Histogram** and **Boxplot** of `Income`

#### **Excel**

* Select column → **Insert → Charts → Histogram**
* Select column → **Insert → Charts → Box & Whisker**

#### **Python**

```python
import matplotlib.pyplot as plt
income = [50000,60000,75000,40000,50000]

plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.hist(income, bins=5, color='skyblue', edgecolor='black')
plt.title('Histogram of Income')

plt.subplot(1,2,2)
plt.boxplot(income, vert=False)
plt.title('Boxplot of Income')
plt.show()
```

#### **R**

```r
income <- c(50000,60000,75000,40000,50000)
hist(income, breaks=5, col='skyblue', main='Histogram', xlab='Income')
boxplot(income, horizontal=TRUE, col='lightgreen', main='Boxplot', xlab='Income')
```

---

## **Part 4: Probability**

### **Exercise 4:** Toss a coin 1000 times; find probability of Heads

#### **Excel**

* Simulate coin: fill column with random numbers → `=IF(RAND()<0.5,"H","T")`
* Count occurrences: `=COUNTIF(A2:A1001,"H")/1000`

#### **Python**

```python
import random
trials = 1000
heads = sum(1 for _ in range(trials) if random.choice(['H','T'])=='H')
print("P(Head) ~", heads/trials)
```

#### **R**

```r
trials <- 1000
coins <- sample(c('H','T'), trials, replace=TRUE)
prop.table(table(coins))
```

---

### **Exercise 5:** Roll a die 1000 times; find P(>4) and P(odd)

#### **Python**

```python
trials = 1000
rolls = [random.randint(1,6) for _ in range(trials)]
print("P(>4) ~", sum(r>4 for r in rolls)/trials)
print("P(odd) ~", sum(r%2!=0 for r in rolls)/trials)
```

#### **R**

```r
rolls <- sample(1:6, 1000, replace=TRUE)
mean(rolls>4)
mean(rolls%%2!=0)
```

---

## **Part 5: Applied Visualization**

### **Exercise 6:** Pie chart for `Education_Level` and Bar chart for `Score`

#### **Excel**

* Pie chart: Select column → **Insert → Pie**
* Bar chart: Select column → **Insert → Bar Chart**

#### **Python**

```python
education = ['Bachelor','Master','PhD','Bachelor','Master']
scores = [45,50,55,60,60]
import pandas as pd
df = pd.DataFrame({'Education':education, 'Score':scores})

df['Education'].value_counts().plot.pie(autopct='%1.1f%%', startangle=90)
plt.show()

df.plot.bar(x='Education', y='Score', color='lightgreen')
plt.show()
```

#### **R**

```r
education <- c('Bachelor','Master','PhD','Bachelor','Master')
scores <- c(45,50,55,60,60)
pie(table(education))
barplot(scores, names.arg=education, col='lightgreen')
```

---

