### Question 1: What is the difference between descriptive statistics and inferential statistics? Explain with examples.

**Answer:**

**1. Descriptive Statistics**  
- Definition: Methods used to summarize, organize, and describe the main features of a dataset.  
- They do not make predictions or generalizations beyond the data.  
- Focus is on presenting information in a simple form (tables, graphs, averages, percentages).  

**Examples:**  
- A teacher calculates the average marks of her class.  
- A company summarizes its employees’ age distribution.  
- A cricket player’s batting average across 10 matches.  

**2. Inferential Statistics**  
- Definition: Methods used to make predictions, inferences, or generalizations about a population based on a sample.  
- Uses probability, hypothesis testing, and estimation.  
- Involves uncertainty and confidence levels.  

**Examples:**  
- A political survey of 1,000 voters is used to predict the outcome for millions.  
- A scientist tests a drug on a sample group to infer its effectiveness.  
- A company uses one branch’s sales data to predict company-wide performance.

### Question 2: What is sampling in statistics? Explain the differences between random and stratified sampling.

**Answer:**  

**Definition:** Sampling is selecting a subset (sample) from a population to study and make conclusions about the population.  

**Why Sampling?**  
- Studying the whole population is impractical.  
- A properly chosen sample can provide reliable insights.  

**Random Sampling:**  
- Every member has an equal chance of selection.  
- Methods: lottery system, random number generator.  
- Advantage: Simple, unbiased if done properly.  
- Limitation: May not represent all subgroups equally.  

**Stratified Sampling:**  
- Population divided into subgroups (strata) based on characteristics (gender, income, etc.).  
- Samples are randomly taken from each subgroup proportionally.  
- Advantage: Ensures representation of all key groups.  
- Limitation: More complex to organize.

### Question 3: Define mean, median, and mode. Explain why these measures of central tendency are important.

**Answer:**  

- **Mean:** The sum of all values divided by the number of values.  
  Example: (5+10+15+20)/4 = 12.5  

- **Median:** The middle value when data is ordered.  
  Example: Data [6,8,10,12] → Median = (8+10)/2 = 9  

- **Mode:** The value occurring most frequently.  
  Example: [2,4,4,6,6,6,8] → Mode = 6  

**Importance:**  
- These measures summarize the dataset into a single value, providing insights into the "center" of the data.

### Question 4: Explain skewness and kurtosis. What does a positive skew imply about the data?

**Answer:**  

- **Skewness:** Measures asymmetry of a distribution.  
  - Symmetrical (0): Mean = Median = Mode.  
  - Positive skew: Tail longer on right, Mean > Median > Mode.  
  - Negative skew: Tail longer on left, Mean < Median < Mode.  

- **Kurtosis:** Measures "tailedness" of distribution.  
  - Mesokurtic (≈3): Normal distribution.  
  - Leptokurtic (>3): Heavy tails, sharp peak.  
  - Platykurtic (<3): Flat curve, light tails.  

**Positive skew implies:** Data has a long right tail, many low/medium values and a few high values (e.g., income distribution).

In [None]:
# Question 5: Compute mean, median, and mode

from statistics import mean, median, mode

numbers = [12, 15, 12, 18, 19, 12, 20, 22, 19, 19, 24, 24, 24, 26, 28]

mean_value = mean(numbers)
median_value = median(numbers)
mode_value = mode(numbers)

print("Numbers:", numbers)
print("Mean:", mean_value)
print("Median:", median_value)
print("Mode:", mode_value)

In [None]:
# Question 6: Compute covariance and correlation

import numpy as np

list_x = [10, 20, 30, 40, 50]
list_y = [15, 25, 35, 45, 60]

x = np.array(list_x)
y = np.array(list_y)

cov_matrix = np.cov(x, y, bias=False)
cov_xy = cov_matrix[0, 1]

corr_matrix = np.corrcoef(x, y)
corr_xy = corr_matrix[0, 1]

print("List X:", list_x)
print("List Y:", list_y)
print("Covariance:", cov_xy)
print("Correlation Coefficient:", corr_xy)

In [None]:
# Question 7: Boxplot and Outliers

import matplotlib.pyplot as plt
import numpy as np

data = [12, 14, 14, 15, 18, 19, 19, 21, 22, 22, 23, 23, 24, 26, 29, 35]

plt.boxplot(data, vert=True, patch_artist=True, notch=True)
plt.title("Boxplot of Data")
plt.ylabel("Values")
plt.show()

Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

outliers = [x for x in data if x < lower_bound or x > upper_bound]

print("Q1:", Q1)
print("Q3:", Q3)
print("IQR:", IQR)
print("Lower Bound:", lower_bound)
print("Upper Bound:", upper_bound)
print("Outliers:", outliers)

In [None]:
# Question 8: Advertising Spend vs Sales

import numpy as np

advertising_spend = [200, 250, 300, 400, 500]
daily_sales = [2200, 2450, 2750, 3200, 4000]

x = np.array(advertising_spend)
y = np.array(daily_sales)

cov_matrix = np.cov(x, y, bias=False)
cov_xy = cov_matrix[0, 1]

corr_matrix = np.corrcoef(x, y)
corr_xy = corr_matrix[0, 1]

print("Advertising Spend:", advertising_spend)
print("Daily Sales:", daily_sales)
print("Covariance:", cov_xy)
print("Correlation Coefficient:", corr_xy)

In [None]:
# Question 9: Survey Scores Histogram

import matplotlib.pyplot as plt
import numpy as np

survey_scores = [7, 8, 5, 9, 6, 7, 8, 9, 10, 4, 7, 6, 9, 8, 7]

mean_score = np.mean(survey_scores)
median_score = np.median(survey_scores)
std_dev = np.std(survey_scores, ddof=1)
mode_score = max(set(survey_scores), key=survey_scores.count)

print("Mean:", mean_score)
print("Median:", median_score)
print("Standard Deviation:", std_dev)
print("Mode:", mode_score)

plt.hist(survey_scores, bins=range(4, 12), edgecolor='black', color='skyblue')
plt.title("Histogram of Customer Satisfaction Scores")
plt.xlabel("Survey Scores")
plt.ylabel("Frequency")
plt.xticks(range(4, 11))
plt.show()