# Table of Contents

1. [Estimation (Point)](#Estimation-(Point))
2. [Estimation (Interval)](#Estimation-(Interval))
3. [Hypothesis Testing](#Hypothesis-Testing)

# Estimation (Point) in Inferential Statistics

Estimation is a fundamental concept in inferential statistics, used to make predictions or inferences about a population based on sample data. Point estimation involves using sample data to calculate a single value (known as a point estimate) that serves as the best estimate of an unknown population parameter (e.g., population mean, population proportion).

#### Mathematical Formula

The formula for estimating the population mean ($\mu$) using the sample mean ($\bar{x}$) is:

$$\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i$$

where $n$ is the sample size, and $x_i$ are the individual sample values.

#### Explanation of Steps

1. **Collect a Sample**: Randomly select a sample of $n$ observations from the population.
2. **Calculate the Sample Mean**: Use the sample data to calculate the sample mean ($\bar{x}$), which serves as the point estimate of the population mean ($\mu$).
3. **Use the Sample Mean as the Point Estimate**: The calculated sample mean is used as the best estimate of the population mean.

#### Business Scenario: Estimating Average Daily Active Users in a Software Application

Imagine you're a data analyst at a software technology company, and you're tasked with estimating the average daily active users (DAU) for a new application. The company has launched the app recently, and you want to estimate the average DAU to forecast server load and guide marketing strategies.

To accomplish this, you decide to collect data on the daily active users for a sample of 30 days after the app's launch.

#### Python Code Example

The following Python code demonstrates how to perform point estimation for this scenario using a randomly created pandas DataFrame to simulate the sample data of daily active users.


In [21]:
import pandas as pd
import numpy as np

# Simulate sample data for daily active users (DAU)
np.random.seed(42)
sample_size = 30
dau_sample = np.random.randint(1000, 5000, size=sample_size)

# Create a pandas DataFrame
df = pd.DataFrame(dau_sample, columns=['daily_active_users'])

# Calculate the sample mean (point estimate for the average DAU)
sample_mean = df['daily_active_users'].mean().astype(int)

print(f"The point estimate for the average daily active users is: {sample_mean} users.")

The point estimate for the average daily active users is: 3048 users.


#### Interpretation

The calculated sample mean provides us with a point estimate of the average daily active users (DAU) for the new application. This estimate can be used to make informed decisions about server capacity planning and to tailor marketing strategies to engage more users.

By using point estimation, we can derive meaningful insights from sample data, allowing for efficient and informed decision-making in business contexts, especially in the dynamic field of software technology.


# Estimation (Interval) in Inferential Statistics

Interval estimation provides a range (interval) of values within which the parameter is expected to lie. This interval is calculated from the sample data and provides an estimate of the parameter with a certain level of confidence (e.g., 95%).

#### Mathematical Formula

The formula for a confidence interval for a population mean, assuming a large sample size or known population standard deviation, is:

$$\bar{x} \pm Z \times \frac{\sigma}{\sqrt{n}}$$

where:
- $\bar{x}$ is the sample mean,
- $Z$ is the Z-score corresponding to the desired confidence level (e.g., 1.96 for 95% confidence),
- $\sigma$ is the population standard deviation (or an estimate from the sample if the population standard deviation is unknown),
- $n$ is the sample size.

#### Business Scenario: Estimating Average Daily Active Users with a Confidence Interval

Given the same business scenario as before, we now wish to provide an interval estimate for the average DAU to understand the range within which the true average DAU likely falls.

#### Python Code Example


In [27]:
from scipy.stats import norm
import math

# Assuming a 95% confidence level
z_score = norm.ppf(0.975) # Two-tailed Z-score for 95% confidence
std_dev = df['daily_active_users'].std() # Sample standard deviation
margin_of_error = z_score * (std_dev / np.sqrt(sample_size))

confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)
floored_confidence_interval = (math.floor(confidence_interval[0]), math.floor(confidence_interval[1]))

print(f"The 95% confidence interval for the average daily active users is: {floored_confidence_interval}")

The 95% confidence interval for the average daily active users is: (2643, 3452)


#### Interpretation

This confidence interval provides a range within which we are 95% confident that the true average DAU lies. It gives us not just an estimate but also an idea of the precision of our estimate and the variability in the sample data.

# Hypothesis Testing


Hypothesis testing is a method of making decisions using data, whether from a controlled experiment or an observational study.