In [1]:
# Q1. What is an ensemble technique in machine learning?
# Ensemble techniques in machine learning involve combining multiple individual models to create a stronger,
# more robust model. The idea is to leverage the diversity among individual models to improve overall 
# predictive performance. The two main types of ensemble techniques are bagging and boosting.

In [2]:
# Q2. Why are ensemble techniques used in machine learning?
# Ensemble techniques are used in machine learning for several reasons:

# Improved Accuracy: Combining predictions from multiple models often leads to better accuracy compared to individual models.
# Robustness: Ensembles are less prone to overfitting and can handle noisy data more effectively.
# Stability: Ensembles are more stable, as they can compensate for the weaknesses of individual models.
# Versatility: Ensemble methods can be applied to various types of models, making them versatile across different machine learning algorithms.

In [3]:
# Q3. What is bagging?
# Bagging (Bootstrap Aggregating): Bagging is an ensemble technique where multiple instances of the same
# learning algorithm are trained on different random subsets of the training data. These subsets are created 
# by sampling with replacement (bootstrap sampling). The final prediction is typically an average (for regression) or a majority vote (for classification) of the predictions made by individual models.

In [4]:
# Q4. What is boosting?
# Boosting: Boosting is another ensemble technique where weak learners (models that perform slightly better
# than random chance) are combined to form a strong learner. Unlike bagging, boosting assigns weights to
# training instances and adjusts them during the training process to give more importance to misclassified
# instances. Boosted models are trained sequentially, and each subsequent model corrects the errors of the 
# previous ones.

In [5]:
# Q5. What are the benefits of using ensemble techniques?
# Improved Performance: Ensembles often outperform individual models, leading to better predictive accuracy.
# Robustness: Ensembles are more robust to noise and outliers in the data.
# Generalization: They enhance the generalization ability of models by reducing overfitting.
# Versatility: Ensemble methods can be applied to various types of models.
# Model Interpretability: Ensembles can provide insights into the importance of different features.

In [6]:
# Q6. Are ensemble techniques always better than individual models?
# While ensemble techniques generally improve performance, they may not always be better than individual models.
# It depends on factors such as the quality of the base models, the diversity among them, and the nature of the 
# data. In some cases, a well-tuned individual model might perform as well as or better than an ensemble.

In [7]:
# Q7. How is the confidence interval calculated using bootstrap?
# A confidence interval using bootstrap involves resampling the dataset with replacement to create multiple 
# bootstrap samples. The confidence interval is then constructed from the distribution of a statistic (e.g.,
# mean, median) calculated on these bootstrap samples. The interval is typically defined by percentiles of 
# the distribution (e.g., 2.5th and 97.5th percentiles for a 95% confidence interval).

In [8]:
# Q8. How does bootstrap work, and what are the steps involved in bootstrap?
# Bootstrap:

# Step 1: Randomly sample data points with replacement from the original dataset to create a bootstrap sample.
# Step 2: Perform the analysis or computation of interest on the bootstrap sample.
# Step 3: Repeat steps 1 and 2 a large number of times (e.g., thousands of times) to create multiple bootstrap samples.
# Step 4: Calculate the desired statistic (e.g., mean, median) for each bootstrap sample.
# Step 5: Construct the confidence interval using percentiles of the distribution of the calculated statistic.

In [9]:
import numpy as np




In [10]:
# Given data
sample_mean = 15  # mean height of the sample
sample_std = 2    # standard deviation of the sample
sample_size = 50  # size of the sample
num_bootstrap_samples = 10000  # number of bootstrap samples


In [11]:
# Generate bootstrap samples
bootstrap_samples = np.random.normal(loc=sample_mean, scale=sample_std, size=(num_bootstrap_samples, sample_size))

# Calculate the mean height for each bootstrap sample
bootstrap_sample_means = np.mean(bootstrap_samples, axis=1)

# Calculate the 95% confidence interval
confidence_interval = np.percentile(bootstrap_sample_means, [2.5, 97.5])

# Display the result
print(f"95% Confidence Interval for the Population Mean Height: {confidence_interval}")


95% Confidence Interval for the Population Mean Height: [14.43788035 15.55985683]
