# Probability Concepts Practice Problems

This notebook covers various probability concepts including basic probability, conditional probability, Bayes' Theorem, and probability distributions. Each problem is followed by Python code that demonstrates how to solve the problem using real-world datasets and simulations.



## Problem 1: Basic Probability with Dice Rolls

**Problem Statement:**
- Calculate the probability of rolling a 4 on a six-sided die.
- Calculate the probability of rolling an even number on a six-sided die.


In [None]:
import random

# Simulate rolling a die 100,000 times
die_rolls = [random.randint(1, 6) for _ in range(100000)]

# Probability of rolling a 4
probability_of_4 = die_rolls.count(4) / len(die_rolls)

# Probability of rolling an even number
even_numbers = [2, 4, 6]
probability_of_even = sum([die_rolls.count(x) for x in even_numbers]) / len(die_rolls)

print(f"Probability of rolling a 4: {probability_of_4}")
print(f"Probability of rolling an even number: {probability_of_even}")

#Problem 2: Conditional Probability with Cards#

**Problem Statement:**
- Given a standard deck of 52 cards,
- Calculate the probability of drawing a Queen given that the card drawn is a face card (King, Queen, Jack).


In [None]:
# Total number of face cards (King, Queen, Jack)
face_cards = 3 * 4  # 3 face cards in each suit, 4 suits in total

# Number of Queens
queens = 4  # 4 Queens in total

# Conditional probability
conditional_probability = queens / face_cards

print(f"Probability of drawing a Queen given that the card is a face card: {conditional_probability}")

# Problem 3: Bayes' Theorem for Medical Testing

**Problem Statement:**
- A disease affects 1% of the population. A test for the disease is 99% accurate (true positive rate). The false positive rate is 5%. Calculate the probability that a person who tests positive actually has the disease.


In [None]:
# Given values
P_disease = 0.01  # Prior probability of having the disease
P_positive_given_disease = 0.99  # Probability of testing positive given that you have the disease
P_positive_given_no_disease = 0.05  # Probability of testing positive given that you do not have the disease
P_no_disease = 1 - P_disease  # Probability of not having the disease

# Bayes' Theorem
P_disease_given_positive = (P_positive_given_disease * P_disease) / (
    (P_positive_given_disease * P_disease) + (P_positive_given_no_disease * P_no_disease)
)

print(f"Probability of having the disease given a positive test result: {P_disease_given_positive}")

# Problem 4: Probability Distribution - Normal Distribution

**Problem Statement:**
- Generate a normal distribution dataset and calculate the probability of a value being within 1 standard deviation from the mean.


In [None]:
import numpy as np
import scipy.stats as stats

# Generate a normal distribution dataset
mean = 0
std_dev = 1
data = np.random.normal(mean, std_dev, 1000)

# Probability of being within 1 standard deviation from the mean
within_one_std_dev = stats.norm.cdf(1, loc=mean, scale=std_dev) - stats.norm.cdf(-1, loc=mean, scale=std_dev)

print(f"Probability of being within 1 standard deviation from the mean: {within_one_std_dev}")

# Problem 5: Naive Bayes Classifier

**Problem Statement:**
- Use the "SMS Spam Collection" dataset to build a Naive Bayes classifier to classify SMS messages as spam or not spam.

**Dataset Link:**
- [SMS Spam Collection Dataset](https://archive.ics.uci.edu/ml/datasets/sms+spam+collection)

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip"
df = pd.read_csv(url, sep='\\t', names=["label", "message"])

# Preprocess data
df['label'] = df['label'].map({'ham': 0, 'spam': 1})
X_train, X_test, y_train, y_test = train_test_split(df['message'], df['label'], test_size=0.3, random_state=42)

# Vectorize text data
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# Train Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train_vec, y_train)

# Make predictions
y_pred = model.predict(X_test_vec)

# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Naive Bayes classifier: {accuracy}"



# Problem 6: Conditional Probability with Real-World Dataset

**Problem Statement:**
- Use the Titanic dataset to calculate the conditional probability of survival given that the passenger was a female.

**Dataset Link:**
- [Titanic Dataset](https://www.kaggle.com/c/titanic/data)




In [None]:
# Load dataset
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)

# Conditional probability of survival given that the passenger is female
female_passengers = df[df['Sex'] == 'female']
P_survival_given_female = female_passengers['Survived'].mean()


# Problem 7: Working with Poisson Distribution

**Problem Statement:**
- Assume that a call center receives an average of 4 calls per minute. Calculate the probability that exactly 5 calls will be received in a given minute.

In [None]:
import scipy.stats as stats

# Given values
average_calls_per_minute = 4
k = 5  # Number of calls

# Poisson probability
P_k_calls = stats.poisson.pmf(k, average_calls_per_minute)

print(f"Probability of receiving exactly 5 calls in a minute: {P_k_calls}")

## Happy Practicing!

Congratulations on making it this far! Probability is a fundamental concept in data science and machine learning, and mastering it will open up many doors for you in these fields.

Take your time with these problems, experiment with the code, and don't hesitate to tweak things to see how the results change. Remember, the best way to learn is by doing—so dive in and explore these concepts fully.

Happy practicing, and may your learning journey be both productive and enjoyable!
