# Lab | Intro to Probability

**Objective**

Welcome to this Intro to Probability lab, where we explore decision-making scenarios through the lens of probability and strategic analysis. In the business world, making informed decisions is crucial, especially when faced with uncertainties. This lab focuses on scenarios where probabilistic outcomes play a significant role in shaping strategies and outcomes. Students will engage in exercises that require assessing and choosing optimal paths based on data-driven insights. The goal is to enhance your skills by applying probability concepts to solve real-world problems.

**Challenge 1**

#### Ironhack Airlines 

Often Airlines sell more tickets than they have seats available, this is called overbooking. Consider the following:
- A plane has 450 seats. 
- Based on historical data we conclude that each individual passenger has a 3% chance of missing it's flight. 

If the Ironhack Airlines routinely sells 460 tickets, what is the chance that they have a seats for all passenger?

In [2]:
# Import necessary libraries
import scipy.stats as stats

In [None]:
""" Approach:
We can use the binomial distribution to model the number of passengers who show up. 
We need to calculate the probability that 450 or fewer passengers show up out of the 460 ticketed passengers."""

# Define the parameters
n = 460  # number of tickets sold
p = 0.97  # probability that a passenger shows up (1 - 0.03)

# Calculate the cumulative probability that 450 or fewer passengers show up
probability = stats.binom.cdf(450, n, p)

# Print the result
print(f"The probability that there are enough seats for all passengers is {probability:.4f}")

The probability that there are enough seats for all passengers is 0.8845


In [4]:
# Define the parameters
n = 460  # number of tickets sold
p = 0.97  # probability that a passenger shows up (1 - 0.03)

# Calculate the cumulative probability that 450 or fewer passengers show up
probability_seats_available = stats.binom.cdf(450, n, p)

# Calculate the probability that more than 450 passengers show up
probability_no_seat = 1 - probability_seats_available

# Print the result
print(f"The probability that a passenger does not have a seat is {probability_no_seat:.4f}")

The probability that a passenger does not have a seat is 0.1155


In [5]:
# Define the parameters for selling 455 tickets
n_455 = 455  # number of tickets sold

# Calculate the cumulative probability that 450 or fewer passengers show up
probability_seats_available_455 = stats.binom.cdf(450, n_455, p)

# Calculate the probability that more than 450 passengers show up
probability_no_seat_455 = 1 - probability_seats_available_455

# Print the results
print(f"The probability that there are enough seats for all passengers when selling 455 tickets is {probability_seats_available_455:.4f}")
print(f"The probability that a passenger does not have a seat when selling 455 tickets is {probability_no_seat_455:.4f}")

The probability that there are enough seats for all passengers when selling 455 tickets is 0.9979
The probability that a passenger does not have a seat when selling 455 tickets is 0.0021


In [9]:
# Import necessary libraries
import scipy.stats as stats

# Define the parameters
seats = 450  # number of seats available
p = 0.97  # probability that a passenger shows up (1 - 0.03)
confidence_level = 0.99  # desired confidence level

# Initialize variables
max_tickets = 460  # start with a higher number of tickets

# Loop to find the maximum number of tickets that can be sold
while max_tickets > seats:
    probability_seats_available = stats.binom.cdf(seats, max_tickets, p)
    if probability_seats_available >= confidence_level:
        break
    max_tickets -= 1

# Print the result
print(f"The maximum number of tickets that can be sold to ensure seating all passengers with {confidence_level*100}% confidence is {max_tickets}")

The maximum number of tickets that can be sold to ensure seating all passengers with 99.0% confidence is 456


**Challenge 2**

#### Ironhack Call Center 

Suppose a customer service representative at a call center is handling customer complaints. Consider the following:
- The probability of successfully resolving a customer complaint on the first attempt is 0.3. 


What is the probability that the representative needs to make at least three attempts before successfully resolving a customer complaint?

In [None]:
""" Approach:
This problem can be modeled using the geometric distribution, 
which describes the number of trials needed for the first success in a series of Bernoulli trials.

The probability that the representative needs at least three attempts is the complement of the probability 
of resolving the complaint in the first two attempts """

# Define the parameters
p = 0.3  # probability of successfully resolving a complaint on the first attempt

# Calculate the probability of resolving the complaint in the first two attempts
probability_first_two_attempts = stats.geom.cdf(2, p)

# Calculate the probability of needing at least three attempts
probability_at_least_three_attempts = 1 - probability_first_two_attempts

# Print the result
print(f"The probability that the representative needs at least three attempts is {probability_at_least_three_attempts:.4f}")

The probability that the representative needs at least three attempts is 0.4900


**Challenge 3**

#### Ironhack Website

Consider a scenario related to Ironhack website traffic. Where:
- our website takes on average 500 visits per hour.
- the website's server is designed to handle up to 550 vists per hour.


What is the probability of the website server being overwhelmed?

In [14]:
"""Approach:
This problem can be modeled using the Poisson distribution, 
which describes the number of events occurring within a fixed interval of time or space.

The probability that the server will be overwhelmed is the probability that the number of visits in an hour exceeds 550. """

# Define the parameters
lambda_ = 500  # average number of visits per hour
capacity = 550  # server capacity in visits per hour

# Calculate the probability that the server will be overwhelmed
probability_overwhelmed = 1 - stats.poisson.cdf(capacity, lambda_)

# Print the result
print(f"The probability that the server will be overwhelmed in a given hour is {probability_overwhelmed:.4f}")

The probability that the server will be overwhelmed in a given hour is 0.0129


What is the probability of being overwhelmed at some point during a day? (consider 24hours)

In [22]:
"""Approach:
To find the probability that the server will be overwhelmed at some point during a 24-hour period, 
we can use the result from Part 1 and apply it to each hour in the 24-hour period. 

We will use the complement rule to calculate the probability that the server is not overwhelmed
 in any of the 24 hours and then subtract this from 1."""

# Define the parameters
lambda_ = 500  # average number of visits per hour
capacity = 550  # server capacity in visits per hour
hours = 24  # number of hours in the period

# Calculate the probability that the server will be overwhelmed in a given hour
probability_overwhelmed_per_hour = 1 - stats.poisson.cdf(capacity, lambda_)

# Calculate the probability that the server will not be overwhelmed in any of the 24 hours
probability_not_overwhelmed_24_hours = (1 - probability_overwhelmed_per_hour) ** hours

# Calculate the probability that the server will be overwhelmed at some point during the 24-hour period
probability_overwhelmed_24_hours = 1 - probability_not_overwhelmed_24_hours

# Print the result
print(f"The probability that the server will be overwhelmed at some point during a 24-hour period is {probability_overwhelmed_24_hours:.4f}")


The probability that the server will be overwhelmed at some point during a 24-hour period is 0.2677


**Challenge 4**

#### Ironhack Helpdesk

Consider a scenario related to the time between arrivals of customers at a service desk.

On average, a customers arrives every 10minutes.

What is the probability that the next customer will arrive within the next 5 minutes?

In [28]:
"""Approach:
This problem can be modeled using the exponential distribution, which describes the time between events in a Poisson process.

The probability that the next customer arrives within a certain time can be calculated 
using the cumulative distribution function (CDF) of the exponential distribution."""

# Define the parameters
lambda_ = 1 / 10  # average rate of arrival (1 arrival every 10 minutes)
time = 5  # time interval in minutes

# Calculate the probability that the next customer arrives within 5 minutes
probability_within_5_minutes = stats.expon.cdf(time, scale=1/lambda_)

# Print the result
print(f"The probability that the next customer arrives within 5 minutes is {probability_within_5_minutes:.4f}")

The probability that the next customer arrives within 5 minutes is 0.3935


If there is no customer for 15minutes, employees can take a 5minutes break.

What is the probability an employee taking a break?

In [None]:
"""Approach:
This problem can be modeled using the exponential distribution. 
We need to calculate the probability that no customer arrives within the next 15 minutes.

The exponential distribution's memoryless property means that the probability of no arrivals
in a specific time period is independent of the past. In this case, we want to calculate 
the probability of no customer arriving in the next 15 minutes.

The survival function (SF) of the exponential distribution is used to determine this probability. The survival function is 
𝑃(𝑇>𝑡) = 𝑒−𝜆𝑡, where 𝜆 is the rate parameter and 𝑡 is the time interval considered."""

# Define the parameters
lambda_ = 1 / 10  # average rate of arrival (1 arrival every 10 minutes)
time_break = 15  # time interval for the break (15 minutes)

# Calculate the probability that no customer arrives within the next 15 minutes
probability_of_break = stats.expon.sf(time_break, scale=1/lambda_)

# Print the result
print(f"The probability that an employee can take a 5-minute break if no customer arrives for 15 minutes is {probability_of_break:.4f}")


The probability that an employee can take a 5-minute break if no customer arrives for 15 minutes is 0.2231


**Challenge 5**

The weights of a certain species of birds follow a normal distribution with a mean weight of 150 grams and a standard deviation of 10 grams. 

- If we randomly select a bird, what is the probability that its weight is between 140 and 160 grams?

In [47]:
"""Approach:
This problem can be modeled using the normal distribution. 
We need to calculate the probability that a bird's weight falls within the range of 140 to 160 grams. 
This can be done using the cumulative distribution function (CDF) of the normal distribution."""

# Define the parameters
mean_weight = 150  # mean weight of the birds in grams
std_dev = 10  # standard deviation of the bird weights in grams

# Define the weight range
lower_bound = 140  # lower bound of the weight range in grams
upper_bound = 160  # upper bound of the weight range in grams

# Calculate the cumulative probability up to the lower bound
probability_lower_bound = stats.norm.cdf(lower_bound, mean_weight, std_dev)

# Calculate the cumulative probability up to the upper bound
probability_upper_bound = stats.norm.cdf(upper_bound, mean_weight, std_dev)

# Calculate the probability that a bird weighs between 140 and 160 grams
probability_between_bounds = probability_upper_bound - probability_lower_bound

# Print the result
print(f"The probability that a randomly selected bird weighs between 140 and 160 grams is {probability_between_bounds:.4f}")

The probability that a randomly selected bird weighs between 140 and 160 grams is 0.6827


**Challenge 6**

If the lifetime (in hours) of a certain electronic component follows an exponential distribution with a mean lifetime of 50 hours, what is the probability that the component fails within the first 30 hours?

In [41]:
"""Approach:
This problem can be modeled using the exponential distribution. 
We need to calculate the probability that the component fails within the first 30 hours. 
This can be done using the cumulative distribution function (CDF) of the exponential distribution."""

# Define the parameters
mean_lifetime = 50  # mean lifetime of the component in hours
lambda_ = 1 / mean_lifetime  # rate parameter (inverse of the mean)
time_failure = 30  # time interval in hours

# Calculate the probability that the component fails within the first 30 hours
probability_failure_30_hours = stats.expon.cdf(time_failure, scale=1/lambda_)

# Print the result
print(f"The probability that the component fails within the first 30 hours is {probability_failure_30_hours:.4f}")

The probability that the component fails within the first 30 hours is 0.4512


Problem:
The lifetime of an electronic component follows an exponential distribution with a mean of 50 hours. What is the probability that the component does not fail within 50 hours?

In [48]:
"""Approach:
This problem can be modeled using the exponential distribution. 
We need to calculate the probability that the component does not fail within 50 hours. 

This can be done using the survival function (SF) of the exponential distribution, 
which gives the probability that the lifetime of the component is greater than a certain value."""

# Define the parameters
mean_lifetime = 50  # mean lifetime of the component in hours
lambda_ = 1 / mean_lifetime  # rate parameter (inverse of the mean)
time_interval = 50  # time interval in hours

# Calculate the probability that the component does not fail within 50 hours
probability_not_fail_50_hours = stats.expon.sf(time_interval, scale=1/lambda_)

# Print the result
print(f"The probability that the component does not fail within 50 hours is {probability_not_fail_50_hours:.4f}")

The probability that the component does not fail within 50 hours is 0.3679
