# Probability
- Probability is a value between zero and one, inclusive
- describing the relative possibility (chance or likelihood) an event will occur

1. An **experiments** is a process that leads to the occurrence of one, and only one, of several possible observations
2. An **outcome** is the particular result of an experiment
3. An **event** is the collection of one or more outcomes of an experiment

## Classical Probability
1. Classical Probability
   - Based on the assuption that the outcomes of an experiment are equally likely
   - Probability of an event = Number of favorable outcomes / Total number of possible outcomes
   - ex: rolling a die => Probability of getting an even number is 3/6 = 1/2
2. Mutually Exclusive
   - if the occurrence of any one event means that none of the others can occur at the same time
   - ex: can't get both a head and tail on one coin toss
3. Independent Events
   - if the occurrence of one event does not affect the occurence of another
   - ex: toss coin 2 time
4. Collectively Exhaustive Events
   - if at leaset one of the events must occur when an experiment is conducted
   - ex: in a horse race, there will be one winner

## Empirical Probability
1. Empirical Probability
   - the probability of an event happening is the fractionof the time similar events happend in the past
   - The empirical approach to probability is based on what is called the Law of Large Numbers
2. Law of Large Numbers
   - Over a large number of trails the empirical probability of an event will approach its true probability

# Counting

In [1]:
import pandas as pd

In [2]:
data_path = 'data/hotel_bookings.csv'
df = pd.read_csv(data_path)
df.sample(3)

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
101344,City Hotel,0,35,2016,November,46,7,1,3,1,...,No Deposit,9.0,,0,Transient,174.25,0,1,Check-Out,2016-11-11
94698,City Hotel,0,2,2016,August,33,7,1,0,2,...,No Deposit,9.0,,0,Transient,98.0,0,2,Check-Out,2016-08-08
54277,City Hotel,1,75,2016,July,29,11,1,1,1,...,No Deposit,9.0,,0,Transient,123.3,0,0,Canceled,2016-05-09


## Rule of Addition
1. If two events A and B are mutually exclusive, the probability of one or the other event's occuring equals the sum of their probabilities.
    - $P(A \cup B) = P(A) + P(B)$
2. The General Rule of Addition
    - If A & B are two events that are not mutually exclusive, then
    - $P(A\cup B) = P(A) + P(B) - P (A \cap B)$

In [5]:
reservation_fdt = df.reservation_status.value_counts()
reservation_fdt

reservation_status
Check-Out    75166
Canceled     43017
No-Show       1207
Name: count, dtype: int64

In [11]:
len_df = len(df)

In [12]:
P_checkout = reservation_fdt['Check-Out']/len_df
P_canceled = reservation_fdt['Canceled']/len_df
P_checkout, P_canceled

(0.6295837172292487, 0.3603065583382193)

In [13]:
P_show = P_checkout + P_canceled
P_noshow = reservation_fdt['No-Show']/len_df
P_show, P_show+P_noshow

(0.9898902755674679, 0.9999999999999999)

## Rule of Multiplication
1. If two events A & B are independent if the occurence of one has no effect on the probability of the occurence of the other
   - $P(A \cap B) = P(A)P(B)$
2. Conditional Probability
   - is the probability of a particular event occurring, given that another event has occurred
   - The probability of the event A occurring given that the event B has occured is written: $P(A|B)$
3. Join Probability
   - is used to find the joint probability that two not independent events will occur
   - $P(A \cap B) = P(A)P(B|A)$

In [10]:
hotel_fdt = df.hotel.value_counts()
hotel_fdt

hotel
City Hotel      79330
Resort Hotel    40060
Name: count, dtype: int64

In [19]:
P_checkout_and_in_city = len(df[(df.hotel=='City Hotel') & (df.reservation_status == 'Check-Out')])/len_df
P_checkout_and_in_city

0.387201608174889

In [22]:
# Probability of customer checkout if they in City Hotel
P_in_city_hotel = hotel_fdt['City Hotel']/len_df
P_in_city_hotel

0.6644610101348521

In [23]:
P_checkout_ifin_city = P_checkout_and_in_city/P_in_city_hotel
P_checkout_ifin_city

0.5827303668221354