## Probability 

## Q-1

1. Dice role 1000 times using numpy.random.randint()
- calculate probability of each outcome using frequency distribution

2. Coin toss 500 times.
- Estimate probability of getting head ans tails

3. Use collection.count() or value_counts() to get experimental probability

In [10]:
import numpy as np
from collections import Counter

# Dice
dice = np.random.randint(1,7, 1000)
counts = Counter(dice)
p_of_dice = {no:times/1000 for no,times in counts.items()}
print("Probability of each face in 1000 dice rolls:\n", p_of_dice)

# Coin
coin = np.random.choice(['H','T'], 500)
counts = Counter(coin)
p_of_coin = {side:times/500 for side,times in counts.items()}
print("\nProbability of heads and tails in 500 coin flips:\n", p_of_coin)


Probability of each face in 1000 dice rolls:
 {np.int32(6): 0.159, np.int32(2): 0.157, np.int32(5): 0.17, np.int32(4): 0.193, np.int32(1): 0.159, np.int32(3): 0.162}

Probability of heads and tails in 500 coin flips:
 {np.str_('T'): 0.46, np.str_('H'): 0.54}


## Q-2

1. Simulates two set of data with different mean
2. Perform a t-test using scipy.stats.ttest_ind() to get the p_value
3. P_value Interpretation 

In [None]:
from scipy import stats

# Simulate two datasets with slightly different means
data1 = np.random.normal(loc=50, scale=10, size=100)
data2 = np.random.normal(loc=52, scale=10, size=100)

# Perform independent t-test
t_stat, p_value = stats.ttest_ind(data1, data2)
print("t-statistic:", t_stat)
print("p-value:", p_value)

# Markdown-style interpretation
if p_value < 0.05:
    print("**Interpretation:** p-value is less than 0.05, so we reject the null hypothesis.")
else:
    print("**Interpretation:** p-value is greater than 0.05, so we fail to reject the null hypothesis.")


t-statistic: -1.0385192635098568
p-value: 0.30029492556610876
**Interpretation:** p-value is greater than 0.05, so we fail to reject the null hypothesis.


## Q-3

1. create a sample dataset with scores before and after training program
2. Use paired t_test on a independent t-test for cheking improvement
3. Calculate confidence interval using mean using scipy.stats.sem() and stats.t.interval()
4. Set a significant level (alpha = 0.05) and determine if the null hypothesis should be determine

In [37]:
# from scipy import stats
# import numpy as np

# Simulate scores before and after training
before = np.random.normal(loc=60, scale=8, size=30)
after = before + np.random.normal(loc=5, scale=5, size=30)  # improvement

# Paired t-test
t_stat, p_value = stats.ttest_rel(before, after)
print("Paired t-test p-value:", p_value)

# Confidence interval for mean difference
mean_diff = np.mean(after - before)
sem_diff = stats.sem(after - before)
confidence_interval = stats.t.interval(0.95, len(before)-1, loc=mean_diff, scale=sem_diff)
print("Confidence interval for mean difference:", confidence_interval)

# Decision based on Î± = 0.05
if p_value < 0.05:
    print("**Conclusion:** Reject the null hypothesis. Training likely had a significant effect.")
else:
    print("**Conclusion:** Fail to reject the null hypothesis. No significant improvement detected.")

Paired t-test p-value: 4.919703888938251e-06
Confidence interval for mean difference: (np.float64(3.303106763447601), np.float64(7.113794711309729))
**Conclusion:** Reject the null hypothesis. Training likely had a significant effect.


## Q-4

1. Type 1 Error (False Positive):
    - Rejecting the null hypothesis when it is actually true.  
    - Example: Concluding a drug works when it actually doesn't.
2. Type 2 Error (False Negative):
    - Failing to reject the null hypothesis when it is actually false.  
    - Example: Concluding a drug doesn't work when it actually does.
