# Before your start:

    Read the README.md file
    Comment as much as you can and use the resources (README.md file)
    Happy learning!

In this exercise, we  will generate random numbers from the continuous disributions we learned in the lesson. There are two ways to generate random numbers:

1. Using the numpy library 
1. using the Scipy library 

Use either or both of the lbraries in this exercise.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import math
import seaborn as sns

## Uniform Distribution

To generate uniform random numbers between any two given values using scipy, we can either use the following code or the code that we have
discussed in class:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generating random numbers from a uniform distribution using numpy
a, b = 0, 10  # Lower and upper bounds
uniform_random_numbers = np.random.uniform(a, b, 1000)  # Generate 1000 random numbers

# Plotting the distribution
sns.histplot(uniform_random_numbers, bins=20, kde=False, color='blue', edgecolor='black')
plt.title('Uniform Distribution (a=0, b=10)')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()


**Your task:**

1. Based on the code above, write a function that generates uniformly distributed random numbers. There are several requirements for your function:
    * It should accept 3 parameters: 
        * `bottom` - the lower boundary of the generated numbers
        * `ceiling` - the upper boundary of the generated numbers
        * `count` - how many numbers to generate
    * It should return an array of uniformly distributed random numbers

1. Call your function with 2 sets of params below:
    * bottom=10, ceiling=15, count=100
    * bottom=10, ceiling=60, count=1,000

1. Plot the uniform distributions generated above using histograms, where x axis is the value and y axis is the count. Let the histogram's number of bins be 10.

Your output should look like below:

![uniform distribution](ud.png)

In [None]:
import numpy as np

def generate_uniform_random_numbers(bottom, ceiling, count):
    """
    Generate uniformly distributed random numbers.
    
    Parameters:
        bottom (float): Lower boundary of the range.
        ceiling (float): Upper boundary of the range.
        count (int): Number of random numbers to generate.
    
    Returns:
        numpy.ndarray: Array of uniformly distributed random numbers.
    """
    return np.random.uniform(bottom, ceiling, count)


In [None]:
# Generate random numbers for two parameter sets
data1 = generate_uniform_random_numbers(bottom=10, ceiling=15, count=100)
data2 = generate_uniform_random_numbers(bottom=10, ceiling=60, count=1000)


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Plot the distributions
plt.figure(figsize=(12, 5))

# Plot for first parameter set
plt.subplot(1, 2, 1)
sns.histplot(data1, bins=10, kde=False, color='blue', edgecolor='black')
plt.title('Uniform Distribution (10, 15, 100)')
plt.xlabel('Value')
plt.ylabel('Frequency')

# Plot for second parameter set
plt.subplot(1, 2, 2)
sns.histplot(data2, bins=10, kde=False, color='green', edgecolor='black')
plt.title('Uniform Distribution (10, 60, 1000)')
plt.xlabel('Value')
plt.ylabel('Frequency')

# Show plots
plt.tight_layout()
plt.show()


How are the two distributions different?

The first distribution (10, 15, 100) is more concentrated over a smaller range, resulting in fewer bins and lower overall variation in the values. The second distribution (10, 60, 1000) is spread over a wider range, leading to greater variation and higher frequencies due to the larger sample size.

## Normal Distribution

1. In the same way in the Uniform Distribution challenge, write a function that generates normally distributed random numbers.
1. Generate 1,000 normally distributed numbers with the average of 10 and standard deviation of 1
1. Generate 1,000 normally distributed numbers with the average of 10 and standard deviation of 50
2. Plot the distributions of the data generated.

Expected output:

![normal distribution](nd.png)

In [None]:
from scipy.stats import norm 

In [None]:
import numpy as np

def generate_normal_random_numbers(mean, std_dev, count):
    """
    Generate normally distributed random numbers.
    
    Parameters:
        mean (float): Mean of the normal distribution.
        std_dev (float): Standard deviation of the normal distribution.
        count (int): Number of random numbers to generate.
    
    Returns:
        numpy.ndarray: Array of normally distributed random numbers.
    """
    return np.random.normal(mean, std_dev, count)


In [None]:
# Generate 1,000 random numbers with different parameters
data1 = generate_normal_random_numbers(mean=10, std_dev=1, count=1000)
data2 = generate_normal_random_numbers(mean=10, std_dev=50, count=1000)


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Plot the distributions
plt.figure(figsize=(12, 5))

# Plot for the first dataset
plt.subplot(1, 2, 1)
sns.histplot(data1, bins=20, kde=True, color='blue', edgecolor='black')
plt.title('Normal Distribution (Mean=10, StdDev=1)')
plt.xlabel('Value')
plt.ylabel('Frequency')

# Plot for the second dataset
plt.subplot(1, 2, 2)
sns.histplot(data2, bins=20, kde=True, color='green', edgecolor='black')
plt.title('Normal Distribution (Mean=10, StdDev=50)')
plt.xlabel('Value')
plt.ylabel('Frequency')

# Show the plots
plt.tight_layout()
plt.show()


How are the two distributions different?

## Normal Distribution of Real Data

In this challenge we are going to take a look the real data. We will use vehicles.csv file for this exercise

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = pd.read_csv(r'C:\Users\harid\Downloads\vehicles.csv')

First import vehicles.csv.
Then plot the histograms for the following variables:

1. Fuel Barrels/Year

In [None]:
# Plot histogram
sns.histplot(data['Fuel Barrels/Year'], bins=20, kde=True, color='blue', edgecolor='black')
plt.title('Fuel Barrels/Year Distribution')
plt.xlabel('Fuel Barrels/Year')
plt.ylabel('Frequency')
plt.show()


2. CO2 Emission Grams/Mile 

In [None]:
# Plot histogram
sns.histplot(data['CO2 Emission Grams/Mile'], bins=20, kde=True, color='green', edgecolor='black')
plt.title('CO2 Emission Grams/Mile Distribution')
plt.xlabel('CO2 Emission Grams/Mile')
plt.ylabel('Frequency')
plt.show()


3. Combined MPG

In [None]:
# Plot histogram
sns.histplot(data['Combined MPG'], bins=20, kde=True, color='red', edgecolor='black')
plt.title('Combined MPG Distribution')
plt.xlabel('Combined MPG')
plt.ylabel('Frequency')
plt.show()


Which one(s) of the variables are nearly normally distributed? How do you know?

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

# Variables to check
variables = ['Fuel Barrels/Year', 'CO2 Emission Grams/Mile', 'Combined MPG']

# Plot histograms with KDE
for var in variables:
    sns.histplot(data[var], bins=20, kde=True, color='blue', edgecolor='black')
    plt.title(f'{var} Distribution')
    plt.xlabel(var)
    plt.ylabel('Frequency')
    plt.show()


In [None]:
# Anderson-Darling Test for normality
for var in variables:
    result = stats.anderson(data[var], dist='norm')
    print(f'{var}: Test Statistic={result.statistic}')
    for i, cv in enumerate(result.critical_values):
        if result.statistic < cv:
            print(f'{var} is nearly normally distributed (significance level {result.significance_level[i]}%)')
        else:
            print(f'{var} is NOT normally distributed (significance level {result.significance_level[i]}%)')


None of them are normally ditributed. 

## Exponential Distribution

1. Using `numpy.random.exponential`, create a function that returns a list of numbers exponentially distributed with the mean of 10. 

1. Use the function to generate two number sequences with the size of 10 and 100.

1. Plot the distributions as histograms with the nubmer of bins as 100.

Your output should look like below:

![exponential distribution](ed.png)

In [None]:
import numpy as np

def generate_exponential_random_numbers(mean, size):
    """
    Generate a list of numbers exponentially distributed with a given mean.
    
    Parameters:
        mean (float): Mean of the exponential distribution.
        size (int): Number of random numbers to generate.
    
    Returns:
        numpy.ndarray: Array of exponentially distributed random numbers.
    """
    return np.random.exponential(scale=mean, size=size)


In [None]:
# Generate two sequences with mean=10 and sizes 10 and 100
data1 = generate_exponential_random_numbers(mean=10, size=10)
data2 = generate_exponential_random_numbers(mean=10, size=100)


In [None]:
import matplotlib.pyplot as plt

# Plot the distributions
plt.figure(figsize=(12, 5))

# Plot for the first sequence
plt.subplot(1, 2, 1)
plt.hist(data1, bins=100, color='blue', edgecolor='black')
plt.title('Exponential Distribution (Size=10, Mean=10)')
plt.xlabel('Value')
plt.ylabel('Frequency')

# Plot for the second sequence
plt.subplot(1, 2, 2)
plt.hist(data2, bins=100, color='green', edgecolor='black')
plt.title('Exponential Distribution (Size=100, Mean=10)')
plt.xlabel('Value')
plt.ylabel('Frequency')

# Show the plots
plt.tight_layout()
plt.show()


How are the two distributions different?

The mean changes, so the distribution changes as well. 

## Exponential Distribution of Real Data

Suppose that the amount of time one spends in a bank is exponentially distributed with mean as 10 minutes (i.e. λ = 1/10). What is the probability that a customer will spend less than fifteen minutes in the bank? 

Write a code in python to solve this problem

In [None]:
import math

# Parameters
mean = 10  # Mean of the distribution
rate = 1 / mean  # Lambda (rate parameter)
x = 15  # Time in minutes

# Probability P(X < 15)
prob_less_than_15 = 1 - math.exp(-rate * x)
prob_less_than_15


What is the probability that the customer will spend more than 15 minutes

In [None]:
# Probability P(X > 15)
prob_more_than_15 = math.exp(-rate * x)
prob_more_than_15
