## Other Distributions

### Poisson Distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space. It is used for modeling the number of times an event happens within a specific period or area, assuming the events occur independently and at a constant average rate.

#### Key Characteristics:

- Discrete: The distribution models counts of events (e.g., number of occurrences).
- Constant Rate: The average rate (λ, lambda) at which events occur is constant.
- Independence: Events occur independently of each other.

#### Poisson Probability Mass Function (PMF)

The probability of observing $k$ events in an interval is given by:
$$P(X=k)=\frac{\exp^{-\lambda}\lambda^k}{k!}$$

where:

- $e$ is the base of the natural logarithm (approximately 2.71828)
- $\lambda$ is the average rate of events (mean number of events)
- $k$ is the number of occurrences

#### Example 1: Customer Arrivals at a Store

Let's say a store on average gets 5 customers per hour. We want to calculate the probability that exactly 3 customers will arrive in the next hour.

Given:

- $\lambda$ (average rate) = 5 customers/hour
- $k$ (number of occurrences) = 3 customers

In [None]:
from scipy.stats import poisson

"""
P(X=3)
"""

poisson.pmf(mu=5, k=3)

In [None]:
import numpy as np

np.exp(-5) * np.power(5, 3)/(6)

#### Example 2: Number of Emails Received

Suppose you receive an average of 2 emails per hour. What is the probability that you receive no emails in the next hour?

Given:

- $\lambda$ (average rate) = 2 emails/hour
- $k$ (number of occurrences) = 0 emails

In [None]:
poisson.pmf(mu=2, k=0)

### Geometric Distribution

The geometric distribution is a discrete probability distribution that models the number of trials needed for the first success in a sequence of independent and identically distributed Bernoulli trials. In simpler terms, it represents the probability that the first occurrence of success requires $k$ independent trials, each with success probability $p$.

#### Key Characteristics:

- Discrete: It models counts of trials until the first success.
- Independent Trials: Each trial is independent of the others.
- Constant Probability: The probability of success $p$ is the same for each trial.

#### Geometric Probability Mass Function (PMF)

The probability that the first success occurs on the $k$-th trial is given by:
$$P(X=k)=(1-p)^{k-1}p$$

where:

- $p$ is the probability of success on any given trial
- $k$ is the trial number on which the first success occurs
- $1-p$ is the probability of failure on any given trial.

#### Example 1: Coin Toss

Let's say you are tossing a fair coin, and you want to know the probability that the first heads (success) appears on the 3rd toss.

Given:

- $p$ (probability of heads) = 0.5
- $k$ (trial number for first heads) = 3

In [None]:
import numpy as np

p = 0.5
k = 3

probability = np.power(1-p, k-1) * p

#### Example 2: Rolling a Die

Suppose you are rolling a fair six-sided die, and you want to find the probability that the first time you roll a 6 (success) is on the 5th roll.

Given:

- $p$ (probability of rolling a 6) = $frac{1}{6}$
- $k$ (trial number for first 6) = 5

In [None]:
import numpy as np

p = 1/6
k = 5

probability = np.power(1-p, k-1) * p

## Requests

- URL Assignment: The first line of code assigns a web address (URL) to a variable named url. This URL points to a text file hosted on the internet.

- Request to Retrieve Data: The second line uses Python's requests library to make a request to the URL specified by url. This request retrieves the contents of the text file located at that URL (index.txt). The allow_redirects=True parameter allows the request to follow any redirections that the server might instruct.

In [None]:
import requests

import pandas as pd

url="https://online.stat.psu.edu/stat462/sites/onlinecourses.science.psu.edu.stat462/files/data/skincancer/index.txt"
r = requests.get(url, allow_redirects=True)

Saving Data Locally: The with open('index.txt', 'wb') as f: part begins a block of code that opens a file named index.txt in write mode ('wb' means write binary). Inside this block, the retrieved content (r.content) from the web request is written into this local file (index.txt). This effectively saves a copy of the text file from the internet onto your computer.

In [None]:
with open('index.txt', 'wb') as f:
    f.write(r.content)

Loading Data into Pandas DataFrame: Finally, the last line of code uses the pandas library (pd) to read the contents of the index.txt file into a DataFrame (c). - 

- The read_csv function of pandas is used here with the following parameters:
'index.txt': This specifies the filename from which to read the CSV data.
- delim_whitespace=True: This parameter tells pandas to use whitespace (spaces, tabs) as the delimiter between columns in the text file, instead of commas which is the default for CSV files.

In [None]:
c = pd.read_csv('index.txt', delim_whitespace=True)

### Summary in Simple Terms:

- The code fetches data from a specific web address that hosts a text file (index.txt).
- It then saves a copy of this text file onto your computer.
- Finally, it loads the data from this saved file into a format that allows easy manipulation and analysis, represented as a table-like structure called a DataFrame.

## Visualization with Matplotlib

In [None]:
import pandas as pd

temperature_df = pd.read_excel("../Week-1/Data.xlsx", sheet_name='Temperature', index_col=0)
rainfall_df = pd.read_excel("../Week-1/Data.xlsx", sheet_name='Rainfall', index_col=0)

In [None]:
rainfall_df

In [None]:
temperature_df

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.plot(temperature_df.columns, temperature_df.loc['OXFORD'].values, c='b', linestyle='--', label='OXFORD')
ax.scatter(temperature_df.columns, temperature_df.loc['OXFORD'].values, c='b')

ax.plot(temperature_df.columns, temperature_df.loc['BUENOS AIRES'].values, c='r', linestyle='--', label='BUENOS AIRES')
ax.scatter(temperature_df.columns, temperature_df.loc['BUENOS AIRES'].values, c='r')

ax.set_ylabel("Temperature (C)")
ax.set_xlabel("Month")

ax.legend()

plt.tight_layout()

### Two plots in a single graph

In [None]:
fig, axs = plt.subplots(nrows=2, sharex=True)

ax = axs[0]

ax.plot(temperature_df.columns, temperature_df.loc['BAGHDAD'].values, c='b', linestyle='--', label='BAGHDAD')
ax.scatter(temperature_df.columns, temperature_df.loc['BAGHDAD'].values, c='b')

ax.plot(temperature_df.columns, temperature_df.loc['Cairo'].values, c='r', linestyle='--', label='Cairo')
ax.scatter(temperature_df.columns, temperature_df.loc['Cairo'].values, c='r')

ax.set_ylabel("Temperature (C)")
# ax.set_xlabel("Month")

ax.legend()

ax = axs[1]

ax.plot(rainfall_df.columns, rainfall_df.loc['BAGHDAD'].values, c='b', linestyle='--', label='BAGHDAD')
ax.scatter(rainfall_df.columns, rainfall_df.loc['BAGHDAD'].values, c='b')

ax.plot(rainfall_df.columns, rainfall_df.loc['Cairo'].values, c='r', linestyle='--', label='Cairo')
ax.scatter(rainfall_df.columns, rainfall_df.loc['Cairo'].values, c='r')

ax.set_ylabel("rainfall (mm)")
ax.set_xlabel("Month")

ax.legend()

plt.tight_layout()

### piechart

In [None]:
# Data for the pie chart
labels = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
sales = [250, 200, 150, 100, 90]

# Create a pie chart
plt.figure(figsize=(8, 8))  # Optional: Set the figure size
plt.pie(sales, labels=labels, autopct='%1.1f%%', startangle=140)

# Add a title
plt.title('Sales Distribution by City')

# Show the pie chart
plt.show()

#### plt.pie(): 
    The function to create a pie chart.

- sales: The data values.
- labels: The labels for each slice.
- autopct='%1.1f%%': Display the percentage value inside each slice.
- startangle=140: Start the pie chart at a specific angle (optional).

In [None]:
# Data for the pie chart
labels = ['Los Angeles', 'New York', 'Chicago', 'Houston', 'Phoenix']
sales = [250, 200, 150, 100, 90]
colors = ['#ff9999','#66b3ff','#99ff99','#ffcc99','#c2c2f0']  # Optional: Custom colors

# Create a pie chart
plt.figure(figsize=(8, 8))  # Optional: Set the figure size
plt.pie(sales, labels=labels, autopct='%1.1f%%', startangle=140, colors=colors, shadow=True, explode=(0.1, 0, 0, 0, 0))

# Add a title
plt.title('Sales Distribution by City')

# Show the pie chart
plt.show()

#### Additional Parameters:
- colors: Custom colors for the slices.
- shadow=True: Adds a shadow to the pie chart.
- explode: A tuple to offset a slice from the center.

### Plot with Error Bars

In [None]:
import numpy as np

# Data for the plot
x = np.arange(0, 10, 1)
y = np.sin(x)
error = np.random.rand(10) * 0.2  # Random error values

# Create a plot with error bars
plt.figure(figsize=(10, 6))
plt.errorbar(x, y, yerr=error, fmt='o', ecolor='red', capsize=5, capthick=2, label='Data with error')

# Add labels and title
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.title('Plot with Error Bars')
plt.legend()

# Show the plot
plt.show()

#### plt.errorbar(): 
    The function to create a plot with error bars.
- x, y: The data values.
- yerr=error: The error values for the y data.
- fmt='o': The format of the plot points ('o' stands for circular markers).
- ecolor='red': The color of the error bars.
- capsize=5: The size of the caps on the error bars.
- capthick=2: The thickness of the caps.
- label='Data with error': The label for the legend.

In [None]:
# Data for the plot
x = np.arange(0, 10, 1)
y = np.sin(x)
error = np.random.rand(10) * 0.2  # Random error values

# Create a plot with error bars
plt.figure(figsize=(10, 6))
plt.errorbar(x, y, yerr=error, fmt='o', ecolor='red', elinewidth=2, capsize=5, capthick=2, marker='s', markersize=8, 
             markerfacecolor='blue', markeredgewidth=1, label='Data with error')

# Add labels and title
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.title('Plot with Custom Error Bars')
plt.legend()

# Show the plot
plt.show()

#### Additional Parameters:

- elinewidth: The line width of the error bars.
- marker: The marker style for the data points.
- markersize: The size of the markers.
- markerfacecolor: The fill color of the markers.
- markeredgewidth: The edge width of the markers.

### Bar Plot

In [None]:
# Generate random data
np.random.seed(42)  # For reproducibility
data = np.random.randn(1000)  # 1000 random values from a standard normal distribution

# Create a histogram
plt.figure(figsize=(10, 6))
plt.hist(data, bins=30, alpha=0.5, color='blue', edgecolor='red', histtype='bar', rwidth=1)

# Add labels and title
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Customized Histogram of Normally Distributed Data')

# Add grid
plt.grid(axis='y', alpha=0.75)

# Show the histogram
plt.show()

plt.hist(data, bins=30, alpha=0.75, color='blue', edgecolor='black'): Creates the histogram.

- data: The data to plot.
- bins=30: The number of bins in the histogram.
- alpha=0.75: The transparency level of the bars.
- color='blue': The color of the bars.
- edgecolor='black': The color of the edges of the bars.

In [None]:
df = pd.DataFrame(data=[[100, 150, 20], [20, 30, 180]], columns=['Blue', "Green", "Pink"], index=['Boy', 'Girl'])

# Data for the plot
categories = df.columns
boys = df.loc['Boy']
girls = df.loc['Girl']

# Create a bar chart
fig, ax = plt.subplots(figsize=(10, 6))

bar_width = 0.35  # Width of the bars
index = np.arange(len(categories))  # The label locations

# Plotting the bars
bars1 = ax.bar(index, boys, bar_width, label='Boy', color='b')
bars2 = ax.bar(index + bar_width, girls, bar_width, label='Girl', color='r')

# Adding labels and title
ax.set_xlabel('Colors')
ax.set_ylabel('Count')
ax.set_title('Comparison between Boy and Girl by Color')
ax.set_xticks(index + bar_width / 2)
ax.set_xticklabels(categories)
ax.legend()

# Adding values on top of the bars
def add_values(bars):
    for bar in bars:
        height = bar.get_height()
        ax.annotate('{}'.format(height),
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3),  # 3 points vertical offset
                    textcoords="offset points",
                    ha='center', va='bottom')

add_values(bars1)
add_values(bars2)

# Show the plot
plt.show()