# Note Generator
- Chunks pdfs by page range and outputs useful notes

## Text Generation API Wrapper

In [3]:
from openai import OpenAI
class TextGeneration:
    def __init__(self, topic: str):
        self.client = OpenAI()
        self.topic = topic

    def generate_json(self, user_prompt, system_prompt=None):
        if system_prompt is None:
            system_prompt = f"You are a helpful assistant that knows a lot about {self.topic} and only responds with JSON"

        return self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": f"{user_prompt}"},
            ]
        )

    def generate_text(self, user_prompt, system_prompt=None):
        if system_prompt is None:
            system_prompt = f"You are a helpful assistant that knows a lot about {self.topic}"

        response = self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": f"{user_prompt}"},
            ]
        )
        return response.choices[0].message.content


tg = TextGeneration(topic="statistics")

In [4]:
from PyPDF2 import PdfFileReader
from IPython.display import Markdown
import pyperclip as pc
class NoteGenerator:
    def __init__(self, file_path):
        self.file_path = file_path

    def get_text_pdf(self, start_page, end_page):
        with open(self.file_path, 'rb') as f:
            reader = PdfFileReader(f)
            text = ""
            for i in range(start_page, end_page + 1):
                text += reader.getPage(i).extractText()
        return text

    @staticmethod
    def chunk_string(string, n):
        return [string[i:i + n] for i in range(0, len(string), n)]

    def process_text(self, start_page, end_page, chunk_size):
        text = self.get_text_pdf(start_page, end_page)
        chunked_string = self.chunk_string(text, chunk_size)
        notes = []
        
        for chunk in chunked_string:
            # Placeholder for generating summary using a hypothetical 'tg.generate_text' method.
            # Adjust this part to match your actual implementation for generating text.
            one_line_summary_prompt = ("Please provide markdown notes (use only h4 for headings) for only "
                                       "the most important concepts here no extra stuff. Use katex inline "
                                       "blocks (surrounded by $ for Jupyter)")
            one_line_summary = tg.generate_text(one_line_summary_prompt + chunk)
            notes.append(one_line_summary)
        
        formatted_notes = "".join(notes)
        pc.copy(formatted_notes)
        display(Markdown(formatted_notes))

ng= NoteGenerator('/home/smillburn/Downloads/William Navidi - Statistics for Engineers and Scientists-McGraw-Hill Education (2014).pdf')
ng.process_text(start_page=108, end_page=135, chunk_size=10000)


#### Random Variables

A random variable assigns a numerical value to each outcome in a sample space. Random variables are denoted with uppercase letters like $X$, $Y$, and $Z$.

#### Discrete Random Variables

- A discrete random variable has possible values forming a discrete set with gaps between adjacent values.
- The probability mass function (PMF) of a discrete random variable gives the probability of each possible value.
- The sum of probabilities in the PMF over all possible values is always equal to 1.

#### Continuous Random Variables

- Continuous random variables have possible values within an interval.
- Probabilities for continuous random variables are represented using a cumulative distribution function (CDF) which gives the probability that a variable is less than or equal to a specific value.

#### Random Variables and Populations

- Thinking of random variable values as samples from populations helps in understanding and calculating probabilities.
- For discrete random variables, the set of possible values along with their probabilities completely describes the population.
#### Cumulative Distribution Function (CDF)
- The cumulative distribution function (CDF) of a random variable $X$ is denoted as $F(x) = P(X \leq x)$.
- CDF is computed by summing the probabilities of all possible values of $X$ that are less than or equal to $x$.
- For any discrete random variable, the CDF $F(x)$ can be found by summing the probabilities of all possible values of $X$ less than or equal to $x$.

#### Mean and Variance for Discrete Random Variables
- The mean $\mu_X$ of a discrete random variable $X$ is given by $\mu_X = \sum x \cdot P(X = x)$.
- The mean is also known as the expectation or expected value of $X$.
- The population variance $\sigma^2_X$ of $X$ is given by $\sigma^2_X = \sum x(x - \mu_X)^2P(X = x)$.
- The standard deviation $\sigma_X$ is the square root of the variance.

#### Probability Histogram
- When possible values of a discrete random variable are evenly spaced, a probability histogram can represent the probability mass function.
- In a probability histogram, rectangles centered at possible values represent the probabilities $P(X = x)$.
- The area of each rectangle corresponds to the probability of that value occurring for the random variable.#### Probability Representation for Discrete Random Variables

$P(a \leq X \leq b) = P(a \leq X < b) = P(a < X \leq b) = P(a < X < b) = \int_a^b f(x)dx$

#### Continuous Random Variables

A continuous random variable's probabilities are represented by areas under a curve, known as the probability density function. The integral of the probability density function over a certain interval gives the probability that the random variable takes on a value in that interval.

#### Cumulative Distribution Function of a Continuous Random Variable

The cumulative distribution function of a continuous random variable $X$ is defined as:

$F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t)dt$

For a continuous random variable, the cumulative distribution function will always be continuous.

#### Mean and Variance for Continuous Random Variables

The population mean and variance of a continuous random variable are calculated using the probability density function, similar to how they are determined for discrete random variables. The mean is the center of mass, and the variance is the moment of inertia around the mean.#### Continuous Random Variables

The mean of a continuous random variable \(X\) is given by:
$$\mu_X = \int_{-\infty}^{\infty} x f(x) dx$$

The variance of \(X\) is given by:
$$\sigma_X^2 = \int_{-\infty}^{\infty} (x - \mu_X)^2 f(x) dx$$

An alternate formula for the variance is:
$$\sigma_X^2 = \int_{-\infty}^{\infty} x^2 f(x) dx - \mu_X^2$$

The standard deviation is the square root of the variance: \(\sigma_X = \sqrt{\sigma_X^2}\)

#### The Population Median and Percentiles

The median of a continuous random variable \(X\) is the point \(x_m\) that solves \(P(X \leq x_m) = \int_{-\infty}^{x_m} f(x) dx = 0.5\)

The \(p\)th percentile of \(X\) is the point \(x_p\) that solves \(P(X \leq x_p) = \int_{-\infty}^{x_p} f(x) dx = p/100\)

#### Chebyshev's Inequality

For a random variable \(X\) with mean \(\mu_X\) and standard deviation \(\sigma_X\):
$$P(|X-\mu_X| \geq k\sigma_X) \leq \frac{1}{k^2}$$#### 6. Titanium Isotopes
a. $\mu_X = \sum x \cdot p(x)$  
b. $\sigma_X = \sqrt{\sum (x - \mu_X)^2 \cdot p(x)}$

#### 7. Packet Resending
a. Find the value of the constant $c$  
b. $P(X=2)$  
c. Mean number of times the packet is sent  
d. Variance of the number of times the packet is sent  
e. Standard deviation of the number of times the packet is sent  

#### 8. Computer Disk Errors
a. Probability of two or fewer errors detected  
b. Probability of more than three errors detected  
c. Probability of exactly one error detected  
d. Probability of no errors detected  
e. Most probable number of errors to be detected  

#### 9. Traffic Engineer
a. Compute $P(X=x)$ using $p_1(x)$  
b. Compute $P(X=x)$ using $p_2(x)$  
c. Compare the two probability mass functions  
d. Discuss whether the models align with data  

#### 10. Microprocessing Chips
a. Probability the first chip chosen is acceptable  
b. Probability first chip is unacceptable and second is acceptable  
c. $P(X=3)$  
d. Probability mass function of $X$  

#### 11. Continued Discussion on Chips
a. Smallest possible value for $Y$  
b. Probability of $Y$ taking on that value  
c. $P(Y=3|X=1)$  
d. $P(Y=3|X=2)$  
e. $P(Y=3)$  

#### 12. Testing Components
a. Possible values for $X$  
b. $P(X=3)$  
c. $P(FSS)$  
d. $P(SFS)$ and $P(SSF)$  
e. $P(X=2)$  
f. $P(X=1)$  
g. $P(X=0)$  
h. $\mu_X$  
i. $\sigma_X^2$  
j. $P(Y=3)$  
#### Concept of Probability Density Function

Given a probability density function \( f(x) \) of a random variable \( X \), certain properties have to be satisfied for it to be a valid probability density function. For example, the area under the curve must be equal to 1.

#### Mean, Cumulative Distribution Function, and Median

The mean of a random variable is a measure of central tendency that gives an idea of the expected value of the variable. The cumulative distribution function gives the probability that the random variable is less than or equal to a certain value. The median is the value that divides the probability distribution into two equal halves.

#### Proportion Calculations

To find the proportion of contaminating particles that have diameters less than a certain threshold (e.g., PM10 or PM2.5), we can use the cumulative distribution function of the particle diameter.

#### Repair Time and Diameter Calculations

Similar to the particle diameter example, we can calculate probabilities, mean values, standard deviations, and cumulative distribution functions for other random variables like repair times or diameters of objects. Different probability density functions may apply in each case.

#### Linear Functions of Random Variables

When performing arithmetic operations on random variables, such as adding a constant or multiplying by a constant, the mean of the random variable changes accordingly, while the variance remains unchanged. These operations help in creating new random variables for analysis.