In [None]:
import railtracks as rt
from railtracks.RAG import get_rag_node, RAG

In [2]:
docs = [
    "Apple is deep red",
    "Pear is light yellow",
    "Watermelon is green on the outside and red on the inside",
]
query = "What is the color of watermelon?"

Great—here’s a concise review of the sigmoid (logistic) function, with a few complementary derivations and key identities.

1) Logistic ODE derivation (growth with saturation)
Assume a quantity y(x) grows proportionally to both its current size and the remaining capacity to a normalized maximum of 1:
$$
\frac{dy}{dx} = y(1-y).
$$
Separate variables:
$$
\frac{dy}{y(1-y)} = dx.
$$
Use partial fractions:
$$
\frac{1}{y(1-y)} = \frac{1}{y} + \frac{1}{1-y},
$$
so
$$
\int \left(\frac{1}{y} + \frac{1}{1-y}\right) dy = \int dx
\quad\Rightarrow\quad
\ln|y| - \ln|1-y| = x + C.
$$
Exponentiate:
$$
\frac{y}{1-y} = C e^{x}.
$$
Solve for y:
$$
y(x) = \frac{1}{1 + C^{-1} e^{-x}} = \frac{1}{1 + e^{-(x+b)}},
$$
where b = \ln C. With an intercept shift b and optional slope a > 0, the general form is
$$
\sigma(x) = \frac{1}{1 + e^{-x}} \quad\text{or}\quad \sigma_{a,b}(x) = \frac{1}{1 + e^{-a(x-b)}}.
$$
For population dynamics with carrying capacity K and rate r:
$$
y(t) = \frac{K}{1 + A e^{-rt}}.
$$

2) Odds/log-odds derivation (logistic regression link)
If the log-odds (logit) of an event is linear in x:
$$
\log\frac{p}{1-p} = w^\top x + b,
$$
then exponentiate:
$$
\frac{p}{1-p} = e^{w^\top x + b}
\quad\Rightarrow\quad
p = \frac{e^{w^\top x + b}}{1 + e^{w^\top x + b}}
= \frac{1}{1 + e^{-(w^\top x + b)}}.
$$
Thus the probability is the sigmoid of a linear score.

3) As the CDF of the logistic distribution
The standard logistic CDF is
$$
F(x) = \frac{1}{1 + e^{-x}} = \sigma(x),
$$
with PDF
$$
f(x) = F'(x) = \sigma(x)\bigl(1-\sigma(x)\bigr).
$$
More generally, with scale s > 0 and location μ:
$$
F(x) = \frac{1}{1 + e^{-(x-\mu)/s}}.
$$

Key identities and properties
- Range and symmetry:
$$
0 < \sigma(x) < 1, \quad \sigma(0) = \tfrac12, \quad \sigma(-x) = 1 - \sigma(x).
$$
- Derivative:
$$
\sigma'(x) = \sigma(x)\bigl(1 - \sigma(x)\bigr).
$$
- Inverse (logit):
$$
\text{logit}(p) = \log\frac{p}{1-p}, \quad \sigma(\text{logit}(p)) = p.
$$
- Relation to tanh:
$$
\sigma(x) = \frac{1}{2}\bigl(1 + \tanh(\tfrac{x}{2})\bigr).
$$
- Small-x approximation:
$$
\sigma(x) \approx \tfrac12 + \tfrac{x}{4} \quad \text{for } |x|\ll 1.
$$

Would you like quick practice problems (e.g., solving the ODE with different initial conditions, or deriving p from given log-odds)?

Think of “soft-max” as just a mathematical **function**;  
“soft-max regression” (multinomial logistic regression) is a **statistical model / learning algorithm** that *uses* that function as its last step.

------------------------------------------------
1. Soft-max function (nothing but a mapping)  

Given any real-valued vector  
$$\mathbf{z}=(z_1,\dots , z_K)\in\mathbb R^{K},$$  
soft-max produces a probability vector  

$$
\text{softmax}(\mathbf{z})_k
    \;=\;
    \frac{e^{z_k}}{\sum_{j=1}^{K}e^{z_j}},\qquad k=1,\dots ,K .
$$

Properties  
• Each component is in $(0,1)$.  
• They sum to $1$.  
• No training, no data, no parameters—just evaluate the formula.

------------------------------------------------
2. Soft-max regression (a learning model)  

Goal: classify an input feature vector $\mathbf x\!\in\!\mathbb R^{d}$ into one of $K$ classes.

Step-by-step model  
a) Affine scores (“logits”):

$$
\mathbf z = W\mathbf x + \mathbf b,
$$

with parameters  
$W\in\mathbb R^{K\times d}$  (weights) and  
$\mathbf b\in\mathbb R^{K}$    (biases).

b) Convert those scores into class probabilities with the **soft-max function**:

$$
P(y=k\mid\mathbf x;\,W,\mathbf b)
  =\frac{e^{z_k}}{\sum_{j=1}^{K}e^{z_j}}
  =\text{softmax}(\mathbf z)_k .
$$

c) Fit $(W,\mathbf b)$ to data by maximizing the multinomial likelihood (or, equivalently, minimizing cross-entropy loss).

So soft-max regression = “linear logits” + “soft-max function” + “parameter learning”.

------------------------------------------------
Analogy

logistic function  ↔  logistic regression  
soft-max function  ↔  soft-max (multinomial logistic) regression

Function alone: deterministic mapping, 0 parameters.  
Regression: statistical model that embeds the function inside it and supplies *trainable* parameters.

------------------------------------------------
Quick comparison table

• Soft-max  
  – Input: any $K$-vector of real numbers.  
  – Output: $K$ probabilities.  
  – Role: normalization / squashing.

• Soft-max regression  
  – Input: feature vector $\mathbf x$.  
  – Output: predicted class probabilities.  
  – Contains: weight matrix $W$, bias vector $\mathbf b$, loss, an optimization algorithm.

In [None]:
rag_node:rt.Node = get_rag_node(
    documents=docs,
)

query = "Color of pear?"
# You should not use _sync in notebook, as it is natively using event loop
with rt.Runner() as runner:
    result = await runner.call(rag_node, query)
print("Query: ", query)
print("choice [0]: ", result[0].record.text)
print("choice [1]: ", result[1].record.text)

[+251.235s] RT          : INFO     - START CREATED query Node
[+251.609s] RT          : INFO     - query Node DONE


Query:  Color of pear?
choice [0]:  Pear is light yellow
choice [1]:  Apple is deep red
