<a href="https://colab.research.google.com/github/hublun/Bayesian_Aggregation_Average_Data/blob/master/NumPyro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Using the OpenAI Library to Programmatically Access GPT-3.5-turbo!

This notebook was authored by [Chris Alexiuk](https://www.linkedin.com/in/csalexiuk/)

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


In [2]:
!nvidia-smi

Sun Jan 14 18:51:09 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [3]:
import jax

In [4]:
jax.devices()

[cuda(id=0)]

In [5]:
from jax import grad
import jax.numpy as jnp

def tanh(x):  # Define a function
  y = jnp.exp(-2.0 * x)
  return (1.0 - y) / (1.0 + y)

grad_tanh = grad(tanh)  # Obtain its gradient function
print(grad_tanh(1.0))   # Evaluate it at x = 1.0

0.4199743




---



In [6]:
!pip install numpyro

Collecting numpyro
  Downloading numpyro-0.13.2-py3-none-any.whl (312 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m312.7/312.7 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: numpyro
Successfully installed numpyro-0.13.2


In [10]:
import numpyro
import numpyro.distributions as dist

In [11]:
numpyro.set_platform('gpu')

In [12]:
import numpy as np
J = 8
y = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

In [13]:
def eight_schools(J, sigma, y=None):
    mu = numpyro.sample('mu', dist.Normal(0, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    with numpyro.plate('J', J):
      theta = numpyro.sample('theta', dist.Normal(mu, tau))
      numpyro.sample('obs', dist.Normal(theta, sigma), obs=y)

In [14]:
from jax import random
from numpyro.infer import MCMC, NUTS

In [20]:
nuts_kernel = NUTS(eight_schools)
mcmc = MCMC(nuts_kernel, num_warmup=500, num_samples=1000)
rng_key = random.PRNGKey(0)
mcmc.run(rng_key, J, sigma, y=y, extra_fields=('potential_energy',))

sample: 100%|██████████| 1500/1500 [00:15<00:00, 96.91it/s, 31 steps of size 1.54e-01. acc. prob=0.84] 


In [21]:
mcmc.print_summary()


                mean       std    median      5.0%     95.0%     n_eff     r_hat
        mu      5.13      3.30      4.91      0.55     10.78    113.35      1.01
       tau      3.88      3.13      2.90      0.61      8.24     74.57      1.00
  theta[0]      7.21      5.54      6.68     -2.40     14.98    208.75      1.01
  theta[1]      5.62      4.62      5.65     -2.00     12.48    277.16      1.00
  theta[2]      4.49      5.61      4.49     -3.65     13.95    216.87      1.00
  theta[3]      5.49      4.73      5.39     -2.63     12.32    255.80      1.01
  theta[4]      4.22      4.77      4.27     -3.00     11.68    180.66      1.00
  theta[5]      5.02      5.01      5.09     -2.34     12.58    229.81      1.00
  theta[6]      7.25      4.97      6.96     -0.77     15.21    232.94      1.00
  theta[7]      5.71      5.28      5.62     -2.71     12.99    266.35      1.00

Number of divergences: 8




---



The values above 1 for the split Gelman Rubin diagnostic (r_hat) indicates that the chain has not fully converged. The low value for the effective sample size (n_eff), particularly for tau, and the number of divergent transitions looks problematic.

Fortunately, this is a common pathology that can be rectified by using a non-centered paramaterization for tau in our model. This is straightforward to do in NumPyro by using a TransformedDistribution instance together with a reparameterization effect handler. Let us rewrite the same model but instead of sampling theta from a Normal(mu, tau), we will instead sample it from a base Normal(0, 1) distribution that is transformed using an AffineTransform. Note that by doing so, NumPyro runs HMC by generating samples theta_base for the base Normal(0, 1) distribution instead. We see that the resulting chain does not suffer from the same pathology — the Gelman Rubin diagnostic is 1 for all the parameters and the effective sample size looks quite good!

In [23]:
from numpyro.infer.reparam import TransformReparam, LocScaleReparam

# Eight Schools example - Non-centered Reparametrization

def eight_schools_noncentered(J, sigma, y=None):

    mu = numpyro.sample('mu', dist.Normal(0, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    with numpyro.plate('J', J):
        #with numpyro.handlers.reparam(config={'theta': TransformReparam()}):
        #  theta = numpyro.sample(
        #        'theta',
        #        dist.TransformedDistribution(dist.Normal(0., 1.),
        #        dist.transforms.AffineTransform(mu, tau)))
        with numpyro.handlers.reparam(config={'theta': LocScaleReparam(centered=0)}):
          theta = numpyro.sample('theta', dist.Normal(mu, tau))
          numpyro.sample('obs', dist.Normal(theta, sigma), obs=y)



nuts_kernel = NUTS(eight_schools_noncentered)
mcmc = MCMC(nuts_kernel, num_warmup=500, num_samples=1000)
rng_key = random.PRNGKey(0)

mcmc.run(rng_key, J, sigma, y=y, extra_fields=('potential_energy',))

mcmc.print_summary(exclude_deterministic=False)

sample: 100%|██████████| 1500/1500 [00:11<00:00, 127.07it/s, 15 steps of size 3.72e-01. acc. prob=0.94]



                         mean       std    median      5.0%     95.0%     n_eff     r_hat
                 mu      4.43      3.33      4.52     -1.45      9.13    766.92      1.00
                tau      3.65      3.10      2.92      0.00      7.52    563.06      1.00
           theta[0]      6.31      5.51      5.86     -2.66     14.55   1009.12      1.00
           theta[1]      4.94      4.89      4.84     -2.80     12.35   1031.74      1.01
           theta[2]      3.83      5.33      4.05     -5.33     11.57    721.02      1.00
           theta[3]      4.91      4.77      4.83     -2.27     12.65   1007.11      1.00
           theta[4]      3.68      4.70      3.99     -4.17     11.04    812.73      1.00
           theta[5]      4.02      4.76      4.16     -3.64     11.12    845.14      1.00
           theta[6]      6.39      4.93      6.00     -1.16     14.38    943.65      1.00
           theta[7]      4.45      5.11      4.56     -3.82     11.36    866.83      1.00
theta_dec

Now, let us assume that we have a new school for which we have not observed any test scores, but we would like to generate predictions. NumPyro provides a **Predictive class** for such a purpose. Note that in the* absence of any observed data*, we simply use the population-level parameters to generate predictions. The Predictive utility conditions the unobserved mu and tau sites to values *drawn from the posterior distribution from our last MCMC run*, and runs the model forward to generate predictions.

In [25]:
from numpyro.infer import Predictive



# New School

def new_school():
    mu = numpyro.sample('mu', dist.Normal(0, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    return numpyro.sample('obs', dist.Normal(mu, tau))



predictive = Predictive(new_school, mcmc.get_samples())

samples_predictive = predictive(random.PRNGKey(1))

print(np.mean(samples_predictive['obs']))

4.5687833




---



### Our First Prompt

You can reference OpenAI's [documentation](https://platform.openai.com/docs/api-reference/authentication?lang=python) if you get stuck!

Let's create a `ChatCompletion` model to kick things off!

There are three "roles" available to use:

- `system`
- `assistant`
- `user`

OpenAI provides some context for these roles [here](https://help.openai.com/en/articles/7042661-chatgpt-api-transition-guide)

Let's just stick to the `user` role for now and send our first message to the endpoint!

If we check the documentation, we'll see that it expects it in a list of prompt objects - so we'll be sure to do that!

##### Helper Functions

In [None]:
from IPython.display import display, Markdown

def get_response(messages: str, model: str = "gpt-3.5-turbo") -> str:
    return openai.ChatCompletion.create(
        model=model,
        messages=messages
    )

def system_prompt(message: str) -> dict:
    return {"role": "system", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}

def pretty_print(message: str) -> str:
    display(Markdown(message["choices"][0]["message"]["content"]))