# Jim's Machine Learning Testbed



## Jupyter Notes
Put your cursor into the cell and press Shift+Enter to execute it and select the next one, or click 'Run Cell' button.

Press Double Shift to search everywhere for classes, files, tool windows, actions, and settings.

To learn more about Jupyter Notebooks in PyCharm, see [help](https://www.jetbrains.com/help/pycharm/ipython-notebook-support.html).
For an overview of PyCharm, go to Help -> Learn IDE features or refer to [our documentation](https://www.jetbrains.com/help/pycharm/getting-started.html).

# Basic Python interpreter check

In [2]:
# Does anything work
print("Hello World!")

# An IPython (Jupyter) thing
%env



Hello World!


{'HOMEBREW_PREFIX': '/opt/homebrew',
 'MANPATH': '/opt/homebrew/share/man::',
 'COMMAND_MODE': 'unix2003',
 'INFOPATH': '/opt/homebrew/share/info:',
 'SHELL': '/bin/zsh',
 'PYTHONPATH': '/Users/jcoles/Source/jkc/testbeds/jims-ml-sandbox:/Users/jcoles/Applications/DataSpell.app/Contents/plugins/python-ce/helpers/pydev:/Users/jcoles/Applications/DataSpell.app/Contents/plugins/python-ce/helpers-pro/jupyter_debug',
 '__CFBundleIdentifier': 'com.jetbrains.dataspell',
 'TMPDIR': '/var/folders/xc/32y9xdg50zv0b_kvssbpfc_00000gn/T/',
 'LC_ALL': 'en_US.UTF-8',
 'JBOSS_HOME': '/Users/jcoles/java/wildfly-29.0.0.Final',
 'PKG_CONFIG_PATH': '/Users/jcoles/.opam/default/lib/pkgconfig:',
 'TOOLBOX_VERSION': '2.5.3.37797',
 'DISPLAY': '/private/tmp/com.apple.launchd.M4apHXqJbV/org.macosforge.xquartz:0',
 'HOME': '/Users/jcoles',
 'HOMEBREW_REPOSITORY': '/opt/homebrew',
 'OPAMNOENVNOTICE': 'true',
 'PATH': '/Users/jcoles/Source/jkc/testbeds/jims-ml-sandbox/.venv/bin:/Users/jcoles/Library/pnpm:/Users/jco

Use a core Python package, numpy:

In [3]:
import numpy as np
np.add(1,2)

3

---
# ML Engine #1: Google `TensorFlow`

A sanity check call to `tensorflow` to make sure it finds and loads the library. NOTE: this does not do any tensor processing:

In [5]:
import tensorflow as tf
tf.add(1, 5).numpy()

6

A snippet of f

In [12]:
hello = tf.constant('Hello, TensorFlow!')
hello.numpy()

b'Hello, TensorFlow!'

Now we load a dataset from the convenient tensorflow_datasets python package.

In [6]:
# Load from tensorflow_dataset
import tensorflow_datasets



# Test Problem 1:
## Create our own dataset: Customer product purchase predictor

In [5]:
import tensorflow as tf

# Sample synthetic data
users = tf.constant([f'user_{i}' for i in range(100)])
products = tf.constant([f'product_{i}' for i in range(50)])

# Generate random purchase data
purchases = tf.random.uniform((100, 10), minval=0, maxval=50, dtype=tf.int32)
purchases = tf.map_fn(lambda row: tf.gather(products, row), purchases)

# Create dataset of (user, purchases) for training
dataset = tf.data.Dataset.from_tensor_slices((users, purchases))

# Define the model
user_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name="user")
product_input = tf.keras.layers.Input(shape=(None,), dtype=tf.string, name="products_purchased")
embedding_layer = tf.keras.layers.Embedding(input_dim=50, output_dim=8, mask_zero=True)
product_embedding = embedding_layer(product_input)
pooled_product_embedding = tf.keras.layers.GlobalAveragePooling1D()(product_embedding)
concatenated = tf.keras.layers.Concatenate()([tf.keras.layers.Embedding(input_dim=100, output_dim=8)(tf.strings.to_hash_bucket_fast(user_input, num_buckets=100)), pooled_product_embedding])
output = tf.keras.layers.Dense(10, activation="softmax")(concatenated)  # Predict 10 recommended products

model = tf.keras.Model(inputs=[user_input, product_input], outputs=output)

# Compile the model
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

# Create labels for training (synthetic labels for recommendations)
labels = tf.random.uniform((100,), minval=0, maxval=50, dtype=tf.int32)
train_dataset = dataset.map(lambda u, p: ({'user': u, 'products_purchased': p}, tf.random.uniform(shape=(10,), minval=0, maxval=50, dtype=tf.int32)))

# Train the model
model.fit(train_dataset.batch(10), epochs=5)

2025-02-10 15:17:40.684566: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro
2025-02-10 15:17:40.684600: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 32.00 GB
2025-02-10 15:17:40.684606: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 10.67 GB
2025-02-10 15:17:40.684622: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2025-02-10 15:17:40.684635: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


InvalidArgumentError: TensorArray dtype is int32 but Op is trying to write dtype string 

----
# ML Engine #2: OpenAI
OpenAI requires a paid API key. Set the key in an os environment variable, e.g.:

`export OPENAI_API_KEY=sk-proj-12fKGWJW ...`

A "hello world", OpenAI API-based _Chat Completion_:

In [4]:
from openai import OpenAI
client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            # "content": "Write a haiku about recursion in programming."
            "content": "Hello, OpenAI."
        }
    ]
)

print(completion.choices[0].message)

ChatCompletionMessage(content='Hello! How can I assist you today?', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)
