In [1]:
%load_ext autoreload
%autoreload 2

# Small-LLM (Locomotion Language Model)

### Q: Can textual language models understand / reason about physics and locomotion?

Animals have knowledge regarding physics and locomotion

![Giraffe walking](media/leg-giraffe.gif)


### (1) In-context prompt learning + Model Predictive Control

Prompt GPT with the current state that evolves based on its generation action


In [2]:
from gpt_wrapper.main import mpc_with_gpt
from visualize import replay_offscreen
import os 
import random 

# generate some trajectories
np_actions = mpc_with_gpt(max_steps=40)

# visualize as video
replay_offscreen('mujoco/halfcheetah/expert-v0', np_actions, out_path=os.path.join("/home/ubuntu/small-llm/test-decision-transformer/saved_vids", f"gptwrapper_cheetah_{random.randint(0,100000)}.mp4"))

Step 40/40
Current state: [  0.139   0.707  -0.353  -0.125  -0.088  -0.096   0.405  -0.209   1.756
   1.789   3.842   8.253  -9.059   9.628   0.851 -14.301  -5.83 ]
GPT generated action: [-35.625 -35.625   0.      0.      5.     -1.   ]









<<< Saved video to /home/ubuntu/small-llm/test-decision-transformer/saved_vids/gptwrapper_cheetah_26009.mp4 >>>



Demo videos

<video width="640" height="480" controls>
  <source src="saved_vids/gptwrapper_cheetah_46742.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>

<video width="640" height="480" controls>
  <source src="saved_vids/gptwrapper_cheetah_66609.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>

<video width="640" height="480" controls>
  <source src="saved_vids/gptwrapper_cheetah_83689.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>




## (2) Fine-tuned small LLM (Pythia-410M)

We freeze the entire model and only train linear encoder and decoder layers (4M trainable params)

In [8]:
from visualize import viz_driver

# Note: we can condition on our target reward 
viz_driver("pythia", target_rew=300)


<<< Saved video to /home/ubuntu/small-llm/test-decision-transformer/saved_vids/pythia_targetreward_300_cheetah_66979.mp4 >>>



Demo videos with different reward conditions


<div style="display: flex; justify-content: space-around; align-items: flex-start; flex-wrap: nowrap; overflow-x: auto;">
  <div style="text-align: center; min-width: 300px; margin: 0 10px;">
    <h4>Reward Target: 600</h4>
    <video width="640" height="480" controls>
      <source src="saved_vids/pythia_targetreward_600_cheetah_81075.mp4" type="video/mp4"> 
      Your browser does not support the video tag.
    </video>
  </div>
  
  <div style="text-align: center; min-width: 300px; margin: 0 10px;">
    <h4>Reward Target: 1200</h4>
    <video width="640" height="480" controls>
      <source src="saved_vids/pythia_targetreward_1200_cheetah_68985.mp4" type="video/mp4">
      Your browser does not support the video tag.
    </video>
  </div>
  
  <div style="text-align: center; min-width: 300px; margin: 0 10px;">
    <h4>Reward Target: 2400</h4>
    <video width="640" height="480" controls>
      <source src="saved_vids/pythia_targetreward_2400_cheetah_31850.mp4" type="video/mp4">
      Your browser does not support the video tag.
    </video>
  </div>
</div>

## (3) Train GPT2 from scratch

Following *Decision Transformer (Chen et al. 2021)*, train GPT2 decoder model (700K params)

In [15]:
from visualize import viz_driver

# Note: we can condition on our target reward 
viz_driver("dt", target_rew=1200)


<<< Saved video to /home/ubuntu/small-llm/test-decision-transformer/saved_vids/dt_targetreward_1200_cheetah_13587.mp4 >>>



Demo videos with different reward conditions

<div style="display: flex; justify-content: space-around; align-items: flex-start; flex-wrap: nowrap; overflow-x: auto;">
  <div style="text-align: center; min-width: 320px; margin: 0 10px;">
    <h4>Reward Target: 300</h4>
      <video width="640" height="480" controls>
      <source src="saved_vids/dt_targetreward_300_cheetah_56626.mp4" type="video/mp4">
      Your browser does not support the video tag.
    </video>
  </div>
  
  <div style="text-align: center; min-width: 320px; margin: 0 10px;">
    <h4>Reward Target: 600</h4>
      <video width="640" height="480" controls>
      <source src="saved_vids/dt_targetreward_600_cheetah_58199.mp4" type="video/mp4">
      Your browser does not support the video tag.
    </video>
  </div>
  
  <div style="text-align: center; min-width: 320px; margin: 0 10px;">
    <h4>Reward Target: 1200</h4>
      <video width="640" height="480" controls>
      <source src="saved_vids/dt_targetreward_1200_cheetah_44888.mp4" type="video/mp4">
      Your browser does not support the video tag.
    </video>
  </div>
</div>

## Evaluation comparison

Comparison on fine-tuned frozen LLM with GPT trained from scratch


![Model comparison](media/model_comparison.png)