In [1]:
!pip install -U datasets huggingface_hub fsspec

Collecting fsspec
  Using cached fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)


In [2]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForQuestionAnswering, Trainer, TrainingArguments, DataCollatorForLanguageModeling
from datasets import load_dataset, Dataset, concatenate_datasets
import json
import os
import accelerate

In [3]:
from google.colab import drive
drive.mount('/content/MyDrive')

Drive already mounted at /content/MyDrive; to attempt to forcibly remount, call drive.mount("/content/MyDrive", force_remount=True).


In [4]:
# data_dir = "/content/drive/MyDrive/Data"
model_dir = "/content/drive/MyDrive/Models"
model_dir

'/content/drive/MyDrive/Models'

In [5]:
# ====== Config ======
# model_name = "microsoft/phi-1"
model_name = "Salesforce/codegen-350M-mono"
output_dir = model_dir + "/rocketry-roqeto-model"

In [7]:
# ====== Load Dataset ======
def load_json_dataset1(data):
    return Dataset.from_list([{
       "text": f"Question: {item['question']}\nAnswer: {item['answer']}"
    } for item in dataset1["train"]["spacesystems"][0:3500] ])

def load_json_dataset2(data):
    return Dataset.from_list([{
       "text": f"Question: {item['question']}\nAnswer: {item['answer']}"
    } for item in dataset2["train"] ])

In [8]:
dataset1 = load_dataset("shaddie/space_systems_qas_dataset")
data1 = load_json_dataset1(dataset1)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [9]:
data1

Dataset({
    features: ['text'],
    num_rows: 3500
})

In [10]:
dataset2 = load_dataset("shaddie/rocketry_qas_dataset")
data2 = load_json_dataset2(dataset2)

In [11]:
data2

Dataset({
    features: ['text'],
    num_rows: 757
})

In [12]:
dataset = concatenate_datasets([data1, data2])

In [13]:
dataset

Dataset({
    features: ['text'],
    num_rows: 4257
})

In [14]:
# # Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

Some weights of the model checkpoint at Salesforce/codegen-350M-mono were not used when initializing CodeGenForCausalLM: ['transformer.h.0.attn.causal_mask', 'transformer.h.1.attn.causal_mask', 'transformer.h.10.attn.causal_mask', 'transformer.h.11.attn.causal_mask', 'transformer.h.12.attn.causal_mask', 'transformer.h.13.attn.causal_mask', 'transformer.h.14.attn.causal_mask', 'transformer.h.15.attn.causal_mask', 'transformer.h.16.attn.causal_mask', 'transformer.h.17.attn.causal_mask', 'transformer.h.18.attn.causal_mask', 'transformer.h.19.attn.causal_mask', 'transformer.h.2.attn.causal_mask', 'transformer.h.3.attn.causal_mask', 'transformer.h.4.attn.causal_mask', 'transformer.h.5.attn.causal_mask', 'transformer.h.6.attn.causal_mask', 'transformer.h.7.attn.causal_mask', 'transformer.h.8.attn.causal_mask', 'transformer.h.9.attn.causal_mask']
- This IS expected if you are initializing CodeGenForCausalLM from the checkpoint of a model trained on another task or with another architecture (e

In [15]:
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

In [16]:
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

1

In [17]:
# Tokenize the dataset
def tokenize(example):
    # print(f'example {example["text"]}')
    return tokenizer(example["text"], truncation=True, padding="max_length", max_length=512)

tokenized_dataset = dataset.map(tokenize, batched=True)

Map:   0%|          | 0/4257 [00:00<?, ? examples/s]

In [18]:
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

In [19]:
# ====== Training Arguments ======
training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=4,
    num_train_epochs=5,
    logging_steps=150,
    save_steps=50,
    save_total_limit=2,
    warmup_steps=10,
    weight_decay=0.01,
    logging_dir=os.path.join(output_dir, "logs"),
    fp16=torch.cuda.is_available(),
    report_to="none",
)

In [20]:
# ====== Trainer ======
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

  trainer = Trainer(


In [21]:
# ====== Train ======
trainer.train()

Step,Training Loss
150,3.5958
300,3.4638
450,3.5018
600,3.4079
750,3.3706
900,3.3436
1050,3.3583
1200,2.2986
1350,2.3033
1500,2.2469


Step,Training Loss
150,3.5958
300,3.4638
450,3.5018
600,3.4079
750,3.3706
900,3.3436
1050,3.3583
1200,2.2986
1350,2.3033
1500,2.2469


TrainOutput(global_step=5325, training_loss=1.497214349916843, metrics={'train_runtime': 4080.5307, 'train_samples_per_second': 5.216, 'train_steps_per_second': 1.305, 'total_flos': 1.989635311927296e+16, 'train_loss': 1.497214349916843, 'epoch': 5.0})

In [34]:
def query_model(prompt, max_new_tokens=368):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [35]:
print(query_model("How would you launch a rocket with optimal fuel for reaching space?"))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


How would you launch a rocket with optimal fuel for reaching space?
I am interested in if the optimal fuel for reaching space is more like 'gauge point' than the rocket itself. I am talking about the rocket equation: 
 Space is:  $\Delta v = v_e \cdot g_0$  where  $g_0$  is the fuel mass available to produce a delta v,  $v_e$  is the exhaust velocity (when multiplied by g_0 to give exhaust velocity as a function of time), and  $g_0$  is the fuel mass available to produce a delta v,  $v_e$  is the exhaust velocity (when multiplied by g_0 to give exhaust velocity as a function of time). 
 The figure below is for  $v_e$  and  $g_0$  are functions of time. This makes the figure  $\Delta v = v_e \cdot g_0$  but  $g_0$  is the fuel mass available to produce a delta v,  $v_e$  is the exhaust velocity (when multiplied by g_0 to give exhaust velocity as a function of time), and  $g_0$  is the fuel mass available to produce a delta v,  $v_e$  is the exhaust velocity (when multiplied by g_0 to gi

In [37]:
# Example usage
print(query_model("Question: Can you describe how you would construct a habitable space station in geosynchronous orbit?"))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Can you describe how you would construct a habitable space station in geosynchronous orbit?
If you could construct a “ habitation space station” in geosynchronous orbit (having the satellite be a point mass) and one day has passed between the point and the end of the cycle, then you could describe how you would place a “ habitation planet” in geosynchronous orbit.
Answer: The habitation and the Earth observation planets and moons might be in geosynchronous orbit. At the end of the cycle the satellite would be point mass. In this case the mass of the Earth and the Sun would be around 7.7 kg. Thus, the mass of the space station would be around 7.8 kg and the mass of the planet and moon might be around 3.5 kg. 
 From the first video (see below) of the Starter Navigation System, the planets and moons could be in geosynchronous orbit, such as the Earth, the Sun and the Moon. 
 https://www.youtube.com/shorts/GI6IevFWR0k 
 https://www.youtube.com/shorts/GI6IevFWR0k 
 https://www.you

In [47]:
print(query_model("How would you design or create artificial gravity in a space-ship traveliing in outer space?"))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


How would you design or create artificial gravity in a space-ship traveliing in outer space?
I've been doing some research on this but very much struggling to work it all out. I'm looking for a cutaway diagram of said rocket, but I can't for the life of me work out what rocket uses what propellant. Can anyone help? 
 In terms of "cutting-edge" rockets, I'm trying to put most of the thrust into the thrust vector, as cutaway diagrammedicine, but the vector is too small to be a complete spin-polarized rocket. 
 As pasted into the Cutaway diagram, were the direct payloads used in the 2000's? 
 I have found a cutaway diagram of an elliptical rocket, which was used in the 2000s to put much of the payload into the thrust vector. I can't for the life of me work out what rocket uses what propellant, but the payload, and the other things being equal. 
 What I am looking for is that the vector of thrust needs to be smaller than the vector of payload, so I'll put that in the thrust vector. I can't

In [None]:
# ====== Save Model ======
# trainer.save_model(output_dir)
# tokenizer.save_pretrained(output_dir)

In [None]:
from huggingface_hub import notebook_login

notebook_login()

In [None]:
model.push_to_hub("shaddie/rocketry_roqeto_model",
                  use_auth_token=True,
                  commit_message="fine-tuning-for-rocketry-knowledge",
                  private=True)