# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: I will use Low-Rank Adaptation (LoRA) for the parameter-efficient fine-tuning process in this project. LoRA will provide a basis for understanding alternate PEFT techniques.
* Model: I selected openai-community/gpt2 as the model for this project because it is a smaller model (137 million parameters) and will serve as a good starting point for learning the PEFT process.
* Evaluation approach: I will use the Hugging Face `Trainer` which provides a high-level interface for training and evaluation.
* Fine-tuning dataset: The dataset I selected for this project is SetFit/ag_news which is labeled for classification of news summaries.

In [None]:
# ensure these libraries are installed.
# this should be handled by the Python venv and requirements.txt file.
! pip install -q "scikit-learn~=1.6" \
    "datasets==3.3.2" "huggingface-hub==0.29.1" \
    "transformers==4.49.0" "evaluate==0.4.3" \
    "peft==0.14.0"

In [1]:
# this project will reuse these definitions throughout the project.
dataset_name = "SetFit/ag_news"
dataset_splits = ["train", "test"]

# labels in the dataset. see: https://huggingface.co/datasets/SetFit/ag_news
dataset_id2label={ 0: "World", 1: "Sports", 2: "Business", 3: "Sci/Tech" }
dataset_label2id={v:k for k, v in dataset_id2label.items()}

untuned_model_name = "openai-community/gpt2"

lora_tuned_path="./data/gpt2-lora-tuned"

print(f"{dataset_id2label=}")
print(f"{dataset_label2id=}")

dataset_id2label={0: 'World', 1: 'Sports', 2: 'Business', 3: 'Sci/Tech'}
dataset_label2id={'World': 0, 'Sports': 1, 'Business': 2, 'Sci/Tech': 3}


## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [2]:
from datasets import load_dataset

# the ag_news dataset is split into 120k train rows and 7.6k test rows.
dataset = load_dataset(dataset_name)

# reduce record cound to fit in memory (and run faster).
# the low record count will also test how well gpt2 does with classification
# when the training data set is small compared to what is required for training
# a model from scratch.
dataset["train"] = dataset["train"].shuffle(seed=7).select(range(500))
dataset["test"] = dataset["test"].shuffle(seed=11).select(range(100))

# view the dataset characteristics
print("train =", dataset["train"])
print("train[0] =", dataset["train"][0])
print("--------------------")
print("test =", dataset["test"])
print("test[0] =", dataset["test"][0])


train = Dataset({
    features: ['text', 'label', 'label_text'],
    num_rows: 500
})
train[0] = {'text': 'Air France-KLM Sales Rise 6.4 on Passenger Increase (Update1) Air France-KLM Group, Europe #39;s biggest airline, said second-quarter sales grew 6.4 percent as passengers generated more revenue.', 'label': 2, 'label_text': 'Business'}
--------------------
test = Dataset({
    features: ['text', 'label', 'label_text'],
    num_rows: 100
})
test[0] = {'text': "Alcoa Warns Earnings to Miss Forecasts (Reuters) Reuters - Alcoa Inc. , the world's largest\\aluminum producer, on Thursday warned that third-quarter\\results would fall far short of Wall Street expectations, hurt\\by plant shutdowns, restructuring costs and weakness in some\\markets.", 'label': 2, 'label_text': 'Business'}


In [3]:
from transformers import AutoTokenizer

# https://huggingface.co/docs/transformers/en/model_doc/gpt2#usage-tips
tokenizer = AutoTokenizer.from_pretrained(
    untuned_model_name,
    padding_side = "right",
)
# information regarding gpt2 and padding in tokenizer
# see: https://medium.com/@prashanth.ramanathan/fine-tuning-a-pre-trained-gpt-2-model-and-performing-inference-a-hands-on-guide-57c097a3b810
tokenizer.pad_token=tokenizer.eos_token
tokenizer.add_special_tokens({'pad_token': '<|pad|>'})

def row_processor(row):
    inputs = tokenizer(
        row["text"],
        truncation=True, 
        padding="max_length", 
        max_length=256,
    )
    inputs['labels'] = inputs['input_ids'].copy()
    return inputs

tokenized_dataset = {}
for split in dataset_splits:
    tokenized_dataset[split] = dataset[split].map(row_processor, batched=True)

# inspect the special token ids
print(f"{tokenizer.eos_token_id=}")
print(f"{tokenizer.pad_token_id=}")

# inspect the columns in the tokenized dataset
print(f"{tokenized_dataset['train']=}")
print(f"{tokenized_dataset['test']=}")

tokenizer.eos_token_id=50256
tokenizer.pad_token_id=50257
tokenized_dataset['train']=Dataset({
    features: ['text', 'label', 'label_text', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 500
})
tokenized_dataset['test']=Dataset({
    features: ['text', 'label', 'label_text', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 100
})


In [4]:
from transformers import AutoModelForSequenceClassification

untuned_model = AutoModelForSequenceClassification.from_pretrained(
    untuned_model_name,
    num_labels=len(dataset_id2label),
    id2label=dataset_id2label,
    label2id=dataset_id2label,
    pad_token_id=tokenizer.pad_token_id,
)

# ensure all the parameters of the base model are frozen
# except for the score layer.
# see: https://huggingface.co/transformers/v4.2.2/training.html
for name, param in untuned_model.base_model.named_parameters():
    if "score" in name:
        param.requires_grad = True
    else:
        param.requires_grad = False

# ensure the score layer outputs 4 values.
print(untuned_model)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=4, bias=False)
)


In [5]:
import numpy as np
import evaluate
from transformers import Trainer, TrainingArguments
from transformers import DataCollatorWithPadding

accuracy_metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy_metric.compute(predictions=predictions, references=labels)

training_args=TrainingArguments(
    output_dir="./data/gpt2-untuned",
    # set the learning rate
    learning_rate=0.005,
    # set the per device train batch size and eval batch size
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    # evaluate and save the model after each epoch
    eval_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=3,
    weight_decay=0.01,
    load_best_model_at_end=True,
)
print(f"{training_args.device=}")

trainer = Trainer(
    model=untuned_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)


training_args.device=device(type='mps')


In [6]:
# we need to do a training pass to set the weights on the score
# layer. the goal is to compare the base gpt2 model against a PEFT
# adapted gpt2 model, so we need have the score layer trained on
# the base model.
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.468647,0.84
2,No log,0.479229,0.85
3,No log,0.425354,0.83


TrainOutput(global_step=96, training_loss=0.6579697529474894, metrics={'train_runtime': 46.3028, 'train_samples_per_second': 32.395, 'train_steps_per_second': 2.073, 'total_flos': 195976101888000.0, 'train_loss': 0.6579697529474894, 'epoch': 3.0})

In [7]:
# we are interested in the accuracy of the untuned gpt2 model.
eval_prediction = trainer.evaluate()
untuned_prediction_accuracy = eval_prediction["eval_accuracy"]
eval_prediction

{'eval_loss': 0.42535400390625,
 'eval_accuracy': 0.83,
 'eval_runtime': 2.1149,
 'eval_samples_per_second': 47.282,
 'eval_steps_per_second': 3.31,
 'epoch': 3.0}

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [28]:
from peft import LoraConfig, TaskType, get_peft_model
from transformers import AutoModelForSequenceClassification

# these two steps need to be in the same cell because get_peft_model
# will modify the untrained_model and target modules names will change.

untuned_model = AutoModelForSequenceClassification.from_pretrained(
    untuned_model_name,
    num_labels=len(dataset_id2label),
    id2label=dataset_id2label,
    label2id=dataset_id2label,    
    pad_token_id=tokenizer.pad_token_id,
)

# untuned_model.score.weight.data = score_weight_tensor
# print(untuned_model.score.weight.clone())

# ensure all the parameters of the base model are frozen
# including the score layer.
# see: https://huggingface.co/transformers/v4.2.2/training.html
for param in untuned_model.base_model.parameters():
    param.requires_grad = False

# for name, module in untuned_model.base_model.named_modules():
#     print(name, type(module))

print("untuned_model...")
print(untuned_model)

# the following article was used as a source understanding how/why
# rank (r) is set, and the understanding the relationship between
# rank and alpha in LoRA.
# https://medium.com/@fartypantsham/what-rank-r-and-alpha-to-use-in-lora-in-llm-1b4f025fd133
lora_config = LoraConfig(
    r=64,
    lora_alpha=64,
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_CLS,
    fan_in_fan_out=True, # this is required for gpt2
    # from trail-and-error it appears that the attention layers in the gp2
    # model are the ones that most improved accuracy from the PEFT.
    target_modules=[
        "transformer.h.3.attn.c_attn",
        "transformer.h.3.attn.c_proj",
        "transformer.h.5.attn.c_attn",
        "transformer.h.5.attn.c_proj",
        # "transformer.h.7.attn.c_attn",
        # "transformer.h.7.attn.c_proj",
        # "transformer.h.10.attn.c_attn",
        # "transformer.h.10.attn.c_proj",
        # "transformer.h.11.attn.c_attn",
        # "transformer.h.11.attn.c_proj"
    ],
    modules_to_save=["score"]
)

lora_model = get_peft_model(untuned_model, lora_config)

print("--------------------")
lora_model.print_trainable_parameters()

# for name, module in lora_model.named_modules():
#     print(name, type(module))

# for name, param in untuned_model.named_parameters():
#     if param.requires_grad:
#         print(name, param.requires_grad)

print("lora_model...")
print(lora_model)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


untuned_model...
GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=4, bias=False)
)
--------------------
trainab

In [29]:
import numpy as np
import evaluate
from transformers import Trainer, TrainingArguments
from transformers import DataCollatorWithPadding

accuracy_metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy_metric.compute(predictions=predictions, references=labels)

# these training arguments must be the same as the gpt2 
# model training and evaluation step used to establish the
# baseline accuracy.
training_args=TrainingArguments(
    output_dir="./data/gpt2-lora-tuned",
    # set the learning rate
    learning_rate=0.005,
    # Set the per device train batch size and eval batch size
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    # evaluate and save the model after each epoch
    eval_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=3,
    weight_decay=0.01,
    load_best_model_at_end=True,
)
print(f"{training_args.device=}")

# use the same tokenized_dataset as the untuned evaluation.
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)


training_args.device=device(type='mps')


No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [30]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.58906,0.81
2,No log,0.348991,0.9
3,No log,0.304491,0.91


TrainOutput(global_step=96, training_loss=0.5302836100260416, metrics={'train_runtime': 74.5541, 'train_samples_per_second': 20.12, 'train_steps_per_second': 1.288, 'total_flos': 197342134272000.0, 'train_loss': 0.5302836100260416, 'epoch': 3.0})

In [31]:
# save the LoRA adapter
lora_model.save_pretrained(lora_tuned_path)

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [32]:
from peft import PeftConfig, PeftModel
from peft import PeftModelForSequenceClassification
from transformers import AutoModelForSequenceClassification

peft_config = PeftConfig.from_pretrained(lora_tuned_path)
print(f"{peft_config.base_model_name_or_path=}")
print(peft_config)

untuned_model = AutoModelForSequenceClassification.from_pretrained(
    peft_config.base_model_name_or_path,
    num_labels=len(dataset_id2label),
    id2label=dataset_id2label,
    label2id=dataset_id2label,
    pad_token_id=tokenizer.pad_token_id,
    return_dict=True,
)

lora_tuned_model = PeftModel.from_pretrained(
    untuned_model, 
    lora_tuned_path,
)

lora_tuned_merged_model = lora_tuned_model.merge_and_unload()

# ensure all the parameters of the base model are frozen.
for param in lora_tuned_merged_model.base_model.parameters():
    param.requires_grad = False


# merged_model.save_pretrained(lora_tuned_path + "-merged-model", merged_model)
print(lora_tuned_merged_model)

peft_config.base_model_name_or_path='openai-community/gpt2'
LoraConfig(task_type='SEQ_CLS', peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path='openai-community/gpt2', revision=None, inference_mode=True, r=64, target_modules={'transformer.h.3.attn.c_attn', 'transformer.h.5.attn.c_proj', 'transformer.h.3.attn.c_proj', 'transformer.h.5.attn.c_attn'}, exclude_modules=None, lora_alpha=64, lora_dropout=0.05, fan_in_fan_out=True, bias='none', use_rslora=False, modules_to_save=['score', 'classifier', 'score'], init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, eva_config=None, use_dora=False, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False), lora_bias=False)


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=4, bias=False)
)


In [33]:

trainer = Trainer(
    model=lora_tuned_merged_model,
    eval_dataset=tokenized_dataset["test"],
    processing_class=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

In [34]:
eval_prediction = trainer.evaluate()
tuned_prediction_accuracy = eval_prediction["eval_accuracy"]
eval_prediction

{'eval_loss': 0.3044907748699188,
 'eval_model_preparation_time': 0.0009,
 'eval_accuracy': 0.91,
 'eval_runtime': 2.2741,
 'eval_samples_per_second': 43.974,
 'eval_steps_per_second': 5.717}

In [35]:
print(f"{untuned_prediction_accuracy=}")
print(f"{tuned_prediction_accuracy=}")
print(f"{(((tuned_prediction_accuracy - untuned_prediction_accuracy) / untuned_prediction_accuracy) * 100):.2f}% accuracy improvement")
lora_model.print_trainable_parameters()

untuned_prediction_accuracy=0.83
tuned_prediction_accuracy=0.91
9.64% accuracy improvement
trainable params: 592,896 || all params: 125,035,776 || trainable%: 0.4742


In [36]:
import pandas as pd
from IPython.display import display

pd.set_option('display.max_colwidth', None)

df = pd.DataFrame(tokenized_dataset["test"])
df = df[["text", "label"]]

predictions = trainer.predict(tokenized_dataset["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)
df["predicted_label"] = df["predicted_label"].map(lambda id: dataset_id2label[id]) 
df["label"] = df["label"].map(lambda id: dataset_id2label[id]) 

In [37]:
df[df["label"] == df["predicted_label"]].head(10)

Unnamed: 0,text,label,predicted_label
0,"Alcoa Warns Earnings to Miss Forecasts (Reuters) Reuters - Alcoa Inc. , the world's largest\aluminum producer, on Thursday warned that third-quarter\results would fall far short of Wall Street expectations, hurt\by plant shutdowns, restructuring costs and weakness in some\markets.",Business,Business
1,"Panama Assures Rumsfeld on Canal Security PANAMA CITY, Panama (Reuters) - Panama's security chief told Defense Secretary Donald Rumsfeld on Saturday the Central American nation was working to prevent any terror attack that might close the Panama Canal.",World,World
2,"Prosecutor brings charges against former neighbor in NBA brawl PONTIAC, Mich. The man accused of starting the brawl at a Detroit Pistons game last month is no stranger to the man who #39;s filing the charges against him.",Sports,Sports
3,Palace Bans Two Fans The Palace in Auburn Hills bans two men from events for their involvement in last month's brawl between the Pistons and Indian Pacers.,Sports,Sports
4,"New MSN Search May Be a Google Killer! New MSN Search May Be a Google Killer!\\The Second Look at MSN's Search technology is available for public beta testing. I've given it a spin myself and must say that I'm impressed. Although they have no ads on the SERP's of the preview site, I'm sure they will load it ...",Sci/Tech,Sci/Tech
5,"Jeanne Heads for Bahamas After Killing 3 SAMANA, Dominican Republic - Threatening to regain hurricane strength, Tropical Storm Jeanne headed for the Bahamas on a track for the southeastern United States after killing three people and causing extensive damage in the Caribbean. The storm forced the evacuation of thousands on Thursday as it slammed into the Dominican Republic after punishing Puerto Rico with flash floods and deadly winds...",World,World
6,Coke opens its coolers to rival products Coca Cola is to allow other companies #39; products in its shop coolers for the first time. It has agreed the move in a deal with the European Commission to settle a five year competition case.,Business,Business
7,ECB quot;consensus quot; kept rates steady today European Central Bank president Jean-Claude Trichet has said that today #39;s decision to leave euro interest rates unchanged reflected a broad consensus on the governing council of the bank.,Business,Business
8,"CARE Official Kidnapped in Baghdad Margaret Hassan, said to be a British-born Iraqi national, the director of CARE International #39;s operation in Iraq is seen in this image made from video footage made on May 20, 2003.",World,World
9,"All the world #39;s a web page as the Bard goes online The earliest editions of Shakespeare #39;s plays provide a fascinating insight into how the playwright reworked his masterpieces over time, but until now, due to their age and",Sci/Tech,Sci/Tech


In [38]:
df[df["label"] != df["predicted_label"]].head(10)

Unnamed: 0,text,label,predicted_label
18,"BHP Billiton, Alcoa sell Integris Metals for 359 million pounds (AFP) AFP - Anglo-Australian mining giant BHP Billiton and Alcoa, the world's largest aluminium producer, have agreed to sell their metal services joint venture Integris Metals for 660 million dollars (359 million pounds) including debt, a joint statement said.",World,Business
24,"Ford underlines committed to motorsport. Despite confirming the successful sale of both Jaguar Racing and its Cosworth engine company to new owners, Ford Motor Company has stressed that it remains committed to supporting motorsport at all levels.",Sports,Business
30,"Martha Stewart reports to jail to begin sentence the time she had to report to the country #39;s oldest federal prison for women. service of her sentence, quot; a Federal Bureau of Prisons statement said.",Business,World
33,"Sales of industrial robots surging: UN report Geneva - Worldwide sales of industrial robots surged to record levels in the first half of 2004 after equipment prices fell while labour costs grew, the United Nations Economic Commission for Europe said in a report to be released today.",Sci/Tech,Business
58,"EU Head Office Trims 2005 Growth Forecast (AP) AP - The European Union's head office issued a bleak economic report Tuesday, warning that the sharp rise in oil prices will ""take its toll"" on economic growth next year while the euro's renewed climb could threaten crucial exports.",World,Business
62,"Report: EADS Could Link With Thales The French government is considering a linkup of European Aeronautic Defence amp; Space Co. with Thales SA to create an aerospace giant, the financial daily Les Echos reported Friday.",Business,Sci/Tech
75,"La. Seeks New Bridge, Elevated Highway If you think oil is expensive now, just imagine if Hurricane Ivan had swung west and come ashore at this bustling oil and gas port at the southernmost point of Louisiana.",Business,World
81,"GAME UNDER FIRE Attacking police officers, racial slurs, bloody beatings of innocent bystanders ... is it really just a game? In four and a half minutes, 14-year-old Ryan Mason ran over a police officer, stole his gun and shot and killed three innocent bystanders.",Sci/Tech,World
95,"ScanSoft to acquire 3 software firms ScanSoft Inc. said it plans three acquisitions. The company will acquire Phonetic Systems Ltd., a provider of automated directory assistance and voice-based programs, for \$35 million in cash, and an additional consideration of up to \$35 million, based on the achievement of performance targets and the potential vesting of a warrant to buy 750,000 common shares. ART Advanced Recognition Technologies ...",Business,Sci/Tech
