In [2]:
!pip install evaluate
!pip install optuna
!pip install datasets
!pip install bert_score

Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting datasets>=2.0.0 (from evaluate)
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting dill (from evaluate)
  Downloading dill-0.4.0-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from evaluate)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from evaluate)
  Downloading multiprocess-0.70.18-py311-none-any.whl.metadata (7.5 kB)
Collecting dill (from evaluate)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting multiprocess (from evaluate)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec>=2021.05.0 (from fsspec[http]>=2021.05.0->evaluate)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m 

In [3]:
!pip install rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=54b402e7c6c36cfe1d84c8fa25f2507b9f6ce5bcc27fd106db01aca41bd8805c
  Stored in directory: /root/.cache/pip/wheels/1e/19/43/8a442dc83660ca25e163e1bd1f89919284ab0d0c1475475148
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [4]:
import json
import torch
from datasets import load_dataset
import nltk
import optuna
from transformers import T5Tokenizer, T5ForConditionalGeneration, Trainer, TrainingArguments
from datasets import Dataset
from evaluate import load

# Download NLTK data for sentence tokenization
nltk.download("punkt")

# Step 1: Load the ClimateFever dataset using Hugging Face datasets
print("Loading ClimateFever dataset...")
climatefever_dataset = load_dataset("climate_fever", split="test")  # Use the test split (full dataset is small)

# Step 2: Extract and adapt (problem, approach) pairs
# ClimateFever has claims and evidence; we'll adapt claims as problems and evidence as approaches
# We'll combine multiple evidence entries and expand them to meet the 150–300 word requirement
dataset = []

# Keywords to ensure environmental science focus (already implicit in ClimateFever, but for robustness)
env_keywords = [
    "climate change", "carbon emission", "pollution", "biodiversity",
    "deforestation", "renewable energy", "sustainability", "ocean acidification"
]

# Function to check if claim is environmental science-related (redundant for ClimateFever, but for robustness)
def is_env_science(claim):
    claim_lower = claim.lower()
    return any(keyword in claim_lower for keyword in env_keywords)

# Function to adapt evidence into a detailed approach
def synthesize_approach(claim, evidence_list):
    # Combine evidence into a single text
    evidence_text = " ".join([evidence["evidence"] for evidence in evidence_list])

    # Synthesize an approach by rephrasing the evidence into a solution-oriented format
    # We'll manually craft a template to expand the evidence into 150–300 words
    problem_words = claim.lower().split()
    if "carbon emission" in claim.lower() or "global warming" in claim.lower():
        approach = f"To address the issue of {claim.lower()}, a multi-step strategy can be implemented: 1. Promote renewable energy adoption by offering incentives such as tax credits for solar and wind energy installations. 2. Expand public transportation systems to reduce reliance on fossil fuel-based vehicles, especially in urban areas. 3. Implement stricter regulations on industrial emissions, requiring companies to adopt cleaner technologies and report emissions annually. Additionally, public awareness campaigns can educate communities about sustainable practices, such as reducing energy consumption and supporting green policies. International collaboration with organizations like the UN can help secure funding and coordinate efforts across countries, ensuring a unified approach to tackling this issue. {evidence_text} This approach aims to mitigate the environmental impact while fostering long-term sustainability."
    elif "pollution" in claim.lower():
        approach = f"To mitigate {claim.lower()}, a comprehensive plan can be adopted: 1. Enforce regulations banning single-use plastics and promoting biodegradable alternatives. 2. Enhance waste management systems by increasing recycling facilities and ensuring proper disposal in affected regions. 3. Launch cleanup initiatives, such as deploying technologies to remove debris from ecosystems. 4. Educate communities about the impact of pollution through school programs and media campaigns, encouraging reduced waste production. Collaboration with global organizations can help secure funding and coordinate efforts across regions, ensuring a unified approach to tackling this issue. {evidence_text} This strategy aims to reduce pollution while promoting sustainable practices."
    else:
        approach = f"To address {claim.lower()}, the following approach can be implemented: 1. Develop policies to protect ecosystems, such as establishing protected areas and regulating resource extraction. 2. Promote sustainable practices among communities through education and incentives. 3. Invest in research to better understand the issue and develop innovative solutions. 4. Foster international cooperation to address global aspects of the problem. {evidence_text} This approach seeks to balance environmental protection with sustainable development, ensuring long-term benefits for both nature and society."

    # Ensure approach is 150–300 words
    word_count = len(approach.split())
    if not (150 <= word_count <= 300):
        # Pad with a generic sentence if too short, or truncate if too long
        if word_count < 150:
            approach += " Furthermore, engaging stakeholders at all levels—from local communities to international policymakers—ensures that solutions are both practical and widely supported, maximizing their impact over time."
        elif word_count > 300:
            approach = " ".join(approach.split()[:300])

    return approach

# Group evidence by claim
claim_to_evidence = {}
for entry in climatefever_dataset:
    claim = entry["claim"]
    evidence = entry["evidences"]
    if not is_env_science(claim):
        continue
    if claim not in claim_to_evidence:
        claim_to_evidence[claim] = []
    claim_to_evidence[claim].extend(evidence)

# Create (problem, approach) pairs
for claim, evidence_list in claim_to_evidence.items():
    if not evidence_list:
        continue
    approach = synthesize_approach(claim, evidence_list)
    dataset.append({"problem": claim, "approach": approach})

    # Stop at 500 pairs
    if len(dataset) >= 500:
        break

# Save the filtered dataset
with open("environmental_science_climatefever_dataset.json", "w") as f:
    json.dump(dataset, f, indent=4)

print(f"Dataset created with {len(dataset)} pairs. Saved to environmental_science_climatefever_dataset.json")

# Step 3: Prepare the dataset for training
# Load the dataset
with open("environmental_science_climatefever_dataset.json", "r") as f:
    data = json.load(f)

# Format for T5: "problem: <text>" as input, approach as target
inputs = ["problem: " + item["problem"] for item in data]
targets = [item["approach"] for item in data]

# Create a Hugging Face Dataset
dataset = Dataset.from_dict({"input_text": inputs, "target_text": targets})

# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-small")

# Tokenize the dataset
def preprocess_function(examples):
    inputs = examples["input_text"]
    targets = examples["target_text"]
    model_inputs = tokenizer(inputs, max_length=64, truncation=True, padding="max_length")
    labels = tokenizer(targets, max_length=256, truncation=True, padding="max_length")
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Split into train and validation sets
train_test_split = tokenized_dataset.train_test_split(test_size=0.1)
train_dataset = train_test_split["train"]
eval_dataset = train_test_split["test"]

# Step 4: Hyperparameter optimization with Optuna
def objective(trial):
    # Suggest hyperparameters
    learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-3, log=True)
    batch_size = trial.suggest_categorical("batch_size", [4, 8, 16])
    num_train_epochs = trial.suggest_int("num_train_epochs", 3, 20)

    # Define device
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Define training arguments with suggested hyperparameters
    training_args = TrainingArguments(
        output_dir=f"./t5_env_science_trial_{trial.number}",
        eval_strategy="epoch",
        learning_rate=learning_rate,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        num_train_epochs=num_train_epochs,
        weight_decay=0.01,
        save_strategy="epoch",
        logging_dir=f"./logs/trial_{trial.number}",
        logging_steps=10,
        report_to="none",
    )

    # Load fresh model for each trial and move to device
    model = T5ForConditionalGeneration.from_pretrained("t5-small").to(device)

    # Initialize Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
    )

    # Train the model
    trainer.train()

    # Evaluate using ROUGE-L on validation set
    rouge = load("rouge")
    predictions = []
    references = []

    for example in eval_dataset:
        input_text = example["input_text"]
        inputs = tokenizer(input_text, return_tensors="pt", max_length=64, truncation=True)
        # Move inputs to the same device as the model
        inputs = {key: val.to(device) for key, val in inputs.items()}
        outputs = model.generate(
            inputs["input_ids"],
            max_length=256,
            num_beams=4,
            early_stopping=True
        )
        generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
        predictions.append(generated)
        references.append(example["target_text"])

    # Compute ROUGE-L
    rouge_results = rouge.compute(predictions=predictions, references=references)
    rouge_l = rouge_results["rougeL"]

    return rouge_l

# Run Optuna optimization
print("Starting hyperparameter optimization with Optuna...")
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=10)  # 10 trials for faster execution

# Print the best hyperparameters
best_trial = study.best_trial
print("Best trial:")
print(f"  ROUGE-L: {best_trial.value}")
print("  Best hyperparameters: ", best_trial.params)

# Step 5: Train the final model with the best hyperparameters
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
best_learning_rate = best_trial.params["learning_rate"]
best_batch_size = best_trial.params["batch_size"]
best_num_train_epochs = best_trial.params["num_train_epochs"]

final_training_args = TrainingArguments(
    output_dir="./t5_env_science_final",
    eval_strategy="epoch",
    learning_rate=best_learning_rate,
    per_device_train_batch_size=best_batch_size,
    per_device_eval_batch_size=best_batch_size,
    num_train_epochs=best_num_train_epochs,
    weight_decay=0.01,
    save_strategy="epoch",
    logging_dir="./logs/final",
    logging_steps=10,
)

# Load fresh model for final training and move to device
final_model = T5ForConditionalGeneration.from_pretrained("t5-small").to(device)
final_trainer = Trainer(
    model=final_model,
    args=final_training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

# Train the final model
print("Training final model with best hyperparameters...")
final_trainer.train()

# Save the final model
final_model.save_pretrained("./t5_env_science_final_model")
tokenizer.save_pretrained("./t5_env_science_final_model")

print("Final model training complete and saved to ./t5_env_science_final_model")

# Step 6: Evaluate the final model
# Load metrics
rouge = load("rouge")
bertscore = load("bertscore")

# Experiment 1: Standard input format
predictions_standard = []
references = []

for example in eval_dataset:
    input_text = example["input_text"]
    inputs = tokenizer(input_text, return_tensors="pt", max_length=64, truncation=True)
    # Move inputs to device
    inputs = {key: val.to(device) for key, val in inputs.items()}
    outputs = final_model.generate(inputs["input_ids"], max_length=256, num_beams=4, early_stopping=True)
    generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
    predictions_standard.append(generated)
    references.append(example["target_text"])

# Compute ROUGE-L and BERTScore for standard input
rouge_results_standard = rouge.compute(predictions=predictions_standard, references=references)
bertscore_results_standard = bertscore.compute(predictions=predictions_standard, references=references, lang="en")
print("\nEvaluation with standard input format:")
print("ROUGE-L:", rouge_results_standard["rougeL"])
print("BERTScore (F1):", sum(bertscore_results_standard["f1"]) / len(bertscore_results_standard["f1"]))

# Experiment 2: Input format with keywords
predictions_keywords = []
for example in eval_dataset:
    problem_text = example["input_text"].replace("problem: ", "")
    input_text_with_keywords = f"problem: {problem_text} [climate change, sustainability]"
    inputs = tokenizer(input_text_with_keywords, return_tensors="pt", max_length=64, truncation=True)
    # Move inputs to device
    inputs = {key: val.to(device) for key, val in inputs.items()}
    outputs = final_model.generate(inputs["input_ids"], max_length=256, num_beams=4, early_stopping=True)
    generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
    predictions_keywords.append(generated)

# Compute ROUGE-L and BERTScore for input with keywords
rouge_results_keywords = rouge.compute(predictions=predictions_keywords, references=references)
bertscore_results_keywords = bertscore.compute(predictions=predictions_keywords, references=references, lang="en")
print("\nEvaluation with keywords in input format:")
print("ROUGE-L:", rouge_results_keywords["rougeL"])
print("BERTScore (F1):", sum(bertscore_results_keywords["f1"]) / len(bertscore_results_keywords["f1"]))

# Manual evaluation: Print a few examples
print("\nManual Evaluation (First 3 Examples):")
for i in range(min(3, len(eval_dataset))):
    print(f"\nProblem: {eval_dataset[i]['input_text']}")
    print(f"Generated Approach (Standard): {predictions_standard[i]}")
    print(f"Generated Approach (With Keywords): {predictions_keywords[i]}")
    print(f"Ground Truth: {references[i]}")

# Step 7: Critical Analysis Prompts (to be included in your report)
print("\nCritical Analysis Prompts for Your Report:")
print("1. Dataset Bias:")
print("   - Did the ClimateFever dataset overrepresent certain types of climate-related problems (e.g., carbon emissions) and underrepresent others (e.g., biodiversity)?")
print("   - How did the synthesized approaches impact the model’s outputs? Were they too generic due to the templating approach?")
print("2. Model Performance:")
print("   - How did the optimized hyperparameters improve performance compared to default settings? Compare ROUGE-L and BERTScore.")
print("   - Did the model generate feasible approaches, or were there vague/incorrect suggestions (e.g., impractical solutions)?")
print("   - Did adding keywords to the input improve the quality of generated approaches? Why or why not?")
print("3. Hyperparameter Optimization:")
print("   - What did you learn from the Optuna search? For example, did a smaller learning rate or more epochs lead to better performance?")
print("   - Were there any trade-offs (e.g., longer training time vs. better performance)?")
print("4. Ethical Issues:")
print("   - Could the model propagate misinformation if the synthesized approaches oversimplify complex environmental problems?")
print("   - What are the implications of using this system in real-world environmental research? How might incorrect approaches impact policy or action?")


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Loading ClimateFever dataset...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/8.09k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/869k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/1535 [00:00<?, ? examples/s]

Dataset created with 173 pairs. Saved to environmental_science_climatefever_dataset.json


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Map:   0%|          | 0/173 [00:00<?, ? examples/s]

[I 2025-04-25 20:11:58,271] A new study created in memory with name: no-name-f530e2e0-df2c-4b70-bcd3-1bdde70330d8


Starting hyperparameter optimization with Optuna...


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Epoch,Training Loss,Validation Loss
1,10.6942,9.957479
2,9.1953,8.060193
3,7.801,6.545442
4,6.7643,5.567036
5,6.0142,5.10867
6,5.6189,4.881193
7,5.3933,4.739897
8,5.2422,4.644691
9,5.0987,4.572938
10,5.0056,4.518826


Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

[I 2025-04-25 20:15:02,801] Trial 0 finished with value: 0.13341974843416515 and parameters: {'learning_rate': 1.0296197879114733e-05, 'batch_size': 16, 'num_train_epochs': 19}. Best is trial 0 with value: 0.13341974843416515.


Epoch,Training Loss,Validation Loss
1,5.1214,3.316718
2,3.4876,2.828506
3,3.0722,2.550618
4,2.8199,2.442021
5,2.6814,2.379753
6,2.5973,2.355408
7,2.5567,2.347092


[I 2025-04-25 20:17:09,932] Trial 1 finished with value: 0.40417226438679565 and parameters: {'learning_rate': 0.00045091774963493893, 'batch_size': 16, 'num_train_epochs': 7}. Best is trial 1 with value: 0.40417226438679565.


Epoch,Training Loss,Validation Loss
1,3.4507,2.697892
2,2.6769,2.327764
3,2.4648,2.220478
4,2.2017,2.175888
5,1.9966,2.154016
6,1.906,2.120077
7,1.8346,2.116065
8,1.8521,2.119507
9,1.7289,2.110378
10,1.7039,2.108434


[I 2025-04-25 20:21:49,914] Trial 2 finished with value: 0.4104890712459252 and parameters: {'learning_rate': 0.0005843867007897631, 'batch_size': 8, 'num_train_epochs': 20}. Best is trial 2 with value: 0.4104890712459252.


Epoch,Training Loss,Validation Loss
1,5.9199,4.773188
2,4.7817,4.261492
3,4.403,3.956102
4,4.1493,3.655403
5,3.8799,3.428658
6,3.7727,3.288791
7,3.668,3.182934
8,3.5804,3.104889
9,3.4617,3.034438
10,3.4159,2.973005


[I 2025-04-25 20:25:16,447] Trial 3 finished with value: 0.20692355013651687 and parameters: {'learning_rate': 3.177917418300164e-05, 'batch_size': 8, 'num_train_epochs': 19}. Best is trial 2 with value: 0.4104890712459252.


Epoch,Training Loss,Validation Loss
1,10.3199,8.929568
2,8.2245,6.764493
3,6.9392,5.780037
4,6.3128,5.359357
5,5.988,5.249296


[I 2025-04-25 20:26:24,481] Trial 4 finished with value: 0.13231250309002757 and parameters: {'learning_rate': 1.6144255251519255e-05, 'batch_size': 16, 'num_train_epochs': 5}. Best is trial 2 with value: 0.4104890712459252.


Epoch,Training Loss,Validation Loss
1,3.5856,2.827554
2,2.812,2.421681
3,2.6265,2.300497
4,2.3862,2.249918
5,2.1996,2.221619
6,2.1468,2.197672
7,2.1161,2.185748
8,2.1736,2.183975


[I 2025-04-25 20:28:31,539] Trial 5 finished with value: 0.5096571475992042 and parameters: {'learning_rate': 0.00046432050038711347, 'batch_size': 8, 'num_train_epochs': 8}. Best is trial 5 with value: 0.5096571475992042.


Epoch,Training Loss,Validation Loss
1,8.7017,5.440211
2,5.5018,4.610394
3,4.9465,4.370161
4,4.7339,4.236754
5,4.6415,4.192922


[I 2025-04-25 20:29:35,230] Trial 6 finished with value: 0.1462742339356864 and parameters: {'learning_rate': 4.525318448825062e-05, 'batch_size': 16, 'num_train_epochs': 5}. Best is trial 5 with value: 0.5096571475992042.


Epoch,Training Loss,Validation Loss
1,9.8255,7.43082
2,6.7434,5.116804
3,5.4471,4.661318
4,5.0156,4.46027
5,4.8269,4.300434
6,4.6554,4.17756
7,4.544,4.066038
8,4.4536,3.986819
9,4.373,3.921996
10,4.3354,3.870337


[I 2025-04-25 20:32:08,231] Trial 7 finished with value: 0.16184758535639807 and parameters: {'learning_rate': 2.3513016104095545e-05, 'batch_size': 16, 'num_train_epochs': 15}. Best is trial 5 with value: 0.5096571475992042.


Epoch,Training Loss,Validation Loss
1,4.74,3.965609
2,4.0399,3.358563
3,3.6479,3.084336
4,3.4549,2.887831
5,3.1617,2.750535
6,3.1229,2.652796
7,3.0952,2.582121
8,2.9341,2.5409
9,2.8094,2.509985
10,2.847,2.491944


[I 2025-04-25 20:34:48,571] Trial 8 finished with value: 0.2572272941639341 and parameters: {'learning_rate': 5.323917342740563e-05, 'batch_size': 4, 'num_train_epochs': 12}. Best is trial 5 with value: 0.5096571475992042.


Epoch,Training Loss,Validation Loss
1,5.1127,4.41665
2,4.5349,3.965695
3,4.2018,3.627677
4,3.9995,3.454052
5,3.8536,3.387148
6,3.8333,3.366237


[I 2025-04-25 20:36:08,240] Trial 9 finished with value: 0.20095097872071832 and parameters: {'learning_rate': 5.527510250789645e-05, 'batch_size': 8, 'num_train_epochs': 6}. Best is trial 5 with value: 0.5096571475992042.


Best trial:
  ROUGE-L: 0.5096571475992042
  Best hyperparameters:  {'learning_rate': 0.00046432050038711347, 'batch_size': 8, 'num_train_epochs': 8}




Training final model with best hyperparameters...


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mpsg5179[0m ([33mpsg5179-penn-state[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss
1,3.5856,2.827554
2,2.812,2.421681
3,2.6265,2.300497
4,2.3862,2.249918
5,2.1996,2.221619
6,2.1468,2.197672
7,2.1161,2.185748
8,2.1736,2.183975


Final model training complete and saved to ./t5_env_science_final_model


Downloading builder script:   0%|          | 0.00/7.95k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



Evaluation with standard input format:
ROUGE-L: 0.5096571475992042
BERTScore (F1): 0.9024326569504209

Evaluation with keywords in input format:
ROUGE-L: 0.5187540676487195
BERTScore (F1): 0.9018336666954888

Manual Evaluation (First 3 Examples):

Problem: problem: The latest NOAA report is “a reminder that climate change has not, despite the insistence of climate contrarians ‘paused’ or even slowed down,” Mann said..
Generated Approach (Standard): To address the latest NOAA report is “a reminder that climate change has not, despite the insistence of climate contrarians ‘paused’ or even slowed down,” Mann said., the following approach can be implemented: 1. Develop policies to protect ecosystems, such as establishing protected areas and regulating resource extraction. 2. Promote sustainable practices among communities through education and incentives. 3. Invest in research to better understand the issue and develop innovative solutions. 4. Foster international cooperation to address g