## Human Readable Surf-Reports

For my specific task, making a human readable surf-report; similar to those made by human surf observers from data freely available from the National Weather Service, I have chosen to implement a Chain of Thought approach to model prompting.

Chain of Thought(CoT) prompting was first introduced by [Wei et al](https://arxiv.org/pdf/2201.11903) in 2023. They put forth a novel prompting approach "to tackle complex arithmetic, commonsense, and symbolic reasoning tasks." It tries to solve these problems by mimicking how humans might approach problem solving, in logical steps where each step builds upon the last. CoT prompting works by providing demonstrations to the model as examples of the correct "thought" process. The model then learns to mimic these processes and, as a result, does a much better job on reasoning tasks.

Since my specific tasks requires the model to take observations on the surf report such as wind and sea states, and turn this into a human readable surf report, we can reach a good approximation by providing examples of the correct reasoning approach. We can do this by following some guides that have been published such as this [blog](https://huggingface.co/blog/samihalawa/chain-of-thoughts-guide) post from huggingface as well as the [openai](https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api#h_5738ec3d85) guide to prompot engineering, both of which lay out strategies for CoT prompt generation.

## Get The Data

Now we have to get our data, I have written two functions below to aquire the data from internet. The data needed for the weather observations comes from Iowa State University's [Enviornmental Mesonet](https://mesonet.agron.iastate.edu/wx/afos/list.phtml?by=cccc&source=AKQ&pil=CWF&year=2025&month=9&day=15&drange=yes&year2=2025&month2=9&day2=24&view=grid&order=asc). This data can be accessed for free using an api. For the data that I need on examples of human surf-reports I used an archive of surf-reports from a surf-shop located on North Carolina's Wrightsville Beach, [Sweetwater Surf Shop](https://wblivesurf.com/reports/). I wrote code to scrape their website to get the text of the report, twice a day, every day for a year. Before doing so I checked the website's robots.txt page to see if the author's had any specific banned uses on scraping. The page's authors had no such qualms posted.


In [2]:
import requests
import re
import json
import pandas as pd
from datetime import datetime, timedelta
from concurrent.futures import ThreadPoolExecutor, as_completed # Import for parallelization

In [2]:


def parse_forecast(text, issuance_date):
    """
    Parses a raw NWS Coastal Waters Forecast into a structured list of dictionaries.
    Attempts to determine the date of each forecast period relative to the issuance date.
    """

    structured_forecast = []
    sections = text.split('$$')[1:]

    # Keep track of the current date as we parse through periods, starting with the issuance date
    current_forecast_date = issuance_date.date()

    for section in sections:
        section = section.strip()
        if not section:
            continue

        zone_data = {
            "zone_id": None,
            "zone_name": None,
            "advisory": None,
            "forecasts": []
        }

        zone_id_match = re.search(r'^(AMZ\d{3})-', section)
        if zone_id_match:
            zone_data["zone_id"] = zone_id_match.group(1)

        zone_name_match = re.search(r'Coastal waters from (.*?)-', section, re.DOTALL)
        if zone_name_match:
            zone_data["zone_name"] = "Coastal waters from " + zone_name_match.group(1).replace('\n', ' ').strip()

        advisory_match = re.search(r'\.\.\.(.*?)\.\.\.', section)
        if advisory_match:
            zone_data["advisory"] = advisory_match.group(1).strip()

        time_periods = re.split(r'\n\.(?=[A-Z])', section)

        for period_text in time_periods[1:]:
            period_text = period_text.replace('\n', ' ').strip()
            if not period_text:
                continue

            period_forecast = {}

            period_name_match = re.match(r'([A-Z\s]+)\.\.\.', period_text)
            if not period_name_match:
                continue

            raw_period = period_name_match.group(1).strip()
            period_forecast['raw_period'] = raw_period # Keep the raw period for reference

            # Attempt to determine the date of the forecast period
            # This is a heuristic based on the day name in the period string and the issuance date
            try:
                day_match = re.match(r'([A-Z]+)', raw_period)
                if day_match:
                    day_abbr = day_match.group(1).upper()
                    # Map day abbreviation to a weekday number (Monday is 0, Sunday is 6)
                    day_map = {"MON": 0, "TUE": 1, "WED": 2, "THU": 3, "FRI": 4, "SAT": 5, "SUN": 6}
                    if day_abbr in day_map:
                        target_weekday = day_map[day_abbr]
                        current_weekday = current_forecast_date.weekday()
                        # Calculate the difference in days, handling week rollovers
                        day_diff = (target_weekday - current_weekday + 7) % 7

                        # If the calculated day is before the current forecast date, it's likely in the next week
                        # This is a simplification and might need refinement for edge cases
                        if current_forecast_date + timedelta(days=day_diff) < current_forecast_date and day_diff != 0:
                            day_diff += 7

                        current_forecast_date += timedelta(days=day_diff)
                        period_forecast['date'] = current_forecast_date.strftime('%Y-%m-%d')
                    else:
                         period_forecast['date'] = issuance_date.strftime('%Y-%m-%d') # Default to issuance date if day name not recognized
                else:
                    period_forecast['date'] = issuance_date.strftime('%Y-%m-%d') # Default to issuance date if no day name


            except Exception as e:
                print(f"Could not determine date from period '{raw_period}': {e}")
                period_forecast['date'] = issuance_date.strftime('%Y-%m-%d') # Default to issuance date on error


            # Determine the type (Morning/Night) based on keywords in the period string
            if "NIGHT" in raw_period.upper():
                period_forecast['type'] = "NIGHT"
            else:
                period_forecast['type'] = "DAY"


            wind_match = re.search(r'\.\.\.(.*?)\. Seas', period_text)
            if wind_match:
                period_forecast['wind'] = wind_match.group(1).strip()

            seas_match = re.search(r'Seas ([\d\s\wto-]+ft)', period_text)
            if seas_match:
                period_forecast['seas'] = seas_match.group(1).strip()

            wave_match = re.search(r'Wave Detail: (.*?)\.', period_text)
            if wave_match:
                period_forecast['wave_detail'] = wave_match.group(1).strip() + '.'
            else:
                 period_forecast['wave_detail'] = 'Not available.'


            # Combine relevant forecast details into a single string for the 'forecast' column
            period_forecast['forecast'] = f"Wind: {period_forecast.get('wind', 'N/A')}, Seas: {period_forecast.get('seas', 'N/A')}, Wave Detail: {period_forecast.get('wave_detail', 'N/A')}"


            zone_data["forecasts"].append(period_forecast)

        structured_forecast.append(zone_data)

    return structured_forecast


def fetch_and_parse_single_day(cccc: str, pil: str, date: datetime, target_zone_id: str = None):
    """
    Fetches, parses, and filters NWS Coastal Waters Forecast data for a single day.
    """
    product_id = {
        "cccc": cccc,
        "pil": pil,
        "date": date.strftime('%Y-%m-%d')
    }

    api_list_url = f"https://mesonet.agron.iastate.edu/api/1/nws/afos/list.json?"

    try:
        response = requests.get(api_list_url, params={**product_id, "limit": 10})
        response.raise_for_status()
        data = response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error accessing API list for {date.strftime('%Y-%m-%d')}: {e}")
        return []
    except ValueError as e:
        print(f"Error decoding JSON response from API list for {date.strftime('%Y-%m-%d')}: {e}")
        return []

    if not data or 'data' not in data or not data['data']:
        print(f"No data found for {date.strftime('%Y-%m-%d')}.")
        return []

    relevant_product_ids = [
        item for item in data['data']
        if datetime.strptime(item['entered'], '%Y-%m-%dT%H:%M:%SZ').date() == date.date()
    ]

    if not relevant_product_ids:
        print(f"No relevant products found for {date.strftime('%Y-%m-%d')}.")
        return []

    latest_relevant_product_id = relevant_product_ids[0]["product_id"]

    text_api_url = f"https://mesonet.agron.iastate.edu/api/1/nwstext/{latest_relevant_product_id}"

    try:
        response = requests.get(text_api_url)
        response.raise_for_status()
        raw_text = response.text
    except requests.exceptions.RequestException as e:
        print(f"Error accessing text API for {latest_relevant_product_id} on {date.strftime('%Y-%m-%d')}: {e}")
        return []

    issuance_date_str = latest_relevant_product_id.split('-')[0]
    issuance_date = datetime.strptime(issuance_date_str, '%Y%m%d%H%M')

    all_parsed_data = parse_forecast(raw_text, issuance_date)

    flattened_data = []
    for zone_data in all_parsed_data:
        if target_zone_id and zone_data.get("zone_id") != target_zone_id:
            continue

        for forecast in zone_data.get("forecasts", []):
            try:
                period_date_str = forecast.get("date")
                if period_date_str:
                    period_date = datetime.strptime(period_date_str, '%Y-%m-%d').date()
                    if period_date == date.date():
                         flattened_data.append({
                             "date": forecast.get("date"),
                             "type": forecast.get("type"),
                             "forecast": forecast.get("forecast")
                         })
            except Exception as e:
                print(f"Could not process forecast period for filtering: {forecast.get('raw_period', '')} - {e}")
                pass

    return flattened_data


def get_and_parse_forecast_for_day(cccc: str, pil: str, start_date: datetime, end_date: datetime, target_zone_id: str = None, max_workers: int = 5):
    """
    Fetches, parses, and filters NWS Coastal Waters Forecast data for a specific date range and optional zone ID,
    using parallel processing for improved performance.

    Args:
        cccc: The CCCC identifier (e.g., "KILM").
        pil: The PIL identifier (e.g., "CWFILM").
        start_date: The start date of the range (inclusive).
        end_date: The end date of the range (inclusive).
        target_zone_id: Optional. The specific zone ID to filter the results by (e.g., "AMZ250").
        max_workers: The maximum number of threads to use for parallel processing.

    Returns:
        A pandas DataFrame containing the parsed and filtered forecast data with
        columns for 'date', 'type', and 'forecast', or an empty DataFrame
        if data cannot be fetched or no forecasts are found within the range/zone.
    """
    all_flattened_data = []
    date_list = [start_date + timedelta(days=x) for x in range((end_date - start_date).days + 1)]

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_date = {executor.submit(fetch_and_parse_single_day, cccc, pil, date, target_zone_id): date for date in date_list}

        for future in as_completed(future_to_date):
            date = future_to_date[future]
            try:
                daily_data = future.result()
                all_flattened_data.extend(daily_data)
            except Exception as exc:
                print(f'{date} generated an exception: {exc}')

    df_forecasts = pd.DataFrame(all_flattened_data)

    return df_forecasts

In [3]:
NWS_data = get_and_parse_forecast_for_day(cccc="KILM",pil="cwf",start_date=datetime(2024,9,30),end_date=datetime(2025,10,4),target_zone_id="AMZ250")

In [4]:
NWS_data.head()

Unnamed: 0,date,type,forecast
0,2024-10-01,NIGHT,"Wind: SW winds 10 to 15 kt, becoming W 5 to 10..."
1,2024-10-01,DAY,"Wind: W winds 5 to 10 kt, Seas: 2 ft, Wave Det..."
2,2024-10-01,NIGHT,"Wind: NW winds 10 to 15 kt, Seas: 2 to 3 ft, W..."
3,2024-10-03,NIGHT,"Wind: NE winds 5 to 10 kt, Seas: 2 to 3 ft, Wa..."
4,2024-10-03,DAY,"Wind: NE winds 10 to 15 kt, Seas: 3 to 4 ft, W..."


In [None]:
NWS_data.describe()

Unnamed: 0,date,type,forecast
count,1105,1105,1105
unique,370,2,1055
top,2025-10-04,NIGHT,Wind: NE winds 15 to 20 kt with gusts up to 25...
freq,3,736,4


In [5]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from datetime import datetime, timedelta
from concurrent.futures import ThreadPoolExecutor, as_completed

In [None]:


def get_single_day_surf_report(date):
    """
    Scrapes surf reports from wblivesurf.com for a single day,
    filtering for "Sunrise Report" and "Afternoon Update", and returns
    the results as a list of dictionaries.

    Args:
        date (datetime): The date to scrape.

    Returns:
        A list of dictionaries containing the extracted surf report data for the day.
    """
    daily_reports = []
    surf_report_url = f"https://wblivesurf.com/reports/?startdate={date.strftime('%Y-%m-%d')}&enddate={date.strftime('%Y-%m-%d')}&swellrating=&author="

    try:
        r = requests.get(surf_report_url)
        r.raise_for_status()  # Raise an HTTPError for bad responses
        soup = BeautifulSoup(r.text, 'html.parser')
    except requests.exceptions.RequestException as e:
        print(f"Error fetching the main report page for {date.strftime('%Y-%m-%d')}: {e}")
        return daily_reports

    blogflex_container = soup.find('div', id='blogFlex')

    if blogflex_container:
        reports = blogflex_container.find_all('article')
        desired_reports = {"Sunrise Report", "Afternoon Update"}

        for report in reports:
            link_tag = report.find('h3').find('a')
            report_title = link_tag.text.strip() if link_tag else None

            if link_tag and 'href' in link_tag.attrs:
                full_report_url = link_tag['href']

                if report_title in desired_reports:
                    try:
                        response = requests.get(full_report_url)
                        response.raise_for_status()

                        full_report_soup = BeautifulSoup(response.text, 'html.parser')
                        meta_tag = full_report_soup.find('meta', property='og:description')

                        if meta_tag and 'content' in meta_tag.attrs:
                            full_text = meta_tag['content']
                            report_type = "DAY" if report_title == "Sunrise Report" else "NIGHT"

                            daily_reports.append({
                                "date": date.strftime('%Y-%m-%d'),
                                "type": report_type,
                                "forecast": full_text
                            })
                        else:
                            print(f"Could not find the 'og:description' meta tag for {report_title} on {date.strftime('%Y-%m-%d')}.")
                    except requests.exceptions.RequestException as e:
                        print(f"Error fetching the full report page for {report_title} on {date.strftime('%Y-%m-%d')}: {e}")
    else:
        print(f"The container with id='blogflex' was not found for {date.strftime('%Y-%m-%d')}.")

    return daily_reports


def get_daily_surf_reports(start_date, end_date, max_workers=5):
    """
    Scrapes surf reports from wblivesurf.com for a given date range,
    filtering for "Sunrise Report" and "Afternoon Update", and returns
    the results as a pandas DataFrame, using parallel processing.

    Args:
        start_date (datetime): The start date of the range (inclusive).
        end_date (datetime): The end date of the range (inclusive).
        max_workers: The maximum number of threads to use for parallel processing.

    Returns:
        A pandas DataFrame containing the extracted surf report data with
        columns for 'date', 'type', and 'forecast'.
    """
    all_reports = []
    date_list = [start_date + timedelta(days=x) for x in range((end_date - start_date).days + 1)]

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_date = {executor.submit(get_single_day_surf_report, date): date for date in date_list}

        for future in as_completed(future_to_date):
            date = future_to_date[future]
            try:
                daily_data = future.result()
                all_reports.extend(daily_data)
            except Exception as exc:
                print(f'{date} generated an exception: {exc}')

    df_reports = pd.DataFrame(all_reports)

    return df_reports

I wrote the code for these files by first boilerplating out the specifics of each task. That is I was able to get one report from the api, and one report from the website. From there I had a back and forth with google-gemini to make a more robust function to extend to multiple-days, add in some exception handling, and finally to parallelize as it was taking a lot of time.

In [7]:
surf_reports = get_daily_surf_reports(datetime(2024,9,30),datetime(2025,10,4))

In [None]:
surf_reports.to_csv('surf_reports.csv', index=False)

In [8]:
surf_reports.head()

Unnamed: 0,date,type,forecast
0,2024-10-02,DAY,Good Morning! We are approaching High tide at ...
1,2024-09-30,NIGHT,Good afternoon guys! It looks like there isn’t...
2,2024-09-30,DAY,Good Morning! The buoy is 2.4 @6 seconds and t...
3,2024-10-04,NIGHT,Hey everyone! There is still a small wave out ...
4,2024-10-04,DAY,Good Morning! there is breezy north wind right...


In [9]:
surf_reports.describe()

Unnamed: 0,date,type,forecast
count,632,632,632
unique,366,2,632
top,2025-10-03,DAY,"Good morning y’all, it’s a similar story out t..."
freq,2,364,1


In [10]:

nws_df_renamed = NWS_data.rename(columns={'forecast': 'nws_forecast'})
surf_df_renamed = surf_reports.rename(columns={'forecast': 'human_forecast'})


merged_forecasts = pd.merge(nws_df_renamed, surf_df_renamed, on=['date', 'type'], how='inner')


merged_forecasts = merged_forecasts[['date', 'type', 'nws_forecast', 'human_forecast']]



In [11]:
merged_forecasts.head()

Unnamed: 0,date,type,nws_forecast,human_forecast
0,2024-10-01,NIGHT,"Wind: SW winds 10 to 15 kt, becoming W 5 to 10...",Good afternoon! There is not a lot going on ou...
1,2024-10-01,DAY,"Wind: W winds 5 to 10 kt, Seas: 2 ft, Wave Det...",Good Morning everyone! The waves are down a no...
2,2024-10-01,NIGHT,"Wind: NW winds 10 to 15 kt, Seas: 2 to 3 ft, W...",Good afternoon! There is not a lot going on ou...
3,2024-10-03,NIGHT,"Wind: NE winds 5 to 10 kt, Seas: 2 to 3 ft, Wa...",Hey everyone! It’s looking pretty similar to h...
4,2024-10-03,DAY,"Wind: NE winds 10 to 15 kt, Seas: 3 to 4 ft, W...",Good Morning! There is some Lully mid to long ...


In [12]:
merged_forecasts.describe()

Unnamed: 0,date,type,nws_forecast,human_forecast
count,895,895,895,895
unique,366,2,864,631
top,2025-10-04,NIGHT,"Wind: S winds 5 to 10 kt, Seas: 3 to 4 ft, Wav...",Hey everyone looks pretty fun out there right ...
freq,3,532,3,2


In [13]:
merged_forecasts.to_csv("training_data.csv", index=False)

I saved the data to a csv file so that I could access the data without having to re-run the functions multiple times.

## Setup the Problem Set

In [2]:
all_data = pd.read_csv("training_data(1).csv")

In [3]:
all_data.head()

Unnamed: 0,date,type,nws_forecast,human_forecast
0,2024-10-01,NIGHT,"Wind: SW winds 10 to 15 kt, becoming W 5 to 10...",Good afternoon! There is not a lot going on ou...
1,2024-10-01,DAY,"Wind: W winds 5 to 10 kt, Seas: 2 ft, Wave Det...",Good Morning everyone! The waves are down a no...
2,2024-10-01,NIGHT,"Wind: NW winds 10 to 15 kt, Seas: 2 to 3 ft, W...",Good afternoon! There is not a lot going on ou...
3,2024-10-03,NIGHT,"Wind: NE winds 5 to 10 kt, Seas: 2 to 3 ft, Wa...",Hey everyone! It’s looking pretty similar to h...
4,2024-10-03,DAY,"Wind: NE winds 10 to 15 kt, Seas: 3 to 4 ft, W...",Good Morning! There is some Lully mid to long ...


In [4]:
training_data = all_data[["nws_forecast","human_forecast"]]

In [5]:
training_data.rename(columns={"nws_forecast":"prompt_text", "human_forecast":"Response"},inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data.rename(columns={"nws_forecast":"prompt_text", "human_forecast":"Response"},inplace=True)


In [6]:
import os
from datasets import load_dataset
import pandas as pd
import numpy as np
os.environ['HF_HOME'] = '/scratch/ezq9qu/models/cache'
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

In [7]:
instruction_text = """
Output a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:
"""

training_data["Instruct"] = "Q: " + instruction_text + training_data["prompt_text"]+" Let's think step by step\nA: "

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data["Instruct"] = "Q: " + instruction_text + training_data["prompt_text"]+" Let's think step by step\nA: "


In [8]:
training_data.head()

Unnamed: 0,prompt_text,Response,Instruct
0,"Wind: SW winds 10 to 15 kt, becoming W 5 to 10...",Good afternoon! There is not a lot going on ou...,Q: \nOutput a human-readable surf-forecast sim...
1,"Wind: W winds 5 to 10 kt, Seas: 2 ft, Wave Det...",Good Morning everyone! The waves are down a no...,Q: \nOutput a human-readable surf-forecast sim...
2,"Wind: NW winds 10 to 15 kt, Seas: 2 to 3 ft, W...",Good afternoon! There is not a lot going on ou...,Q: \nOutput a human-readable surf-forecast sim...
3,"Wind: NE winds 5 to 10 kt, Seas: 2 to 3 ft, Wa...",Hey everyone! It’s looking pretty similar to h...,Q: \nOutput a human-readable surf-forecast sim...
4,"Wind: NE winds 10 to 15 kt, Seas: 3 to 4 ft, W...",Good Morning! There is some Lully mid to long ...,Q: \nOutput a human-readable surf-forecast sim...


In [9]:
training_data["Instruct"].loc[0]

"Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "

In [10]:
training_data["Response"].loc[0]

'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]'

In [11]:
training_data["Instruct"].loc[126]

"Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: W winds 20 to 25 kt with gusts up to 30 kt, Seas: 3 to 5 ft, Wave Detail: W 5 ft at 5 seconds and NE 1 ft at 10 seconds. Let's think step by step\nA: "

In [12]:
training_data["Response"].loc[126]

'Hey everyone! There’s really not much going on out back this afternoon. Any wave that is coming through is breaking pretty close to the shore. I think a bigger board will work if you are really trying to get your hair wet today. The conditions are clean with the wind blowing 15mph WNW. The tide […]'

This is pretty good! We will see how the models perform with this prompting approach.

## Model Choice

We need to choose a few models to evaluate. We need to look at models that do well with text-to-text tasking, the model should have fairly good reasoning as well. We will need to balance the number of parameters needed so as not to overload the servers. I chose to look use the following models: 
* google/gemma-7b-it
* qwen/qwen3-4b-instruct-2507
* meta-llam/llama-3.1-8b-instruct

The google model was chosen as it is an instruction tuned model, and performs well on text-generation tasks. It has a reasonable number of parameters, 7 billion. This is the sweet spot of parameters, we shall see if the model does well at the task, with a "smaller" number of parameters. Also, the model supports the transformers library. It performed 64.3 on the MMLU benchmark as well as 81.2 at the HellaSwag benchmark. Meaning it does fairly well at reasoning.

The qwen model was chosen as it is an instruction tuned model, that also performs well on text-generation tasks. It has fewer parameters than the google model, I am interested to see how it perfroms with a small number of parameters. If this model performs well enough it might give an insight on the true number of parameters needed. It performed 69.6 on the MMLU benchmark, this is more than the google but performs worse on general reasoning benchmarks.

The meta model is similar to the  that it is an instruction tuned model from a large company, however this model supposedly excels at this speific task of text generation, howevcer it is even larger, at 8b parameters. If this one performs the best then we can see the extra billion parameters are worth the computing hassel. This model had a score of 68.5 on the MMLU dataset. Benchmark and outperformed the google model in the reasoning tasks.

In [13]:
tokenizer_google = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model_google = AutoModelForCausalLM.from_pretrained(
    "google/gemma-7b-it",
    torch_dtype=torch.bfloat16
)

`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [14]:
google_pipe = pipeline(
    "text-generation",
    model = model_google,
    torch_dtype=torch.bfloat16, 
    device_map="auto", 
    tokenizer = tokenizer_google, 
    max_new_tokens = 250,
    do_sample = False,
)

`torch_dtype` is deprecated! Use `dtype` instead!
Device set to use cuda:0
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [15]:
text = google_pipe(f"{training_data.iloc[1]['Instruct']}")
print(text[0]['generated_text'])

Q: 
Output a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:
Wind: W winds 5 to 10 kt, Seas: 2 ft, Wave Detail: E 2 ft at 10 seconds and W 1 ft at 4 seconds. Let's think step by step
A: 
The wind is blowing west, ranging from 5 to 10 knots. This is onshore wind, which means it's blowing towards the shore.
B: 
The waves are breaking on both the east and west sides of the beach. The east side is producing waves that are 2 feet high and have a period of 10 seconds. The west side is producing waves that are 1 foot high and have a period of 4 seconds.
C: 
So, overall, the conditions are looking good for surfing on the east side of the beach today. The waves are small but the period is long, which means that the waves will be slow and easy to ride.

**Answer:**

"The wind is pumping we

In [16]:
text = google_pipe(f"{training_data.iloc[2]['Instruct']}")
print(text[0]['generated_text'])

Q: 
Output a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:
Wind: NW winds 10 to 15 kt, Seas: 2 to 3 ft, Wave Detail: E 2 ft at 10 seconds and NW 1 ft at 4 seconds. Let's think step by step
A: 
The wind is blowing northwest, 10 to 15 knots, which is onshore and not ideal for surfing.
B: 
The waves are small, 2 to 3 feet, and the water is smooth as glass.
C: 
The wave period is long, 10 seconds, which means the waves are slow and rolling.
D: 
The wave detail shows there are two types of waves, E and NW. The E waves are 2 feet and have a period of 10 seconds, which are perfect for longboard surfing. The NW waves are 1 foot and have a period of 4 seconds, which are more suited for beginner surfers.

**Final Output:**

The wind is blowing onshore, but the waves are small and smooth 

The model did a fairly good job at interpreting the onshore wave and its use in surfing. It also knew that small waves are better for longboards, especially those with a long period. However by adding the surfer lingo section it betrays some of my instructions.

In [15]:

tokenizer_qwen = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
model_qwen = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Instruct-2507",
    torch_dtype="auto",
    device_map="auto"
)

`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [16]:
qwen_pipe = pipeline(
    "text-generation",
    model = model_qwen,
    torch_dtype=torch.bfloat16, 
    device_map="auto", 
    tokenizer = tokenizer_qwen, 
    max_new_tokens = 400,
    do_sample = False
)

Device set to use cuda:0
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [19]:
text = qwen_pipe(f"{training_data.iloc[2]['Instruct']}")
print(text[0]['generated_text'])

`generation_config` default values have been modified to match model-specific defaults: {'do_sample': True}. If this is not desired, please set these values explicitly.


Q: 
Output a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:
Wind: NW winds 10 to 15 kt, Seas: 2 to 3 ft, Wave Detail: E 2 ft at 10 seconds and NW 1 ft at 4 seconds. Let's think step by step
A: 1. First, assess the wind direction and strength: NW winds 10–15 kt are moderate and offshore-ish, which is generally good for surf—offshore winds help clean up the waves and reduce chop.  
2. Look at sea-state: 2 to 3 ft is a solid swell, not too big, not too small—ideal for a variety of surfers.  
3. Analyze wave detail: E 2 ft at 10 seconds is a long, clean, and powerful swell—perfect for experienced surfers looking for cutbacks or powerful rides. The NW 1 ft at 4 seconds is shorter and more choppy, likely a spillover or secondary swell that adds a bit of chaos.  
4. Combine the element

In [20]:
text = qwen_pipe(f"{training_data.iloc[1]['Instruct']}")
print(text[0]['generated_text'])

Q: 
Output a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:
Wind: W winds 5 to 10 kt, Seas: 2 ft, Wave Detail: E 2 ft at 10 seconds and W 1 ft at 4 seconds. Let's think step by step
A: 1. First, assess the wind conditions: W winds 5–10 kt — that's light to moderate, not a killer, but could cause some chop if the swell is breaking. Not a bad wind for a session, especially if it's offshore.  
2. Look at the sea state: 2 ft overall — that's a solid, manageable swell, not huge, not tiny. Good for beginners or intermediate surfers.  
3. Analyze the wave detail: E 2 ft at 10 seconds — long, clean, and powerful. That's a good ride, especially for a consistent, powerful wave. W 1 ft at 4 seconds — short, choppy, and less ideal for long rides.  
4. Combine it: The east swell is the real 

This also does a good job at reading the wind conditions. It also betrays some of my instruction tuning.

In [17]:
model_llama = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
llama_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [18]:
llama_pipe = pipeline(
    "text-generation",
    model = model_qwen,
    torch_dtype=torch.bfloat16, 
    device_map="auto", 
    tokenizer = tokenizer_qwen, 
    max_new_tokens = 400,
    do_sample = False
)


Device set to use cuda:0


In [23]:
text = llama_pipe(f"{training_data.iloc[2]['Instruct']}")
print(text[0]['generated_text'])

Q: 
Output a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:
Wind: NW winds 10 to 15 kt, Seas: 2 to 3 ft, Wave Detail: E 2 ft at 10 seconds and NW 1 ft at 4 seconds. Let's think step by step
A: 1. First, assess the wind direction and strength. NW winds 10 to 15 kt — that’s a solid, consistent breeze, not too strong, not too weak. It’s a good wind for catching the waves, especially if it’s helping to shape the swell. 2. Look at the sea state: 2 to 3 ft — that’s a moderate swell, not too big, not too small. Good for intermediate surfers. 3. Analyze the wave detail: E 2 ft at 10 seconds and NW 1 ft at 4 seconds. The east swell is long-period (10 seconds), which means it’s clean, well-formed, and has good ride quality. The NW swell is short and choppy (4 seconds), likely breaking up 

In [24]:
text = llama_pipe(f"{training_data.iloc[1]['Instruct']}")
print(text[0]['generated_text'])

Q: 
Output a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:
Wind: W winds 5 to 10 kt, Seas: 2 ft, Wave Detail: E 2 ft at 10 seconds and W 1 ft at 4 seconds. Let's think step by step
A: 1. First, assess the wind direction and strength: W winds 5–10 kt are light to moderate, which is generally good for surf—especially if they're offshore, which they appear to be. This helps keep the waves clean and reduces chop.  
2. Evaluate the sea state: 2 ft is a modest swell, not huge, but still manageable. The wave detail shows two distinct swells: a stronger, longer E swell at 2 ft with a 10-second period, and a weaker, shorter W swell at 1 ft with a 4-second period. The 10-second period is ideal—it means the waves are long and smooth, perfect for riding.  
3. Consider the surfing implicati

Here the output is clean and concise, it lacks some of the "feel" that you get from the others.

**Few Shot**

In [19]:
def few_shot(pipe, train, test, shots = 3, test_index = 0):
    messages = []
    for i in range(shots):
        question = train.iloc[i]['Instruct']
        answer = train.iloc[i]['Response']+"\n\n"
        messages.append({"role": "user", "content": question})
        messages.append({"role": "assistant", "content": answer})

    final_question = test.iloc[test_index]['Instruct']
    messages.append({"role": "user", "content": final_question})
    text = pipe(messages)
    print(text[0]['generated_text'])
    return text[0]["generated_text"][-1]['content']



In [26]:
few_shot(google_pipe, training_data, training_data, shots = 3, test_index =1)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

'Good Morning everyone! The waves are down a notch from yesterday and breaking in the ankle to knee high range. The buoy is 1.6 @ 5 seconds so there is not much energy out there. The wind is Light WSW grooming up the waves nicely but unfortunately they are barely surf able. Technically as the tide is dropping the waves are going to get a little better so keep an eye on the surf this afternoon.'

In [27]:
few_shot(google_pipe, training_data, training_data, shots = 3, test_index =2)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

'Good morning, my friend. The waves are breaking a little better this morning with the tide coming up and the wind blowing offshore. The waves are small, but there is enough energy to get you out there and have a good time. The wind is blowing 11.4 mph NW, and the waves are breaking in the knee to chest high range. The wave period is a little long, so it is not ideal for beginners, but it is perfect for experienced surfers. So, if you are looking to get out and surf this morning, I would recommend heading to the beach and catching a wave.'

This few shot example we can see that it looks better than the on the zero, shot, it already looks much better and more similar to a observer forecast

In [28]:
few_shot(qwen_pipe, training_data, training_data, shots = 3, test_index =1)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

"Alright, let’s break it down — the swell’s coming in from the east at 2 feet, 10 seconds long, which is clean, long, and just enough to carve a little shape. The wind’s light from the west, 5 to 10 knots, helping to groom the face and keep things a bit more organized. That W 1-foot swell at 4 seconds is a ghost, not much to work with. If you're looking for a ride, the east-facing breaks will give you a soft, slow roll — not a cut, but good for a practice set or a chill session. Keep your board ready, the window’s open but not wide."

In [29]:
few_shot(qwen_pipe, training_data, training_data, shots = 3, test_index =2)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

"Alright, let’s break this down—NW winds 10 to 15 kt are pushing hard, which means the swells are being chopped and squeezed, not exactly ideal for clean rides. Seas are holding steady at 2 to 3 feet, but the wave detail tells a story: that E swell at 2 ft with a 10-second period is the real deal—long, rolling, and capable of a solid cut. The NW 1 ft swell at 4 seconds? That’s just a ghost wave, barely breaking. So, the main play’s on the east side—look for a few well-formed, open-water rides, but expect a lot of chop and wind-affected faces. Not a full session, but if you're after a few clean sets, go east when the wind eases."

Interestingly the model adds a non-existent reef here. It does a better job with the ouptut however.

In [30]:
few_shot(llama_pipe, training_data, training_data, shots = 3, test_index =1)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

'Alright, check the lineup—light west winds at 5 to 10 kts are grooming the east swell nicely, but the wave window’s a bit tight. We’ve got a solid 2-foot east swell at 10 seconds—long, clean, and a little lazy, perfect for a patient ride or a soft wipeout. The west swell at 4 seconds is short and weak, just a ghost in the lineup. If you’re after glassy, consistent sets, wait for the tide to pull back and the wind to drop—this one’s more for the seasoned watcher than the hungry surfer.'

In [31]:
few_shot(llama_pipe, training_data, training_data, shots = 3, test_index =2)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

"Alright, check the lineup—this one’s a *breeze* but not a swell. NW winds 10 to 15 kt are pushing hard, churning up the surface and keeping the waves tight and glassy. Seas are holding at 2 to 3 ft, with the E swell at 2 ft and 10 seconds—long, clean, and a little hollow—perfect for a ride if you’re after a ride. The NW swell at 1 ft and 4 seconds is more of a ghost, not much to work with. If you're looking for powder, stay on the east side. Otherwise, keep your board in the water and watch the wind shift."

Here we also see the phantom reef. This does a good job of letting the user know about how conditions may improve in the future.

In [20]:
few_shot(google_pipe, training_data, training_data, shots = 8, test_index =0)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

'Good afternoon everyone! The waves are down a notch from yesterday and breaking in the ankle to knee high range. The buoy is 1.6 @ 5 seconds so there is not much energy out there. The wind is Light WSW grooming up the waves nicely but unfortunately they are barely surf able. Technically as the tide is coming up and the wind is changing direction the waves are going to start to get a little better later on in the day.'

In [26]:
few_shot(google_pipe, training_data, training_data, shots = 8, test_index =1)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

'Good morning everyone! The waves are down a notch from yesterday and breaking in the ankle to knee high range. The buoy is 1.6 @ 5 seconds so there is not much energy out there. The wind is Light WSW grooming up the waves nicely but unfortunately they are barely surf able. Technically as the tide is low and the wind is light it is not the best time to surf but if you are out there and the waves are breaking in your favor it can be a good time to get a few waves in.'

The google pipeline has created a fairly good final answer, however it seems that it outputs the same answer no matter what the input.

In [22]:
few_shot(qwen_pipe, training_data, training_data, shots = 8, test_index =1)

`generation_config` default values have been modified to match model-specific defaults: {'do_sample': True}. If this is not desired, please set these values explicitly.


[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

'Winds are light to moderate from the west, just enough to keep the water a little choppy but not enough to really groom the face. The swell is weak—2 ft with a long, lazy period of 10 seconds from the east, and a short, dead 1 ft wave at 4 seconds from the west. That E swell is the only one with any real shape, breaking clean on the shoulder, but it’s still a bit flat and not much energy. If you’re looking for fun, stay on the reef or check the back—nothing’s breaking with real power, but there’s a chance for a few clean sets if you’re patient.'

In [23]:
few_shot(qwen_pipe, training_data, training_data, shots = 8, test_index =2)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

'Alright, check the lineup — it’s a dead flat day out back. Wind’s coming in hard from the northwest at 15 knots, which is churning the surface but not giving us much swell. The E 2 ft at 10 seconds is the only real thing on the board, but it’s slow, lazy, and barely breaking. The NW swell at 4 seconds is just a ghost — too short and weak to ride. If you’re looking for glass, you’ll be disappointed. Best to stay dry and watch the tide roll in.'

The Qwen Model does a fairly good job at incorporating the data and generating a nice output.

In [24]:
few_shot(llama_pipe, training_data, training_data, shots = 8, test_index =1)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

'Winds are light and steady from the west, just enough to keep the surface a bit choppy but not enough to kill the swell. The waves are holding at 2 ft, with a solid 10-second E swell that’s riding clean and long—perfect for a longboarder looking for some soft, rolling sets. The short W wave at 4 seconds is barely breaking, more like a ghost ride. If you’re after consistent, clean water, the east side is your best bet—just don’t go chasing the wind.'

In [25]:
few_shot(llama_pipe, training_data, training_data, shots = 8, test_index =2)

[{'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should be a few short sentences, with some surfing lingo and flair. The data is as follows:\nWind: SW winds 10 to 15 kt, becoming W 5 to 10 kt late, Seas: 2 to 3 ft, Wave Detail: SW 2 ft at 4 seconds and SE 1 ft at 9 seconds. Let's think step by step\nA: "}, {'role': 'assistant', 'content': 'Good afternoon! There is not a lot going on out back at the moment. It is looking pretty flat out there and not much is breaking. The wind is blowing 11.4 mph SW, and the tide passed low at 1:20 pm this afternoon. There is not much breaking out back but keep an eye on […]\n\n'}, {'role': 'user', 'content': "Q: \nOutput a human-readable surf-forecast similar to that of a veteran surf-obsever. The response should take into account the winds, sea-state, and wave period. The final output should b

'Alright, let’s get real — this one’s a quiet one. NW winds at 10 to 15 kt are pushing the swell, but the waves are tight and short, mostly just a 2 ft E set at 10 seconds that’s a bit of a struggle to catch. The NW swell is weak and breaking close to shore, not much room for planing or ride time. The water’s choppy, and the period’s too short for clean, consistent sets. Best to stay on the backside or just watch the tide roll — there’s no magic here, just a slow, tired swell.'

I like the llama model's output it does a fairly good job at interpreting the Wind and Swell and how they interact to make surfable waves. It is inserting some extra information on where the shorebreak is of course, which is a pure hallucination. It is an improvement over the three shot version.

## Final Choice

I am going to go with the smaller Qwen Model. This lightweight model does a fairly goodjob at outputing already. This, combined with the fact that I haven't trained the model at allb bodes well for the future of the project. The lightweight Qwen Model and sufficient training data should do a good job at approximating a human sounding surf reporter. The larger models, ran into some compute errors when I was trying to do 8 shot prompting, I needed to spin up an afton cluster with more than one GPU in order to process that data. For these reasons I feel confident in the Qwen Model. The real difficultly will be two come up with a test in order to evaluate the "truthfulness" of the generated responses!