<a href="https://colab.research.google.com/github/geminicopilotgpt/GenAI/blob/main/Gpt2Finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [8]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2')

prompt = "Write a short story about a life of 23 year old guy:"

output = generator(prompt, max_length=50, num_return_sequences=1)[0]['generated_text']

print(output)

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Write a short story about a life of 23 year old guy: http://nypost.com/2012/10/08/young-yale-writers-to-have-kids/

We talk about it all.

Some


In [9]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2')

prompt = "Write a short story about the life of a 23-year-old guy."

output = generator(prompt,
                   max_length=200, # Increased max_length
                   num_return_sequences=1,
                   truncation=True) # Added explicit truncation

print(output[0]['generated_text'])

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Write a short story about the life of a 23-year-old guy.


I've given my personal opinion on this and we should all use our voices now when we speak up about the real problems here; it's an important piece of work in which we are doing better rather than worse. We need to make the world better for people who know how much they care about this issue and who care about their kids because it is something that affects all lives, and what they care about means nothing.


Now there are hundreds and thousands of kids in this country who are going through mental illness who aren't even aware that mental illness is a psychiatric condition but are just doing whatever it is they want to do. Let them know you care and they can move on.


Thank you for sharing your experiences about how to start a conversation to help. There are many things to add for people who already care. And you won't be as hard on those who don't.




In [10]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2')

prompt = "Write a short story about a 23-year-old guy named Alex who is trying to start his own business in a big city."

output = generator(prompt,
                   max_length=200,
                   num_return_sequences=1,
                   truncation=True,
                   temperature=0.7) # Lower temperature

print(output[0]['generated_text'])

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Write a short story about a 23-year-old guy named Alex who is trying to start his own business in a big city. A little something about that is kind of a cool touch.

I've never had a story like that before. I think I was just trying to come up with the right character for the story. I don't know how they came up with Alex, but they did. I think they came up with the character for him because they wanted him to be a character who is part of the larger community.

So I think that's the perfect fit for me. I love this character. I love working with that character. I love working with this character. I love working with that character.

[Laughs.]

It's funny because you've said some interesting things about the other two shows, The Walking Dead and Game of Thrones, but you're not really a guy.

I don't really. I've never said that.


In [11]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2')

prompt = "Write a short story about Alex, a 23-year-old, who moves to New York City to open a coffee shop. Describe his first day, the challenges he faces, and a small victory he achieves."

output = generator(prompt,
                   max_length=300,
                   num_return_sequences=1,
                   truncation=True,
                   temperature=0.7)

print(output[0]['generated_text'])

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Write a short story about Alex, a 23-year-old, who moves to New York City to open a coffee shop. Describe his first day, the challenges he faces, and a small victory he achieves. Write about his trip to the United States to visit his father, who died of brain cancer, and his struggle with depression, all while wearing a t-shirt with the message "Keep a smile on your face, kids! I've got something to tell you."

How do you feel about the whole process of getting started in comics?

It was incredible. I had such a great time. I was completely overwhelmed. It was so overwhelming. I was so overwhelmed and I hadn't thought about a problem for a couple of months. I had to do it. I had to figure out how to do it.

What was your first day like?

I was so excited, it was so great to be part of something that I loved so much. It was amazing. I was so surprised by the amount of people that came. It was so much fun and really fun.

What do you get out of the experience?

I get so much. I get to se

In [12]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
from googlesearch import search
import requests
from bs4 import BeautifulSoup
import io

def get_google_search_text(query, num_results=5):
    """Performs a Google search and returns the text content from the top results."""
    search_results = search(query, num_results=num_results)
    all_text = ""
    for url in search_results:
        try:
            response = requests.get(url)
            response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
            soup = BeautifulSoup(response.content, 'html.parser')
            text = soup.get_text(separator=' ', strip=True) #Gets all text from the webpage.
            all_text += text + " "
        except requests.exceptions.RequestException as e:
            print(f"Error fetching {url}: {e}")
        except Exception as e:
            print(f"Error processing {url}: {e}")
    return all_text

def finetune_and_generate(search_query, prompt):
    """Performs a Google search, finetunes GPT-2, and generates text."""

    text_data = get_google_search_text(search_query)

    if not text_data:
        print("No search results or text found.")
        return

    train_file_obj = io.StringIO(text_data)

    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    model = GPT2LMHeadModel.from_pretrained('gpt2')

    train_dataset = TextDataset(
        tokenizer=tokenizer,
        file_path="in_memory",
        block_size=128,
        file_obj=train_file_obj
    )

    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer, mlm=False
    )

    training_args = TrainingArguments(
        output_dir="./gpt2-finetuned",
        overwrite_output_dir=True,
        num_train_epochs=3,
        per_device_train_batch_size=4,
        save_steps=10_000,
        save_total_limit=2,
        prediction_loss_only=True,
        report_to="none"
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=train_dataset,
    )

    trainer.train()

    model.save_pretrained("./gpt2-finetuned")
    tokenizer.save_pretrained("./gpt2-finetuned")

    finetuned_generator = pipeline('text-generation', model='./gpt2-finetuned')

    output = finetuned_generator(prompt,
                       max_length=300,
                       num_return_sequences=1,
                       truncation=True,
                       temperature=0.7)

    print(output[0]['generated_text'])

# Example Usage:
search_query = "New York City coffee shop trends"
prompt = "Write a short story about a coffee shop in New York City."
finetune_and_generate(search_query, prompt)

TypeError: search() got an unexpected keyword argument 'num_results'

In [13]:
pip install --upgrade google-search-python

[31mERROR: Could not find a version that satisfies the requirement google-search-python (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for google-search-python[0m[31m
[0m

In [15]:
def get_google_search_text(query, num_results=5):
    """Performs a Google search and returns the text content from the top results."""
    search_results = search(query) #Removed num_results here.
    all_text = ""
    count = 0 #Added counter
    for url in search_results:
        if count >= num_results: #Added check
            break
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            text = soup.get_text(separator=' ', strip=True)
            all_text += text + " "
            count += 1 #Increment counter.
        except requests.exceptions.RequestException as e:
            print(f"Error fetching {url}: {e}")
        except Exception as e:
            print(f"Error processing {url}: {e}")
    return all_text

In [17]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
from googlesearch import search
import requests
from bs4 import BeautifulSoup
import io

def get_google_search_text(query, num_results=5):
    """Performs a Google search and returns the text content from the top results."""
    search_results = search(query)
    all_text = ""
    count = 0
    for url in search_results:
        if count >= num_results:
            break
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            text = soup.get_text(separator=' ', strip=True)
            all_text += text + " "
            count += 1
        except requests.exceptions.RequestException as e:
            print(f"Error fetching {url}: {e}")
        except Exception as e:
            print(f"Error processing {url}: {e}")
    return all_text

def finetune_and_generate(search_query, prompt):
    """Performs a Google search, finetunes GPT-2, and generates text."""

    text_data = get_google_search_text(search_query)

    if not text_data:
        print("No search results or text found.")
        return

    train_file_obj = io.StringIO(text_data)

    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    model = GPT2LMHeadModel.from_pretrained('gpt2')

    train_dataset = TextDataset(
        tokenizer=tokenizer,
        file_path="in_memory",
        block_size=128,
        file_obj=train_file_obj
    )

    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer, mlm=False
    )

    training_args = TrainingArguments(
        output_dir="./gpt2-finetuned",
        overwrite_output_dir=True,
        num_train_epochs=3,
        per_device_train_batch_size=4,
        save_steps=10_000,
        save_total_limit=2,
        prediction_loss_only=True,
        report_to="none",
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=train_dataset,
    )

    trainer.train()

    model.save_pretrained("./gpt2-finetuned")
    tokenizer.save_pretrained("./gpt2-finetuned")

    finetuned_generator = pipeline('text-generation', model='./gpt2-finetuned')

    output = finetuned_generator(prompt,
                       max_length=300,
                       num_return_sequences=1,
                       truncation=True,
                       temperature=0.7)

    print(output[0]['generated_text'])

# Example Usage:
search_query = "New York City coffee shop trends"
prompt = "Write a short story about a coffee shop in New York City."
finetune_and_generate(search_query, prompt)

TypeError: TextDataset.__init__() got an unexpected keyword argument 'file_obj'

In [18]:
pip install --upgrade transformers

Collecting transformers
  Downloading transformers-4.49.0-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.49.0-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m52.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.48.3
    Uninstalling transformers-4.48.3:
      Successfully uninstalled transformers-4.48.3
Successfully installed transformers-4.49.0


In [1]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
from googlesearch import search
import requests
from bs4 import BeautifulSoup
import tempfile #added import
import os #added import

def get_google_search_text(query, num_results=5):
    # ... (your get_google_search_text function remains the same)

def finetune_and_generate(search_query, prompt):
    """Performs a Google search, finetunes GPT-2, and generates text."""

    text_data = get_google_search_text(search_query)

    if not text_data:
        print("No search results or text found.")
        return

    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    model = GPT2LMHeadModel.from_pretrained('gpt2')

    # Save to a temporary file
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as temp_file:
        temp_file.write(text_data)
        temp_file_path = temp_file.name

    train_dataset = TextDataset(
        tokenizer=tokenizer,
        file_path=temp_file_path,
        block_size=128
    )

    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer, mlm=False
    )

    training_args = TrainingArguments(
        output_dir="./gpt2-finetuned",
        overwrite_output_dir=True,
        num_train_epochs=3,
        per_device_train_batch_size=4,
        save_steps=10_000,
        save_total_limit=2,
        prediction_loss_only=True,
        report_to="none"
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=train_dataset,
    )

    trainer.train()

    model.save_pretrained("./gpt2-finetuned")
    tokenizer.save_pretrained("./gpt2-finetuned")

    finetuned_generator = pipeline('text-generation', model='./gpt2-finetuned')

    output = finetuned_generator(prompt,
                       max_length=300,
                       num_return_sequences=1,
                       truncation=True,
                       temperature=0.7)

    print(output[0]['generated_text'])

    # Clean up the temporary file
    os.remove(temp_file_path)

# Example Usage:
search_query = "New York City coffee shop trends"
prompt = "Write a short story about a coffee shop in New York City."
finetune_and_generate(search_query, prompt)

IndentationError: expected an indented block after function definition on line 8 (<ipython-input-1-3bf2a16fd7b7>, line 11)

In [6]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
from googlesearch import search
import requests
from bs4 import BeautifulSoup
import tempfile
import os

def get_google_search_text(query, num_results=5):
    """Performs a Google search and returns the text content from the top results."""
    search_results = search(query)
    all_text = ""
    count = 0
    for url in search_results:
        if count >= num_results:
            break
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            text = soup.get_text(separator=' ', strip=True)
            all_text += text + " "
            count += 1
        except requests.exceptions.RequestException as e:
            print(f"Error fetching {url}: {e}")
        except Exception as e:
            print(f"Error processing {url}: {e}")
    return all_text
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
from googlesearch import search
import requests
from bs4 import BeautifulSoup
import tempfile
import os

def get_google_search_text(query, num_results=5):
    """Performs a Google search and returns the text content from the top results."""
    search_results = search(query)
    all_text = ""
    count = 0
    for url in search_results:
        if count >= num_results:
            break
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            text = soup.get_text(separator=' ', strip=True)
            all_text += text + " "
            count += 1
        except requests.exceptions.RequestException as e:
            print(f"Error fetching {url}: {e}")
        except Exception as e:
            print(f"Error processing {url}: {e}")
    return all_text

def finetune_and_generate(search_query, prompt):
    """Performs a Google search,
    """
    # ... (rest of your function code)

In [7]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
from googlesearch import search
import requests
from bs4 import BeautifulSoup
import tempfile
import os

def get_google_search_text(query, num_results=5):
    """Performs a Google search and returns the text content from the top results."""
    search_results = search(query)
    all_text = ""
    count = 0
    for url in search_results:
        if count >= num_results:
            break
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            text = soup.get_text(separator=' ', strip=True)
            all_text += text + " "
            count += 1
        except requests.exceptions.RequestException as e:
            print(f"Error fetching {url}: {e}")
        except Exception as e:
            print(f"Error processing {url}: {e}")
    return all_text

def finetune_and_generate(search_query, prompt):
    """Performs a Google search, finetunes GPT-2, and generates text."""

    text_data = get_google_search_text(search_query)

    if not text_data:
        print("No search results or text found.")
        return

    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    model = GPT2LMHeadModel.from_pretrained('gpt2')

    with tempfile.NamedTemporaryFile(mode='w', delete=False) as temp_file:
        temp_file.write(text_data)
        temp_file_path = temp_file.name

    train_dataset = TextDataset(
        tokenizer=tokenizer,
        file_path=temp_file_path,
        block_size=128
    )

    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer, mlm=False
    )

    training_args = TrainingArguments(
        output_dir="./gpt2-finetuned",
        overwrite_output_dir=True,
        num_train_epochs=3,
        per_device_train_batch_size=4,
        save_steps=10_000,
        save_total_limit=2,
        prediction_loss_only=True,
        report_to="none"
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=train_dataset,
    )

    trainer.train()

    model.save_pretrained("./gpt2-finetuned")
    tokenizer.save_pretrained("./gpt2-finetuned")

    finetuned_generator = pipeline('text-generation', model='./gpt2-finetuned')

    output = finetuned_generator(prompt,
                       max_length=300,
                       num_return_sequences=1,
                       truncation=True,
                       temperature=0.7)

    print(output[0]['generated_text'])

    os.remove(temp_file_path)

# Add this line to call the function and start the process
search_query = "New York City coffee shop trends"
prompt = "Write a short story about a coffee shop in New York City."
finetune_and_generate(search_query, prompt)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step,Training Loss


NameError: name 'pipeline' is not defined

In [8]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments, pipeline #Added import of pipeline
from googlesearch import search
import requests
from bs4 import BeautifulSoup
import tempfile
import os

def get_google_search_text(query, num_results=5):
    """Performs a Google search and returns the text content from the top results."""
    search_results = search(query)
    all_text = ""
    count = 0
    for url in search_results:
        if count >= num_results:
            break
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            text = soup.get_text(separator=' ', strip=True)
            all_text += text + " "
            count += 1
        except requests.exceptions.RequestException as e:
            print(f"Error fetching {url}: {e}")
        except Exception as e:
            print(f"Error processing {url}: {e}")
    return all_text

def finetune_and_generate(search_query, prompt):
    """Performs a Google search, finetunes GPT-2, and generates text."""

    text_data = get_google_search_text(search_query)

    if not text_data:
        print("No search results or text found.")
        return

    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    model = GPT2LMHeadModel.from_pretrained('gpt2')

    with tempfile.NamedTemporaryFile(mode='w', delete=False) as temp_file:
        temp_file.write(text_data)
        temp_file_path = temp_file.name

    train_dataset = TextDataset(
        tokenizer=tokenizer,
        file_path=temp_file_path,
        block_size=128
    )

    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer, mlm=False
    )

    training_args = TrainingArguments(
        output_dir="./gpt2-finetuned",
        overwrite_output_dir=True,
        num_train_epochs=3,
        per_device_train_batch_size=4,
        save_steps=10_000,
        save_total_limit=2,
        prediction_loss_only=True,
        report_to="none"
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=train_dataset,
    )

    trainer.train()

    model.save_pretrained("./gpt2-finetuned")
    tokenizer.save_pretrained("./gpt2-finetuned")

    finetuned_generator = pipeline('text-generation', model='./gpt2-finetuned')

    output = finetuned_generator(prompt,
                       max_length=300,
                       num_return_sequences=1,
                       truncation=True,
                       temperature=0.7)

    print(output[0]['generated_text'])

    os.remove(temp_file_path)

# Add this line to call the function and start the process
search_query = "New York City coffee shop trends"
prompt = "Write a short story about a coffee shop in New York City."
finetune_and_generate(search_query, prompt)



Step,Training Loss


Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Write a short story about a coffee shop in New York City.


In [10]:
from transformers import pipeline

finetuned_generator = pipeline('text-generation', model='./gpt2-finetuned')

prompt = "what is a laptop?"
output = finetuned_generator(prompt,
                   max_length=300,
                   num_return_sequences=1,
                   truncation=True,
                   temperature=0.7)

print(output[0]['generated_text'])

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


what is a laptop?

A laptop is a laptop with a hard drive, keyboard, or other type of device connected to the internet such as a laptop computer, mobile device, or tablet. The laptop computer is a portable computer with a computer operating system such as Mac OS X, Windows, or Linux. A laptop computer is a computer that allows a user to view, browse, and interact with a variety of media. Examples of laptops include printers, e-readers, and mobile devices. A laptop computer is a laptop with a hard drive, keyboard, or other type of device connected to the internet such as a laptop computer, mobile device, or tablet.

Is there a minimum amount of storage for a laptop?

The following are the minimum storage requirements for a laptop:

Minimum Storage: This percentage represents the total storage of the laptop, such as hard drive, hard disk drive, or other type of device. A laptop is not a laptop with a total of more than 100 GB of hard drive, hard disk drive, or other type of device.

Maxi