# Automated Blog Post Generator for Dog-Friendly Travel in Denia

## Project Description

This project aims to demonstrate how to create an automated system for generating SEO-friendly blog posts about a specific topic. In this case, we will focus on traveling with dogs in Denia, Costa Blanca. The generated content will cover pet-friendly places such as restaurants, beaches, parks, shops, and activities.

## Goals
1. **Innovative Content Creation**: Develop an automated system to generate SEO-friendly blog posts.
2. **Advanced Retrieval System**: Use Pinecone for storing and querying text embeddings.
3. **Generative Model**: Fine-tune a GPT model to generate high-quality articles.
4. **SEO Optimization**: Optimize generated content for SEO.
5. **User-Friendly Interface**: Create an intuitive web interface using Vue.js for inputting topics and viewing generated content.

## Step 1: Setup


In [2]:
# Install the required libraries
%pip install requests beautifulsoup4 pandas transformers torch pinecone-client python-dotenv


Collecting python-dotenv
  Using cached python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [7]:
# Import the necessary libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
from transformers import BertTokenizer, BertModel
import torch
from pinecone import Pinecone, ServerlessSpec
from dotenv import load_dotenv
import os


Let's use Pinecone to store and query text embeddings. First, we need to create an index to store the embeddings.


In [9]:
# Load environment variables from .env file
load_dotenv()

pinecone = Pinecone(
    api_key=os.getenv("PINECONE_API_KEY")
)

# Create a new index if it does not exist
index_name = 'denia-dog-travel'
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=768, # common size for GPT-2, BERT, etc.
        metric='cosine',
        spec=ServerlessSpec(
            cloud='aws',
            region='us-east-1',
        )
      )

# Connect to the index
index = pinecone.Index(index_name)


## Step 2: Data Collection

Define a function to scrape data from the web

In [10]:
def scrape_blog_post(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    title = soup.find('h1').text if soup.find('h1') else 'No Title'
    content = ' '.join([p.text for p in soup.find_all('p')])
    return {'title': title, 'content': content}

blog_urls = [
    'https://www.happyinthesun.com/en/dog-beach-in-denia/',
    'https://www.beachatlas.com/dog-friendly-beaches-denia',
    'https://www.beachatlas.com/els-molins',
    'https://www.beachatlas.com/escollera-norte-puerto-denia',
    'https://www.mnmcostablanca.com/Blog/travelling-by-balearia-ferry-from-denia',
    'https://tlcdenia.com/experience-claudia-attar/',
    'https://www.rewindthegap.co.uk/2020/03/10/trust-resort-canino-ondara/',
    'https://emmenetonchien.com/en/plage-chien-autorise/plage-escollera-norte/',
    'https://blog.cumbredelsol.com/en/19364/events/exploring-the-dog-friendly-beaches-of-the-costa-blanca-north.html',
    'https://www.denia.com/en/playas-de-perros/',
    'https://euroweeklynews.com/2022/06/01/denia-dog-friendly-beaches-for-the-summer/',
    'https://denia.net/canine-bathing-area-and-beach-regulations',
    'https://www.thefork.com/restaurants/denia-c133452/dog-friendly-t348',
    'https://www.muchosol.co.uk/escapes/pet-friendly-accommodations-in-denia',
]

# Scrape blog posts
blogs = [scrape_blog_post(url) for url in blog_urls]

# Convert to DataFrame
df_blogs = pd.DataFrame(blogs)
df_blogs.head()


Unnamed: 0,title,content
0,"Sorry, you have been blocked",This website is using a security service to pr...
1,Dog beach in Denia,Home » Dog beach in Denia The coast of Dénia i...
2,Dog Friendly Beaches in Denia,Playa Els Molins is a picturesque beach locate...
3,Playa Els Molins,Playa Els Molins amenities include parking and...
4,Playa Escollera Norte Puerto de Denia,Playa Escollera Norte Puerto de Denia amenitie...


## Step 3: Data Preprocessing

In [19]:
def preprocess_text(text):
    # Remove HTML tags and special characters
    text = re.sub(r'<[^>]+>', '', text)
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
    text = text.lower()
    return text

df_blogs['cleaned_content'] = df_blogs['content'].apply(preprocess_text)
df_blogs.head(2)


Unnamed: 0,title,content,cleaned_content
0,"Sorry, you have been blocked","This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data. You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page. \nCloudflare Ray ID: 89b4e15dad084c45\n•\n\n Your IP:\n Click to reveal\n79.244.61.241\n•\n\nPerformance & security by Cloudflare\n",this website is using a security service to protect itself from online attacks the action you just performed triggered the security solution there are several actions that could trigger this block including submitting a certain word or phrase a sql command or malformed data you can email the site owner to let them know you were blocked please include what you were doing when this page came up and the cloudflare ray id found at the bottom of this page \ncloudflare ray id 89b4e15dad084c45\n\n\n your ip\n click to reveal\n7924461241\n\n\nperformance security by cloudflare\n
1,Dog beach in Denia,"Home » Dog beach in Denia The coast of Dénia is one of the main attractions of the city. However, it is not always accessible to everyone. Pets, in fact, are not usually welcome at the seashore. There is only one dog-friendly beach in the city, located on the northern breakwater of the harbour. However, depending on the time of year, you can also enjoy the rest of the coastline with your four-legged companions. In Dénia there has always been a growing sector of the population that has demanded spaces to enjoy the coast in summer with their pets. In fact, for some years now there has been a beach that admits dogs during the twelve months of the year, but it is a very limited corner that lacks the services and care of the rest of the beaches. However, for a large part of the year, dogs are allowed on almost the entire coastline of Dianes. This is always outside the high summer season, so that sun and beach tourism does not benefit from this measure, which could serve as an attraction for families with pets. As we were saying, there is a small corner of the beach where dogs are allowed. Here, known as the dog beach, people and dogs coexist to enjoy the sand and a good swim away from the prohibitions. It is a sandy beach formed by several breakwaters, so the presence of vegetation such as posidonia is common. It is no more than 100 metres long, so users have little space to enjoy it. It is located on the north breakwater of the port, between the Punta del Raset beach and the Baleària station. It is easily accessible from the town centre, as it is only a short distance from it, so it is not necessary to take the car. In fact, it is advisable to walk there as there is no public parking space nearby. Although the aforementioned beach is the only one considered dog-friendly, the truth is that most of the beaches in Dénia allow dogs outside the high and medium summer season. Dogs are allowed on almost all of Dénia’s natural beaches from 1st November to 1st March. In the case of the urban beaches of Marineta Cassiana and a stretch of Les Marines, access with pets is allowed until the 1st of June. However, there are some exceptions. Both at the end of Les Deveses and in the area of l’Alberca, on Els Molins beach, the presence of dogs is completely forbidden all year round. In addition, there are marked out restricted areas that may not be walked on. These are the regenerating dunes, which are also breeding grounds for the kentish plover, a protected native species. Calle Sant Roc, 12 03779 Els Poblets (Alicante) SPAINTel.: 0034 670 03 06 02 / 0034 965 03 80 72E-mail: info@happyinthesun.comWeb: www.happyinthesun.com Nº Registro turístico: EGTV – 879 – A Developed by net2rent © 2023 Relax in the Sun S.L.",home dog beach in denia the coast of dnia is one of the main attractions of the city however it is not always accessible to everyone pets in fact are not usually welcome at the seashore there is only one dogfriendly beach in the city located on the northern breakwater of the harbour however depending on the time of year you can also enjoy the rest of the coastline with your fourlegged companions in dnia there has always been a growing sector of the population that has demanded spaces to enjoy the coast in summer with their pets in fact for some years now there has been a beach that admits dogs during the twelve months of the year but it is a very limited corner that lacks the services and care of the rest of the beaches however for a large part of the year dogs are allowed on almost the entire coastline of dianes this is always outside the high summer season so that sun and beach tourism does not benefit from this measure which could serve as an attraction for families with pets as we were saying there is a small corner of the beach where dogs are allowed here known as the dog beach people and dogs coexist to enjoy the sand and a good swim away from the prohibitions it is a sandy beach formed by several breakwaters so the presence of vegetation such as posidonia is common it is no more than 100 metres long so users have little space to enjoy it it is located on the north breakwater of the port between the punta del raset beach and the baleria station it is easily accessible from the town centre as it is only a short distance from it so it is not necessary to take the car in fact it is advisable to walk there as there is no public parking space nearby although the aforementioned beach is the only one considered dogfriendly the truth is that most of the beaches in dnia allow dogs outside the high and medium summer season dogs are allowed on almost all of dnias natural beaches from 1st november to 1st march in the case of the urban beaches of marineta cassiana and a stretch of les marines access with pets is allowed until the 1st of june however there are some exceptions both at the end of les deveses and in the area of lalberca on els molins beach the presence of dogs is completely forbidden all year round in addition there are marked out restricted areas that may not be walked on these are the regenerating dunes which are also breeding grounds for the kentish plover a protected native species calle sant roc 12 03779 els poblets alicante spaintel 0034 670 03 06 02 0034 965 03 80 72email infohappyinthesuncomweb wwwhappyinthesuncom n registro turstico egtv 879 a developed by net2rent 2023 relax in the sun sl


## Step 4: Generate Text Embeddings with GPT-2

In [21]:
# Drop the first row of the data frame as it has bad data
df_blogs = df_blogs.drop(index=0).reset_index(drop=True)

In [25]:
from transformers import GPT2Tokenizer, GPT2Model
import torch

# Load pre-trained GPT-2 model and tokenizer
model_name = 'gpt2'
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2Model.from_pretrained(model_name)

# Set padding token
tokenizer.pad_token = tokenizer.eos_token


In [32]:
def generate_embeddings(text):
    """
    Generate embeddings for a given text using GPT-2.

    This function tokenizes the input text, passes it through the GPT-2 model to obtain hidden states,
    and then computes a fixed-size embedding by averaging the hidden states of all tokens.

    Args:
        text (str): The input text to be converted into embeddings.

    Returns:
        np.ndarray: A numpy array representing the fixed-size embedding of the input text.
    """

    # Tokenize the input text
    inputs = tokenizer(
        text,
        return_tensors='pt', # Specifies that the output should be in the form of PyTorch tensors.
        truncation=True, # Ensures the input text is truncated to the maximum length of 512 tokens if necessary.
        padding=True,  # Pads the input text to ensure uniform length.
        max_length=512 # Sets the maximum length for the input sequence.
      )
    
    # Check if the tokenization resulted in a valid tensor
    if inputs['input_ids'].shape[1] == 0:
        return None  # Return None for empty or invalid inputs
    
    # Pass the tokenized input through the GPT-2 model
    outputs = model(**inputs)
    # The outputs include the hidden states of the model at the last layer for each token in the input sequence.
    # Hidden states are intermediate representations that capture various levels of abstraction and semantic information about the input text.

    # Compute the mean of the last hidden state to represent the embedding
    embeddings = outputs.last_hidden_state.mean(dim=1)

    # Detach the tensor from the computational graph and convert it to a numpy array
    # to prevent further gradients from being calculated and for easier manipulation and storage.
    return embeddings.detach().numpy()


In [33]:
# Generate embeddings for the cleaned content of the blog posts
df_blogs['embeddings'] = df_blogs['cleaned_content'].apply(generate_embeddings)

# Drop rows where embeddings are None
df_blogs = df_blogs[df_blogs['embeddings'].notnull()]
df_blogs.head()

Unnamed: 0,title,content,cleaned_content,embeddings
0,Dog beach in Denia,Home » Dog beach in Denia The coast of Dénia is one of the main attractions of the city. However...,home dog beach in denia the coast of dnia is one of the main attractions of the city however it...,"[[0.06546249, 0.11949071, -0.93020034, 0.11505492, -0.114872605, 0.0494746, 0.07108152, -0.03792..."
1,Dog Friendly Beaches in Denia,"Playa Els Molins is a picturesque beach located in Dénia, Spain. This tranquil spot stretches ov...",playa els molins is a picturesque beach located in dnia spain this tranquil spot stretches over ...,"[[0.16946755, 0.109530985, -0.77464455, 0.046797127, -0.033379883, 0.2561004, 0.11223618, -0.059..."
2,Playa Els Molins,Playa Els Molins amenities include parking and loungers. Best restaurants near Playa Els Molins ...,playa els molins amenities include parking and loungers best restaurants near playa els molins i...,"[[0.33472726, -0.09248481, -0.8055865, 0.081854455, -0.14519772, 0.19423556, 0.4100061, -0.10396..."
3,Playa Escollera Norte Puerto de Denia,Playa Escollera Norte Puerto de Denia amenities include parking. Best restaurants near Playa Esc...,playa escollera norte puerto de denia amenities include parking best restaurants near playa esco...,"[[0.3666223, -0.18818898, -0.6287987, 0.06308876, -0.23106927, 0.32673752, 0.8353995, -0.2362218..."
4,"Sorry, you have been blocked",This website is using a security service to protect itself from online attacks. The action you j...,this website is using a security service to protect itself from online attacks the action you ju...,"[[0.15065706, -0.0007279099, -0.44581804, 0.22798474, -0.19323938, 0.17989622, 3.7230008, -0.024..."


## Step 5: Store Text Embeddings in Pinecone

In [34]:
# Prepare embeddings and metadata for indexing
embeddings = df_blogs['embeddings'].tolist()
print(embeddings)
metadata = [{'title': row['title'], 'content': row['content']} for _, row in df_blogs.iterrows()]

# Index the embeddings
index.upsert(vectors=[(str(i), embedding, metadata[i]) for i, embedding in enumerate(embeddings)])



PineconeApiTypeError: Invalid type for variable '0'. Required value type is float and passed type was list at ['values'][0]