## **An AI-Powered Google Cloud Documentation Assistant**

### Overview
This Kaggle notebook presents an AI-powered assistant that helps developers and cloud engineers quickly find and understand official Google Cloud documentation. Given any Google Cloud service query, the assistant intelligently retrieves relevant pages, extracts structured content, and summarizes it in a clear, human-readable format.


### Use Case:
Google Cloud documentation is rich but often complex, scattered across multiple pages, and time-consuming to navigate—especially when troubleshooting. 

### Solution:
This assistant removes that friction by combining Generative AI with real-time search to deliver precise, structured guidance. This project leverages Generative AI + Web Search Integration to simplify troubleshooting. Just type a natural language question like "How to create a compute engine instance?", and receive:
  - A step-by-step summary of official documentation
  - Key IAM permissions, CLI/API commands, and console instructions
  - Clean, deduplicated links to additional resources

*Example Queries*
  - Quick Google Cloud Setup Guides (e.g., "Cloud Storage bucket creation")
  - Troubleshooting Help (e.g., "RDBMS timeout errors")
  - Learning new Services (e.g., "IAM role best practices")

### Features
1. Prioritizes documentation (cloud.google.com/docs)
2. Extracts steps, code snippets, CLI commands, IAM roles, and console instructions
3. Removes noise and irrelevant page elements
4. Falls back to generic guides if detailed parsing fails
5. Interactive terminal-based CLI experience


### Notebook Structure
1. Install dependencies and API keys
2. **Search** : search_cloud_docs() to locate relevant Google Cloud documentation
3. **Extract** : extract_cloud_content() parses HTML to extract structured sections
4. **Format** : format_cloud_summary() creates a summarized guide
5. **Interactive CLI** : cloud_service_assistant() handles user queries


### GenAI Capabilities
1. **Document Understanding**: Extracts meaningful sections, commands, and steps from HTML pages.
2. **Structured Output**: Summarizes documentation in standardized sections (e.g., IAM, CLI, API, Console)
3. **Retrieval-Augmented Generation (RAG)**: Dynamically fetches content grounded in real-time Google Cloud documentation.
4. **Function Calling**: Modular assistant calls different functions based on user input
5. **Grounding**: Limits search to official Google Cloud documentation (cloud.google.com)
6. **Long-Context Handling**: Parses full pages but truncates responses for readability (2000-character summaries)

#### 1. Setup and Configuration

- Import necessary libraries (requests for web requests, BeautifulSoup for HTML parsing).
- Securely access the SerpAPI key

In [24]:
import requests
from bs4 import BeautifulSoup
from kaggle_secrets import UserSecretsClient

# Initialize SerpAPI
user_secrets = UserSecretsClient()
serpapi_key = user_secrets.get_secret("GOOGLE_API_KEY")

#### 2. Search the Google Cloud Documentation

Uses SerpAPI to search Google for docs and return the top 3 relevant links.

SerpAPI’s Google search ensures we always get fresh data from official sources. The query is filtered to prioritize URLs from `cloud.google.com/docs`, ensuring our assistant remains grounded in trusted sources.

In [6]:
def search_cloud_docs(query):
    """Search for Google Cloud documentation using SerpAPI"""
    params = {
        # Restrict to Google Cloud domains here (Grounding)
        "q": f"{query} site:cloud.google.com",
        "api_key": serpapi_key,
        "engine": "google",
        # Get top 3 results
        "num": 3  
    }
    try:
        response = requests.get("https://serpapi.com/search", params=params, timeout=10)
        results = response.json()
        
        return [result.get("link") for result in results.get("organic_results", [])]
    
    except Exception as e:
        print(f"Search error: {e}")
        return []

#### 3. Extracting Content from Documents

Using BeautifulSoup to scrape HTML from Google Cloud documentation pages and extract meaningful content such as headers, code snippets, and step-by-step instructions.


In [22]:
def extract_cloud_content(url):
    """
    Intelligently formats:
    - CLI commands
    - IAM permissions
    - Console steps
    - API snippets
    """
    try:
        response = requests.get(url, timeout=15)
        soup = BeautifulSoup(response.text, 'html.parser')

        # 1. Remove noise 
        for element in soup.select('''
            nav, header, footer, script, style, 
            .nocontent, .hidden, .devsite-page-nav,
            [aria-hidden=true], [role=navigation],
            devsite-header, devsite-footer, devsite-masthead,
            devsite-book-nav, devsite-breadcrumb, devsite-toc
        '''):
            element.decompose()


        # 2. Find main content 
        article = (
            soup.find('article') or 
            soup.find('main') or
            soup.select_one('.devsite-article, .article') or
            soup.body  # Fallback
        )

        if not article:
            return None, None

        # 3. Unified extraction logic
        sections = []
        links = set()

        # Detect and format common patterns
        for header in article.find_all(['h1', 'h2', 'h3']):
            header_text = header.get_text(' ', strip=True)
            
            # Skip empty/unwanted headers
            if not header_text or header_text in ['Feedback', 'Related links']:
                continue

            section_content = []

            # CASE 1: IAM Permissions Section
            if 'role' in header_text.lower() or 'permission' in header_text.lower():
                section_content.append(f"## {header_text}")
                for li in header.find_next('ul').find_all('li'):
                    perm_text = li.get_text(' ', strip=True)
                    if 'roles/' in perm_text or '.permissions.' in perm_text:
                        section_content.append(f"  - {perm_text.split('(')[0].strip()}")

            # CASE 2: Console/CLI/API Tabs
            elif any(tab in header_text.lower() for tab in ['console', 'command line', 'api']):
                tab_type = 'Console' if 'console' in header_text.lower() else \
                          'CLI' if 'command' in header_text.lower() else \
                          'API'
                section_content.append(f"## Using {tab_type}")

                # Extract steps/code blocks
                next_elem = header.next_sibling
                while next_elem and next_elem.name not in ['h1', 'h2', 'h3']:
                    if next_elem.name == 'pre':
                        section_content.append(f"```\n{next_elem.get_text()}\n```")
                    elif next_elem.name == 'p':
                        text = next_elem.get_text(' ', strip=True)
                        if text and not text.startswith('Note:'):
                            section_content.append(f"• {text}")
                    next_elem = next_elem.next_sibling

            # CASE 3: Generic content
            else:
                section_content.append(f"## {header_text}")
                for p in header.find_next_siblings(['p', 'ul']):
                    if p.name == 'p':
                        section_content.append(f"• {p.get_text(' ', strip=True)}")
                    elif p.name == 'ul':
                        for li in p.find_all('li'):
                            section_content.append(f"  - {li.get_text(' ', strip=True)}")

            if len(section_content) > 1:  # Skip empty sections
                sections.extend(section_content)

        # 4. Extract relevant links
        for link in article.find_all('a', href=True):
            href = link['href']
            if 'cloud.google.com' in href and '/docs' in href:
                clean_url = href.split('#')[0].split('?')[0]
                links.add(clean_url)

        # 5. Fallback if no structured content found
        if not sections:
            title = soup.find('title')
            fallback_content = [
                f"## {title.get_text() if title else 'Service Overview'}",
                "• For detailed steps, visit the official documentation:",
                f"• {url}"
            ]
            return '\n'.join(fallback_content), links

        return '\n'.join(sections), links

    except Exception as e:
        print(f"Extraction error: {e}")
        return None, None

#### 4. Formatting the Output

This function takes the raw extracted content and formats it into a cleaner, summarized output with markdown-style readability.

In [13]:
def format_cloud_summary(query, content_with_links):
    content, links = content_with_links
    
    # Standardize bucket terminology
    if 'bucket' in query.lower():
        title = "CREATE BUCKET GUIDE"
        default_link = "https://cloud.google.com/storage/docs/creating-buckets"
    else:
        title = f"{query.upper()} GUIDE"
        default_link = f"https://cloud.google.com/{query.replace(' ', '-')}"
    
    return f"""
GOOGLE CLOUD {title}
{"=" * 60}
{content[:2000]}
{"=" * 60}
Full documentation: {links.pop() if links else default_link}
"""

#### 5. Main Assistant Function

This is the main assistant function that combines search, extraction, and formatting. Enter any Google Cloud query, and the assistant will fetch and summarize official documentation.

In [17]:
def cloud_service_assistant():
    """Generic Google Cloud service assistant"""
    print("\nGoogle Cloud Assistant (type 'quit' to exit)")
    print("Examples: 'compute engine', 'cloud storage', 'database services'")
    
    while True:
        query = input("\nWhat Google Cloud service do you need help with? ").strip().lower()
        if query in ['quit', 'exit']:
            break
        if not query:
            continue
            
        print(f"\nSearching Google Cloud docs for: {query}")
        cloud_links = search_cloud_docs(query)
        
        if not cloud_links:
            print(f"No documentation found. Try visiting: https://cloud.google.com/{query.replace(' ', '-')}")
            continue
               
        all_content = []
        all_links = set()
        
        for url in cloud_links[:2]:  # Process top 2 results
            print(f"Reviewing: {url}")
            content, links = extract_cloud_content(url)
            if content:
                all_content.append(content)
                if links:
                    all_links.update(links)
        
        if not all_content:
            print("Using generic service information.")
            default_content = f"""
## Google Cloud {query.title()} Service
This is a managed Google Cloud service providing cloud-based solutions.
For specific documentation, please visit: https://cloud.google.com/{query.replace(' ', '-')}
"""
            print(default_content)
        else:
            summary = format_cloud_summary(query, ("\n\n".join(all_content), all_links))
            print(summary)

In [23]:
# Start the assistant
if __name__ == "__main__":
    cloud_service_assistant()


Google Cloud Assistant (type 'quit' to exit)
Examples: 'compute engine', 'cloud storage', 'database services'



What Google Cloud service do you need help with?  log based metric



Searching Google Cloud docs for: log based metric
Reviewing: https://cloud.google.com/logging/docs/logs-based-metrics
Reviewing: https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring/sli-metrics/logs-based-metrics

GOOGLE CLOUD LOG BASED METRIC GUIDE
## Sources of log-based metrics
• You can use the metrics defined by Cloud Logging to collect general usage
information, and you can define your own log-based metric to capture information
specific to your application or business.
• Log-based metrics can apply within a single Google Cloud project or within a log
bucket. You can't create
log-based metrics for other Google Cloud resources such as
Cloud Billing accounts or organizations.
• For information about the differences between project-based log-based metrics
and bucket-based log-based metrics, see Bucket-scoped log-based metrics .
• Logging provides a set of metrics for usage values such as the
number of log entries stored in log buckets in your project, or
the number of


What Google Cloud service do you need help with?  troubleshoot Network loadbalancer



Searching Google Cloud docs for: troubleshoot network loadbalancer
Reviewing: https://cloud.google.com/load-balancing/docs/internal/troubleshooting-ilb
Reviewing: https://cloud.google.com/load-balancing/docs/network/troubleshooting-networklb

GOOGLE CLOUD TROUBLESHOOT NETWORK LOADBALANCER GUIDE
## Troubleshoot common issues with Network Analyzer
• Network Analyzer automatically monitors your VPC network configuration and detects
both suboptimal configurations and misconfigurations. It identifies network
failures, provides root cause information, and suggests possible resolutions. To
learn about the different misconfiguration scenarios that are automatically
detected by Network Analyzer, see Load balancer insights in the Network Analyzer documentation. Network Analyzer is available in the Google Cloud console as a part of
Network Intelligence Center. Go to Network Analyzer
• When creating a load balancer, you might see the error:
• This happens when you try to use the same backend in t


What Google Cloud service do you need help with?  quit
