## **An AI-Powered AWS Documentation Assistant**

### Overview
This Kaggle notebook provides an interactive assistant that fetches and summarizes official AWS documentation for any AWS service. 


### Use Case:
Many developers and cloud engineers face challenges navigating AWS documentation to troubleshoot or learn how to use a service. The documentation is vast, fragmented across many pages, and often requires specific search skills to get the right answer quickly.

### Solution:
This project leverages Generative AI + Web Search Integration to simplify AWS troubleshooting. Just type a natural language question like "How to create an EC2 instance?", and the assistant will:
  - Search AWS docs using SerpAPI + Google
  - Extract structured, step-by-step content
  - Format and present a concise, human-readable guide

*Example*
  - Quick AWS Setup Guides (e.g., "How to create an S3 bucket")
  - Troubleshooting Help (e.g., "Fix Lambda timeout errors")
  - Learning AWS Services (e.g., "RDS backup best practices")

### Features
1. Prioritizes documentation (docs.aws.amazon.com) from AWS
2. Parses headings, steps, and code blocks
3. Removes AWS doc artifacts
4. Uses general web results if AWS docs are unavailable
5. Interactive CLI


### Notebook Structure
1. Install dependencies and API keys
2. **Search** : seach_aws_docs() finds relevant AWS pages
3. **Extract** : extract_service_guide() scrapes headings/steps
4. **Format** : format_aws_summary() structures the output
5. **Interactive CLI** : aws_service_assistant() handles query


### GenAI Capabilities
1. **Document Understanding** in extracting structures information from documentation.
2. **Structured Output/Controlled Generation** formatted the extracted content into the standardized template.
3. **Retrieval Augmented Generation (RAG)** to dynamically retrieve AWS docs to generate accurate responses.
4. **Function Calling** from main assistant function to other functions based on user input
5. **Grounding** the outputs in official AWS docs.
6. **Long Context Window** as large HTML documents are processed but truncates to 1000 character for relevance.

#### 1. Setup and Configuration

- Import necessary libraries (requests for web requests, BeautifulSoup for HTML parsing).
- Securely access the SerpAPI key

In [None]:
import requests
from bs4 import BeautifulSoup
import re
from kaggle_secrets import UserSecretsClient

# Initialize SerpAPI
user_secrets = UserSecretsClient()
serpapi_key = user_secrets.get_secret("GOOGLE_API_KEY")

#### 2. Search the AWS Documentation

Uses SerpAPI to search Google for AWS docs and return the top 3 relevant links.

SerpAPI’s Google search ensures we always get fresh data from official sources.

In [None]:
def search_aws_docs(query):
    """Search for AWS documentation using SerpAPI"""
    params = {
        # Restrict to AWS domains kere
        "q": f"{query} site:aws.amazon.com OR site:docs.aws.amazon.com",
        "api_key": serpapi_key,
        "engine": "google",
        # Get top 3 results
        "num": 3  
    }
    try:
        response = requests.get("https://serpapi.com/search", params=params, timeout=10)
        results = response.json()
        
        return [result.get("link") for result in results.get("organic_results", [])]
    
    except Exception as e:
        print(f"Search error: {e}")
        return []

#### 3. Extracting Content from Documents

This function will extract the structured content like headings, steps, and paragraphs from the AWS documentation pages.


In [None]:
def extract_service_guide(url):
    """Extract the main content from AWS documentation"""
    try:
        response = requests.get(url, timeout=15)
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Find main content area (#main-col-body or <article>)
        main_content = soup.find('div', id='main-col-body') or soup.find('article')
        if not main_content:
            return ""
            
        # Extract headings and their content
        guide = []
        current_heading = ""
        
        for element in main_content.find_all(['h1', 'h2', 'h3', 'ol', 'ul', 'p']):
            if element.name in ['h1', 'h2', 'h3']:
                current_heading = element.get_text().strip()
            elif element.name == 'ol':
                steps = [f"{i+1}. {li.get_text(' ', strip=True)}" 
                        for i, li in enumerate(element.find_all('li'))]
                if steps and current_heading:
                    guide.append(f"\n{current_heading}:")
                    guide.extend(steps)
            elif element.name == 'p':
                text = element.get_text(' ', strip=True)
                # Only include substantial paragraphs
                if len(text.split()) > 10:  
                    if current_heading:
                        guide.append(f"\n{current_heading}:")
                        current_heading = ""
                    guide.append(f"- {text}")
        
        return '\n'.join(guide[:1000]) 
        
    except Exception as e:
        print(f"Extraction error: {e}")
        return ""

#### 4. Formatting the Output

It structures the extracted content into a user-friendly guide.

In [None]:
def format_aws_summary(query, content):
    """Format the extracted content into a structured guide"""
    if not content:
        return f"Couldn't find specific documentation for '{query}'. Please visit AWS documentation directly."
    
    # Clean up common formatting issues
    replacements = {
        'â': '-',
        'â': '--',
        'Â': '',
        '\xa0': ' '
    }
    for old, new in replacements.items():
        content = content.replace(old, new)
    
    # Structure the output
    service_name = query.replace('AWS', '').replace('Amazon', '').strip()
    return f"""\n\n AWS {service_name.upper()} GUIDE \n {"=" * 50}
            {content[:3000]}  \n {"=" * 50} \n # Check the links provided above for more details"""

#### 5. Main Assistant Function

Here we ties everything together

In [None]:
def aws_service_assistant():
    """Main assistant function for AWS services"""
    print("\nAWS Service Assistant (type 'quit' to exit)")
    print("Examples: 'How to setup S3 bucket', 'EC2 connection troubleshooting'")
    
    while True:
        query = input("\nWhat AWS service do you need help with? ").strip()
        if query.lower() in ['quit', 'exit']:
            break
            
        print(f"\nSearching AWS documentation for: {query}")
        aws_links = search_aws_docs(query)
        
        if not aws_links:
            print("No AWS documentation found. Trying general web search...")
            params = {
                "q": f"{query} AWS service",
                "api_key": serpapi_key,
                "engine": "google",
                "num": 2
            }
            try:
                response = requests.get("https://serpapi.com/search", params=params)
                aws_links = [result.get("link") for result in response.json().get("organic_results", [])]
            except:
                aws_links = []
        
        if not aws_links:
            print("Couldn't find relevant resources. Please try another query.")
            continue
            
        print(f"Found {len(aws_links)} relevant resources. Processing...")
        
        all_content = []
        for url in aws_links[:2]:  # Process max 2 links
            print(f"Extracting content from: {url}")
            content = extract_service_guide(url)
            if content:
                all_content.append(content)
        
        if not all_content:
            print("Couldn't extract useful content. Here's a general summary:")
            print(f"Learn about {query} at: https://aws.amazon.com/getting-started/")
            continue
            
        summary = format_aws_summary(query, '\n\n'.join(all_content))
        print(summary)

In [None]:
# Start the assistant
if __name__ == "__main__":
    aws_service_assistant()