# üéì Week 1 Mini-Project: Learning Assistant

Welcome to your first personal research tool! This assistant will help you master any topic by analyzing multiple data sources and providing answers with clear citations.

### 1. Setup & Imports
We need to point Python to our `src` folder, which is now **two levels up**.

In [1]:
import sys
import os

# Add project root to path (two levels up from projects/week1_learning_assistant/)
sys.path.append(os.path.abspath(os.path.join('..', '..')))

from src.utils.config import Config
from src.document_loader import DocumentLoader, Document
from src.vector_store import VectorStore
from groq import Groq
from IPython.display import display, Markdown

# Initialize components
vector_store = VectorStore(in_memory=True)
groq_client = Groq(api_key=Config.GROQ_API_KEY)
loader = DocumentLoader()

print("‚úÖ Assistant initialized and ready!")

LLM_PROVIDER: groq
EMBEDDING_PROVIDER: local
EMBEDDING_MODEL: multi-qa-distilbert-cos-v1
VECTOR_SIZE: 768
USE_INTELLIGENT_CHUNKING: False
Initializing collection: research_documents
[OK] Created collection: research_documents
‚úÖ Assistant initialized and ready!


### 2. The Ingestion Engine
This function handles loading and indexing all your research materials.

In [4]:
def ingest_learning_material(topic: str, sources: dict):
    """
    Ingests research materials for a specific topic.
    Sources dict: {"pdfs": [], "webs": [], "youtubes": []}
    """
    print(f"üöÄ Starting Ingestion for Topic: {topic.upper()}")
    
    # Clear previous topic data
    vector_store.clear()
    
    all_docs = []
    project_root = Config.PROJECT_ROOT

        # 1. Process GitHub Repos (Smart Parsing for URLs and Shorthand)
    for repo_info in sources.get("githubs", []):
        # Clean the input: Remove 'https://github.com/' if it's there
        clean_info = repo_info.replace("https://github.com/", "").strip("/")
        print("clean_info", clean_info)
        parts = clean_info.split('/')
        print("parts", parts)
        
        if len(parts) < 2:
            print(f"   ‚ö†Ô∏è Skipping invalid GitHub info: {repo_info}")
            continue
            
        owner = parts[0]
        name = parts[1]
        # Use provided branch, or default to 'main'
        branch = parts[2] if len(parts) > 2 else "main" 
        
        print(f"   üìÇ Downloading Github Repo: {owner}/{name} (Branch: {branch})")
        repo_docs = loader.load_github_repo(owner, name, branch)
        for d in repo_docs:
            d.metadata["topic"] = topic
        all_docs.extend(repo_docs)
    
    # Load PDFs
    for path in sources.get("pdfs", []):
        # Handle relative paths from project root
        full_path = os.path.join(project_root, path) if not os.path.isabs(path) else path
        if os.path.exists(full_path):
            print(f"   üìÑ Loading PDF: {os.path.basename(full_path)}")
            pdf_pages = loader.load_pdf(full_path)
            for p in pdf_pages:
                p.metadata["topic"] = topic
            all_docs.extend(pdf_pages)
            
    # Load Web Pages
    for url in sources.get("webs", []):
        print(f"   üåê Loading Web: {url}")
        web_doc = loader.load_web_page(url)
        web_doc.metadata["topic"] = topic
        all_docs.append(web_doc)
        
    # Load YouTube Transcripts
    for url in sources.get("youtubes", []):
        print(f"   üé• Loading YouTube: {url}")
        yt_doc = loader.load_youtube_transcript(url)
        yt_doc.metadata["topic"] = topic
        all_docs.append(yt_doc)
        
    if all_docs:
        count = vector_store.add_documents(all_docs)
        print(f"\n‚úÖ Knowledge Base Updated! {count} research chunks ready for '{topic}'.")
    else:
        print("‚ö†Ô∏è No materials found to index.")

### 3. Topic Definition
Choose a topic and provide your sources here.

In [5]:
MY_TOPIC = "Docker Fundamentals"

SOURCES = {
    "pdfs": ["data/docker_cheatsheet.pdf"],
    "webs": ["https://docs.docker.com/get-started/overview/"],
    "youtubes": ["https://www.youtube.com/watch?v=fqMOX6JJhGo"],
    "githubs": ["docker/cli/master"], # Example: Indexing the official Docker CLI repo docs
}

ingest_learning_material(MY_TOPIC, SOURCES)

üöÄ Starting Ingestion for Topic: DOCKER FUNDAMENTALS
Initializing collection: research_documents
[OK] Created collection: research_documents
[OK] Cleared collection: research_documents
clean_info docker/cli/master
parts ['docker', 'cli', 'master']
   üìÇ Downloading Github Repo: docker/cli (Branch: master)
   ‚úÖ Loaded 437 markdown files from GitHub: docker/cli
   üìÑ Loading PDF: docker_cheatsheet.pdf
   üåê Loading Web: https://docs.docker.com/get-started/overview/
   üé• Loading YouTube: https://www.youtube.com/watch?v=fqMOX6JJhGo


  0%|          | 0/440 [00:00<?, ?it/s]

[OK] Added 4135 chunks from 440 documents

‚úÖ Knowledge Base Updated! 4135 research chunks ready for 'Docker Fundamentals'.


In [6]:
def query_assistant(question: str):
    """
    Searches knowledge base and generates answer with clear citations.
    """
    # 1. Search for relevant information
    results = vector_store.search(question, top_k=3)
    
    if not results:
        display(Markdown("‚ö†Ô∏è *I couldn't find any information about that in your materials.*"))
        return

    # 2. Process Context and unique Sources
    context_parts = []
    sources = []
    
    for res in results:
        context_parts.append(res['text'])
        
        # Build a citation string
        meta = res['metadata']
        source_type = meta.get('source_type', 'unknown').upper()
        # Find the name from path or url
        name = os.path.basename(meta.get('source_path', '')) or meta.get('source_url', 'Web/YouTube')
        page = f" (Page {meta['page_number']})" if 'page_number' in meta else ""
        citation = f"{source_type}: {name}{page}"
        
        if citation not in sources:
            sources.append(citation)

    context_text = "\n\n".join(context_parts)
    
    # 3. Ask the AI (Groq)
    messages = [
        {
            "role": "system", 
            "content": "You are a professional Learning Assistant. Answer questions accurately using ONLY the provided context. Use bullet points for readability if appropriate."
        },
        {"role": "user", "content": f"Context:\n{context_text}\n\nQuestion: {question}"}
    ]
    
    # Standard non-streaming call for simplicity and reliability
    response = groq_client.chat.completions.create(
        model=Config.GROQ_MODEL,
        messages=messages
    )
    
    # 4. Display the beautiful output
    answer = response.choices[0].message.content
    display(Markdown(f"### ü§ñ Assistant Answer\n{answer}"))
    
    print("\n" + "="*40)
    print("üìö SOURCES USED FOR THIS ANSWER:")
    for i, source in enumerate(sources, 1):
        print(f"{i}. {source}")

In [8]:
print(f"--- Welcome to the {MY_TOPIC} Classroom ---")
print("(Type 'exit' or 'quit' to finish your session)\n")

while True:
    user_query = input("What would you like to learn about? ")
    
    if user_query.lower() in ['exit', 'quit']:
        print("\nüëã Happy studying! Closing the session.")
        break
    
    if not user_query.strip():
        continue
        
    query_assistant(user_query)
    print("\n" + "-"*60 + "\n")

--- Welcome to the Docker Fundamentals Classroom ---
(Type 'exit' or 'quit' to finish your session)



### ü§ñ Assistant Answer
Here are the benefits of Docker:
* Fast and consistent delivery of applications
* Streamlines the development lifecycle
* Allows developers to work in standardized environments
* Enables continuous integration and continuous delivery (CI/CD) workflows
* Provides a loosely isolated environment for applications, ensuring security and stability
* Enables running many containers simultaneously on a given host
* Containers are lightweight and contain everything needed to run the application
* No need to rely on what's installed on the host
* Enables sharing of containers while working, ensuring everyone gets the same container that works in the same way
* Reduces the delay between writing code and running it in production
* Allows managing infrastructure in the same ways as managing applications.


üìö SOURCES USED FOR THIS ANSWER:
1. WEB: https://docs.docker.com/get-started/overview/

------------------------------------------------------------



### ü§ñ Assistant Answer
To spin up a Docker container, you can use the following command:

```console
$ docker run -it <image_name>
```

 Replace `<image_name>` with the name of the image you want to use. For example:

```console
$ docker run -it alpine
```

or 

```console
$ docker run -it debian
```

This will create and start a new container from the specified image, with an interactive shell and a pseudo-TTY attached. 

Alternatively, you can also specify a name for the container and create it before starting it:

```console
$ docker container create -i -t --name mycontainer <image_name>
$ docker container start --attach -i mycontainer
```


üìö SOURCES USED FOR THIS ANSWER:
1. GITHUB: https://github.com/docker/cli/blob/master/docs/reference/commandline/container_run.md
2. GITHUB: https://github.com/docker/cli/blob/master/docs/reference/commandline/container_create.md
3. GITHUB: https://github.com/docker/cli/blob/master/docs/reference/commandline/container_attach.md

------------------------------------------------------------


üëã Happy studying! Closing the session.
