# Project Title:
## AI-Powered University Course Advisor Chatbot- by Ifeoma Augusta Adigwe

### Project Overview:
This project involves the development of an AI-powered course advisor chatbot for Deutsche Hochschule für Studien (Germany - fictional). Built with open-source tools, the chatbot offers real-time, document-grounded academic support by leveraging Retrieval-Augmented Generation (RAG) to extract course and policy details from institutional documents.

## Business Context:
Institution: Deutsche Hochschule für Studien (Germany - fictional)
Sector: Higher Education
Deutsche Hochschule für Studien  is one of Germany’s leading public research universities, with over 55,000 students and a reputation for academic excellence since 1872. As part of its digital transformation initiative, Deutsche Hochschule für Studien aims to adopt an AI-driven, cost-effective chatbot to improve student support services and reduce the workload on academic advisors.

### Business Problems Addressed:
- Difficulty for students in navigating complex course and policy documents
- Overburdened academic advisors handling repetitive queries
- Lack of 24/7 academic support for students
- Need for a scalable, privacy-friendly automation solution
- Student Decision-Making Delays & Course Selection Uncertainty

### Project Objectives:
- Provide instant academic guidance through an AI chatbot
- Retrieve accurate information from structured and unstructured academic documents
- Reduce advisor workload via automated, intelligent support
- Deploy solution on AWS Free Tier infrastructure
- Ensure solution is open-source and privacy-respecting

### Tech Stack & Tools:
##### Layer and  Toolset
- Model (Mistral 7B / LLaMA 2 via Hugging Face Transformers)
- Embeddings(SentenceTransformers (all-MiniLM-L6-v2))
- Vector DB(FAISS / ChromaDB)
- Interface	(Streamlit / Gradio)
- Deployment (AWS Free Tier (EC2 / Lambda / Amplify))

In [1]:
pip install beautifulsoup4 requests weasyprint

Note: you may need to restart the kernel to use updated packages.


In [2]:
from weasyprint import HTML
HTML(string="<h1>Hello PDF</h1>").write_pdf("test.pdf")


In [3]:
# Import all necessary libraries
import os
import requests
from bs4 import BeautifulSoup
from weasyprint import HTML
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
# Create "docs" in my Project Folder to save scraped urls in PDF format
os.makedirs("docs", exist_ok=True)

### Get the dataset - urls from University site online

In [5]:
# Use beautiful soup to scrape the URL of the reuired info from the University site and convert to pdf

# URLs to scrape
url_map = {
    "Admissions": "https://www.en.uni-muenchen.de/students/degree/index.html",
    "Language Requirements": "https://www.en.uni-muenchen.de/students/int_student_guide/language/index.html",
    "Fees and Finances": "https://www.en.uni-muenchen.de/students/fees/index.html",
    "Living Costs and Funding": "https://www.en.uni-muenchen.de/students/int_student_guide/finance/index.html",
    "Enrollment Guide": "https://www.en.uni-muenchen.de/students/int_student_guide/first_steps/index.html",
    "Housing Info": "https://www.en.uni-muenchen.de/students/int_student_guide/housing/index.html",
    "Academic Calendar": "https://www.en.uni-muenchen.de/students/int_student_guide/dates/index.html",
    "Course Catalog": "https://www.en.uni-muenchen.de/students/degree/index.html",
    "Program Outline": "https://www.en.uni-muenchen.de/students/degree/master_programs/index.html"
}


# Main loop
for title, url in url_map.items():
    try:
        print(f"Fetching: {title}")
        res = requests.get(url)
        soup = BeautifulSoup(res.text, "html.parser")

        # Make the dataset clean by removing unwanted tags
        for tag in soup(["nav", "footer", "script", "style", "header", "aside"]):
            tag.decompose()

        # Convert to PDF
        clean_html = str(soup)
        filename = f"docs/{title.replace(' ', '_').lower()}.pdf"
        HTML(string=clean_html).write_pdf(filename)
        print(f"Saved: {filename}")
    except Exception as e:
        print(f"Failed on {title}: {e}")


Fetching: Admissions
Saved: docs/admissions.pdf
Fetching: Language Requirements
Saved: docs/language_requirements.pdf
Fetching: Fees and Finances
Saved: docs/fees_and_finances.pdf
Fetching: Living Costs and Funding
Saved: docs/living_costs_and_funding.pdf
Fetching: Enrollment Guide
Saved: docs/enrollment_guide.pdf
Fetching: Housing Info
Saved: docs/housing_info.pdf
Fetching: Academic Calendar
Saved: docs/academic_calendar.pdf
Fetching: Course Catalog
Saved: docs/course_catalog.pdf
Fetching: Program Outline
Saved: docs/program_outline.pdf


### Vector DB (FAISS)
#### PDF ➜ Text Chunks ➜ Embeddings ➜ Vector DB (FAISS)