<a href="https://colab.research.google.com/github/mikeobeid/CloudCourse/blob/main/ex7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📓 **Tutorial 7 – Cloud Computing**  
**Project:** HW2_Tiger – Microservices Search Engine

---

### **Summary:**  
This notebook implements a document search engine using a simulated microservices architecture in Python. It includes:

- 🧩 Document indexing
- 🔍 Logical search queries using `AND` / `OR`
- 📊 Keyword frequency scoring and ranking system
- 🧾 Result formatting with title and snippet display

The architecture is modular and designed to reflect real-world service boundaries.  
This project can be scaled and deployed using **serverless** Function-as-a-Service platforms (e.g., Google Cloud Functions).

---


In [2]:
# Required for pretty display
from datetime import datetime

In [3]:
class IndexService:
    def __init__(self):
        self.documents = {}
        self.index = {}

    def add_document(self, doc_data):
        doc_id = str(len(self.documents) + 1)
        self.documents[doc_id] = {**doc_data, 'id': doc_id}
        words = doc_data['content'].lower().split()
        for word in words:
            if word not in self.index:
                self.index[word] = set()
            self.index[word].add(doc_id)
        return self.documents[doc_id]

    def get_document(self, doc_id):
        return self.documents.get(doc_id)

    def search_word(self, word):
        return list(self.index.get(word.lower(), set()))


In [4]:
class QueryService:
    def __init__(self, index_service):
        self.index_service = index_service
        self.queries = {}

    def create_query(self, query_data):
        try:
            query_id = str(len(self.queries) + 1)
            terms = query_data['terms']
            operator = query_data.get('operator', 'AND').upper()
            results = {}
            all_doc_scores = {}

            for term in terms:
                doc_ids = self.index_service.search_word(term)
                for doc_id in doc_ids:
                    all_doc_scores[doc_id] = all_doc_scores.get(doc_id, 0) + 1

            if operator == 'AND':
                filtered = [doc_id for doc_id, score in all_doc_scores.items() if score == len(terms)]
            elif operator == 'OR':
                filtered = list(all_doc_scores.keys())
            else:
                return {'error': 'Unsupported operator'}

            query = {
                'id': query_id,
                'terms': terms,
                'operator': operator,
                'results': filtered,
                'scores': {doc_id: all_doc_scores[doc_id] for doc_id in filtered},
                'timestamp': query_data.get('timestamp', str(datetime.now()))
            }
            self.queries[query_id] = query
            return query
        except Exception as e:
            return {'error': str(e)}


In [5]:
class ResultService:
    def __init__(self, index_service, query_service):
        self.index_service = index_service
        self.query_service = query_service
        self.results = {}

    def format_results(self, query_id):
        try:
            query = self.query_service.queries.get(query_id)
            if not query:
                return {'error': 'Query not found'}

            ranked = sorted(query['results'], key=lambda d: query['scores'][d], reverse=True)

            formatted = []
            for doc_id in ranked:
                doc = self.index_service.get_document(doc_id)
                if doc:
                    formatted.append({
                        'doc_id': doc_id,
                        'title': doc['title'],
                        'score': query['scores'][doc_id],
                        'snippet': doc['content'][:100] + '...'
                    })

            result_id = str(len(self.results) + 1)
            result = {
                'id': result_id,
                'query_id': query_id,
                'formatted_results': formatted,
                'count': len(formatted)
            }
            self.results[result_id] = result
            return result
        except Exception as e:
            return {'error': str(e)}


In [6]:
# Initialize services
index_service = IndexService()
query_service = QueryService(index_service)
result_service = ResultService(index_service, query_service)

# Add documents
index_service.add_document({'title': 'MQTT Overview', 'content': 'MQTT is a lightweight messaging protocol'})
index_service.add_document({'title': 'IoT and MQTT', 'content': 'IoT devices use MQTT for communication'})
index_service.add_document({'title': 'Cloud Communication', 'content': 'Protocols like HTTP and MQTT are used'})

# Create and run query
query = query_service.create_query({'terms': ['mqtt', 'protocol'], 'operator': 'OR'})
print("Query:", query)

# Format results
formatted = result_service.format_results(query['id'])
for r in formatted['formatted_results']:
    print(f"[{r['score']}] {r['title']} → {r['snippet']}")


Query: {'id': '1', 'terms': ['mqtt', 'protocol'], 'operator': 'OR', 'results': ['2', '1', '3'], 'scores': {'2': 1, '1': 2, '3': 1}, 'timestamp': '2025-05-23 10:57:50.765241'}
[2] MQTT Overview → MQTT is a lightweight messaging protocol...
[1] IoT and MQTT → IoT devices use MQTT for communication...
[1] Cloud Communication → Protocols like HTTP and MQTT are used...
