In [None]:
%pip install wikipedia-api pydantic

Collecting wikipedia-api
  Downloading wikipedia_api-0.8.1.tar.gz (19 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia-api
  Building wheel for wikipedia-api (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia-api: filename=Wikipedia_API-0.8.1-py3-none-any.whl size=15383 sha256=83eb371508dd07a53654ae7d70e9e30a3aad14bd5274e9350f73407ab0eaeae6
  Stored in directory: /root/.cache/pip/wheels/0b/0f/39/e8214ec038ccd5aeb8c82b957289f2f3ab2251febeae5c2860
Successfully built wikipedia-api
Installing collected packages: wikipedia-api
Successfully installed wikipedia-api-0.8.1


In [None]:
from pydantic import BaseModel
from typing import List, Optional
import wikipediaapi

class InstitutionDetails(BaseModel):
  founder: Optional[str]
  founded: Optional[str]
  branches: Optional[List[str]]
  number_of_employees: Optional[int]
  summary: Optional[str]

def fetch_institution_details(name: str) -> InstitutionDetails:
  wiki_wiki = wikipediaapi.Wikipedia(user_agent="Script/1.0 (contact: myemail@example.com)", language='en')
  page = wiki_wiki.page(name)
  #print(f"Page: {page}")

  if not page.exists():
    raise ValueError(f"No page found for '{name}'")

  founder = founded = None
  branches =[]
  summary = page.summary[:500]
  num_employees = None

  for line in page.text.split('\n'):
    if 'Founder' in line:
      founder = line.split(':')[-1].strip()
    elif 'Founded' in line:
      founded = line.split(':')[-1].strip()
    elif 'Branches' in line:
      branches = [b.strip() for b in line.split(':')[-1].split(',')]
    elif 'Number of employees' in line:
      try:
        num_employees = int(line.split(':')[-1].strip().replace(',', ''))
      except:
        pass

  return InstitutionDetails(
      founder = founder,
      founded = founded,
      branches = branches or None,
      number_of_employees = num_employees,
      summary = summary
  )

def display_details(d: InstitutionDetails):
  print(f"\n Institution Details:- ")
  print(f"Founder: {d.founder or 'N/A'}")
  print(f"Founded: {d.founded or 'N/A'}")
  print(f"Branches: {', '.join(d.branches) if d.branches else 'N/A'}")
  print(f"Employees: {d.number_of_employees or 'N/A'}")
  print(f"Summary: {d.summary or 'N/A'}")

name = input("Enter institution name: ").strip()
if name:
  try:
    details = fetch_institution_details(name)
    display_details(details)
  except Exception as e:
    print("Error : ", e)
else:
  print("Please enter a valid name!")

Enter institution name: Massachusetts Institute of Technology

 Institution Details:- 
Founder: N/A
Founded: N/A
Branches: N/A
Employees: N/A
Summary: The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and science.
In response to the increasing industrialization of the United States, William Barton Rogers organized a school in Boston to create "useful knowledge." Initially funded by a federal land grant, the institute adopted a polytechnic model that stressed labo


VIVA

----
1. What is the significance of Cohere, LangChain, and Pydantic?

- Cohere: Provides NLP APIs including embeddings, classification, and summarization via LLMs. Used for model inference.

- LangChain: Framework to build applications powered by LLMs. It connects LLMs with tools like memory, APIs, and agents.

- Pydantic: Used for data validation and parsing using Python type hints. Useful in defining schemas and ensuring structured output from LLMs.

----

2. Explain the techniques used in word embeddings.

- Traditional: One-hot encoding, Bag of Words (BoW), TF-IDF.

- Neural: Word2Vec (CBOW, Skip-gram), GloVe (co-occurrence matrix), FastText (subword n-grams), BERT (contextual embeddings).

----

3. Why do we use word embeddings?/ Need for word embeddings.

- To reduce dimensionality and sparsity.

- To capture semantic and syntactic meanings.

- To enable words with similar meanings to have similar representations.

----

4. Discuss the real-world applications of LLMs and their limitations.

- Applications: Chatbots, summarization, code generation, sentiment analysis, content creation.

- Limitations: Bias in training data, hallucination, high computational cost, lack of real-time awareness.

----

5. Which model is used for summarization?

- Pre-trained summarization model from Hugging Face (facebook/bart-large-cnn)

----

6. Explain the BART model in detail.

- BART (Bidirectional and Auto-Regressive Transformer) combines BERT (encoding) and GPT (decoding).

- It is a sequence-to-sequence model used for text generation, summarization, translation, etc.

- Trained by corrupting text and learning to reconstruct it.

----

7. What is sentiment analysis and its applications?

- It is the process of identifying sentiment (positive, negative, neutral) from text.

- Applications: Customer feedback, brand monitoring, political analysis, market research.

----

8. Discuss and explain the significance of the parameter perplexity in t-SNE.

- hyperparameter that defines the effective number of neighbors.

- Controls the balance between local(less perplexity) vs. global(more perplexity) structure.

- Should be less than the number of data points; typical range: 5–50.

----

9. Describe the algorithm (step-by-step, in words) for building an IPC chatbot.

a. Download the Indian Penal Code document.

b. Preprocess and split the document into retrievable chunks.

c. Use embeddings to store the chunks in a vector store.

d. Accept user queries.

e. Retrieve relevant sections using similarity search.

f. Use LLM (via LangChain) to answer based on the retrieved context.

----

10. Discuss PCA and t-SNE.

- PCA: Linear, preserves global variance, faster, used for large datasets.

- t-SNE: Non-linear, preserves local relationships, ideal for visualizing word clusters in small data.

- Used to visualize high-dimensional word embeddings in 2D/3D.

----

11. What are the uses of prompt engineering?

- To control LLM outputs by carefully designing the input prompts.

- Used in chatbots, summarization, translation, data extraction, and few-shot learning.

----