## Loading PDF and Extracting text 

In [1]:
import fitz

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text

In [7]:
pdf_path = 'Syllabus Z515 2024.pdf'
pdf_text = extract_text_from_pdf(pdf_path)
pdf_text[0:1000]

'IU Bloomington\nLuddy School of Informatics, Computing, and \nEngineering\nDepartment of Information and Library Science\nZ515: Information Architecture\nTaught by:\nKate Wehner <kmessing@iu.edu>\nFall 2024\nZ515—INFORMATION ARCHITECTURE SYLLABUS\n2\nIntroduction\nThe backbone of good user experience is findability and discoverability of \ncontent. Principles in information architecture applied to digital ecosystems \nenable users to navigate with ease and intuitively understand categories \nof information. Traditional systematic structures will be covered, such as \nclassification schemes, ontologies, controlled vocabularies, and thesauri.\nThis course also reviews research in information science, cognitive science, \nsemiotics, and computer science that has contributed to an understanding \nof how communities represent, organize, retrieve, and ultimately use \ninformation. This research can inform current practices of representation and \norganization in the design of more effective

## API Testing

In [3]:
import os

from groq import Groq

client = Groq(
    api_key=os.environ.get("GROQ_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Explain the importance of fast language models",
        }
    ],
    model="llama3-8b-8192",
)

print(chat_completion.choices[0].message.content)

Fast language models, also known as efficient language models or accelerable language models, have gained significant importance in recent years due to the increasing demand for natural language processing (NLP) and machine learning (ML) applications. Here are some reasons why fast language models are crucial:

1. **Scalability**: Fast language models can process large amounts of text data quickly, making them essential for applications that require handling massive amounts of text, such as text analysis, sentiment analysis, and language translation.
2. **Real-time processing**: With the increasing use of NLP in various applications like chatbots, virtual assistants, and social media platforms, fast language models enable real-time processing of user input, enabling faster responses and improved user experience.
3. **Efficient memory utilization**: Fast language models require less memory compared to traditional language models, making them suitable for deployment on devices with limit

## Summarizing Text

In [8]:
def summarize_text(text):
    try:
        summary_response = client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": f"Summarize the following text: {text}"
                }
            ],
            model="llama-3.1-8b-instant",
        )
        return summary_response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

summary = summarize_text(pdf_text)
print("Summary:")
print(summary)

Summary:
The provided text is a syllabus for a course called "Information Architecture" (Z515) at the Luddy School of Informatics, Computing, and Engineering, Department of Information and Library Science, Indiana University Bloomington. Here's a summary of the course:

**Course Overview:** 

This course focuses on the principles of information architecture, which enables users to navigate digital ecosystems with ease and understand categories of information intuitively. The course covers traditional systematic structures, such as classification schemes, ontologies, controlled vocabularies, and thesauri, as well as research in information science, cognitive science, semiotics, and computer science.

**Course Objectives:**

By the end of the course, students will:

1. Understand the objectives, functions, and applications of information architectures in various environments.
2. Be familiar with a range of structured models of information representation and organization.
3. Be able to id

## Questioning

In [12]:
def ask_question(context, question):
    try:
        answer_response = client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": f"Context: {context} Question: {question}"
                }
            ],
            model="llama-3.1-8b-instant",
        )
        return answer_response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

question = "Can you give me the course outline? give me the content in the form of short story"
answer = ask_question(pdf_text, question)
print("Answer:")
print(answer)

Answer:
Here's the course outline in the form of a short story:

As you sit in the lecture hall, Professor Kate Wehner begins to introduce the course, laying the groundwork for what you'll be learning throughout the semester. It's the first week of August and you're about to embark on a journey through the world of Information Architecture.

On the first day, Professor Wehner talks about what to expect from the course and sets the tone for the upcoming weeks. You take notes as she explains the objectives of the course, which include understanding the objectives, functions, and applications of information architectures, as well as becoming familiar with various structured models of information representation and organization.

As the weeks go by, you delve into the world of Information Architecture and User Experience. In session 2, Professor Wehner discusses the relationship between IA and user experience, citing Garrett's book on the Elements of User Experience (Chapter 1: User Experi

## Streamlit app

In [14]:
import streamlit as st
import fitz  # PyMuPDF
from groq import Groq
from PIL import Image

# Initialize the Groq client
client = Groq(api_key='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')

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text

def summarize_text(text):
    try:
        summary_response = client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": f"Summarize the following text: {text}"
                }
            ],
            model="llama-3.1-8b-instant",
        )
        return summary_response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

def ask_question(context, question):
    try:
        answer_response = client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": f"Context: {context} Question: {question}"
                }
            ],
            model="llama-3.1-8b-instant",
        )
        return answer_response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

# Streamlit UI
st.title("PDF Summarizer and Question Answering")
# image = Image.open('AI Pic 5.png')
# st.image(image, use_column_width='always')

uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")

if uploaded_file is not None:
    pdf_text = extract_text_from_pdf(uploaded_file)
    
    st.subheader("Text Extracted from PDF:")
    st.write(pdf_text[:500])  # Display a snippet of the text for review

    summary_button = st.button("Summarize Text")
    if summary_button:
        summary = summarize_text(pdf_text)
        st.subheader("Summary:")
        st.write(summary)

    question = st.text_input("Ask a question about the PDF:")
    if question:
        answer = ask_question(pdf_text, question)
        st.subheader("Answer:")
        st.write(answer)

