In [1]:
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text

pdf_path = 'Pandas cheat sheet-Aryan.pdf'
pdf_text = extract_text_from_pdf(pdf_path)
pdf_text[0:1000]

'Pandas cheat sheet-Aryan\n1. Introduction to Pandas\nWhat is Pandas?\nPandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and\nmanipulation library for Python. It provides data structures like Series and DataFrames that\nare designed to handle structured data, offering a variety of tools to clean, manipulate, and\nanalyze large datasets. Pandas is built on top of NumPy, and it integrates well with other\ndata science libraries such as Matplotlib, Seaborn, and SciPy.\nThe name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and\nwas created by Wes McKinney in 2008.\nWhy Pandas?\n●\nData set cleaning, merging, and joining.\n●\nEasy handling of missing data (represented as NaN) in floating point as well as\nnon-floating point data.\n●\nColumns can be inserted and deleted from DataFrame and higher dimensional\nobjects.\n●\nPowerful group by functionality for performing split-apply-combine operations on\ndata sets.\n●\nData Visualizatio

In [2]:
import os

from groq import Groq

client = Groq(api_key='')

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "you are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Explain the importance of fast language models",
        }
    ],
    model="llama-3.1-8b-instant",
)

print(chat_completion.choices[0].message.content)

## Available models: https://console.groq.com/docs/models

Fast language models are a type of artificial intelligence (AI) technology that are essential for various applications in the fields of natural language processing (NLP) and deep learning. Here are some reasons why fast language models are crucial:

1. **Speed and Efficiency**: Fast language models can process and generate text at incredibly high speeds, often exceeding millions of words per second. This allows them to handle large volumes of language data and perform complex tasks such as language translation, question-answering, and text summarization, which would be impractical with slower models.

2. **Real-time Applications**: Fast language models enable real-time applications such as:
	* Chatbots and virtual assistants (e.g., Siri, Alexa) that can provide instant responses to user queries.
	* Language translation services (e.g., Google Translate) that can translate text in real-time.
	* Content generation tools (e.g., online content platforms) that can produce high-quality conten

In [3]:
def summarize_text(text):
    try:
        summary_response = client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": f"Summarize the following text: {text}"
                }
            ],
            model="llama-3.1-8b-instant",
        )
        return summary_response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

summary = summarize_text(pdf_text)
print("Summary:")
print(summary)

Summary:
**Overview of Pandas**

Pandas is a powerful, flexible, and easy-to-use open-source data analysis and manipulation library for Python. It provides data structures like Series and DataFrames that are designed to handle structured data, offering a variety of tools to clean, manipulate, and analyze large datasets.

**Why Use Pandas?**

Pandas is useful for:

* Data set cleaning, merging, and joining
* Easy handling of missing data
* Columns can be inserted and deleted from DataFrame and higher dimensional objects
* Powerful group by functionality for performing split-apply-combine operations on data sets
* Data Visualization

**Installation and Setup**

Pandas can be installed using pip or Anaconda.

**Fundamental Pandas Objects**

Pandas objects can be thought of as enhanced versions of NumPy structured arrays. The two fundamental Pandas data structures are:

1. Series: A one-dimensional labeled array that can hold any data type.
2. DataFrame: A two-dimensional labeled data stru

In [4]:
def ask_question(context, question):
    try:
        answer_response = client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": f"Context: {context} Question: {question}"
                }
            ],
            model="llama-3.1-8b-instant",
        )
        return answer_response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

question = "What is the main topic of the document?"
answer = ask_question(pdf_text, question)
print("Answer:")
print(answer)

Answer:
The main topic of the document is Pandas, a popular open-source library used for data analysis and manipulation in Python. The document provides an in-depth introduction to Pandas, including its features, installation, basic operations, and advanced data structures such as Series and DataFrames.


In [5]:
import streamlit as st
import fitz 
from groq import Groq
from PIL import Image


client = Groq(api_key='')

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text

def summarize_text(text):
    try:
        summary_response = client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": f"Summarize the following text: {text}"
                }
            ],
            model="llama-3.1-8b-instant",
        )
        return summary_response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

def ask_question(context, question):
    try:
        answer_response = client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": f"Context: {context} Question: {question}"
                }
            ],
            model="llama-3.1-8b-instant",
        )
        return answer_response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

# Streamlit UI
st.title("PDF Summarizer and Question Answering")
image = Image.open('AI Pic 5.png')
st.image(image, use_column_width='always')

uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")

if uploaded_file is not None:
    pdf_text = extract_text_from_pdf(uploaded_file)
    
    st.subheader("Text Extracted from PDF:")
    st.write(pdf_text[:500])  

    summary_button = st.button("Summarize Text")
    if summary_button:
        summary = summarize_text(pdf_text)
        st.subheader("Summary:")
        st.write(summary)

    question = st.text_input("Ask a question about the PDF:")
    if question:
        answer = ask_question(pdf_text, question)
        st.subheader("Answer:")
        st.write(answer)

2025-02-04 16:05:49.651 
  command:

    streamlit run /Users/rohitlabade/Desktop/untitled folder 2/.venv/lib/python3.9/site-packages/ipykernel_launcher.py [ARGUMENTS]


FileNotFoundError: [Errno 2] No such file or directory: 'AI Pic 5.png'