# Question Answering Project
This project involves building a question answering system using a pre-trained language model. The system will take a context passage and a question as input and generate an answer based on the provided context.

In [6]:
from dotenv import load_dotenv, find_dotenv
import os
load_dotenv(find_dotenv(), override=True)
api_key = os.getenv("OPENAI_API_KEY")

In [7]:
def load_document(file):
    name, extension = os.path.splitext(file)
    if extension == ".pdf":
        from langchain.document_loaders import PyPDFLoader
        print(f"Loading document from {file}")
        loader = PyPDFLoader(file)
    elif extension == ".docx":
        from langchain.document_loaders import Docx2txtLoader
        print(f"Loading document from {file}")
        loader = Docx2txtLoader(file)
    else:
        raise ValueError(f"Unsupported file extension: {extension}")
    data = loader.load()
    return data

In [8]:
data = load_document("files/constitution.pdf")
print(data[1].page_content)  # Print content of the second page

Loading document from files/constitution.pdf
C O N S T I T U T I O N O F T H E U N I T E D S T A T E S  
 
 
 
 
We the People of the United States, in Order to form a 
more perfect Union, establish Justice, insure domestic 
Tranquility, provide for the common defence, promote 
the general Welfare, and secure the Blessings of Liberty to 
ourselves and our Posterity, do ordain and establish this 
Constitution for the United States of America  
 
 
Article.  I. 
SECTION. 1 
All legislative Powers herein granted shall be vested in a 
Congress of the United States, which shall consist of a Sen- 
ate and House of Representatives. 
SECTION. 2 
The House of Representatives shall be composed of Mem- 
bers chosen every second Year by the People of the several 
States, and the Electors in each State shall have the Qualifi- 
cations requisite for Electors of the most numerous Branch 
of the State Legislature. 
No Person shall be a Representative who shall not have 
attained to the Age of twenty f

In [9]:
print(f"Document has {len(data)} pages.")
print(f"There are {len(data[0].metadata)} metadata fields on the first page.")

Document has 19 pages.
There are 13 metadata fields on the first page.


In [12]:
data_docx = load_document("files/Sam_Villasmith_Resume_2025 _Dev.docx")
print(data_docx[0].page_content)  # Print content of the first page

Loading document from files/Sam_Villasmith_Resume_2025 _Dev.docx
SAMUEL VILLA-SMITH, MBA

Senior Software Engineer

📧 svillasmith2@gmail.com | 📱 (806) 440-2215 | 🏠 Fritch, TX

🔗 https://www.linkedin.com/in/samuel-villa-smith-mbaa803a0109  | 🌐 https://github.com/samvillasmith | 



PROFESSIONAL SUMMARY

Experienced Senior Software Engineer with strong background in secure cloud-native applications and full-stack development. Data-driven PhD student in Information Technology with expertise in AI, Machine Learning, and Natural Language Processing (NLP). Combines technical expertise with business acumen to architect and develop robust, security-first web and mobile solutions. AWS Solutions Architect certified with proven experience in implementing defensive security measures and optimizing application performance.





TECHNICAL SKILLS

Development: React, TypeScript, Next.js, Node.js, Tailwind CSS, Shadcn UI, T3 Stack, Full- Stack Development

Data & AI: Advanced Analytics, Data Visualiza