# Python/Go/Java Project Requirements

## Project 1
## Read a pdf file from a folder. Refer to the PDF file Chemistry Questions.pdf
### Requirements
1. Store a PDF file in a folder called “/content”
2. Read PDF file from the folder
3. Write the content to a text file called “output.txt”
4. Store this file under the “/content” folder
### Error Handling
1. Take care of case where folder is not available
2. Take care of case where PDF file is not present in the content folder
3. Take care of case where the output.txt file is not available

In [3]:
pip install PyPDF2 pandas

Note: you may need to restart the kernel to use updated packages.


In [4]:
import os
import PyPDF2

In [6]:
def read_pdf_from_folder(folder_path, filename):
    file_path = os.path.join(folder_path, filename)

    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
        return

    with open(file_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page_num, page in enumerate(reader.pages):
            text += f"\n--- Page {page_num + 1} ---\n"
            text += page.extract_text() or "[No text found]"
        return text

In [8]:
def write_to_text_file(text, output_file, output_folder):
    os.makedirs(output_folder, exist_ok=True)
    output_path = os.path.join(output_folder, output_file)
    with open(output_path, 'w', encoding='utf-8') as file:
        file.write(text)
        if not os.path.exists(output_path):
            print(f"Failed !!! File not found: {output_path}")
            return
        else:
             print(f"Success !!! Content written to {output_path}")

In [10]:
folder = "/Users/naganatarajan/Desktop/GEN_AI_Tasks/my_python_tasks/content/"  # Change this to your folder path
pdf_file = "Chemistry Questions.pdf"        # Change this to your PDF file name
output_file="output.txt"
content = read_pdf_from_folder(folder, pdf_file)
if content:
    write_to_text_file(content,output_file,folder)

Success !!! Content written to /Users/naganatarajan/Desktop/GEN_AI_Tasks/my_python_tasks/content/output.txt


## Project 2
## Traverse through folder tree and filter pdf files
### Requirements
1. Add sub-folders called “One”, “Two”, “Three” under the folder called “/content”
2. Add PDF files under each of the sub-folders
3. Load all PDF files under the sub-folders and load the PDF content
4. Write the content to a text file called “output.txt” under each sub-folder respectively
### Error Handling
1. Take care of case where folder is not available
2. Take care of case where PDF file is not present in a sub-folder
3. Take care of case where the output.txt file is not available in a sub-folde


In [27]:
def traverse_subFolder(main_folder):
    subfolders = []
    for root, dirs, files in os.walk(main_folder):
        for dir in dirs:
            full_path = os.path.join(root, dir)
            subfolders.append(full_path)
    return subfolders


subfolders = traverse_subFolder(folder)

print("Subfolders found under", folder)
for sub in subfolders:
    content = read_pdf_from_folder(sub, pdf_file)
    if content:
        write_to_text_file(content,output_file,sub)


Subfolders found under /Users/naganatarajan/Desktop/GEN_AI_Tasks/my_python_tasks/content/
Success !!! Content written to /Users/naganatarajan/Desktop/GEN_AI_Tasks/my_python_tasks/content/One/output.txt
Success !!! Content written to /Users/naganatarajan/Desktop/GEN_AI_Tasks/my_python_tasks/content/One/Two/output.txt
Success !!! Content written to /Users/naganatarajan/Desktop/GEN_AI_Tasks/my_python_tasks/content/One/Two/Three/output.txt


## Project 3
## Read content from a particular page
### Requirements
1. Update project 1 and update the reading of content 
2. Take a page number as an input from command prompt
3. Read content of the page number provided and write to the output file
### Error Handling
1. Take care of case where folder is not available
2. Take care of case where PDF file is not present in a sub-folder
3. Take care of case where the output.txt file is not available in a sub-folder


In [47]:
def read_pdf_from_folder_based_page_number(folder_path, filename, pagenumber):
    file_path = os.path.join(folder_path, filename)

    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
        return

    with open(file_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page_num, page in enumerate(reader.pages):
            if page_num + 1 == pagenumber:
                text += f"\n--- Page {page_num + 1} ---\n"
                text += page.extract_text() or "[No text found]"
        return text

In [69]:
folder = "/Users/naganatarajan/Desktop/GEN_AI_Tasks/my_python_tasks/content/"  # Change this to your folder path
pdf_file = "Chemistry Questions.pdf"        # Change this to your PDF file name
output_file="output.txt"
pagenumber = 1
content = read_pdf_from_folder_based_page_number(folder, pdf_file,pagenumber)
if content:
    write_to_text_file(content,output_file,folder)
else:
    print(f"Failed !!! No Content Found: {content}")

Success !!! Content written to /Users/naganatarajan/Desktop/GEN_AI_Tasks/my_python_tasks/content/output.txt
