In [1]:
import preprocessing3, cosine3

from pdfstructure.hierarchy.parser import HierarchyParser
from pdfstructure.source import FileSource
from pdfstructure.printer import JsonFilePrinter
import json
import pathlib

import numpy as np
import re

In [2]:
def text_on_page(dict_var, id_json, list_res, page):
    if type(dict_var) is dict:
        for k, v in dict_var.items():
            if k == id_json and v == page:
                if v > page: return list_res
                list_res.append(dict_var["text"])
            elif isinstance(v, dict):
                text_on_page(v, id_json, list_res, page)   
            elif isinstance(v, list):
                for item in v:
                    text_on_page(item, id_json, list_res, page)
    return list_res


def get_page(data, page):
    lines = []
    for chunk in data["elements"]:
        lines.extend(text_on_page(chunk, "page", [], page))             
    return lines

In [22]:
file = 'pdfs/Nurse.pdf'
start = 22
end = 50

In [23]:
parser = HierarchyParser()
source = FileSource(file, page_numbers=list(range(start-1, end)))
document = parser.parse_pdf(source)
printer = JsonFilePrinter()
file_path = pathlib.Path('pdf.json')
printer.print(document, file_path=str(file_path.absolute()))

'c:\\Users\\james\\Documents\\Cornell\\2021SP\\CS4300\\Project\\CS4300_microGoogle\\pdf.json'

In [24]:
with open('pdf.json') as file:
    data = json.load(file)
file.close()
pages = {i + start : get_page(data,i) for i in range(0, end-start+1)}

In [25]:
pages[22]

['02doenges-02  2/2/04  11:56 AM  Page 4',
 'CHAPTER 2',
 'Application of\nthe Nursing Process',
 'Because  of their  hectic  schedules, many  nurses  believe  that\ntime spent writing plans of care is time taken away from client\ncare. Plans  of care  have  been  viewed  as “busy  work” to  satisfy\naccreditation requirements or the whims of supervisors. In real-\nity, however, quality  client  care  must  be  planned  and  coordi-\nnated. Properly  written  and  used  plans  of care  can  provide\ndirection and continuity of care by facilitating communication\namong  nurses  and  other  caregivers. They  also  provide  guide-\nlines  for  documentation  and  a  tool  for  evaluating  the  care\nprovided.',
 'The components of a plan of care are based on the nursing\nprocess. Creating  a  plan  of care  begins  with  the  collection  of\ndata (assessment). The client database consists of subjective and\nobjective  information  encompassing  the  various  concerns\nreflected  in  the  

In [None]:
##### SVD ####

In [26]:
(formatted_docs, paragraph_page_idx) = preprocessing3.get_formatted_docs(pages)
preprocessed_docs = preprocessing3.get_preprocessed_docs(formatted_docs)
tfidf_vectorizer = cosine3.get_tfidf_vectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(list(preprocessed_docs.values())).toarray()

In [27]:
tfidf_matrix.shape

(769, 1232)

In [28]:
query = 'components for a care plan'
q = cosine3.get_query_vector(query, tfidf_vectorizer)
cos_sims = cosine3.get_cosine_sim(q, tfidf_matrix)
(rankings, scores) = cosine3.get_rankings(cos_sims)
cosine3.display_rankings(rankings, scores, formatted_docs, paragraph_page_idx)

1,   cosine score: 4.320900989649408,   page: 22
Because of their hectic schedules, many nurses believe that time spent writing plans of care is time taken away from client care. Plans of care have been viewed as “busy work” to satisfy accreditation requirements or the whims of supervisors. In reality, however, quality client care must be planned and coordinated. Properly written and used plans of care can provide direction and continuity of care by facilitating communication among nurses and other caregivers. They also provide guidelines for documentation and a tool for evaluating the care provided.


2,   cosine score: 4.267026528189695,   page: 29
The plan of care documents client care in areas of accountability, quality assurance, and liability. The nurse needs to plan care with the client, because both are accountable for that care and for achieving the desired outcomes.


3,   cosine score: 4.224903211808756,   page: 29
Healthcare providers have a responsibility for planning with

In [29]:
(U, s, Vh) = cosine3.get_svd(tfidf_matrix)
query = 'components for a care plan'
q = cosine3.get_query_vector(query, tfidf_vectorizer)
cos_sims = cosine3.get_cosine_sim_svd(q, U, s, Vh)
(rankings, scores) = cosine3.get_rankings(cos_sims)
cosine3.display_rankings(rankings, scores, formatted_docs, paragraph_page_idx)

1,   cosine score: 2.777366234433362,   page: 29
Healthcare providers have a responsibility for planning with the client and family for continuation of care to the eventual outcome of an optimal state of wellness or a dignified death. Planning, setting goals, and choosing appropriate interventions are essential to the construction of a plan of care as well as to delivery of quality nursing care. These nursing activities comprise the planning phase of the nursing process and are documented in the plan of care for a particular client. As a part of the client’s permanent record, the plan of care not only provides a means for the nurse who is actively caring for the client to be aware of the client’s needs (NDs), goals, and actions to be taken, but it also substantiates the care provided for review by third-party payors and accreditation agencies, while meeting legal requirements.


2,   cosine score: 2.7411974806238217,   page: 22
Because of their hectic schedules, many nurses believe tha