## Resume Screening App - Report {-}

---

### 1. Components: {-}

app.py : containing the script to be run for the app

utils.py : contains the working of the app

### 2. utils.py : {-}

#### 2.1 Extracting data from a single pdf :

- Using pypdf to extract data, page by page into a single 'text' object and return it

In [None]:
def get_pdf_text(pdf_doc):
    text = ""
    pdf_reader = PdfReader(pdf_doc)
    for page in pdf_reader.pages:
        text += page.extract_text()
    return text

#### 2.2 Extracting data from all uploaded pdf files : {-}

- The user will uploads multiple pdfs, to process them, the create_docs function works as: 

1. Recieves the list of uploaded documents and unique ids
2. Instantiate a list <b> docs </b> to store all information
2. Iterate over all document
3. For each document:
> 3.1 apply get_pdf_text function to extract all text data from the pdf <br>
> 3.2 Create a Langchain schema <b>(Document)</b> taking data from result fo 3.1 and having <u> metadata</u> fields : name,type,size,unique_id populated from the document <br>
> 3.3 Append the Document schema object to the <b>Docs</b> list <br><br>

4. Return the data of all uploaded pdfs stored in Docs

In [None]:
def create_docs(user_pdf_list, unique_id):
    docs=[]
    for filename in user_pdf_list:
        
        chunks=get_pdf_text(filename)

        #Adding items to our list - Adding data & its metadata
        docs.append(Document(
            page_content=chunks,
        metadata={"name": filename.name,"id":filename.file_id,"type=":filename.type,"size":filename.size,"unique_id":unique_id},
        ))

    return docs

### 2.3 Instantiating embeddings model object : {-}

- The downloaded model will make embeddings (i.e.) vectors from given text data. 

In [None]:
def create_embeddings_load_data():
    #embeddings = OpenAIEmbeddings()
    embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
    return embeddings

### 2.4 Generating vectors of data and uploading to Pinecone : {-}

- Applying the embeddings model to the data 
- Generating the embedding vectors of the data
- Storing the embedding vectors to the given index in Pinecone

In [None]:
def push_to_pinecone(pinecone_apikey,pinecone_environment,pinecone_index_name,embeddings,docs):

    pinecone.init(
    api_key=pinecone_apikey,
    environment=pinecone_environment
    )
    
    Pinecone.from_documents(docs, embeddings, index_name=pinecone_index_name)

### 2.5 Downloading the vectors from Pinecone : {-}

- In <b>2.4</b>, we uploaded the generated vectors to an index in Pinecone.
- Here, we are downloading all the vectors from that index using the <b> .from_existing_index() </b> function
- Returning the object

In [None]:
def pull_from_pinecone(pinecone_apikey,pinecone_environment,pinecone_index_name,embeddings):
    print("20secs delay...")
    time.sleep(20)
    pinecone.init(
    api_key=pinecone_apikey,
    environment=pinecone_environment
    )

    index_name = pinecone_index_name

    index = Pinecone.from_existing_index(index_name, embeddings)
    return index

### 2.6 Extracting relevant documents from Pinecone : {-}

- The idea behind selecting resume is to find the resume documents that match the JD.
- The relevant documents will be the ones having similarity to the text of the JD.
- <u> Using pull_from_pinecone() defined in 2.5</u> , we download all documents stored in the Pinecone index,
- The .similarity_search_with_score() function is used to find the similarity of the JD to all of the documents.
- Return the output.

In [None]:
def similar_docs(query,k,pinecone_apikey,pinecone_environment,pinecone_index_name,embeddings,unique_id):

    pinecone.init(
    api_key=pinecone_apikey,
    environment=pinecone_environment
    )

    index_name = pinecone_index_name

    index = pull_from_pinecone(pinecone_apikey,pinecone_environment,index_name,embeddings)
    similar_docs = index.similarity_search_with_score(query, int(k),{"unique_id":unique_id})
    #print(similar_docs)
    return similar_docs

### 2.7 Summarise the contents of a document : {-}

In [None]:
def get_summary(current_doc):
    llm = OpenAI(temperature=0)
    #llm = HuggingFaceHub(repo_id="bigscience/bloom", model_kwargs={"temperature":1e-10})
    chain = load_summarize_chain(llm, chain_type="map_reduce")
    summary = chain.run([current_doc])

    return summary

---