# Set-up

In [4]:
import os
import base64
from langchain.document_loaders import PDFMinerLoader
from langchain.indexes import VectorstoreIndexCreator
from impersonate_gpt.utils.langchain_utils import get_actual_prompt
from langchain.callbacks import get_openai_callback
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Lanchaing creation

In [5]:
# OpenAI saved encoded in base64 in a single-line file at data folder
with open("../data/secrets.txt") as secrets:
    open_ai_key = secrets.readline()
    open_ai_key = base64.b64decode(open_ai_key).decode('ascii')
os.environ["OPENAI_API_KEY"] = open_ai_key

In [6]:
# Place your cv in data folder and load here
loader = PDFMinerLoader('../data/my_cv.pdf')

In [7]:
# Creating the cv database for document indexing
index_creator = VectorstoreIndexCreator()
docsearch = index_creator.from_loaders([loader])

Using embedded DuckDB without persistence: data will be transient


In [8]:
# Creating the langchain with the default OpenAI model for quering
chain = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    chain_type="stuff",
    retriever=docsearch.vectorstore.as_retriever(),
    input_key="question",
    verbose=True,
    return_source_documents=True,
)

# Playing with chain

In [9]:
# Simple question
question = "Who is Aitor Mira Abad?"
response = chain({"question": question})
print(response["result"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
 Aitor Mira Abad is a passionate artificial intelligence enthusiast with a strong background in data science, specifically in solving complex problems with data. He has several years of professional experience in the field, working with a variety of technologies and tools such as Python, JavaScript, HTML/CSS, Docker, Google Analytics, SQL, Flask, Pandas, Numpy, Scikit-learn, D3, Google Data Studio, Kibana, Visual Basic, Bash, TensorFlow, Qlik, and PyTorch. He has a Bachelor's degree in Civil Engineering and a Master's degree in Transportation Management and Mobility Behavior Analysis.


In [10]:
# Retrieving the prompt sent to OpenAI LLM. Interesting for understanding how langchain works.
question = "Who is Aitor Mira Abad?"
actual_prompt = get_actual_prompt(chain, question)
print(actual_prompt)

Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Aitor Mira Abad
Solving complex problems with data

About me
As a passionate artificial intelligence enthusiast, I embrace new technology and continuously engage in side
projects and hands-on testing to expand my knowledge and skills. My strong analytical skills and ability to think
critically make me particularly drawn to projects that involve high levels of uncertainty and the need for
comprehensive problem-solving.

Contact

+ info

+34 630422029

Detailed CV

Web Portfolio

Check following pages

https://itortouch.github.io

itortouch@gmail.com

Remote (Valencia, Spain)

Aitor Mira Abad

@iTortouch

Tech stack

Career

Python

SQL

NoSQL

Unix

Bash

Git

Docker Airflow

Pandas

Numpy

Scikit learn

Tensorflow Pytorch

Huggingface

Senior Data
Scientist
Oct 2021 - today
(1 year and 5 months)

Senior Data Scientist develo

In [11]:
# Getting the retrieved docs for query
chain._get_docs(question)

[Document(page_content='Aitor Mira Abad\nSolving complex problems with data\n\nAbout me\nAs a passionate artificial intelligence enthusiast, I embrace new technology and continuously engage in side\nprojects and hands-on testing to expand my knowledge and skills. My strong analytical skills and ability to think\ncritically make me particularly drawn to projects that involve high levels of uncertainty and the need for\ncomprehensive problem-solving.\n\nContact\n\n+ info\n\n+34 630422029\n\nDetailed CV\n\nWeb Portfolio\n\nCheck following pages\n\nhttps://itortouch.github.io\n\nitortouch@gmail.com\n\nRemote (Valencia, Spain)\n\nAitor Mira Abad\n\n@iTortouch\n\nTech stack\n\nCareer\n\nPython\n\nSQL\n\nNoSQL\n\nUnix\n\nBash\n\nGit\n\nDocker Airflow\n\nPandas\n\nNumpy\n\nScikit learn\n\nTensorflow Pytorch\n\nHuggingface\n\nSenior Data\nScientist\nOct 2021 - today\n(1 year and 5 months)\n\nSenior Data Scientist developing core\nalgorithms and researching innovative\nsolutions for the retail i

In [12]:
# More elaborated question with the OpenAI returning some useful execution information
with get_openai_callback() as cb:
    query = "Do you think Aitor is best suited for a ML Engineer or a Data Scientist position?. Make some bullet points with pros and cons for each position."
    response = chain({"question": query})
    print(response["result"])
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Successful Requests: {cb.successful_requests}")
    print(f"Total Cost (USD): ${cb.total_cost}")




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
 
ML Engineer:
Pros:
-Strong experience with machine learning libraries like Pytorch, TensorFlow, HuggingFace, etc.
-Experience with deep learning libraries like Scikit-learn, XGBoost, lightGBM, etc.
-Experience with technologies like AWS, Docker, GIT, etc.
Cons:
-Lack of experience with web-based technologies like HTML, CSS, and JavaScript

Data Scientist:
Pros:
-Strong experience with data manipulation libraries like Pandas, Numpy, etc.
-Experience with machine learning libraries like Scikit-learn, XGBoost, lightGBM, etc.
-Experience with web-based technologies like HTML, CSS, and JavaScript
Cons:
-Lack of experience with deep learning libraries like Pytorch, TensorFlow, HuggingFace, etc.
-Lack of experience with technologies like AWS, Docker, GIT, etc.
Total Tokens: 1388
Prompt Tokens: 1187
Completion Tokens: 201
Successful Requests: 1
Total Cost (USD): $0.02776
