# RAG desde web - INGLÉS
Este RAG recibe el contenido de un artículo de Towards Data Science 'Businness skills you need to progress in your Data Science Career 2025"
Posteriormente responde a unas respuestas en inglés.
Finalmente se puede ver la ejecución de una interfaz gráfica a la que se puede acceder a través de localhost blablabla

## 1. Código

In [None]:
# Introducción documento "externo"
import requests
from bs4 import BeautifulSoup

web_link='https://towardsdatascience.com/3-business-skills-you-need-to-progress-your-data-science-career-in-2025-146f841d1a1e'

response = requests.get(web_link)
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    text = soup.get_text(separator="\n", strip=True)



In [5]:
# Split del texto recibido
import langchain

print(langchain.__version__)

from langchain.text_splitter  import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n",  
    chunk_size=800,  
    # chunk_overlap=200 
)

splits = text_splitter.split_text(text)
print(len(splits))

0.3.11
19


In [9]:
# Vectorizar
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

from langchain_community.vectorstores import Chroma

vector_store = Chroma.from_texts(
    texts=splits,
    collection_name="ds_career",
    embedding=embeddings,
    persist_directory="./chroma_ds_career",
)

retriever = vector_store.as_retriever()

In [11]:
# Ollama 
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate
from langchain_ollama.chat_models import ChatOllama
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

ollama_llm = "llama3.2"
model_local = ChatOllama(model=ollama_llm)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model_local
    | StrOutputParser()
)


## 2. Preguntas

In [13]:
chain.invoke("What would you do if you would like to progress in your data science career?")

'Based on the provided context, Dr. Varshita Sher mentions that "If you have been a data scientist for a while, sooner or later you’ll notice that your day-to-day has shifted from a VSCode-loving, research paper-reading, git-version-committing data scientist to a collaboration-driving, project-scoping, stakeholder-managing, and strategy-setting individual."\n\nIt can be inferred that to progress in a data science career, one needs to acquire "Business Skills" such as collaboration, project management, stakeholder management, and strategy setting.'

In [14]:
chain.invoke("Would you insult your boss if you wanted to improve in your data science career?")

"Based on the provided context, it does not appear that insulting one's boss would be a recommended approach for improving their data science career. The articles provided seem to focus on more general and professional advice, such as learning business skills like collaboration, project-scoping, stakeholder-managing, and strategy-setting, in order to progress in a data science career."

In [15]:
chain.invoke("Is it a good idea to be a good a communication?")

'Yes, according to the context, being a good communicator is considered a crucial skill for data scientists who take on leadership roles. It can help them effectively manage teams, pitch their products, communicate insights, motivate their team members, and even negotiate better pay and secure funding opportunities. With practice, achieving any skill, including communication skills, is possible.'

## 3. Creación GUI

In [19]:
def search_chroma(query, top_k):
    try:   
        results = vector_store.similarity_search(query, k=top_k)
        return chain.invoke(query)
    except Exception as e:
        return f"Error: {e}"

In [20]:
import gradio as gr
with gr.Blocks() as demo:
    gr.Markdown("### Chroma Database Search")
    
    with gr.Row():
        query_input = gr.Textbox(label="Enter Your Query", placeholder="Type your question here...")
        top_k_input = gr.Slider(1, 10, step=1, value=5, label="Number of Results")

    search_button = gr.Button("Search")
    output_box = gr.Textbox(label="Search Results", lines=15)

    # Bind the function to the Gradio UI
    search_button.click(fn=search_chroma, inputs=[query_input, top_k_input], outputs=output_box)

# Launch the App
demo.launch()

* Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.


