# RAG desde web - INGLÉS
Este RAG recibe el contenido de un artículo de Towards Data Science 'Businness skills you need to progress in your Data Science Career 2025"
Posteriormente responde a unas respuestas en inglés.
Finalmente se puede ver la ejecución de una interfaz gráfica a la que se puede acceder a través de localhost blablabla

## 1. Código

In [13]:
# Introducción documento "externo"
import requests
from bs4 import BeautifulSoup

web_link='https://towardsdatascience.com/3-business-skills-you-need-to-progress-your-data-science-career-in-2025-146f841d1a1e'

response = requests.get(web_link)
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    text = soup.get_text(separator="\n", strip=True)



In [14]:
# Split del texto recibido
import langchain

print(langchain.__version__)

from langchain.text_splitter  import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n",  
    chunk_size=800,  
    # chunk_overlap=200 
)

splits = text_splitter.split_text(text)
print(len(splits))

0.3.11
19


In [15]:
# Vectorizar
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

from langchain_community.vectorstores import Chroma

vector_store = Chroma.from_texts(
    texts=splits,
    collection_name="ds_career",
    embedding=embeddings,
    persist_directory="./chroma_ds_career",
)

retriever = vector_store.as_retriever()

In [16]:
# Ollama 
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate
from langchain_ollama.chat_models import ChatOllama
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

ollama_llm = "llama3.2"
model_local = ChatOllama(model=ollama_llm)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model_local
    | StrOutputParser()
)


## 2. Preguntas

In [17]:
chain.invoke("What would you do if you would like to progress in your data science career?")

"Based on the context, it seems that Dr. Varshita Sher is a popular writer and expert in data science and generative AI. Given her credentials as an Oxford Alumni and a 2x Top writer on Medium, she appears to be knowledgeable about best practices for advancing in a data science career.\n\nHowever, there is no direct answer to the question provided in the context. Nevertheless, I can infer some potential steps one might consider taking to progress in their data science career:\n\n1. Build relevant skills: Stay up-to-date with industry trends and develop expertise in specific areas of data science.\n2. Network: Connect with professionals in your desired field, attend conferences or workshops, and participate in online communities.\n3. Stay curious: Continuously learn and expand your knowledge by reading books, articles, and attending online courses.\n4. Gain practical experience: Apply theoretical concepts to real-world problems, either through personal projects or collaborations.\n5. Pu

In [18]:
chain.invoke("Would you insult your boss if you wanted to improve in your data science career?")

'No, the context does not suggest that insulting your boss would be an effective way to improve your data science career. In fact, the text mentions "stakeholder-managing" and "strategy-setting," which implies a more collaborative and professional approach. The tone of the article is also informative and supportive, suggesting that Dr. Varshita Sher\'s goal is to help readers develop business skills for success in their careers, rather than offering advice on how to insult or manipulate others.'

In [19]:
chain.invoke("Is it a good idea to be a good a communication?")

'Yes, according to the context, it is a good idea to have good communication skills. The passage states that "soft communication skills — useful for managing teams, data storytelling, and cross-team collaboration — come naturally to them" (likely referring to those who are lucky enough to possess these skills), but also acknowledges that "For the rest, there’s hope! With practice, achieving any skill is possible." This implies that having good communication skills can be beneficial in a leadership role as a data scientist, and that it\'s never too late to develop them with practice.'

## 3. Creación GUI

In [20]:
def search_chroma(query, top_k):
    try:   
        results = vector_store.similarity_search(query, k=top_k)
        return chain.invoke(query)
    except Exception as e:
        return f"Error: {e}"

In [21]:
import gradio as gr
with gr.Blocks() as demo:
    gr.Markdown("### Chroma Database Search")
    
    with gr.Row():
        query_input = gr.Textbox(label="Enter Your Query", placeholder="Type your question here...")
        top_k_input = gr.Slider(1, 10, step=1, value=5, label="Number of Results")

    search_button = gr.Button("Search")
    output_box = gr.Textbox(label="Search Results", lines=15)

    # Bind the function to the Gradio UI
    search_button.click(fn=search_chroma, inputs=[query_input, top_k_input], outputs=output_box)

# Launch the App
demo.launch()

* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


