In [1]:
from fastcore.all import *
from google import genai

# LLM

In [4]:
model = 'gemini-2.0-flash'

In [3]:
client = genai.Client()

## Simple call example

In [28]:
r = client.models.generate_content(
    model=model,
    contents='Why is the sky blue?',
    config=genai.types.GenerateContentConfig(systemInstruction='Provide small, precise answers.'),
)
print(r.text)

Rayleigh scattering: air molecules scatter blue light from the sun more than other colors.



## Chat example

In [56]:
chat = client.chats.create(
    model=model,
    config=genai.types.GenerateContentConfig(
        systemInstruction='Eres un asistente que responde de manera breve y concisa.',
    ),
)

In [57]:
r = chat.send_message('¿Qué es la felicidad?')
print(r.text)

Un estado de ánimo caracterizado por emociones de alegría, satisfacción y bienestar.


In [58]:
r = chat.send_message('Explicame por qué', )
print(r.text)

La felicidad es subjetiva, influenciada por las circunstancias, las relaciones y la salud mental y física.


In [59]:
chat._curated_history

[UserContent(
   parts=[
     Part(
       text='¿Qué es la felicidad?'
     ),
   ],
   role='user'
 ),
 Content(
   parts=[
     Part(
       text='Un estado de ánimo caracterizado por emociones de alegría, satisfacción y bienestar.'
     ),
   ],
   role='model'
 ),
 UserContent(
   parts=[
     Part(
       text='Explicame por qué'
     ),
   ],
   role='user'
 ),
 Content(
   parts=[
     Part(
       text='La felicidad es subjetiva, influenciada por las circunstancias, las relaciones y la salud mental y física.'
     ),
   ],
   role='model'
 )]

In [66]:
# Eliminamos la ultima interaccion
chat._curated_history = chat._curated_history[:-2]
chat._curated_history

[UserContent(
   parts=[
     Part(
       text='¿Qué es la felicidad?'
     ),
   ],
   role='user'
 ),
 Content(
   parts=[
     Part(
       text='Un estado de ánimo caracterizado por emociones de alegría, satisfacción y bienestar.'
     ),
   ],
   role='model'
 )]

In [82]:
# Editamos la respuesta del modelo
chat._curated_history[1].parts[0].text = 'Es como una caja de chocolates con muchos sabores.'
chat._curated_history

[UserContent(
   parts=[
     Part(
       text='¿Qué es la felicidad?'
     ),
   ],
   role='user'
 ),
 Content(
   parts=[
     Part(
       text='Es como una caja de chocolates con muchos sabores.'
     ),
   ],
   role='model'
 )]

In [83]:
r = chat.send_message('Explicame por qué')
print(r.text)

La felicidad es subjetiva; lo que alegra a uno puede no alegrar a otro, al igual que las preferencias de sabor.


## Usando informacion externa

In [84]:
instructions = '''You are a helpful AI assistant expert coding with python that likes concise short code. \
Your task is to provide brief answers with concise code snippets.'''
prompt = '''Using python, how can I read an url with a text file?, write a function that I can use like: \
`text = read_url_text(url)`'''
r = client.models.generate_content(
    model=model,
    contents=prompt,
    config=genai.types.GenerateContentConfig(systemInstruction=instructions),
)
print(r.text)

```python
import requests

def read_url_text(url):
  response = requests.get(url)
  response.raise_for_status()
  return response.text
```



In [90]:
# Usemos markdown para mostrar la respuesta del modelo
from IPython.display import Markdown
Markdown(r.text)

```python
import requests

def read_url_text(url):
  response = requests.get(url)
  response.raise_for_status()
  return response.text
```


In [6]:
import requests

def read_url_text(url):
  response = requests.get(url)
  response.raise_for_status()
  return response.text

In [95]:
documentation = read_url_text('https://hdbscan.readthedocs.io/en/latest/_sources/how_hdbscan_works.rst.txt')
print(documentation[:800])


How HDBSCAN Works

HDBSCAN is a clustering algorithm developed by `Campello, Moulavi, and
Sander <http://link.springer.com/chapter/10.1007%2F978-3-642-37456-2_14>`__.
It extends DBSCAN by converting it into a hierarchical clustering
algorithm, and then using a technique to extract a flat clustering based
in the stability of clusters. The goal of this notebook is to give you
an overview of how the algorithm works and the motivations behind it. In
contrast to the HDBSCAN paper I'm going to describe it without reference
to DBSCAN. Instead I'm going to explain how I like to think about the
algorithm, which aligns more closely with `Robust Single
Linkage <http://cseweb.ucsd.edu/~dasgupta/papers/tree.pdf>`__ with `flat
cluster
extraction <http://link.springer.com/article/10.10


In [10]:
documentation = read_url_text('https://scikit-learn.org/stable/_sources/modules/clustering.rst.txt')
print(documentation[:300])

.. _clustering:

Clustering

`Clustering <https://en.wikipedia.org/wiki/Cluster_analysis>`__ of
unlabeled data can be performed with the module :mod:`sklearn.cluster`.

Each clustering algorithm comes in two variants: a class, that implements
the ``fit`` method to learn the clu


In [14]:
instructions = f'''\
<assistant>
You are a helpful AI assistant specializing in answering questions about the HDBSCAN clustering \
algorithm. You will be provided with documentation and your task is to analyze the documentation to \
provide a clear, accurate answer to the user's question.
</assistant>

<documentation>
{documentation}
</documentation>

<rules>
- Maintain a concise, educational tone
- Provide brief, clear explanation
- Mention relevant documentation
- If the question cannot be fully answered using only the provided documentation, state this clearly
</rules>

<style>
- Use Markdown formatting in your responses
</style>
'''
chat = client.chats.create(
    model=model,
    config=genai.types.GenerateContentConfig(systemInstruction=instructions),
)

In [15]:
r = chat.send_message('''\
Help me build my intuition about the HDBSCAN clustering algorithm: How does it work? When should I use it?
''')
Markdown(r.text)

HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that extends DBSCAN by removing the assumption that the clustering criterion (density requirement) is globally homogeneous.

Here's how it works, according to the documentation:

1.  **Mutual Reachability Graph:** HDBSCAN computes core distances for each sample, which is the distance to its `min_samples`-th nearest neighbor. It then calculates the mutual reachability distance between pairs of points, which is the maximum of their core distances and the distance between them. These distances form the mutual reachability graph.

2.  **Hierarchical Clustering:** HDBSCAN extracts a minimum spanning tree (MST) from the mutual reachability graph and then greedily cuts edges with the highest weight. This process is equivalent to performing DBSCAN\* clustering across all possible density scales in a hierarchical fashion.

3.  **Cluster Extraction:** HDBSCAN can be smoothed with the additional hyperparameter `min_cluster_size`, which specifies that during the hierarchical clustering, components with fewer than `minimum_cluster_size` many samples are considered noise. In practice, one can set `minimum_cluster_size = min_samples` to couple the parameters and simplify the hyperparameter space.

You should consider using HDBSCAN when:

*   Your data has **non-flat geometry** and **uneven cluster sizes**.
*   You want **outlier removal**.
*   You need a **hierarchical** clustering structure.
*   Your clusters have **variable density**.
*   You want to use a density-based clustering algorithm that doesn't assume clusters are globally homogeneous

See the  :ref:`HDBSCAN <hdbscan>` section of the documentation for more details.


In [16]:
r = chat.send_message('Translate your last answer to spanish.')
Markdown(r.text)

HDBSCAN (Clustering Espacial Jerárquico Basado en Densidad de Aplicaciones con Ruido) es un algoritmo de clustering que extiende DBSCAN eliminando la suposición de que el criterio de clustering (requisito de densidad) es globalmente homogéneo.

Así es como funciona, según la documentación:

1.  **Gráfico de Mutua Alcanzabilidad (Mutual Reachability Graph):** HDBSCAN calcula las distancias centrales (core distances) para cada muestra, que es la distancia a su vecino más cercano número `min_samples`. Luego, calcula la distancia de mutua alcanzabilidad entre pares de puntos, que es el máximo de sus distancias centrales y la distancia entre ellos. Estas distancias forman el gráfico de mutua alcanzabilidad.

2.  **Clustering Jerárquico (Hierarchical Clustering):** HDBSCAN extrae un árbol de expansión mínimo (MST) del gráfico de mutua alcanzabilidad y luego corta de forma codiciosa (greedy) las aristas con el peso más alto. Este proceso es equivalente a realizar clustering DBSCAN\* en todas las posibles escalas de densidad de forma jerárquica.

3.  **Extracción de Clústeres (Cluster Extraction):** HDBSCAN se puede suavizar con un hiperparámetro adicional `min_cluster_size`, que especifica que durante el clustering jerárquico, los componentes con menos de `minimum_cluster_size` muestras se consideran ruido. En la práctica, se puede establecer `minimum_cluster_size = min_samples` para acoplar los parámetros y simplificar el espacio de hiperparámetros.

Deberías considerar usar HDBSCAN cuando:

*   Tus datos tengan una **geometría no plana** y **tamaños de clúster desiguales**.
*   Necesites **eliminar valores atípicos (outliers)**.
*   Necesites una estructura de clustering **jerárquica**.
*   Tus clústeres tengan **densidad variable**.
*   Quieras utilizar un algoritmo de clustering basado en densidad que no asuma que los clústeres son globalmente homogéneos.

Consulta la sección :ref:`HDBSCAN <hdbscan>` de la documentación para obtener más detalles.
