# Using LLMs to analyse `Zonaprop` listings
- https://www.machinelearnear.com/
- https://www.youtube.com/@machinelearnear
- https://www.youtube.com/watch?v=DuujwXarVbY&ab_channel=SantiagoMagnin

In [2]:
import pandas as pd
import numpy as np

In [3]:
# load data from local
zonaprop_listings = pd.read_csv('processed/zonaprop_with_viewcount.csv')

## Que criterio vamos a usar para filtrar las propiedades?
Nos vamos a basar en esto que dice el groso de [Santiago Magnin](https://twitter.com/santivende) en este video [*"GUÍA 2024 para COMPRADORES 💣 Webinar bomba 💣 ¿Cómo comprar con Javier Milei Presidente?"*](https://www.youtube.com/watch?v=DuujwXarVbY&ab_channel=SantiagoMagnin).

En resumen, para `Zonaprop`:
- (1) ~60 visitas/dia, se reservan en la primera semana, negociación promedio del 2%
- (2) ~30 visitas/dia, se reservan entre 8 y 30 dias, negociación promedio del 5%
- (3) ~20 visitas/dia, se reservan entre 31 y 60 dias, negociación promedio del 7%
- (4) ~10 visitas/dia, se reservan en más de 60 dias, negociación promedio del 8%

## Como traemos la información de `Airbnb`?

In [94]:
from sklearn.neighbors import BallTree

def find_within_radius(reference_df, target_df, radius_km=1):
    # Convert radius from kilometers to radians for BallTree
    radius_rad = radius_km / 6371  # Earth's radius in km

    # Creating BallTree with target data
    tree = BallTree(np.deg2rad(target_df[['latitude', 'longitude']].values), metric='haversine')

    # Initialize DataFrame to store results
    results_df = pd.DataFrame()

    # Iterate over reference DataFrame
    for index, row in reference_df.iterrows():
        # Query BallTree
        indices = tree.query_radius(np.deg2rad([[row['latitude'], row['longitude']]]), r=radius_rad)
        # Extract relevant rows from target DataFrame
        filtered_df = target_df.iloc[indices[0]]
        results_df = pd.concat([results_df, filtered_df])

    return results_df

In [95]:
airbnb_listings = pd.read_csv('processed/airbnb_listings.csv')
# airbnb_reviews = pd.read_csv('processed/airbnb_reviews.csv')

### Encontrar los listings mas cercanos

In [102]:
def find_within_radius(row, airbnb_listings, radius_km):
    radius_rad = radius_km / 6371
    valid_listings = airbnb_listings.dropna(subset=['latitude', 'longitude'])
    tree = BallTree(np.deg2rad(valid_listings[['latitude', 'longitude']].values), metric='haversine')
    if pd.notnull(row['latitude']) and pd.notnull(row['longitude']):
        indices = tree.query_radius(np.deg2rad([[row['latitude'], row['longitude']]]), r=radius_rad)
        return valid_listings.iloc[indices[0]]
    return pd.DataFrame()

In [134]:
from sklearn.neighbors import BallTree
import numpy as np
import pandas as pd

def obtener_info_relevante(listings, airbnb_listings, radius_km):
    listings = listings.copy()
    listings['latitude'] = pd.to_numeric(listings['latitude'], errors='coerce')
    listings['longitude'] = pd.to_numeric(listings['longitude'], errors='coerce')

    def find_within_radius(row, radius_km):
        radius_rad = radius_km / 6371
        valid_listings = airbnb_listings.dropna(subset=['latitude', 'longitude'])
        tree = BallTree(np.deg2rad(valid_listings[['latitude', 'longitude']].values), metric='haversine')
        if pd.notnull(row['latitude']) and pd.notnull(row['longitude']):
            indices = tree.query_radius(np.deg2rad([[row['latitude'], row['longitude']]]), r=radius_rad)
            return valid_listings.iloc[indices[0]]
        return pd.DataFrame()

    for index, row in listings.iterrows():
        closest_listings = find_within_radius(row, radius_km)
        if closest_listings.empty: continue
        filtered = closest_listings.dropna(subset=['estimated_price_per_night_in_USD', 'review_scores_rating', 
                           'review_scores_location', 'review_scores_value', 'room_type', 'estimated_nights_booked_l30d'])

        # calcular la probabilidad de alquiler
        booking_counts = filtered['estimated_nights_booked_l30d'].value_counts()
        probabilidad_alquiler = 'más probable' if (booking_counts.get('high', 0) > booking_counts.get('low', 0)
                                                  ) and (booking_counts.get('high', 0) > booking_counts.get('medium', 0)) else 'menos probable'
        listings.at[index, 'airbnb_probabilidad_alquiler'] = probabilidad_alquiler
        
        # otros numeros grosos
        listings.at[index, 'airbnb_avg_price_entire_home'] = filtered[filtered['room_type'] == 'Entire home/apt']['estimated_price_per_night_in_USD'].mean()
        listings.at[index, 'airbnb_avg_price_private_room'] = filtered[filtered['room_type'] == 'Private room']['estimated_price_per_night_in_USD'].mean()
        listings.at[index, 'airbnb_avg_review_score_rating'] = filtered['review_scores_rating'].mean()
        listings.at[index, 'airbnb_avg_review_score_location'] = filtered['review_scores_location'].mean()
        listings.at[index, 'airbnb_avg_review_score_value'] = filtered['review_scores_value'].mean()

    return listings

In [135]:
# # Suponiendo que high_relevance_listings y airbnb_listings son tus DataFrames
a = obtener_info_relevante(high_relevance_listings, airbnb_listings, 0.3)

In [136]:
a.loc[0]

listing_url                         https://www.zonaprop.com.ar/propiedades/depart...
asking_price_in_usd                                                            110000
expensas_in_ars                                                               15000.0
latitude                                                                   -34.574835
longitude                                                                   -58.44179
google_maps                         https://maps.google.com/?q=-34.5748353589,-58....
photos                              ['https://imgar.zonapropcdn.com/avisos/1/00/52...
whatsapp                                                              5492213588285.0
published_on                                                               19-01-2024
disposición                                                                    Frente
luminoso                                                                 Muy luminoso
orientación                                           

In [123]:
a

Unnamed: 0,listing_url,asking_price_in_usd,expensas_in_ars,latitude,longitude,google_maps,photos,whatsapp,published_on,disposición,...,usd_per_m2,avg_price_entire_home,avg_price_private_room,booking_likelihood_low,booking_likelihood_medium,booking_likelihood_high,avg_review_score_rating,avg_review_score_location,avg_review_score_value,probabilidad_alquiler
0,https://www.zonaprop.com.ar/propiedades/depart...,110000,15000.0,-34.574835,-58.44179,"https://maps.google.com/?q=-34.5748353589,-58....",['https://imgar.zonapropcdn.com/avisos/1/00/52...,5492214000000.0,19-01-2024,Frente,...,1692.0,58.05,156.666667,118.0,21.0,24.0,4.763436,4.835153,4.67681,menos probable
1,https://www.zonaprop.com.ar/propiedades/depto-...,120000,40000.0,,,https://maps.google.com/?q=No se encontró la l...,['https://imgar.zonapropcdn.com/avisos/1/00/52...,5491168000000.0,19-01-2024,Frente,...,1935.0,,,,,,,,,


In [None]:
# filtering out NaN values for relevant columns
filtered_df = closest_listings.dropna(subset=['estimated_price_per_night_in_USD', 'review_scores_rating', 
                                              'review_scores_location', 'review_scores_value', 'room_type'])

# average estimated price per night in USD, considering room type
average_price_entire_home = filtered_df[filtered_df['room_type'] == 'Entire home/apt']['estimated_price_per_night_in_USD'].mean()
average_price_private_room = filtered_df[filtered_df['room_type'] == 'Private room']['estimated_price_per_night_in_USD'].mean()

# count of estimated nights booked in the last 30 days by category
booking_counts = filtered_df['estimated_nights_booked_l30d'].value_counts()

# average review scores
average_review_score_rating = filtered_df['review_scores_rating'].mean()
average_review_score_location = filtered_df['review_scores_location'].mean()
average_review_score_value = filtered_df['review_scores_value'].mean()

# generate markdown summary
markdown_summary = f"""
### Summary of Airbnb Listings Near Zonaprop Location

- **Average Estimated Price Per Night in USD**:
    - Entire home/apt: {average_price_entire_home:.2f} USD
    - Private room: {average_price_private_room:.2f} USD (if applicable)

- **Booking Likelihood (Last 30 Days)**:
    - Low: {booking_counts.get('low', 0)} listings
    - Medium: {booking_counts.get('medium', 0)} listings
    - High: {booking_counts.get('high', 0)} listings
    - The data suggests that an apartment in this neighbourhood is {'more likely' if (booking_counts.get('high', 0) > booking_counts.get('low', 0)) and (booking_counts.get('high', 0) > booking_counts.get('medium', 0)) else 'less likely'} to get rented.

- **Average Review Scores**:
    - Overall Rating: {average_review_score_rating:.2f}/5
    - Location Rating: {average_review_score_location:.2f}/5
    - Value Rating: {average_review_score_value:.2f}/5
"""

## Categoria (1)

In [125]:
high_relevance_listings = zonaprop_listings[zonaprop_listings.views_per_day > 60].reset_index(drop=True)
high_relevance_listings = high_relevance_listings.dropna(axis=1, how='all')

In [126]:
high_relevance_listings

Unnamed: 0,listing_url,asking_price_in_usd,expensas_in_ars,latitude,longitude,google_maps,photos,whatsapp,published_on,disposición,...,antigüedad,cantidadplantas,superficiesemicubiertam²,dormitorios,aptocrédito,cantidadpisosenedificio,user_views,days,views_per_day,usd_per_m2
0,https://www.zonaprop.com.ar/propiedades/depart...,110000,15000.0,-34.5748353589,-58.4417900061,"https://maps.google.com/?q=-34.5748353589,-58....",['https://imgar.zonapropcdn.com/avisos/1/00/52...,5492214000000.0,19-01-2024,Frente,...,,1,0.0,2.0,Apto crédito,,84,1,84,1692.0
1,https://www.zonaprop.com.ar/propiedades/depto-...,120000,40000.0,No se encontró la latitud,No se encontró la longitud,https://maps.google.com/?q=No se encontró la l...,['https://imgar.zonapropcdn.com/avisos/1/00/52...,5491168000000.0,19-01-2024,Frente,...,30.0,1,,,Apto crédito,9.0,111,1,111,1935.0


In [64]:
# Acceder al ítem específico donde views_per_day es 0
item = zonaprop_listings[zonaprop_listings.views_per_day > 30].reset_index().loc[0]

# Crear una lista para guardar cada par columna-valor
pares_columna_valor = []

# Recorrer cada columna y su valor, excluyendo los NaN
for columna, valor in item.items():
    if pd.notnull(valor):
        pares_columna_valor.append(f'{columna}: {valor}')

# Unir todos los pares en una sola string
string_resultante = ', '.join(pares_columna_valor)

print(string_resultante)

index: 1, listing_url: https://www.zonaprop.com.ar/propiedades/departamento-en-venta-2-dorm.-1-bano-65-m-sup2--52950315.html, asking_price_in_usd: 110000, expensas_in_ars: 15000.0, latitude: -34.5748353589, longitude: -58.4417900061, google_maps: https://maps.google.com/?q=-34.5748353589,-58.4417900061, photos: ['https://imgar.zonapropcdn.com/avisos/1/00/52/95/03/15/1200x1200/1895481551.jpg?isFirstImage=true', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/03/15/1200x1200/1895481552.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/03/15/1200x1200/1895481535.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/03/15/1200x1200/1895481550.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/03/15/1200x1200/1895481531.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/03/15/1200x1200/1895481545.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/03/15/1200x1200/1895481530.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/03/15/1200x1200/1895481537.jpg'], whatsapp: 5

## Usamos `Mixtral` a través de `Together.ai` 

In [65]:
from openai import OpenAI
import os

In [66]:
TOGETHER_API_KEY = ""
client = OpenAI(api_key=TOGETHER_API_KEY, base_url='https://api.together.xyz')

In [67]:
system_instructions = "You are an AI assistant that always responds in Argentinian Spanish and you are concise and professional"
user_query = f"""
Using the following information from a property listing in Buenos Aires, write a summary, paying more attention to
`asking_price` and `usd_per_m2`. Skip any URL.
{string_resultante}
"""

In [68]:
messages=[
    {"role": "system", "content": system_instructions},
    {"role": "user", "content": user_query}
]

In [69]:
chat_completion = client.chat.completions.create(
    messages=messages,
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    max_tokens=1024
)

In [70]:
print(chat_completion.choices[0].message.content)

Se trata de un departamento en venta en Buenos Aires, con una superficie cubierta de 65 m² y un precio de venta de USD 110.000. Ubicado en una zona con fácil acceso y muy iluminado, cuenta con dos dormitorios, un baño y tres ambientes en total. El valor por metro cuadrado es de USD 1692. El edificio cuenta con una sola planta y el departamento está orientado al este. El mantenimiento del edificio es de ARS 15.000 por mes. El departamento está aprobado para crédito y ha recibido 84 visitas en un día, lo que equivale a 84 visitas por día. Está disponible en la siguiente dirección: (-34.5748353589, -58.4417900061). Para más información, por favor comuníquese al número de WhatsApp (+5492213588285). Publicado el 19-01-2024.


index: 0, listing_url: https://www.zonaprop.com.ar/propiedades/departamento-en-venta-en-retiro-52950487.html, asking_price_in_usd: 84000, expensas_in_ars: 96989.0, latitude: -34.5967581064, longitude: -58.3731719, google_maps: https://maps.google.com/?q=-34.5967581064,-58.3731719, photos: ['https://imgar.zonapropcdn.com/avisos/1/00/52/95/04/87/1200x1200/1895489221.jpg?isFirstImage=true', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/04/87/1200x1200/1895485012.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/04/87/1200x1200/1895485006.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/04/87/1200x1200/1895485004.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/04/87/1200x1200/1895484993.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/04/87/1200x1200/1895484982.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/04/87/1200x1200/1895484995.jpg', 'https://imgar.zonapropcdn.com/avisos/1/00/52/95/04/87/1200x1200/1895484986.jpg'], whatsapp: 5491138334554.0, publish

In [18]:
zonaprop_listings[zonaprop_listings.views_per_day > 30].listing_url.values

array(['https://www.zonaprop.com.ar/propiedades/departamento-en-venta-2-dorm.-1-bano-65-m-sup2--52950315.html',
       'https://www.zonaprop.com.ar/propiedades/depto-de-2-ambientes-en-colegiales-dueno-direct-52953162.html',
       'https://www.zonaprop.com.ar/propiedades/bozzano-cespedes-52937110.html',
       'https://www.zonaprop.com.ar/propiedades/departamento-en-botanico-2-ambientes-balcon-terraza-52832520.html'],
      dtype=object)

In [2]:
# sample data for zonaprop_listings, single entry
zonaprop_listings = pd.DataFrame(
    {'latitude': [-34.5835146], 'longitude': [-58.4537686]})

## `AirBnB` listings & reviews

In [3]:
from sklearn.neighbors import BallTree

def find_within_radius(reference_df, target_df, radius_km=1):
    # Convert radius from kilometers to radians for BallTree
    radius_rad = radius_km / 6371  # Earth's radius in km

    # Creating BallTree with target data
    tree = BallTree(np.deg2rad(target_df[['latitude', 'longitude']].values), metric='haversine')

    # Initialize DataFrame to store results
    results_df = pd.DataFrame()

    # Iterate over reference DataFrame
    for index, row in reference_df.iterrows():
        # Query BallTree
        indices = tree.query_radius(np.deg2rad([[row['latitude'], row['longitude']]]), r=radius_rad)
        # Extract relevant rows from target DataFrame
        filtered_df = target_df.iloc[indices[0]]
        results_df = pd.concat([results_df, filtered_df])

    return results_df

### Find closest listings

In [4]:
airbnb_listings = pd.read_csv('processed/airbnb_listings.csv')

In [5]:
# sample usage
closest_listings = find_within_radius(
    zonaprop_listings, 
    airbnb_listings,
    radius_km=0.3,
)
closest_listings.head()

Unnamed: 0,id,listing_url,last_scraped,neighbourhood_cleansed,latitude,longitude,room_type,beds,price,number_of_reviews_l30d,review_scores_rating,review_scores_location,review_scores_value,estimated_nights_booked_l30d,bathrooms,estimated_price_per_night_in_USD
1035,3140077,https://www.airbnb.com/rooms/3140077,2023-12-27,Chacarita,-34.58347,-58.45243,Entire home/apt,3.0,43731.0,1,5.0,4.72,4.83,low,1.5,55.0
8659,40677601,https://www.airbnb.com/rooms/40677601,2023-12-27,Chacarita,-34.58407,-58.4527,Entire home/apt,1.0,18367.0,1,4.62,4.67,4.43,low,1.0,23.0
6298,32665979,https://www.airbnb.com/rooms/32665979,2023-12-29,Chacarita,-34.58369,-58.45257,Entire home/apt,1.0,,0,4.77,4.82,4.86,low,1.5,
13822,660221314715899969,https://www.airbnb.com/rooms/660221314715899969,2023-12-27,Chacarita,-34.58368,-58.45293,Entire home/apt,1.0,27550.0,1,4.89,5.0,4.78,low,1.0,35.0
6187,32378359,https://www.airbnb.com/rooms/32378359,2023-12-27,Chacarita,-34.58556,-58.45258,Entire home/apt,1.0,23300.0,0,5.0,5.0,5.0,low,1.0,29.0


In [6]:
# filtering out NaN values for relevant columns
filtered_df = closest_listings.dropna(subset=['estimated_price_per_night_in_USD', 'review_scores_rating', 
                                              'review_scores_location', 'review_scores_value', 'room_type'])

# average estimated price per night in USD, considering room type
average_price_entire_home = filtered_df[filtered_df['room_type'] == 'Entire home/apt']['estimated_price_per_night_in_USD'].mean()
average_price_private_room = filtered_df[filtered_df['room_type'] == 'Private room']['estimated_price_per_night_in_USD'].mean()

# count of estimated nights booked in the last 30 days by category
booking_counts = filtered_df['estimated_nights_booked_l30d'].value_counts()

# average review scores
average_review_score_rating = filtered_df['review_scores_rating'].mean()
average_review_score_location = filtered_df['review_scores_location'].mean()
average_review_score_value = filtered_df['review_scores_value'].mean()

# generate markdown summary
markdown_summary = f"""
### Summary of Airbnb Listings Near Zonaprop Location

- **Average Estimated Price Per Night in USD**:
    - Entire home/apt: {average_price_entire_home:.2f} USD
    - Private room: {average_price_private_room:.2f} USD (if applicable)

- **Booking Likelihood (Last 30 Days)**:
    - Low: {booking_counts.get('low', 0)} listings
    - Medium: {booking_counts.get('medium', 0)} listings
    - High: {booking_counts.get('high', 0)} listings
    - The data suggests that an apartment in this neighbourhood is {'more likely' if (booking_counts.get('high', 0) > booking_counts.get('low', 0)) and (booking_counts.get('high', 0) > booking_counts.get('medium', 0)) else 'less likely'} to get rented.

- **Average Review Scores**:
    - Overall Rating: {average_review_score_rating:.2f}/5
    - Location Rating: {average_review_score_location:.2f}/5
    - Value Rating: {average_review_score_value:.2f}/5
"""

In [7]:
print(markdown_summary)


### Summary of Airbnb Listings Near Zonaprop Location

- **Average Estimated Price Per Night in USD**:
    - Entire home/apt: 41.62 USD
    - Private room: 18.29 USD (if applicable)

- **Booking Likelihood (Last 30 Days)**:
    - Low: 48 listings
    - Medium: 6 listings
    - High: 8 listings
    - The data suggests that an apartment in this neighbourhood is less likely to get rented.

- **Average Review Scores**:
    - Overall Rating: 4.84/5
    - Location Rating: 4.83/5
    - Value Rating: 4.79/5

