## Extraer datos mediante la API de inmobiliaria

Pagina donde se hizo el webscraping: https://www.argenprop.com/ 

Paises admitidos: Argentina, Chile, Uruguay, Brasil(1propiedad)

In [2]:
#Librerias
import requests
import numpy as np
import pandas as pd
from geopy.geocoders import Nominatim
from datetime import datetime
import pytz

## 1. Creamos la conexión con la API para extraer los datos

In [None]:
%%time

url = 'http://reffindr-alb-1167121448.us-east-1.elb.amazonaws.com:4155/argenprop'

# Parámetros de la solicitud
params = {'pais': 'argentina', 'limite': 100}
response = requests.get(url, params=params)
print(response.status_code)

In [29]:
data = response.json()


JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [None]:
df = pd.DataFrame(data)
df

Unnamed: 0,Bathrooms,Bedrooms,CountryName,Description,Environments,Latitude,Longitude,Price,Seniority,StateName,Title,img
0,2,3,Argentina,"Casa en Barrio Privado Camino Real, cuenta con...",4,-3448684,-5858234,USD 2.300,27,Buenos Aires,Av. Camino Real Morón San Fernando 1500,[https://static1.sosiva451.com/08910661/ea4bac...
1,2,3,Argentina,Alquilo amplio Duplex en Florencio varela comb...,4,-3479154,-5827525,$ 490.000,10,Buenos Aires,Combate De San Lorenzo 1700,[https://static1.sosiva451.com/28339561/8cca67...
2,2,4,Argentina,DUEÑO ALQUILA\r\n\n •SIN EXPENSAS\r\n •SIN GAS...,5,0.0,0.0,USD 1.100,15,Buenos Aires,casa en lujan con pileta alquiler anual,[https://static1.sosiva451.com/09149461/6abd53...
3,2,4,Argentina,Imponente casa de estilo en la esquina de Alve...,5,-3294819,-6066076,$ 2.300.000,90,Santa Fe,Mendoza 2300,[https://static1.sosiva451.com/22799461/e73d3f...
4,5,5,Argentina,Exclusiva Casa en Alquiler en Puertos del Lago...,7,-34318497,-58742558,USD 7.500,0,Buenos Aires,Puertos del Lago Barrio Marinas Escobar Alquiler,[https://static1.sosiva451.com/48236461/e64ab1...
...,...,...,...,...,...,...,...,...,...,...,...,...
145,2,3,Argentina,[COUNTRY-927]\n\n Excelente oportunidad \n Al...,4,-3434615,-5890201,USD 900,2,Buenos Aires,San Sebastián (área 13),[https://static1.sosiva451.com/10538561/3cef0a...
146,,2,Argentina,TIPO CASA en Bulnes entre Bme Mitre y Rivadavi...,3,-34609825,-58418743,$ 800.000,0,Capital Federal,Bulnes 100,[https://static1.sosiva451.com/3148618/a52eaa3...
147,2,5,Argentina,Salaya Romera propiedades OFRECE : un muy bue...,6,-34598396,-58406845,USD 5.000,90,Capital Federal,Jean Jaures al 900,[https://static1.sosiva451.com/50130521/ff21b9...
148,5,4,Argentina,Imponente casa en la Horqueta. \n Hermoso jard...,5,-3448789,-5855991,USD 4.000,20,Buenos Aires,Blanco Encalada 800,[https://static1.sosiva451.com/81514141/17d741...


## 2. Transformación de datos

### 2.1 Tabla de datos de Propietarios

Copia del df original

In [121]:
df_properties = df.copy()

Función para convertir la columna Price en moneda de pesos argentinos

In [122]:
def convert_to_ars(price):
    exchange_rate = 1011.61 
    price = str(price)
    if 'USD' in price:
        price_numeric = float(price.replace('USD', '').replace('.', '').replace(',', '.').strip())
        return price_numeric * exchange_rate
    elif '$' in price:
        price_numeric = float(price.replace('$', '').replace('.', '').replace(',', '.').strip())
        return price_numeric
    else:
        return None

In [123]:
df_properties['Price'] = df_properties['Price'].apply(convert_to_ars)

Arreglo de string de latitud y longitud

In [124]:
# Reemplazar comas por puntos solo en las columnas Latitude y Longitude
df_properties['Latitude'] = df_properties['Latitude'].apply(lambda x: str(x).replace(',', '.') if isinstance(x, str) else str(x))
df_properties['Longitude'] = df_properties['Longitude'].apply(lambda x: str(x).replace(',', '.') if isinstance(x, str) else str(x))

# Convertir las columnas a float
df_properties['Latitude'] = pd.to_numeric(df_properties['Latitude'], errors='coerce')
df_properties['Longitude'] = pd.to_numeric(df_properties['Longitude'], errors='coerce')


Eliminando filas que contienen valores en blanco (' ')

In [125]:
string_columns = df_properties.select_dtypes(include=['object', 'string'])
rows_with_spaces = string_columns.apply(lambda col: col.str.strip() == '', axis=0).any(axis=1)
df_properties = df_properties[~rows_with_spaces].reset_index(drop=True)

In [126]:
df_properties.isnull().sum()

Bathrooms       28
Bedrooms        23
CountryName      0
Description      0
Environments    34
Latitude         0
Longitude        0
Price           12
Seniority        0
StateName        0
Title            0
img              0
dtype: int64

Eliminando filas con valores nulos

In [127]:
df_properties = df_properties.dropna().reset_index(drop=True)

In [128]:
df_properties.isnull().sum()

Bathrooms       0
Bedrooms        0
CountryName     0
Description     0
Environments    0
Latitude        0
Longitude       0
Price           0
Seniority       0
StateName       0
Title           0
img             0
dtype: int64

Función para obtener dirección en base a latitud y longitud

In [129]:
geolocator = Nominatim(user_agent="geoapi_exercises")

# Función para obtener la dirección
def obtener_direccion(lat, lon):
    if lat == 0.0 and lon == 0.0:
        return None 
    try:
        location = geolocator.reverse((lat, lon))
        return location.address if location else None
    except Exception as e:
        return f"Error: {e}"

In [130]:
df_properties['Address'] = df_properties.apply(lambda x: obtener_direccion(x['Latitude'], x['Longitude']), axis=1)

In [131]:
df_properties.dropna(inplace=True) 
df_properties.reset_index(drop=True, inplace=True)

Obtener numero de calle y/o direccion

In [132]:
df_properties['Address'] = df_properties['Address'].apply(lambda x: ', '.join(x.split(', ')[:3]))
df_properties = df_properties.drop(columns=['Latitude', 'Longitude']) # elimino las columnas latitud y longitud

In [133]:
df_properties.isnull().sum()

Bathrooms       0
Bedrooms        0
CountryName     0
Description     0
Environments    0
Price           0
Seniority       0
StateName       0
Title           0
img             0
Address         0
dtype: int64

In [134]:
df_properties.dropna(inplace=True) 
df_properties.reset_index(drop=True, inplace=True)

Eliminando propiedades que no tienen imagen

In [135]:
df_properties['img'] = df_properties['img'].astype(str)
df_properties.drop(df_properties[df_properties['img'] == '[]'].index, inplace=True)

In [136]:
df_properties.reset_index(drop=True, inplace=True)

Eliminando titulo sin sentido 

In [137]:
df_properties.drop(df_properties[df_properties['Title'].str.contains('U\$', na=False)].index, inplace=True)

Seleccionando 200 registros aleatoriamente

In [138]:
df_prop = df_properties.sample(n=5, random_state=5).reset_index(drop=True)

Agregando columnas faltantes

In [139]:
columnas = [
    "Water", "Gas", "Surveillance", "Electricity", "Internet", 
    "Pool", "Garage", "Pets", "Grill", "Elevator", "Terrace",
    "IsHistoric", "IsWorking", "HasWarranty", "RangeSalary"
]

# Agregar columnas con valores específicos
for column in columnas:
    if column in ["Water", "Gas", "Electricity"]:
        df_prop[column] = True  # Estas columnas son siempre True
    elif column in ["Surveillance", "Pets", "Pool"]:
        df_prop[column] = np.random.choice([True, False], size=len(df_prop))
    elif column == "RangeSalary":
        # Genero los salarios con distribución sesgada a la izquierda
        salary_range = np.random.triangular(left=400000, mode=1200000, right=3000000, size=len(df_prop))
        salary_range = salary_range.astype(int)  # Asegurarnos de que sean números enteros
        df_prop[column] = salary_range
    else:
        df_prop[column] = np.random.choice([True, False], size=len(df_prop), p=[0.8, 0.2]) # sesgo para que haya mas true que false

In [140]:
df_prop.loc[:, 'CreatedAt'] = datetime.now(pytz.UTC)
df_prop['UpdatedAt'] = None
df_prop['IsDeleted'] = False


Ordenando columnas

In [141]:
orden_columns = [
    "img","IsWorking", "HasWarranty", "RangeSalary", "CountryName", "StateName", "Title", "Address", "Price", "Environments", 
    "Bathrooms", "Bedrooms", "Seniority", "Water", "Gas", "Surveillance", "Electricity", "Internet", "Pool", 
    "Garage", "Pets", "Grill", "Elevator", "Terrace", "IsHistoric", "Description", "CreatedAt", "UpdatedAt",
    "IsDeleted"
]

df_prop = df_prop[orden_columns]

In [142]:
df_prop

Unnamed: 0,img,IsWorking,HasWarranty,RangeSalary,CountryName,StateName,Title,Address,Price,Environments,...,Garage,Pets,Grill,Elevator,Terrace,IsHistoric,Description,CreatedAt,UpdatedAt,IsDeleted
0,['https://static1.sosiva451.com/97313351/f4278...,True,False,1483413,Argentina,Buenos Aires,"Mayling, Chubut 415","Mayling, Pedro Medrano, Villa Rosa",1618576.0,4,...,True,True,False,False,True,False,Linda casa desarrollada en lote central con j...,2024-12-03 18:11:00.757102+00:00,,False
1,['https://static1.sosiva451.com/08190061/55ec7...,True,False,1375316,Argentina,Buenos Aires,Acacias,"Pasaje Espigon, Puertos del Lago, El Cazador",1213932.0,4,...,True,False,False,False,True,True,Exclusiva Casa en Venta Rodeada de Naturaleza ...,2024-12-03 18:11:00.757102+00:00,,False
2,['https://static1.sosiva451.com/61485361/a93cb...,True,False,1136947,Argentina,Buenos Aires,Pringles 4700,"4732, Coronel Pringles, Godoy Cruz",400000.0,3,...,True,False,True,True,True,True,ALQUILER\n\n Casa de 3 ambientes al frente\n\n...,2024-12-03 18:11:00.757102+00:00,,False
3,['https://static1.sosiva451.com/49944561/b4dfa...,True,True,1162191,Argentina,Buenos Aires,El Canton Norte,"El Cantón, El Cazador, Partido de Escobar",1517415.0,8,...,True,True,True,True,True,False,ALQUILER DE CASA EN CANTON NORTE CON PILETA\n\...,2024-12-03 18:11:00.757102+00:00,,False
4,['https://static1.sosiva451.com/02629561/cf4f0...,True,False,1733894,Argentina,Buenos Aires,Entre Rios 1700,"1731, Entre Ríos, Martínez Oeste",1700000.0,4,...,True,True,True,True,True,True,"Alquiler de Casa 4 AMBIENTES en Martínez, San ...",2024-12-03 18:11:00.757102+00:00,,False


In [143]:
# Guardar el DataFrame en un archivo CSV con comillas alrededor de las descripciones y codificación UTF-8-SIG
#df_prop.to_csv('df_prop.csv', index=False, encoding='utf-8-sig', sep=';')

#print("Archivo CSV guardado exitosamente con las descripciones correctas.")

### 2.2 Tabla de datos de Usuarios

Leyendo datos fictios creados por IA

In [194]:
csv_path = 'Data_ficticia\\Users_Ficticios_IA.csv'
df_users = pd.read_csv(csv_path)

Seleccionando 400 usuarios aleatoriamente

In [195]:
df_users = df_users.sample(n=10, random_state=5).reset_index(drop=True)

Creando columnas faltantes

In [196]:
df_users.insert(0, 'Id', range(1, len(df_users) + 1))
df_users.insert(1, 'CountryName', 'Argentina')


In [197]:
df_users.loc[:, 'CreatedAt'] = datetime.now(pytz.UTC)
df_users['UpdatedAt'] = None
df_users['IsDeleted'] = False

In [198]:
df_users

Unnamed: 0,Id,CountryName,Name,LastName,Dni,Phone,Address,BirthDate,Email,Password,CreatedAt,UpdatedAt,IsDeleted
0,1,Argentina,Trinidad,Farre,92309378,+54 9 11-5516-2964,"Via de Benjamín Egea 457, Resistencia, Argentina",1971-04-03,trinidad.farre@gmail.com,@Farre03,2024-12-03 18:37:14.637420+00:00,,False
1,2,Argentina,Victorino,Valentin,19496248,+54 9 21-5986-6054,"Callejón de Damián Maldonado 245, San Salvador...",1999-05-20,victorino.valentin@gmail.com,<Valentin20,2024-12-03 18:37:14.637420+00:00,,False
2,3,Argentina,Rosenda,Barreda,12697037,+54 9 21-5297-7790,"Via de Patricia Yáñez 191, San Miguel de Tucum...",1980-01-20,rosenda.barreda@gmail.com,@Barreda20,2024-12-03 18:37:14.637420+00:00,,False
3,4,Argentina,Panfilo,Vergara,57455334,+54 9 21-4995-5021,"Avenida de Inmaculada Farré 57, Río Gallegos, ...",1952-07-29,panfilo.vergara@gmail.com,@Vergara29,2024-12-03 18:37:14.637420+00:00,,False
4,5,Argentina,Lucia,Arrieta,16391990,+54 9 11-4236-3158,"Callejón Clara Palomo 254, Resistencia, Argentina",1953-09-08,lucia.arrieta@gmail.com,]Arrieta08,2024-12-03 18:37:14.637420+00:00,,False
5,6,Argentina,Cloe,Pinol,38504574,+54 9 23-4978-8992,"Alameda de Eusebio Tovar 172, San Salvador de ...",1980-02-14,cloe.pinol@gmail.com,>Pinol14,2024-12-03 18:37:14.637420+00:00,,False
6,7,Argentina,Ani,Martinez,45999479,+54 9 20-5488-1454,"Calle Imelda Quevedo 887, Bahía Blanca, Argentina",1984-10-06,ani.martinez@gmail.com,#Martinez06,2024-12-03 18:37:14.637420+00:00,,False
7,8,Argentina,Silvia,Matas,41942604,+54 9 12-4898-4106,"Paseo de Silvia Luján 439, Trelew, Argentina",1987-02-18,silvia.matas@gmail.com,.Matas18,2024-12-03 18:37:14.637420+00:00,,False
8,9,Argentina,Chuy,Larranaga,39788013,+54 9 15-4249-7155,"Cuesta de Andrés Palomar 835, Mendoza, Argentina",1960-07-31,chuy.larranaga@gmail.com,{Larranaga31,2024-12-03 18:37:14.637420+00:00,,False
9,10,Argentina,Pacifica,Valcarcel,11747146,+54 9 12-6644-4604,"Urbanización Débora Roma 930, Posadas, Argentina",1967-01-23,pacifica.valcarcel@gmail.com,?Valcarcel23,2024-12-03 18:37:14.637420+00:00,,False


### 2.3 Tabla de datos de UsersTenantsInfo