# Workshop We Work Santiago

### Objetivo de hoy

2018: Estamos lanzando una startup PropTech. Necesitamos recolectar información sobre los arriendos en Santiago. 

![Se arriendo](./Arriendo.jpeg)

👉 Nuestra idea: entrar en [Portal Inmobiliario](https://www.portalinmobiliario.com) y sacar toda la info!

> A mano? 😳 🤯

> Definitivamente NO! 😉

Para eso, necesitaremos aprender sobre:

- Estructura de datos en *Python*: listas y diccionarios
- Recolección de datos utilizando WebScraping (bs4)
- Visualización utilizando librerías de Python (Plotly / Seaborn)
- Ejemplo de *no code*

### Estructura de Datos

**Listas**

- Indice (posición)
- Puedo leer, agregar, modificar o eliminar

In [1]:
students = ["Sebas", "Fede", "Camila"]

In [2]:
age = [32, 28, 26]

In [3]:
age[0]

32

**Diccionarios**

- Parejas: `clave` : `valor`
- No tiene indices
- Las claves son únicas

In [4]:
{'name': 'Sebas', 'age': 32}

{'age': 32, 'name': 'Sebas'}

**Entonces...**

In [5]:
students = [
    {'name': 'Sebas', 'age': 32},
    {'name': 'Fede', 'age': 29},
    {'name': 'Camila', 'age': 26}
]

### Web 101

![HTTP Request](./Web.png)

---

![HTML Tag](./Tags.png)

### OK, let's go!

##### Importamos la librerías de Python que necesitamos

In [8]:
import requests
import numpy as np
from bs4 import BeautifulSoup
import re

Pedido de información a la web:

In [9]:
url = "https://www.portalinmobiliario.com/arriendo/departamento/santiago-metropolitana/"
response = requests.get(url)
soup = BeautifulSoup(response.content)

In [10]:
pages = np.arange(1, 40*50, 50).tolist()

##### Funcion de recolección

In [29]:
def transform_html_to_data(soup):
    restaurants_data = soup.find_all(class_='ui-search-layout__item')
    restaurants = []
    for restaurant in restaurants_data:
        price = restaurant.find('span', class_='andes-money-amount__fraction').text
        price = int(price.replace(".", ""))
        address = restaurant.find(class_='ui-search-item__group__element ui-search-item__location shops__items-group-details').text
        space_information = restaurant.find(class_='ui-search-item__group ui-search-item__group--attributes shops__items-group').text
        if space_information:
            size = re.search(r'(\d+) m', space_information)
            if size:
                size = int(size.group(1))
            rooms = re.search(r'(\d+) dormitorio', space_information)
            if rooms:
                rooms = int(rooms.group(1))
        data = {'price (CLP)': price, 'rooms': rooms, 'size (m2)': size, 'address': address}
        restaurants.append(data)

    return restaurants

##### Iteramos según cuantas `pages` haya disponibles

In [30]:
restaurants_list = []
for page in pages:
    url = "https://www.portalinmobiliario.com/arriendo/departamento/santiago-metropolitana/_Desde_" + str(page) + "_NoIndex_True"
    response = requests.get(url)
    soup = BeautifulSoup(response.content)
    # {'price (CLP)': price, 'rooms': rooms, 'size (m2)': size, 'address': address}
    restaurants_list += transform_html_to_data(soup)

AttributeError: 'NoneType' object has no attribute 'text'

In [31]:
restaurants_list[0]

{'address': u'Santo Domingo 3251, Santiago, Chile, Barrio Yungay, Santiago',
 'price (CLP)': 256608,
 'rooms': 2,
 'size (m2)': 66}

**Cuantos departamentos pudimos recuperar?**

In [None]:
len(restaurants_list)

In [None]:
restaurants_list

##### Transformamos los datos a un DataFrame

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(restaurants_list)
df.head()

In [None]:
df.shape

##### Limpiamos la información

- Falta de información

In [None]:
df.isna().sum()

In [None]:
df.dropna(inplace=True)
df.shape

- Outliers

> Precio de arriendo menor a 2.5 MCLP

In [None]:
condition = df['price (CLP)'] < 2_500_000
df = df[condition]

In [None]:
df.shape

> Precio mayor a 1000 CLP

In [None]:
condition = df['price (CLP)'] > 1000
df = df[condition]

In [None]:
df.shape

> Menos de 5 piezas

In [None]:
condition = df['rooms'] < 5 
df = df[condition]

In [None]:
df.shape

### Data Visualization

In [None]:
import seaborn as sns
import plotly.express as px

In [None]:
fig = px.scatter(df, x="size (m2)", y="price (CLP)", size="rooms", title="Precio vs. Tamaño", width=800, height=400)
fig.show()

In [None]:
sns.countplot(x="rooms", data=df)
sns.set(rc={'figure.figsize':(15, 6)})

In [None]:
sns.catplot(x='rooms', y='price (CLP)', data=df, kind="box")

In [None]:
sns.regplot(x='size (m2)', y='price (CLP)', data=df)
sns.set(rc={'figure.figsize':(15, 6)})

In [None]:
condition = df['size (m2)'] < 200
df_max_size_200 = df[condition]

In [None]:
df.shape

In [None]:
df_max_size_200.shape

In [None]:
sns.regplot(x='size (m2)', y='price (CLP)', data=df_max_size_200, color='green')

##### Exportemos a CSV

Aprovechando la librería de **Pandas**, utilizamos solo `.to_csv()`

In [None]:
df_max_size_200.to_csv('./RegionMetropolitana.csv')

### OK, let's do `NO CODE`

[Browse AI](https://dashboard.browse.ai/tasks)