## Week 4 Project

### Introduction/Business Problem

A recognized real estate company from Guayaquil, a city in Ecuador, has an important need, which is to have a tool that tells it based on the Client's requirements such as House Size, Price, Crime Rate, Nearby Places of Interest , among others, the right place or the right places to recommend a property to the client. It may be the case that the client feels comfortable with the size of the house, number of rooms, among others, but it requires that it be close to cafes and restaurants, since it is a very busy family that always has lunch and dinner outside of the House; or at the same time that you require tourist centers such as parks or museums, since you have young children and they want a distraction center.

In short, the objective is to have a list of options that allow you to decide which house to stay with given a segment of properties that meet the characteristics required by the client. This analysis will not only save time in the search for properties by the real estate company, since it would have to see nearby places of interest one by one in some tool, but it also reduces costs and generates higher returns by presenting possible properties to the client immediately and thus be able to continue with other clients. In addition to the marketing that can be done with this tool, arguing that it is one of the best performing companies in the Ecuadorian market when it comes to finding suitable properties for the end customer.

### Data section

In this project, a csv of data collected from the olx.com.ec platform will be used, which is a website that performs digital commerce in the style of amazon or ebay, but it is more local, limiting itself to national shipments within of the countries where it is. Information was obtained at the beginning of the year (Cut January 25, 2021), a sample of the houses for sale that OLX users have published, you can almost always find the location, price and size of the land in which the house is built.

Said csv contains the following fields:
* **Code:** Unique code generated by each house for sale registered in CSV (Code generated fictitiously to generate a primary key)
* **Sector:** Name of the neighborhood where the house for sale is located
* **Latitude:** Latitude of the house for sale
* **Longitude:** Length of the house for sale
* **Size:** Land in square meters on which the house is built
* **Price:** Price published by the owner / owner or seller of the property.

In [4]:
import pandas as pd
df=pd.read_csv('D:\Documentos\houses_guayaquil.csv', sep=';')
df.head()

Unnamed: 0,Codigo,Sector,Latitud,Longitud,Tamaño,Precio
0,AC247,Acacias,-22417,-799008,309.0,110000.0
1,AC246,Acacias,-22467,-798966,308.0,145000.0
2,AC129,Acacias,-22465,-799026,194.0,185000.0
3,AR308,Acuarelas del Rio,-21299,-798809,170.0,165000.0
4,AR380,Acuarelas del Rio,-21351,-798825,230.0,176000.0


This database, which is a sample of the houses published in OLX, has certain limitations which will be listed:
1. In the first place, it is a sample in time of the houses published, which is updated day by day, and trends may change. There should be an automatic clustering model that generates the groups instantly with the information of the houses for sale at present, but for the purpose of study it will be done with this csv.
2. In addition, both the land and the price are not measured by an appraiser, these measurements can be biased by the seller of the property, for example, a person can put a price much higher than the market price of the property, or the land that they put as a property feature may be greater than it actually is. There are also certain times in which by Ecuadorian culture the price is much higher, because you bargain first since the price is negotiable.
3. Finally, another major limitation is the lack of global information on houses for sale, it would be necessary to take into account a master base of all the houses for sale currently in Guayaquil. As well as the large amount of missing data, the number of rooms, bathrooms, etc. could be put into the model. But most vendors do not fill in the relevant information and that is why these variables are not considered in the study. Once you have a much richer base for the city of Guayaquil with all the necessary fields, you will have a better perception of reality.

Another data source that will be used for this project is the nearby places of interest provided by the Foursquare API, which, for each location of the houses for sale, will have locations such as coffee shops, parks, among others. These two data sources will complement each other to solve the problem of segmenting the houses for sale in Guayaquil and to be able to solve the needs of the clients of the real estate company.

Since there is no crime rate registry for each sector of the houses for sale, that data source will not be used, crime studies in the city of Guayaquil are very limited to certain specific sectors of Guayaquil, but not to the whole city in general. This can be a great starting point for a second phase in this project.

Next, the database of records of houses for sale will be refined to continue with the study, in addition to making the respective connection with the Foursquare API and segmentation of the places. This work will continue in the last week of the Applied Data Science Capstone course with its final report.