# Property Click Prediction

------------------------------

### Company: NoBroker

`This data project has been used as a take-home assignment in the recruitment process for the data science positions at NoBroker Data Sciences.`

Properties form one of the most important data entity in nobroker data ecosystem.

Properties receive interactions on NoBroker. One interaction is defined as one user requesting an owner contact on a property.

A property can receive 0 to many interactions. This research will focus on studying and modelling the interactions received by properties.


## Assignment
We are interested in studying and statistically modelling property interactions. We would like to have a predictive model that would say the number of interactions that a property would receive in a period of time.

For simplicity let’s say we would like to predict the number of interactions that a property would receive within 3 days of its activation and 7 days of its activation.

However, this part is open ended and you could bring your own time intervals into the problem. This is the part of your artistry in data science.

In the end we need to profess the number of interaction that a certain kind of property would receive within a given number of days. We cannot do a time series forecasting here considering the limited amount of data that could shared as a part of an assignment. You may clean the data, merge them, do an EDA, visualize and build your model.


## Data Description
Unzip the datasets.zip file to find the following 3 data sets:
**property_data_set.csv:**
- Properties data containing various features like activation_date, BHK type, locality, property size, property age, rent, apartment type etc.
- activation_date is the date property got activated on NoBroker. Fields like lift, gym etc are binary valued - 1 indicating presence and 0 indicating absence. All other fields are self-explanatory.
- You may use these along with the rest of the data sets to engineer the features that you would use in your study

**property_photos.tsv:**
- Data containing photo counts of properties
- photo_urls column contains string values that you have to parse to obtain the number of photos uploaded on a property
- Each value in the photo_url column is supposed to be a string representation of an array of json [ in python terms a list of dictionaries ] where each json object represents one image. However due to some unforeseen events, these values got corrupted and lost their valid json array representation. You could see this if you observe the data closely.
`Hint`: There is a missing “ before ‘title’ for the first json object in each value. There is also an additional “ at the end of each value. Also you must remove all the \\ to get a valid json representation.
- Your objective is to get the number of photos uploaded for a property. For this you should correct the corrupt string and make it a valid json. Once you have a valid json string, you can get the length of this array, which would be the number of photos uploaded on the property.

- Also note that these are not images, but just names that we use to point to images. You are NOT given the images nor do we expect you to have them. All that you are expected to do it get the number of photos on each property by cleaning up the corrupt invalid json array string.

- NULL/NaN values indicate absence of photos on the property, ie; photo_count = 0

**property_interactions.csv:**
- Data containing the timestamps of interaction on the properties.
- Each request_date value represents the timestamp of a unique valid interaction on a property (contact owner happened and a user received the owner contact phone number)
- Therefore, if you count the number of times each property has appeared in this table, it tells you the number of interaction received on this property
- You will use this request_date along with the activation_date in our first table and other features in our study

### Practicalities
Please go through all the instructions and data descriptions carefully before getting on the ground.

We DO NOT look just at your final model and its performance, rather we look for the research mindset in you, your curiosity in data, your enthusiasm to collaborate and if your work mindset fit in our DS culture. Therefore, we urge you to present whatever you do with standards followed among the data science community. Keep an open-ended eye on the problem and feel free to approach the data in whatever way you think suits the problem. We urge you to try out different methodologies and present your results.

You should take 72 hours or less on this problem. This is a research assignment and please don’t struggle for a full fledged hurried submission. Quality is what We believe in and We also believe Great things are built small bit at a time. Hence present everything and anything you have done and strive for good quality in them.



In [1]:
import pandas as pd, numpy as np
# Reading Data
property = pd.read_csv('./datasets/property_data_set.csv')
photos = pd.read_csv('./datasets/property_photos.tsv', delimiter='\t')
interations = pd.read_csv('./datasets/property_interactions.csv')

In [2]:
display(property.head())
display(property.property_id.nunique())

Unnamed: 0,property_id,type,activation_date,bathroom,floor,total_floor,furnishing,gym,latitude,longitude,...,lift,locality,parking,property_age,property_size,swimming_pool,pin_code,rent,deposit,building_type
0,ff808081469fd6e20146a5af948000ea,BHK2,09-03-2017 14:36,1,3,4.0,SEMI_FURNISHED,1,12.876174,77.596571,...,1,Hulimavu,BOTH,2,850,1,560076.0,12000,120000,AP
1,ff8080814702d3d10147068359d200cd,BHK2,07-03-2017 12:02,2,4,11.0,SEMI_FURNISHED,1,13.018444,77.678122,...,1,Ramamurthy Nagar,BOTH,1,1233,1,560016.0,20000,150000,AP
2,ff808081470c645401470fb03f5800a6,BHK2,10-03-2017 13:43,2,0,4.0,NOT_FURNISHED,1,12.975072,77.665865,...,1,GM Palya,FOUR_WHEELER,0,1200,0,560075.0,15000,75000,AP
3,ff808081470c6454014715eaa5960281,BHK3,09-03-2017 22:16,2,3,4.0,SEMI_FURNISHED,0,12.888169,77.591282,...,0,Arakere,BOTH,1,1300,0,560076.0,17000,150000,AP
4,ff808081474aa867014771a0298f0aa6,BHK1,15-03-2017 18:29,1,1,2.0,SEMI_FURNISHED,0,12.990243,77.712962,...,0,Hoodi,BOTH,4,450,0,560048.0,6500,40000,IF


28888

In [7]:
display(photos.iloc[3][1])

'[{\\title\\":\\"Bedroom\\",\\"name\\":\\"Screenshot_7.jpg\\",\\"imagesMap\\":{\\"medium\\":\\"ff808081470c6454014715eaa5960281_77976_medium.jpg\\",\\"large\\":\\"ff808081470c6454014715eaa5960281_77976_large.jpg\\",\\"original\\":\\"ff808081470c6454014715eaa5960281_77976_original.jpg\\",\\"thumbnail\\":\\"ff808081470c6454014715eaa5960281_77976_thumbnail.jpg\\"},\\"displayPic\\":false},{\\"title\\":\\"Balcony\\",\\"name\\":\\"Screenshot_4.jpg\\",\\"imagesMap\\":{\\"medium\\":\\"ff808081470c6454014715eaa5960281_29075_medium.jpg\\",\\"large\\":\\"ff808081470c6454014715eaa5960281_29075_large.jpg\\",\\"original\\":\\"ff808081470c6454014715eaa5960281_29075_original.jpg\\",\\"thumbnail\\":\\"ff808081470c6454014715eaa5960281_29075_thumbnail.jpg\\"},\\"displayPic\\":false},{\\"title\\":\\"Balcony\\",\\"name\\":\\"Screenshot_6.jpg\\",\\"imagesMap\\":{\\"medium\\":\\"ff808081470c6454014715eaa5960281_45408_medium.jpg\\",\\"large\\":\\"ff808081470c6454014715eaa5960281_45408_large.jpg\\",\\"original

In [16]:
interations

Unnamed: 0,property_id,request_date
0,ff808081469fd6e20146a5af948000ea,2017-03-10 17:42:34
1,ff808081469fd6e20146a5af948000ea,2017-03-09 15:51:17
2,ff808081469fd6e20146a5af948000ea,2017-03-10 17:30:22
3,ff808081469fd6e20146a5af948000ea,2017-03-11 17:48:46
4,ff8080814702d3d10147068359d200cd,2017-03-30 19:59:15
...,...,...
170606,ff8081815b2007fc015b201c77a20395,2017-04-03 16:13:55
170607,ff8081815b2007fc015b201c77a20395,2017-04-02 21:54:14
170608,ff8081815b2007fc015b201c77a20395,2017-04-09 11:33:14
170609,ff8081815b2007fc015b201c77a20395,2017-04-04 10:01:12


In [14]:
photos.iloc[0][1]

'[{\\title\\":\\"Balcony\\",\\"name\\":\\"IMG_20131006_120837.jpg\\",\\"imagesMap\\":{\\"original\\":\\"ff808081469fd6e20146a5af948000ea_65149_original.jpg\\",\\"thumbnail\\":\\"ff808081469fd6e20146a5af948000ea_65149_thumbnail.jpg\\",\\"medium\\":\\"ff808081469fd6e20146a5af948000ea_65149_medium.jpg\\",\\"large\\":\\"ff808081469fd6e20146a5af948000ea_65149_large.jpg\\"},\\"displayPic\\":false},{\\"title\\":\\"Bathroom\\",\\"name\\":\\"IMG_20131006_120734.jpg\\",\\"imagesMap\\":{\\"original\\":\\"ff808081469fd6e20146a5af948000ea_63511_original.jpg\\",\\"thumbnail\\":\\"ff808081469fd6e20146a5af948000ea_63511_thumbnail.jpg\\",\\"medium\\":\\"ff808081469fd6e20146a5af948000ea_63511_medium.jpg\\",\\"large\\":\\"ff808081469fd6e20146a5af948000ea_63511_large.jpg\\"},\\"displayPic\\":false},{\\"title\\":\\"Bedroom\\",\\"name\\":\\"IMG_20131006_120643.jpg\\",\\"imagesMap\\":{\\"original\\":\\"ff808081469fd6e20146a5af948000ea_16708_original.jpg\\",\\"thumbnail\\":\\"ff808081469fd6e20146a5af948000ea_