<a id='TOC'></a>

# Project: Investigate Airbnb dataset of Boston and Seattle. 

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#business_understanding">Business Understanding</a></li>
<li><a href="#data_understanding">Data Understanding</a></li>
<li><a href="#data_preparation">Data Preparation</a></li>
<li><a href="#modeling">Modeling</a></li>    
<li><a href="#results_evaluation">Results Evaluation</a></li>
<li><a href="#deploy_solution">Deployment</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

Airbnb is a publicly listed company with focus on Lodging industry. It is headquartered in San Francisco but has presence in multiple countries. 
<br>
It works in online marketplace for rental activities.
<br><br>
Dataset includes 2 cities; Boston and Seattle. Files for each city contain 3 csv (comma separated variable) files containing rental availability calendar, available listings and reviews.
<br><br>
Objective is to discover actionable insight from the available data so that stakeholders can use that information and strategize business decisions.


<li><a href="#TOC">Back To Table Of Contents</a></li>

Import general packages and graphing capabilities which will be used in all datasets.

In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn as sklearn


<a id='business_understanding'></a>
## Business Understanding

Business model of Airbnb is based on revenue earning through comission for providing rental listings to end users. people willing to rent put up the listing/s on Airbnb website and customers book it through Airbnb. 
<br><br>
Based on the data available objective of this analysis is going to be to discover factors behind several situations; e.g. which neighborhoods are always overbooked? what exactly causes that? is it because of the surrounding area or the rental facilities are very good? knowledge about these things can tell Airbnb how they can approach the situation. Maybe the listings are not very good, or the service provided is not upto the mark. whatever the reason understanding what are the environmental factors is the first step towards the improvement.

<li><a href="#TOC">Back To Table Of Contents</a></li>

<a id='data_understanding'></a>
## Data Understanding

> **Tip**: In this section of the report, you will load in the data, performing assessment of the data to understand how much appropriate it is for further analysis regarding observations noted in phase of Business Understanding.



### General Properties

Copy paste following cell to create multiple dataframes as needed.

checking duplicate value/s 

In [3]:
#df_name = pd.read_csv(......., sep='...')

###### Load your data and print out a few lines. Perform operations to inspect data types and look for instances of missing or possibly errant data.


In [4]:
df_name.head()

NameError: name 'df_name' is not defined

In [None]:
df_name.tail()

In [3]:
df_name.sample(5)

NameError: name 'df' is not defined

<li><a href="#TOC">Back To Table Of Contents</a></li>

### Assess

###### perform rudimentary data assessment

In [None]:
df_name.info()

In [None]:
df_name.describe()

In [None]:
df_name.corr()

###### List of issues you identiefied using rudimentry assessment

- Issue 1
- Issue 2
- Issue 3
- Issue 4

Columns identified in earlier step, run value_counts on them so as to get sense of outliers

In [None]:
df_name.column_name.value_counts()

Plot histograms of the dataframes so as to identify general distribution of features  

In [None]:
df_name.hist()

Plot scatterplot of the dataframes so as to identify correlations amongst several variables. Through this we will start to get sense of which features could be of use to us for further analysis.

In [None]:
pd.plotting.scatter_matrix(df_name)

###### List of issues you identiefied using visual assessment

- Issue 1
- Issue 2
- Issue 3
- Issue 4

nan value detection

Following code fragments can be run to identify presence of NaN Null in dataframe

In [None]:
df_name.isnull()

Following command will tell us columns that have atleast 1 NaN value in them

In [None]:
df_name.isnull().any(axis=0)

Following command will tell us rows that have atleast 1 NaN value in them

In [None]:
df_name.isnull().any(axis=1)

checking duplicate value/s 

In [None]:
sum(df_name.duplicated())

checking and making note of incorrect datatype/s. Prime examples to look for are date column in string datatype, unit mentioned in numeric value column.

In [None]:
df_name.info()

###### List of issues you identiefied using programmatic assessment

- Issue 1
- Issue 2
- Issue 3
- Issue 4

> **Tip**: You should _not_ perform too many operations in each cell. Create cells freely to explore your data. One option that you can take with this project is to do a lot of explorations in an initial notebook. These don't have to be organized, but make sure you use enough comments to understand the purpose of each code cell. Then, after you're done with your analysis, create a duplicate notebook where you will trim the excess and organize your steps so that you have a flowing, cohesive report.

> **Tip**: Make sure that you keep your reader informed on the steps that you are taking in your investigation. Follow every code cell, or every set of related code cells, with a markdown cell to describe to the reader what was found in the preceding cell(s). Try to make it so that the reader can then understand what they will be seeing in the following cell(s).



<li><a href="#TOC">Back To Table Of Contents</a></li>

<a id='data_preparation'></a>
## Data Preparation

##### Clean the data, drop not useful data, replace missing values, do feature engineering.

In [None]:
# After discussing the structure of the data and any problems that need to be
#   cleaned, perform those cleaning steps in the second part of this section.


<a id='modeling'></a>
## Modeling

> **Tip**: Here starts modeling of the data, depending on the targeted business goals and insights modeling technique/s is chosen and relevant model is trained and predictions are made. Performance of the model is also evaluated in this step using several inbuilt functions.
<br><br>All proposed questions might not need data mining techniques, in such cases descriptive and inferential statistics is used to get the needed answers.


### Research Question 1 (Replace this header name!)

In [None]:
# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.


### Research Question 2  (Replace this header name!)

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<li><a href="#TOC">Back To Table Of Contents</a></li>

<a id='results_evaluation'></a>
## Results Evaluation

> **Tip**: Evaluation in this step is with regards to the business value this analysis, modeling provides. Therefore the analysis will be from the point of view of the stakeholder.
<br>This section should make sense to non technical as well as technical audience.

<li><a href="#TOC">Back To Table Of Contents</a></li>

<a id='deploy_solution'></a>
## Deployment

> **Tip**: In this stage deployment plan is made and along with that monitoring and maintenace plan is drafted out as well.

<li><a href="#TOC">Back To Table Of Contents</a></li>

<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work here, check over your report to make sure that it is satisfies all the areas of the rubric (found on the project submission page at the end of the lesson). You should also probably remove all of the "Tips" like this one so that the presentation is as polished as possible.



<li><a href="#TOC">Back To Table Of Contents</a></li>