<a id="overview"></a>

# Evaluating Airbnbs in Asheville


## Overview

You have just been hired by a real estate company in San Francisco, California. They are trying to enter the short-term rental market by listing several of their properties on Airbnb.com and they have hired you to help inform their direction and marketing. You have a few questions about how Airbnb listings perform, based on factors such as: number of bedrooms, bathrooms, and amenities.

When choosing Airbnb listings, what are the factors that go into a typical consumer decision-making process? Can we decompose this process by looking at the data? We can begin to estimate consumer interests by asking questions of the data, such as: How many bathrooms does the rental property have? It is a yurt, a cottage, or a mansion? How much is the nightly fee? What amenities are provided?

Your task is to import raw data, clean the data, and provide insights to your real estate client, based on the available sample data and structured questions below. They are looking to you to provide programmatic insights based on your new Python skills!

**Expected Time to complete: 4-8 hours**


## Objectives
This assignment will provide you with a chance to:

1. Read/write CSV files using Python's built-in `csv` module.
1. Clean and transform raw data from a csv into `lists` and `dicts`.


## Problem

Your goal is to filter the data and perform some basic analysis, looking to glean market insights and answer questions such as: 

- What is the most frequently offered amenity in San Francisco?
- What is the average cost of listings that match a certain criteria?

## Structure

This notebook walks through Pythonic data analysis in different stages: 

- **Required:** This section covers classroom topics from Unit 1 and is _required_. 
- **Advanced:** This section covers upcoming topics from future Units and is _optional_.

Throughout the notebook, you will see clearly labeled sections setting up questions for you to solve. _You must provide answers to all of the questions in the **Required** section._ Note that some questions have been further divided up into "Part 1", "Part 2", (etc) in order to break down the steps of sequential logic used in Python programming. Please attempt answers for all parts.

For those of you who wish to work ahead or want to come back later for more practice, the **Advanced** section offers additional prompts that will extend your analysis. This section is optional; you do not need to complete these for submission. Depending on the discretion of your section instructor, these questions may be worth bonus points.

Finally, the **Challenge** section provides an additional set of real-world prompts and examples that integrate new programming concepts and Python libraries not covered in this class. Challenge questions are intended to help you explore and continue your learning outside of this course! _You do not need to complete Challenge questions._


## Instructions

1. Open the assignment notebook. 
1. Save a copy of your notebook and retitle it: "yourname_assignment.ipynb"
1. Attempt answers for all **Required questions**. Some questions can be solved in many different ways!
1. Include at least one comment per question explaining your logic or approach. To include a comment in your Python code, use the `#` sign.
1. Make sure to include all work within your Jupyter notebook.
1. Submit answers for the **Required questions** to your instructional team by the due date.
1. Have fun!

## Data 

[Our data](./data/sanfran_airbnb.csv) is a truncated subset of data taken from [Inside Airbnb](http://insideairbnb.com). You'll see twelve columns:

- `id` - A unique identifier of the Airbnb
- `listing_url` - The URL to the Airbnb
- `name` - The name of the listing
- `host_id` - A unique identifier for the host
- `host_name` - The name of the host
- `host_is_superhost` - A boolean stating whether or not the host is a superhost
- `neighbourhood_cleansed` - Identifies the neighborhood of the city the listing is in
- `accomodates` - How many people the listing can house
- `bedrooms` - The reported number of bedrooms
- `bathrooms` - The reported number of bathrooms
- `amenities` - A list of the amenities that the listing offers
- `price` - The nightly fee of the listing (before cleaning fees)

In [1]:
# Import the csv module
import csv

---

# REQUIRED / GRADED
> **Required:** This section covers classroom topics from Unit 1 and is _required_. 

In this section of the notebook, you'll begin your analysis by importing and inspecting the data with Python. Make sure to complete questions 1-5. 

Ready, set, go!


---

## Question 1

- **Part 1**: First, you'll need to load the `sanfran_airbnb` CSV from your local files. Alternatively, you can also [click here to access the data online](https://gist.github.com/jeff-boykin/2879dcf8936e42f2d0ef5c7c39b4da70).  

> Our data is a truncated subset of data taken from [Inside Airbnb](http://insideairbnb.com/asheville/). The original set contains extra columns which have been removed for this assignment.

> Hint: The delimiter for this file is a *tab* character, which can be passed into the `csv.reader` as `csv.reader(csvfile, delimiter='\t')`


- **Part 2**: Next, create a list called `column_names` that holds the column names from the csv. 

> Hint: There should be 10 columns, total. For example: `columns_names == ['id', 'listing_url', ....]`


- **Part 3**: Now create a list called `listings` that holds each listing as it's own list. There should be 6,346 total. For example, `listings[0]` will be:

```
['958', 'https://www.airbnb.com/rooms/958', 'Bright, Modern Garden Unit - 1BR/1BTH', '1169', 'Holly', 't', 'Western Addition', '3', '1', '1', '["Heating", "Hot water", "Stove", "Iron", "Dryer", "Coffee maker", "Carbon monoxide alarm", "Pack \\u2019n Play/travel crib", "Private entrance", "Microwave", "Hangers", "Essentials", "Laptop-friendly workspace", "First aid kit", "Smoke alarm", "Refrigerator", "Wifi", "Cooking basics", "Shampoo", "TV", "Dishes and silverware", "Room-darkening shades", "Garden or backyard", "Hair dryer", "Kitchen", "Washer", "Keypad", "Cable TV", "Oven", "Free street parking"]', '$132.00 ']

```

#### Need Help?

- [Click here for an explanation of Python data types](https://www.geeksforgeeks.org/python-set-3-strings-lists-tuples-iterations/).
- [For examples of Python lists, click here](https://www.w3schools.com/python/python_lists.asp).
- [For help reading and writing CSV files in Python, click here](https://www.w3schools.com/python/python_file_write.asp).


> Quick note on the amenity: `translation missing: en.hosting_amenity_XX`. Airbnb has to translate each amenity into as many languages as it can in order to provide their services across multiple geographic regions. In order to do this, each amenity is assigned an English translation and served up to us when we view the site in English. When we see things like `translation missing: en.hosting_amenity_49`, that implies that there is some amenity for which there is no suitable translation or available option.

In [1]:
# Enter your solution for Q1, Part 1

# Load the os and csv modules in order to load csv files
import os
import csv

# Load the sanfran_airbnb data
csvpath = os.path.join("data","sanfran_airbnb.csv")
with open(csvpath, newline = "") as datafile:
    # Checking the sanfran_airbnb.csv file showed delimiter = ","
    csvreader = csv.reader(datafile, delimiter = ",")
    # Store the first line of the file as csv_header and skip it
    csv_header = next(csvreader)

# Enter your solution for Q1, Part 2
    # Store each item in csv_header and append it to column_names
    column_names = [item for item in csv_header] 


# Enter your solution for Q1, Part 3
     # populate the listings list with each row after the header
    listings = [row for row in csvreader]
#         

# Check the number of items in listings to make sure it matches 6,346 rows by printing length of listings.        
print(f'The number of listings in listings is {len(listings)}')

# Check first row of listings to make sure it matches format above.
print(listings[0])


The number of listings in listings is 6346
['958', 'https://www.airbnb.com/rooms/958', 'Bright, Modern Garden Unit - 1BR/1BTH', '1169', 'Holly', 't', 'Western Addition', '3', '1', '1', '"Heating","Hot water","Stove","Iron","Dryer","Coffee maker","Carbon monoxide alarm","Pack \\u2019n Play/travel crib","Private entrance","Microwave","Hangers","Essentials","Laptop-friendly workspace","First aid kit","Smoke alarm","Refrigerator","Wifi","Cooking basics","Shampoo","TV","Dishes and silverware","Room-darkening shades","Garden or backyard","Hair dryer","Kitchen","Washer","Keypad","Cable TV","Oven","Free street parking"', '$132.00 ']


In [2]:
# Print out the column names
print(column_names)

['id', 'listing_url', 'name', 'host_id', 'host_name', 'host_is_superhost', 'neighbourhood_cleansed', 'accommodates', 'bathrooms', 'bedrooms', 'amenities', 'price']


---

## Question 2

Next, answer the following questions using the `listings` variable:

- **Part 1**. Print the first listing
- **Part 2**. Print the 100th listing
- **Part 3**. Print the price of the 100th listing *without* printing the rest of the listing information!

> Hint: [Here are some examples on how to print in Python](https://www.w3schools.com/python/ref_func_print.asp) 

In [3]:
# Enter your solution for Q2, Part 1

# Print the first listing
print(listings[0])

['958', 'https://www.airbnb.com/rooms/958', 'Bright, Modern Garden Unit - 1BR/1BTH', '1169', 'Holly', 't', 'Western Addition', '3', '1', '1', '"Heating","Hot water","Stove","Iron","Dryer","Coffee maker","Carbon monoxide alarm","Pack \\u2019n Play/travel crib","Private entrance","Microwave","Hangers","Essentials","Laptop-friendly workspace","First aid kit","Smoke alarm","Refrigerator","Wifi","Cooking basics","Shampoo","TV","Dishes and silverware","Room-darkening shades","Garden or backyard","Hair dryer","Kitchen","Washer","Keypad","Cable TV","Oven","Free street parking"', '$132.00 ']


In [4]:
# Enter your solution for Q2, Part 2

# Print the 100th listing
print(listings[99])

['137672', 'https://www.airbnb.com/rooms/137672', 'Charming Private Room in Cozy Apt', '673098', 'Elizabeth', 'f', 'Inner Sunset', '2', '1.5', '1', '"Host greets you","Long term stays allowed","Heating","Kitchen","Breakfast","Luggage dropoff allowed","Wifi","Washer","Iron","Cable TV","Dryer","TV","Hangers","Laptop-friendly workspace"', '$150.00 ']


In [5]:
# Enter your solution for Q2, Part 3

# Print only the price for the 100th listing
print(listings[99][11])

$150.00 


---

### Tutorial

Before we get to Question 3, let's first look at at a few ways we can manipulate string data in Python.

In [6]:
# Here are some examples using a `.replace` function with string data (this will come in handy for the next question)!

# Example: `str.replace(item_to_replace, item_to_replace_with)`
# This will return: `str`

print("$40,123.00".replace('$', ''))  # removes the dollar sign
print("$40,123.00".replace(',', ''))  # removes the comma
print("$40,123.00".replace('$', '').replace(',', ''))  # removes the dollar sign and the comma


40,123.00
$40123.00
40123.00


In [7]:
# And here are some examples of the `.split` functionality with strings. Take a look and then proceed to Question 3 when you're ready!

# Example: `str.split(delimiter)`
# Returns: `list`

print("a,b,c,d".split(','))  # split by comma
print("a;b;c;d".split(';'))  # split by semi-colon
print("a; b; c; d".split('; '))  # split by semi-colon and a space


['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd']


---

## Question 3

Create a list called `parsed_listings` that contains the original listings as its elements - but with the following changes:

    - First, change the 4th item (amenities) to be a list of strings (this one is a bit tricky). 
> Hint, you may have to remove the `"`, `}`, and the `{` characters and then split the string by the comma.   
    
    - Second, change the 5th item (price) to be a float.
> Try using `.replace` to remove a few bad characters from your floats

    - Third, change the 6th item (bedrooms) to be a float.
    - Fourth, change the 7th item (bathrooms) to be a float.

> Note that the elements of `parsed_listings` should still be lists themselves (in other words, they should hold the listings' same characteristics). [Click here to learn more about working with different Python data types](https://www.w3schools.com/python/python_datatypes.asp).

    - Fifth and finally, try using a `for` loop to accomplish this. When you're done, the first element (`parsed_listing[0]`) should look like:

```
['958',
 'https://www.airbnb.com/rooms/958',
 'Bright, Modern Garden Unit - 1BR/1BTH',
 '1169',
 'Holly',
 't',
 'Western Addition',
 3.0,
 1.0,
 1.0,
 ['[Heating',
  ' Hot water',
  ' Stove',
  ' Iron',
  ' Dryer',
  ' Coffee maker',
  ' Carbon monoxide alarm',
  ' Pack \\u2019n Play/travel crib',
  ' Private entrance',
  ' Microwave',
  ' Hangers',
  ' Essentials',
  ' Laptop-friendly workspace',
  ' First aid kit',
  ' Smoke alarm',
  ' Refrigerator',
  ' Wifi',
  ' Cooking basics',
  ' Shampoo',
  ' TV',
  ' Dishes and silverware',
  ' Room-darkening shades',
  ' Garden or backyard',
  ' Hair dryer',
  ' Kitchen',
  ' Washer',
  ' Keypad',
  ' Cable TV',
  ' Oven',
  ' Free street parking]'],
 132.0]
```

> Note: A more advanced method would be to use a [list comprehension](https://docs.python.org/3/tutorial/datastructures.html) to accomplish this.

In [8]:
# Enter your solution for Q3

# Create a list called parsed_listings to contain modifications of the original listings as elements
# First create an empty list named parsed_listings.
parsed_listings = []

# Create a list of each idex in listings. 
ID = [row[0] for row in listings]
listing_url = [row[1] for row in listings]
name = [row[2] for row in listings]
host_id = [row[3] for row in listings]
host_name= [row[4] for row in listings]
host_is_superhost = [row[5] for row in listings]
neighborhood_cleansed = [row[6] for row in listings]
# This index represents accommodates - convert to an integer
accommodates = [int(row[7]) for row in listings]
# This index represents bathrooms - convert to a float
bathrooms = [float(row[8]) for row in listings]
# This index represents bedrooms - convert to a float
bedrooms = [float(row[9]) for row in listings]
# This index represents amenities - convert from single string to list of strings.
amenities = [(row[10]).replace('"','').split(',') for row in listings]
# This index represents price - drop the '$' and ',' and convert to a float
price = [float(row[11].replace('$','').replace(',','')) for row in listings]

# Zip the 12 lists above into a tupple. 
# Then loop each of the items in the zip to be appended into parsed_listings as lists
for item in zip(ID, listing_url, name, host_id, host_name, host_is_superhost, neighborhood_cleansed,
                     accommodates, bathrooms, bedrooms, amenities, price): 
    parsed_listings.append(list(item))
   

In [9]:
(parsed_listings[0]) # Great!

['958',
 'https://www.airbnb.com/rooms/958',
 'Bright, Modern Garden Unit - 1BR/1BTH',
 '1169',
 'Holly',
 't',
 'Western Addition',
 3,
 1.0,
 1.0,
 ['Heating',
  'Hot water',
  'Stove',
  'Iron',
  'Dryer',
  'Coffee maker',
  'Carbon monoxide alarm',
  'Pack \\u2019n Play/travel crib',
  'Private entrance',
  'Microwave',
  'Hangers',
  'Essentials',
  'Laptop-friendly workspace',
  'First aid kit',
  'Smoke alarm',
  'Refrigerator',
  'Wifi',
  'Cooking basics',
  'Shampoo',
  'TV',
  'Dishes and silverware',
  'Room-darkening shades',
  'Garden or backyard',
  'Hair dryer',
  'Kitchen',
  'Washer',
  'Keypad',
  'Cable TV',
  'Oven',
  'Free street parking'],
 132.0]

---

## Question 4

Next, let's dig into price differences between listings with different criteria.

- **Part 1**. Begin by creating two lists called `one_bathroom` and `two_bathroom` where the elements fit the following criteria:
    - `small_homes_one` should only have listings with less than two bathrooms
    - `small_homes_two` should only have listings with more than two bathrooms but less than three
    
- **Part 2**. What is the average price for each set of listings? 

- **Part 3**. Finish by printing the number of elements in each list.

- **Part 4**. Then create a new list called `small_homes` that only contains listings that have either: 
    - Exactly 1 bathroom
OR
    - Less than 2 bathrooms AND exactly 1 bedroom

- **Part 5**. Wrap up by printing the number of elements in the list `small_homes`.

In [11]:
# Enter your solution for Q4, Parts 1, 2, and 3
# Part 1:
#   Create list called small_homes_one that contains only listings with < 2 bathrooms
small_homes_one = [row for row in parsed_listings if row[8] < 2.0]

#   Create list called small_homes_two that contains only listings with > 2 bathrooms but < 3 bathrooms
small_homes_two = [row for row in parsed_listings if 2.0 < row[8] < 3.0]

# Part 2: 
#  What is the average price for each set of listings above.

#   Store the prices in small_homes_one and small_homes_two into their own separate lists
#   called small_homes_one_prices and small_homes_two_prices
small_homes_one_prices = [price[11] for price in small_homes_one]
small_homes_two_prices = [price[11] for price in small_homes_two]

#   Calculate and store the average price in small_homes_one_prices and small_homes_two_prices into
#   avgprice_small_homes_one and avgprice_small_homes_two respectively.
avgprice_small_homes_one = round(sum(small_homes_one_prices)/len(small_homes_one_prices),2)
avgprice_small_homes_two = round(sum(small_homes_two_prices)/len(small_homes_two_prices),2)

#   Print the average price for small_homes_one and small_homes_two
print(f'Average Price of listings for small homes one = {avgprice_small_homes_one}')
print(f'Average Price of listings for small homes two = {avgprice_small_homes_two}')

# Part 3: Finish by printing the number of elements in each list

print(f'Small homes one has {len(small_homes_one)} listings.')
print(f'Small homes two has {len(small_homes_two)} listings.')

Average Price of listings for small homes one = 240.36
Average Price of listings for small homes two = 378.42
Small homes one has 4773 listings.
Small homes two has 196 listings.


In [12]:
# Enter your solution for Q4, Parts 4 and 5
# Part 4: Create a new list called small_homes that contains listings with exactly one bathroom 
#        or have less than 2 bathrooms and exactly 1 bedroom
small_homes = [row for row in parsed_listings if row[8] == 1.0 or (row[8] < 2 and row[9] == 1.0)]

# Part 5: Wrap up by printing the number of elements in small_homes.
print(f'small_homes has {len(small_homes)} listings.')

small_homes has 4563 listings.


---

## Question 5


- **Part 1**. Now let's create a *dictionary* called `amenities_count`. 

> Hint: A dictionary uses key/value pairs. For more info on Python dictionaries, [check out this link](https://www.w3schools.com/python/python_dictionaries.asp).

For your new `amenities_count` dictionary, make the *keys* of the dictionary equal the amenities listed and the *values* indicate the number of times that amenity appears across every listing.

Examples:
    - amenities_count['Day bed'] == 7
    - amenities_count['Coffee maker'] == 1230
    

- **Part 2**. Now *iterate* over your new `amenities_count` dictionary to surface the amenity that appears the *most often* across all listings!





## Note the code below took nearly 6 minutes to complete on my computer -- JV


In [10]:
# Enter your solution for Q5, Part 1
# Create a dictionary called amenities_count where the keys equal the name of the amenities and
# the values indicate the number of times the amenities appear.

# Create an empty dictionary called amenities_count
amenities_count = {}

# Create an empty list to store the amenities taken from parsed_listings row[10]
all_amenities = []

# Populate all_amenities list with each amenity listed in parsed_listings row[10].
for row in parsed_listings:
    for item in row[10]:
        all_amenities.append(item)
        
# Loop through the items(amenities) in the all_amenities list and add each item to the amenities_count dictionary
# where the item is defined as a key and the item's count or frequency in the list is defined as a value.
# The list.count() function can be used to provide the count of duplicate items in a list.
for item in all_amenities:
    amenities_count[item] = all_amenities.count(item)

print(f'There are {len(amenities_count)} items in the amenities_count dictionary for SF Airbnb data')
print(f'Here is the full list of amenities and their counts...{amenities_count}')


There are 95 items in the amenities_count dictionary for SF Airbnb data
Here is the full list of amenities and their counts...{'Heating': 5974, 'Hot water': 4253, 'Stove': 2734, 'Iron': 5214, 'Dryer': 4593, 'Coffee maker': 3710, 'Carbon monoxide alarm': 5432, 'Pack \\u2019n Play/travel crib': 570, 'Private entrance': 2650, 'Microwave': 3507, 'Hangers': 5525, 'Essentials': 5881, 'Laptop-friendly workspace': 4735, 'First aid kit': 3129, 'Smoke alarm': 6017, 'Refrigerator': 3775, 'Wifi': 6279, 'Cooking basics': 2820, 'Shampoo': 4986, 'TV': 5101, 'Dishes and silverware': 3528, 'Room-darkening shades': 511, 'Garden or backyard': 1684, 'Hair dryer': 5484, 'Kitchen': 5009, 'Washer': 4614, 'Keypad': 992, 'Cable TV': 2137, 'Oven': 2821, 'Free street parking': 2884, 'Fire extinguisher': 4433, 'Host greets you': 551, 'Lock on bedroom door': 1479, 'Dishwasher': 2399, 'Extra pillows and blankets': 2169, 'Bed linens': 2997, 'Children\\u2019s books and toys': 505, 'Indoor fireplace': 1091, 'Luggage d

In [19]:
# Enter your solution for Q5, Part 2
# Iterate through the amenities_count dictionary to surface the amenity that appears most often accross all listings.

# Loop through both keys and values in the dictionary using the .items() method.
for key,value in amenities_count.items():
    
    # Find the largest value in the dictionary using max() function and .values() method.
    # Store it in maxvalue
    maxvalue = max(amenities_count.values())
    
    # Print out the key and value if the value matches maxvalue.
    if value == maxvalue:
        print(f'The amenity that appears most often accross all listings is {key} with {value} listings.')
   

The amenity that appears most often accross all listings is Wifi with 6279 listings.


---

# ADVANCED 

> **Advanced:** This section covers mopre complex topics from the previous unit as well as conquering some brand new concepts. These questions are _optional_. 

## Question 1

This dataset has a bunch of properties in it that are ABSURDLY priced ($10000 per night seems a bit high) and are probably priced in this way to deter rentals whilst still keeping the property up. This makes them severe outliers in the dataset and could throw off any analysis we want to make in the future. Let's try to clear this up.

- **Part 1.** Create a loop that goes through the original list of properties and places them into a new list from least to most expensive. Then take some time to look through a few of the higher priced properties. This will reveal some strange values. 
> Note: There are many ways to accomplish this task but we recommend using a new library method called [itemgetter](https://docs.python.org/3/library/operator.html#operator.itemgetter) which was made specifically for this purpose and the [sorted](https://www.w3schools.com/python/ref_func_sorted.asp) function.

- **Part 2.** Calculate the median price of the sorted dataset. This will be used in order to determine the quartiles of our dataset.

- **Part 3.** Calculate the lower quartile (the data point below which 25% of the observations set)

- **Part 4.** Calculate the upper quartile (the data point above which 25% of the observations set)

- **Part 5.** Find the interquartile range by subtracting the value of the lower quartile from the value of the upper quartile.

- **Part 6.** Find the "inner fences" of the data set. To find the inner fences of the data set first multiply the interquartile range by 1.5. Then add the result to the upper quartile and subtract it from the lower quartile. The two values you recieve are the boundries for the dataset's inner fences.
> Note: A point that falls outside of this numeric boundry is classified as a *minor outlier*

- **Part 7.** Find the "outer fences" of the data set. This is done in the same way as uncovering the inner fences, except that the interquartile range is multiplied by 3 instead of 1.5. The result is then added to the upper quartile and subtracted from lower quartile to find the upper and lower boundaries of the outer fence.
> Note: A point that falls outside of this numeric boundry is classified as a *major outlier*

- **Part 8.** Now it is time to finally clean the dataset! Remove any values from the listings whose prices are outside of the outer fences.

- **Part 9.** Finally, let's add a new value to each listing that tells the viewer whether or not the listing is a minor outlier or not.

In [20]:
# Now you try!
# Enter your solution for Q1, Part 1

# Load the operator module which contains itemgetter function
import operator

# Sort parsed_listings from least to most expensive using price and place it into a list called sorted_listings 
Sorted_listings = sorted(parsed_listings, key=operator.itemgetter(11))

# Spot check listing to see if first listing has low price and last listing has high price
# Check first price
print(Sorted_listings[0][11])
# Check last price
print(Sorted_listings[-1][11])


10.0
10000.0


In [22]:
# Now you try!
# Enter your solution for Q1, Part 2

# Create function to find the median price in a list sorted in descending order.
def CalculateMedianPrice(sortlist):
    # If sortlist has odd number of items
    if int(len(sortlist))%2 ==1:
        # Since index starts at 0, dividing the interger length of odd rows by 2 would give the index for the median
        index = int(len(sortlist)/2)
        # locate price at median index
        medianprice = sortlist[index][11]
    else:
        # for even number of items the two middle indices are needed
        index = int((len(sortlist)/2)-1)
        index2 = int(len(sortlist)/2)
        # add the prices of the two middle indices and divide by two
        medianprice = (sortlist[index][11] + sortlist[index2][11])/2
        
    return medianprice

# Use CalculateMedianPrice function to find median price for Sorted_listings and store it as median_price
median_price = CalculateMedianPrice(Sorted_listings)

# Print results
print(f'The median price for Sorted_listings is ${round(median_price,2)}.')


The median price for Sorted_listings is $146.0.


In [24]:
# Now you try!
# Enter your solution for Q1, Part 3, 4, and 5
# Sorted_listings has an even number of items.  Find the two indices that fall below and above the midpoint. 
BelowMidpoint = int((len(Sorted_listings)/2)-1)
AboveMidpoint = int(len(Sorted_listings)/2)

# The lower quartile point is the median for all numbers below the median
# Take all items below the median.
below_median = Sorted_listings[0:BelowMidpoint]

# The upper quartile point is the median for all numbers above the median
# Take all items above the median.
above_median = Sorted_listings[AboveMidpoint:]

# Use CalculateMedianPrice to locate the lower and upper quartile price.
lower_quartile = CalculateMedianPrice(below_median)
upper_quartile = CalculateMedianPrice(above_median)

# The interquartile range (IQR) is the difference between the upper and lower quartiles
IQR = upper_quartile-lower_quartile

print(f'Lower quartile price = ${lower_quartile}.')
print(f'Upper quartile price = ${upper_quartile}.')
print(f'The interquartile range = {IQR}')

Lower quartile price = $90.0.
Upper quartile price = $239.0.
The interquartile range = 149.0


In [25]:
# Now you try!
# Enter your solution for Q1, Part 6 and 7

# Calculate the inner fences
inner_fence1 = lower_quartile -(IQR*1.5)
inner_fence2 = upper_quartile + (IQR*1.5)

print(f'The inner fences of the data are {inner_fence1} and {inner_fence2}.')

#Calculate the outer fences
outer_fence1 = lower_quartile -(IQR*3)
outer_fence2 = upper_quartile +(IQR*3)

print(f'The outer fences of the data are {outer_fence1} and {outer_fence2}.')

The inner fences of the data are -133.5 and 462.5.
The outer fences of the data are -357.0 and 686.0.


In [26]:
# Now you try!
# Enter your solution for Q1, Part 8

# Remove the major outliers by storing the non-major outliers in Sorted_listings into CleanListings
CleanListings = [row for row in Sorted_listings if row[11] >= outer_fence1 and row[11] <= outer_fence2]

# Check the prices in the first row and last row to make sure they fall within the outer fences.
print(CleanListings[0][11])
print(CleanListings[-1][11])

10.0
683.0


In [27]:
# Now you try!
# Enter your solution for Q1, Part 9

# Add a new value to each listing to define whether prices are minor outliers or not.
for row in CleanListings:
    if row[11] < inner_fence1 or row[11] > inner_fence2:
        row.append("price is a minor outlier")
    else:
        row.append("price falls within expected market price")        

In [28]:
# Spot check to make sure listings appended appropriate outlier definition
print(CleanListings[0])
print(CleanListings[-1])

['10105531', 'https://www.airbnb.com/rooms/10105531', 'Studio by Dolores Park & Castro', '9776019', 'Lorenzo', 'f', 'Castro/Upper Market', 2, 1.0, 1.0, ['Heating', 'Hot water', 'Stove', 'Iron', 'Carbon monoxide alarm', 'Coffee maker', 'Private entrance', 'Microwave', 'Hangers', 'Essentials', 'Laptop-friendly workspace', 'Smoke alarm', 'Extra pillows and blankets', 'Refrigerator', 'Fire extinguisher', 'Wifi', 'Cooking basics', 'Shampoo', 'TV', 'Dishes and silverware', 'Hair dryer', 'Host greets you', 'Kitchen', 'Oven', 'Bed linens', 'Ethernet connection'], 10.0, 'price falls within expected market price']
['44523362', 'https://www.airbnb.com/rooms/44523362', 'Beautiful Home With Stunning 360 View of Downtown', '12264996', 'Luong', 'f', 'Twin Peaks', 6, 3.0, 3.0, ['Smoke alarm', 'Heating', 'Kitchen', 'First aid kit', 'Breakfast', 'Fire extinguisher', 'Wifi', 'Private entrance', 'Washer', 'Iron', 'Cooking basics', 'Dryer', 'TV', 'Shampoo', 'Hangers', 'Patio or balcony', 'Carbon monoxide a

## Question 2

You are working with a client on developing a new a value proposition for their AirBnB properties. This will help your client, a real estate investor, determine which type of properties they should purchase to have the best success on AirBnB.

**Part 1.** Create at least three rental market segments based on price and the number of people the property can accommodate.

> Note: Market segments refer to clustering a group of people by one of more charactertic. In marketing, it will allow us to develop a specifc targeted strategy for different people based on their needs. In our case, you should segment the rentals based on price and the number of people the property can accommodate. For exmaple, one segment could contain lower priced and smaller properties. This segment could be geared toward a customers that are price sensitive and looking for a deal. Another segment could contain large higher priced properties tailored to customers looking for a place to stay for a family / friend celebration or vacation. 

**Part 2.** Which room and property type appear the most in each segment?

**Part 3.** How many properties contain reviews under 5 in each segment? Remove all rentals with the number of reviews under 5 from each

**Part 4.** Which segment should your client consider and why?

In [29]:
# Now you try!
# Enter your solution for Q2, Part 1
# Chose 6 rental market segments: Room for One (single accomodations), Economy and luxury for double accomodations.
#  Accomodations for 6+ and standard and luxury accomodations for Families of 3-5 members.
#  The lists utilize info from CleanListings to exclude outliers and the precalculated median and upper quartile prices
Room_for_One = [row for row in CleanListings if row[7]==1]
Double_Economy =[row for row in CleanListings if row[7] == 2 and row[11] <= median_price]
Double_Luxury = [row for row in CleanListings if row [7] == 2 and row[11] > median_price]
Family_Size_3_to_5 = [row for row in CleanListings if 3 <= row[7] <= 5 and row[11] < upper_quartile]
Family_Size_3_to_5_Luxury = [row for row in CleanListings if 3<=row[7]<=5 and row[11]>= upper_quartile]
Family_Size_6_or_more = [row for row in CleanListings if row[7] > 5]

In [30]:
# Now you try!
# Enter your solution for Q2, Part 2a
# Property type is not a variable in the listings.  Instead will use neighborhood_cleansed as substitute.
# Will exclude room since I am unclear on what is requested.

# Chose to create a function allowing user to enter the segment as the variable.

def most_neighborhood_count(segment):
    # Create empty neighborhood dictionaries to contain items and their counts
    neighborhood = {}
 
    # Create empty holding lists
    neighhold = []
    
    # Loop through the given segment list and append each neighborhood to neighhold
    for row in segment:
        neighhold.append(row[6])
        #Loop through neighhold and append neighborhood as key and count as value in neighborhood dictionary.
        for item in neighhold:
            neighborhood[item] = neighhold.count(item)
    
    # Loop through each key and value in neighborhood dict. using items() method
    for nkey, nvalue in neighborhood.items():
        #Find the largest count of neighborhoods using max() function and values() method
        maxnvalue = max(neighborhood.values())
        # If value is the largest value store the key for that value
        if nvalue == maxnvalue:
            maxnkey = nkey
    
    # Print the results
    return print(f'This segment has most locations in {maxnkey} neighborhood with {maxnvalue} listings out of {len(segment)} listings in the segment.')
            


In [31]:
#Q2 Part 2b
# Type one of the segments into the most_neighborhood_count function to find which neighborhood has the most listings.

most_neighborhood_count(Double_Luxury)


This segment has most locations in South of Market neighborhood with 134 listings out of 931 listings in the segment.


In [32]:
# Now you try!
# Enter your solution for Q2, Part 3
# Since reviews are not included in the data, the host_is_superhost will be used in its place.

# Store listings where the host is a superhost for each client segment.

SH_Room_for_One = [row for row in Room_for_One if row[5] == 't']
SH_Double_Economy = [row for row in Double_Economy if row[5] == 't']
SH_Double_Luxury = [row for row in Double_Luxury if row[5] == 't']
SH_Family_Size_3_to_5 = [row for row in Family_Size_3_to_5 if row[5] == 't']
SH_Family_Size_3_to_5_Luxury = [row for row in Family_Size_3_to_5_Luxury if row[5] == 't']
SH_Family_Size_6_or_more = [row for row in Family_Size_6_or_more if row[5] == 't']

# Print the number of superhost listings for each segment.
print(f'Here are the number of superhost listings by segment:\nRoom for One = {len(SH_Room_for_One)}')
print(f'Double Economy = {len(SH_Double_Economy)}\nDouble Luxury = {len(SH_Double_Luxury)}')
print(f'Family Size 3 to 5 = {len(SH_Family_Size_3_to_5)}')
print(f'Family Size 3 to 5 Luxury = {len(SH_Family_Size_3_to_5_Luxury)}')
print(f'Family Size 6 or more = {len(SH_Family_Size_6_or_more)}')


Here are the number of superhost listings by segment:
Room for One = 259
Double Economy = 830
Double Luxury = 403
Family Size 3 to 5 = 654
Family Size 3 to 5 Luxury = 285
Family Size 6 or more = 323


In [33]:
# Now you try!
# Enter your solution for Q2, Part 3
# Let's pretend that a superhost is equivalent to a high rating. 
# The ratio of superhost to segment would be a good way to measure the proportion of high ratings to segments

SHratio_Room_for_One = round(len(SH_Room_for_One)/len(Room_for_One),2)
SHratio_Double_Economy = round(len(SH_Double_Economy)/len(Double_Economy),2)
SHratio_Double_Luxury = round(len(SH_Double_Luxury)/len(Double_Luxury),2)
SHratio_Family_Size_3_to_5 = round(len(SH_Family_Size_3_to_5)/len(Family_Size_3_to_5),2)
SHratio_Family_Size_3_to_5_Luxury = round(len(SH_Family_Size_3_to_5_Luxury)/len(Family_Size_3_to_5_Luxury),2)
SHratio_Family_Size_6_or_more = round(len(SH_Family_Size_6_or_more)/len(Family_Size_6_or_more),2)

print(f'Room for One high ratings ratio = {SHratio_Room_for_One}')
print(f'Double Economy high rating ratio = {SHratio_Double_Economy}')
print(f'Double Luxury high rating ratio = {SHratio_Double_Luxury}')
print(f'Family Size 3 to 5 high rating ratio = {SHratio_Family_Size_3_to_5}')
print(f'Family_Size 3 to 5_Luxury high rating ratio = {SHratio_Family_Size_3_to_5_Luxury}')
print(f'Family_Size 6 ormore high rating ratio = {SHratio_Family_Size_6_or_more}')


Room for One high ratings ratio = 0.36
Double Economy high rating ratio = 0.49
Double Luxury high rating ratio = 0.43
Family Size 3 to 5 high rating ratio = 0.46
Family_Size 3 to 5_Luxury high rating ratio = 0.47
Family_Size 6 ormore high rating ratio = 0.46


In [34]:
# Double Economy and Family Size 3 to 5 Luxury have the two highest rating ratios of all segments.
# Find the neighborhood with the most Double Economy and Family Size 3 to 5 Luxury high_ratings
print(most_neighborhood_count(SH_Double_Economy))
print(most_neighborhood_count(SH_Family_Size_3_to_5_Luxury))

This segment has most locations in Mission neighborhood with 82 listings out of 830 listings in the segment.
None
This segment has most locations in Downtown/Civic Center neighborhood with 40 listings out of 285 listings in the segment.
None


In [None]:
# I would recommend client to puchase either a Double_Economy property in the Mission neighborhood or a 
# Family Size 3 to 5 Luxury property in the Downtown/Civic Center neighborhood since
# these segments in those neighborhoods have the most numerous locations to choose from with a superhost/high rating.