# AirBnB SQL Challenge

There are three tables: sfo_calendar, sfo_listings, and sfo_reviews.

## What's the most expensive listing? 

Price is a column in sfo_listings, so I looked for the entry with the highest value for price:

```
SELECT
  *
FROM
  sfo_listings
WHERE
  price = (
  SELECT
    MAX(price)
  FROM
    sfo_listings
  );
```
`
id	name	host_id	host_name	neighbourhood_group	neighbourhood	latitude	longitude	room_type	price	minimum_nights	number_of_reviews	last_review	reviews_per_month	calculated_host_listings_count	Availability_365
"24650875"	"Full House Victorian: 7500 SqFt, 4 Floors, Hot Tub"	"6460979"	"Sarah"		"Western Addition"	"37.780230017385364"	"-122.44046111352687"	"Entire home/apt"	"10000"	2	3	"2018-05-24"	"1"	4	"18"
`

## What else can you tell me about the listing?


__The most expensive listing is for an entire home/apt.__ 

__It is FAR above the mean price for other listings of the same type, and \\$1000 more than the next highest rental of the same type.__

I took a look at the top 20 most expensive listings, based on the price in the listings table, to see how this listing compares.

```
SELECT
  *
FROM
  sfo_listings
ORDER BY
  price DESC
LIMIT 20;
```

The second and third most expensive listings are for another whole home for \\$9000, and a private room for \\$8000.

Here are some summary statistics for each available type of listing:

```
SELECT
  room_type,
  MIN(price) as min_price,
  MAX(price) as max_price,
  ROUND(AVG(price)::numeric,2) AS mean_price,
  ROUND(STDDEV(price)::numeric,2) as std_dev_price
FROM
  sfo_listings
GROUP BY
  room_type;
```
`
room_type	max_price	min_price	mean_price	std_dev_price
Entire home/apt	10	10000	265.36	323.08
Shared room	25	2326	69.82	177.34
Private room	0	8000	132.78	223.16
`

__It's also way more expensive than other listings in Sarah's neighbo(u)rhood:__
```
SELECT
  room_type,
  MIN(price) as min_price,
  MAX(price) as max_price,
  ROUND(AVG(price)::numeric,2) AS mean_price,
  ROUND(STDDEV(price)::numeric,2) as std_dev_price
FROM
  sfo_listings
WHERE
  neighbourhood = 'Western Addition'
GROUP BY
  room_type;

```

`
room_type	max_price	min_price	mean_price	std_dev_price
Entire home/apt	75	10000	337.79	637.92
Shared room	67	500	214	247.72
Private room	10	2000	149.45	176.67
`

__The most expensive listing is one of four listings by Sarah (host_id = 6460979):__

```
SELECT
  *
FROM
  sfo_listings
WHERE
  host_id = '6460979';
```
`
id	name	host_id	host_name	neighbourhood_group	neighbourhood	latitude	longitude	room_type	price	minimum_nights	number_of_reviews	last_review	reviews_per_month	calculated_host_listings_count	Availability_365
23613300	Studio with Wet Bar in Full House Victorian	6460979	Sarah		Western Addition	37.7790720096486	-122.439299685749	Private room	175	1	14	2018-07-30	4.62	4	47
24650875	Full House Victorian: 7500 SqFt, 4 Floors, Hot Tub	6460979	Sarah		Western Addition	37.7802300173854	-122.440461113527	Entire home/apt	10000	2	3	2018-05-24	1	4	18
25679846	Guest Suite with Wet Bar in Grand Victorian Home	6460979	Sarah		Western Addition	37.7807503047168	-122.439432188125	Private room	249	1	12	2018-08-03	6.79	4	63
25944549	2 Bedroom with Living Room + Wet Bar + Hot Tub	6460979	Sarah		Western Addition	37.779047198615	-122.439920971792	Private room	425	2	8	2018-08-02	6.32	4	75
`

__Sarah's other listings are a lot cheaper than the most expensive one, too!__

The prices of her other three listings are much more in line with most of the other listings--within a single standard deviation away from the mean for both this type of rental, and in the Western Addition neighborhood.

__The most expensive listing was rented for just a few days each month between September and December 2018__, with September having the highest number of rented days.
```
SELECT
/*
   This builds a nice-looking date field in the form YYYY-MM:
*/
  EXTRACT(year from calender_date) || 
    '-' || 
    TRIM(TO_CHAR(EXTRACT(month FROM calender_date),'09')) AS year_month,
  COUNT(*) AS days_occupied
FROM
  sfo_calendar
WHERE
  listing_id = 24650875
AND
  available = 't'
GROUP BY
  year_month;
```

`
year_month	days_occupied
2018-09	11
2018-10	4
2018-11	5
2018-12	8
`

__The price per day varied a bit in September and October, and rose significantly by the time December rolled around.__
```
SELECT
  listing_id,
  EXTRACT(year from calender_date) AS year,
  EXTRACT(month FROM calender_date) AS month,
  MIN(LTRIM(price,'$')) as min_price,
  MAX(LTRIM(price,'$')) as max_price
FROM
  sfo_calendar
WHERE
  listing_id = 24650875
AND
  available = 't'
GROUP BY
  listing_id,
  year,
  month;
```

`
listing_id	year	month	min_price	max_price
24650875	2018	9	2400	3000
24650875	2018	10	2400	3000
24650875	2018	11	4500	4500
24650875	2018	12	9875	9875
`

__The four reviews for the most expensive listing are glowing.__

None of them are recent enough where I could check the calendar table to see which specific dates the reviewers reserved, or how much they paid per day.

```
SELECT
  *
FROM
  sfo_reviews
WHERE
  listing_id = 24650875;
```

`
listing_id	id	review_date	reviewer_id	reviewer_name	comments
24650875	262703136	2018-05-09	15598959	Dave	Gorgeous home. Terrific hosts. Was amazing place for our crew to return to and work/relax after long days.
24650875	263350911	2018-05-11	78371505	David	My friends and I travelled to San Francisco for a wedding. Everyone agreed the home was beautiful and spacious. The hosts, Sarah and Jason, were also super pleasant to deal with. Overall, would 100% recommend this home to anyone looking for a place to stay in San Francisco.
24650875	268331242	2018-05-24	6094847	Kim	The house was perfect for the 7 adults and 1 infant we had with us. The location was great, kitchen was amazing, and the family loved the projector viewing for movies and basketball. We would definitely rent again.
24650875	307120516	2018-08-13	68883870	Aggie	Sarah’s home is beautiful!  It is comfy, clean, spacious, and in a great neighborhood.  We will certainly be back!
`

(Side note: the listings table suggests there are only three reviews available for this listing, and the latest one was submitted on 2018-05-24. There is also a fourth review in the reviews table, dated 2018-08-13.)


## What neighborhoods seem to be the most popular?

__The most popular neighborhoods are the ones with the highest average occupancy (days occupied for each house in a neighborhood, divided by the number of listings)__.

```
WITH
  listings_per_neighbourhood
AS (
  SELECT
    neighbourhood,
    COUNT(*) AS listings_per_neighbourhood
  FROM
    sfo_listings
  GROUP BY
    neighbourhood
),
  days_occupied_by_neighbourhood
AS
(
  SELECT
    lis.neighbourhood,
    COUNT(cal.*) AS days_occupied
  FROM
    sfo_calendar cal
  JOIN
    sfo_listings lis
  ON
    cal.listing_id = lis.id
WHERE
  cal.available = 'f'
GROUP BY
  lis.neighbourhood
)
SELECT
  lpn.neighbourhood,
  ROUND((CAST(dobn.days_occupied AS double precision) / lpn.listings_per_neighbourhood)::numeric,2) AS average_occupation --(simply dividing seems to return the floor, and I didn't want that)
FROM
  listings_per_neighbourhood lpn
JOIN
  days_occupied_by_neighbourhood dobn
ON
  lpn.neighbourhood = dobn.neighbourhood
ORDER BY
  average_occupation DESC;
```
`
neighbourhood	average_occupation
Bernal Heights	223.16
Parkside	212.71
Castro/Upper Market	211.63
Downtown/Civic Center	211.36
Glen Park	207.69
Diamond Heights	207.47
Mission	206.03
Inner Sunset	204.51
Potrero Hill	204.36
Noe Valley	204.22
Visitacion Valley	203.3
Excelsior	201.67
Outer Sunset	199.41
Russian Hill	194.1
Golden Gate Park	189.33
Ocean View	187.85
Outer Mission	185.6
Presidio Heights	185.55
Western Addition	185.47
Inner Richmond	183.39
Haight Ashbury	180.69
Chinatown	179.59
Outer Richmond	179.01
Twin Peaks	178.61
North Beach	176.67
Pacific Heights	174.14
West of Twin Peaks	173.8
Bayview	158.9
Marina	157.44
Financial District	154.08
Crocker Amazon	148.29
Seacliff	134.09
Nob Hill	124.33
South of Market	122.02
Lakeshore	87.72
Presidio	15
`

## What time of year is the cheapest time to go to San Francisco? What about the busiest?


### The cheapest time of year

I think I can break down the cheapest times by year/week of the year, get a mean price, and rank:
 
```
SELECT
  EXTRACT(year from calender_date) AS year,
  EXTRACT(week FROM calender_date) AS week,
/*
   LTRIM removes the dollar sign,
   REGEXP_REPLACE strips the comma,
   AVG computes the average for the group (year+week),
   and ROUND rounds the result to 2 decimal places:
*/
  ROUND(AVG(REGEXP_REPLACE(LTRIM(price,'$'),',','','g')::numeric),2) as mean_price
FROM
  sfo_calendar
WHERE
  available = 't'
GROUP BY
  year,
  week
ORDER BY
  mean_price
LIMIT 20;
```

This query isn't perfect--the entry for year 2018, week "1" represents the single day 2018-12-31, so this day's average price should be rolled into the entry for year 2019, week 1 (see [this weekly calendar](https://www.epochconverter.com/weeks/2019) to better understand this).

`
year	week	mean_price
2019	3	210.91
2019	2	211.61
2019	4	212.5
2019	1	214.83
2018	50	214.84
2018	51	215.75
2019	8	218.5
2019	5	219.36
2019	7	220.62
2019	9	221.78
2019	6	222.5
2018	1	223.71
2018	52	224.36
2018	49	225.17
2019	11	226.89
2019	14	227.35
2019	15	228.12
2019	10	228.36
2019	18	228.4
2019	13	228.61
`

The cheapest weeks turn out to be weeks 49-52 of 2018, and weeks 1-11 or so of 2019--basically, the months of December 2018, and January - March 2019.


### Busiest time of year 

The period with the fewest days available on the calendar. This is similar to what I did to figure out the cheapest time of year:

```
SELECT
  EXTRACT(year from calender_date) AS year,
  EXTRACT(week FROM calender_date) AS week,
  COUNT(*) AS days_available
FROM
  sfo_calendar
WHERE
  available = 't'
GROUP BY
  year,
  week
ORDER BY
  days_available;
```

This query isn't perfect--the entry for year 2018, week "1" represents the single day 2018-12-31, so this day's total should be added to the entry for year 2019, week 1 (see [this weekly calendar](https://www.epochconverter.com/weeks/2019) to better understand this).

This query also assumes the number of listings remains constant throughout the 366-day period available in the sfo_calendar table. There's no info about when listings first became available in the data we're using for this exercise, so it's probably safe to assume this is true?

`
year	week	days_available
2018	36	2200
2018	1	3176
2018	37	8729
2018	39	10436
2018	38	10881
2018	40	14164
2019	36	15598
2018	41	16269
2018	42	17137
2019	24	18149
2019	25	18160
2019	26	18221
2019	29	18234
2019	27	18251
2019	28	18263
2019	30	18311
`

Not counting the entry for 2018 week "1", the top 15 or so most-occupied weeks are: 
* Weeks 36-42 of 2018  (the entire month of September, and first half of October), with the first two weeks of September having the lowest numbers of days available. (A little bonus work from the first part of this question: these weeks also have the highest mean prices...)
* Week 36 of 2019 (first week of September--including Labor Day)
* Weeks 24-30 of 2019 (June 10th through the end of July).

# Notes

The calendar table contains availability information about listing IDs from 2018-09-08 through 2019-09-07.
```
SELECT
  MIN(calender_date),
  MAX(calender_date)
FROM
  sfo_calendar;
```

Availability is indicated by a column containing values t/f (presumably true/false?). When availability is t(rue?) for a given listing on a given date, a price is listed--when f(alse), the price is null. I confirmed this by running these two queries and seeing that no rows were returned for either one.

```
# Confirming the meaning of the available column. There should be an associated price when available = 't'.
SELECT
  *
FROM
  sfo_calendar
WHERE
  available = 't'
AND
  price is null;

# Related query: I expect we won't see any records where available = 'f' and price is not null.
SELECT
  *
FROM
  sfo_calendar
WHERE
  available = 'f'
AND
  price is not null;
```

```
SELECT
  *
FROM
  sfo_calendar
WHERE
  listing_id = 24650875
AND
  available = 't';
```


