<a href="https://colab.research.google.com/github/root-git/stratascratch-sql-challenges/blob/main/6_Growth_of_Airbnb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Growth of Airbnb

Calculate Airbnb's annual growth rate using the number of registered hosts as the key metric. The growth rate is determined by:

Growth Rate = ((Number of hosts registered in the current year - number of hosts registered in the previous year) / number of hosts registered in the previous year) * 100

Output the year, number of hosts in the current year, number of hosts in the previous year, and the growth rate. Round the growth rate to the nearest percent. Sort the results in ascending order by year.

Assume that the dataset consists only of unique hosts, meaning there are no duplicate hosts listed.

**Original Question Link:**  
[StrataScratch ID 9637 – Growth of Airbnb](https://platform.stratascratch.com/coding/9637-growth-of-airbnb?code_type=1)


## Table Schema

`airbnb_search_details`

| Column                   | Type              |
|--------------------------|-------------------|
| accommodates             | bigint            |
| amenities                | text              |
| bathrooms                | bigint            |
| bed_type                 | text              |
| bedrooms                 | bigint            |
| beds                     | bigint            |
| cancellation_policy      | text              |
| city                     | text              |
| cleaning_fee             | boolean           |
| host_identity_verified   | text              |
| host_response_rate       | text              |
| **host_since**           | date              |
| **id**                   | bigint            |
| neighbourhood            | text              |
| number_of_reviews        | bigint            |
| price                    | double precision  |
| property_type            | text              |
| review_scores_rating     | double precision  |
| room_type                | text              |
| zipcode                  | bigint            |

---


## Thought Process

1. Extract the year from `host_since`.
2. Count number of unique hosts per year.
3. Use `LAG` window function to get previous year's host count.
4. Apply growth rate formula.

In [2]:
import pandas as pd
import random

# Create mock Airbnb data
data = {
    'id': list(range(1, 51)),
    'host_since': pd.to_datetime([
        f'{random.choice([2017, 2018, 2019, 2020, 2021, 2022])}-{random.randint(1,12):02d}-{random.randint(1,28):02d}'
        for _ in range(50)
    ])
}

df = pd.DataFrame(data)

In [3]:
import sqlite3

# Load into SQLite (in-memory)
conn = sqlite3.connect(':memory:')
df.to_sql('airbnb_search_details', conn, index=False, if_exists='replace')

#Show Preview
print(pd.read_sql('SELECT * FROM airbnb_search_details', conn))

    id           host_since
0    1  2021-11-06 00:00:00
1    2  2022-11-21 00:00:00
2    3  2018-04-28 00:00:00
3    4  2022-08-04 00:00:00
4    5  2018-06-27 00:00:00
5    6  2018-03-02 00:00:00
6    7  2022-10-15 00:00:00
7    8  2018-04-01 00:00:00
8    9  2022-12-10 00:00:00
9   10  2019-02-21 00:00:00
10  11  2017-10-09 00:00:00
11  12  2019-08-25 00:00:00
12  13  2022-11-08 00:00:00
13  14  2020-06-22 00:00:00
14  15  2019-07-11 00:00:00
15  16  2020-05-15 00:00:00
16  17  2020-08-07 00:00:00
17  18  2020-10-22 00:00:00
18  19  2021-04-16 00:00:00
19  20  2021-01-21 00:00:00
20  21  2019-11-28 00:00:00
21  22  2019-01-24 00:00:00
22  23  2022-05-13 00:00:00
23  24  2020-08-17 00:00:00
24  25  2022-04-03 00:00:00
25  26  2017-01-12 00:00:00
26  27  2017-12-24 00:00:00
27  28  2017-06-17 00:00:00
28  29  2022-12-22 00:00:00
29  30  2021-01-04 00:00:00
30  31  2018-09-12 00:00:00
31  32  2018-10-09 00:00:00
32  33  2017-01-02 00:00:00
33  34  2017-10-05 00:00:00
34  35  2022-03-16 0

In [4]:
# Replace with your SQL query below
query = """ SELECT * FROM airbnb_search_details"""

result_df = pd.read_sql(query, conn)

In [8]:
query = """
WITH yearly_host AS
(
SELECT
CAST(STRFTIME('%Y', host_since) AS INT) AS year,
COUNT(DISTINCT id) AS hosts_current_year
FROM airbnb_search_details
GROUP BY year
),
host_growth AS
(
  SELECT
    year,
    hosts_current_year,
    LAG(hosts_current_year) OVER (ORDER BY year) AS hosts_previous_year
  FROM yearly_host
)
SELECT
  year,
  hosts_current_year,
  hosts_previous_year,
  ROUND(((hosts_current_year - hosts_previous_year) / CAST(hosts_previous_year AS FLOAT)) * 100, 0) AS growth_rate
FROM host_growth
WHERE hosts_previous_year IS NOT NULL
ORDER BY year
"""

solution = pd.read_sql(query, conn)
print(solution)

   year  hosts_current_year  hosts_previous_year  growth_rate
0  2018                   8                    9        -11.0
1  2019                   7                    8        -13.0
2  2020                   7                    7          0.0
3  2021                   7                    7          0.0
4  2022                  12                    7         71.0


## Problem Explanation

### Step 1: Count hosts by registration year
```sql
SELECT
  CAST(STRFTIME('%Y', host_since) AS INT) AS year,
  COUNT(DISTINCT id) AS hosts_current_year
FROM airbnb_search_details
GROUP BY year
```

### Step 2: Get previous year's host count
```sql
SELECT
  year
  hosts_current_year,
  LAG(hosts_current_year) OVER(ORDER BY YEAR) AS hosts_previous_year
FROM yearly_hosts
```
- Use `LAG` window function to get previous year's host count.

### Step 3: Calculate growth rate
```sql
SELECT
  year,
  hosts_current_year,
  hosts_previous_year,
  ROUND(100.0*(hosts_current_year - hosts_previous_year)/CAST(hosts_previous_year AS FLOAT)) AS growth_rate
FROM host_growth
WHERE hosts_previous_year IS NOT NULL
ORDER BY year
```
- `ROUND` function to round the growth_rate to the nearest percent.
- Use the given equation to calculate the growth rate.