<div class="alert alert-block alert-info" style="background-color: #323031; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">
<div style="color: #ffffff">
<h1>Python Fundamentals Guided Project - Zoopla Rents</h1>

<br>

<span style="font-size:90%">by Ximin Chen & Michael Wiemers<br>
12.05.2022</span>
</div>
</div>

<div  class="alert alert-block alert-info" style="color:#1b1b1b; background-color:#f2f2f2; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

#### Project description

In this guided project, you will explore a **dataset on apartments from Zoopla**. The goal of the project is to practice and consolidate the techniques and skills you have learned in the Python Fundamentals Series and to challenge yourself to combine different techniques in creative ways. The project will also introduce new techniques that have not been covered in the Python Fundamentals workshops.

Time to complete: 3-4 hours

#### What to do if you get stuck

Parts of this project are meant to be challenging and stretch your Python skills and you will most likely have to do a web search to find help online. Being able to use effective search terms and identify useful resources is an important competency for a programmer at any level and we therefore encourage to practice this when getting stuck. If you require 1-2-1 support, please see our website for information on how to join our [daily Coding drop-ins on Teams](https://info.lse.ac.uk/current-students/digital-skills-lab/drop-in-sessions) to get help from a trainer.


#### Learning objectives
- <a href="#download-a-csv"><b>Download a csv file</b> using the request library</a>
- <a href="#convert-data-to-list-of-lists"><b>Parse strings</b> with csv.reader into a list of lists</a>
- Basic **data integrity checks**
    - Test for number of elements in each row
    - Remove whitespace
    - Find and remove rows with missing values
- Basic **exploratory data analysis**
    - Finding the maximum rent in the dataset
    - Compute summary stats
    - Finding the most common property types
- **Write code as a function** to enhance structure and efficiency of your code
- Use **list comprehensions** to create lists with for loops more efficiently

</div>

<img src="regents_park.jpg" width="80%">

<br>

### The Zoopla dataset

The dataset you will be working with contains data scraped from Zoopla's site on 7 Sep 2020. 

It contains 9969 data points of rental properties within 3 miles from LSE. 

You can see a screenshot of the dataset and a list with all column names below.

#### Columns:
 - Monthly rent
 - Location
 - Bedrooms
 - Bathrooms
 - Receptions
 - Descriptions
 - Nearest Station
 - Distance (miles)
 - Long Description
 - Available
 - Link
 

<img src="zoopla.png" width="2000px">

---

<div class="alert alert-block alert-info" style="color: white; background-color: #323031; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

<h3><a id="download-a-csv" style="color:white">Downloading the data</a></h3>
    
</div>    

Instead of downloading the data manually, we will use Python to do so. Being able to download a csv file programmatically and knowing how to correctly parse the downloaded string into a list of list, which we can work with, is a useful skill to have. If you have taken our Python course for tabular data, you may know that we can use pandas to load a csv file very easily. However, here, we focus on the techniques covered in our Python Fundamentals series.

The dataset itself is located at the following url:

https://raw.githubusercontent.com/ximinchen/DSL-Project/main/zoopla_rents.csv

To download the data in the first step, we need the **requests** library, which is a simple, yet elegant, HTTP library that allows you to send HTTP requests extremely easily, to download data from websites. 

The `request.get()` method is used to send a GET request to the specified url, which if successful will return the websites content. The content and much more information about the website is stored in the Response object that the `request.get()` method returns. We use the variable `response` to store that Response object.

For this example, we will use the main zoopla domain - `'https://www.zoopla.co.uk/'`. 

We pass the url to the `request.get()` method to request the websites content.

In [12]:
import requests

url = 'https://www.zoopla.co.uk/'

response = requests.get(url)
print(response)

<Response [200]>


We get the 200 response, which means that our request was successful and we could download the websites content, which is now stored in the `response` variable.

<br>

### Task 1 - Download the zoopla csv
Now it is your turn to download the zoopla dataset using the url https://raw.githubusercontent.com/ximinchen/DSL-Project/main/zoopla_rents.csv

1. Import the requests library
2. Create a variable for the url
3. Get the response for the url
4. Print the response

<details><summary>Click here for the solution</summary>

<pre>
# import requests
import requests

# set a variable for the url
url = "https://raw.githubusercontent.com/ximinchen/DSL-Project/main/zoopla_rents.csv"

# get the response
response = requests.get(url)

# print the response
print(response)

</pre>
</details>

In [13]:
# import requests
import requests

# set a variable for the url
url = "https://raw.githubusercontent.com/ximinchen/DSL-Project/main/zoopla_rents.csv"

# get the response
response = requests.get(url)

# print the response
print(response)


<Response [200]>


<br>

<br>

<div class="alert alert-block alert-info" style="color: white; background-color: #323031; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

<h3><a id="convert-data-to-list-of-lists" style="color:white">Converting the data to a list of lists</a></h3>
    
</div>    

We can access the websites content with the `response.text` attribute. This will show the entire html code or in this case the data, since we downloaded an csv file, as one long string. Let us save the string in the variable `content`.

In [17]:
content = response.text

In the example below, we only print the first 1000 characters of the string. The values are separated by commas. The lines are separated by `\n`, which is the special character for line breaks.

```Python
content[:1000]
```

<img src="csv_string.png">

We can easily split this string into a list of strings for each line using `content.split('\n')`. The split method splits a string into a list of smaller strings based on the character passed to the method. The method cuts off the start of the list whenever it finds a `'\n'` add this part to the list and keeps doing this until it reaches the end of the string.

Below we print the first 5 rows of the list we get.

In [20]:
content = content.split('\n')
content[:5]

['Monthly Rent,Location,Bedrooms,Bathrooms,Receptions,Description,Nearest Station,Distance (miles),Long Description,Available,Link\r',
 '2167,"14 Stable Walk, London E1",,1,1,Studio ,Aldgate East,0.2,"Beautifully presented one bedroom studio in the phase Neroli House features a spacious living room with an open kitchen, with spacious double bedroom, built-in wardrobe and a large luxurious bathroom. Situated just 3 minutes\' walk away from Aldgate East.",Available immediately,https://www.zoopla.co.uk/to-rent/details/56147079\r',
 '2578,"Exchange Gardens, London SW8",2,2,2,2 bed flat ,Vauxhall,0.2,The apartment is located on the fourth floor with lift and offers 860 sq ft of living space. The property benefits from an open plan kitchen / reception room..,Available immediately,https://www.zoopla.co.uk/to-rent/details/56147140\r',
 '1290,"Commercial Street, London E1",1,1,1,1 bed flat ,Shoreditch High Street,0.2,"City Realtor is proud to present this cozy one bedroom flat situated in an ic

We now have a list of strings, where each string in the list represents a row in our data. In this format, we cannot easily select values from each row, since all values from a row are stored in a string. It would be much easier, if we each string itself would be split into a list. That way, we can easily select values from a particular row, based on their position.
 
Effectively, we need to generate a new list of lists by iterating through each element of the content list and splitting each row into a list in itself based on the comma character. There is a much easier way to do this with the reader function from the **csv** library.

The code below uses the reader function to parse the content list of strings into a list of lists. The reader function does not itself return a list, but a reader Object. We have to use the list function to convert it into a list of lists.

In [34]:
import csv

list(csv.reader(content))

<_csv.reader at 0x20539cc1ee0>

<div class="alert alert-block alert-info" style="background-color:#e0f0ff;color:#033962">
If you are interested, to learn more about the csv library and how some of the arguments of the reader function work and also how to write a csv file to the disk, we would recommend the <a href="https://realpython.com/python-csv/">Reading and Writing CSV Files in Python</a> from RealPython.com
</div>


<br>

### Task 2 - Converting the Zoopla dataset into a list of lists

1. Import the csv library
1. Use the text attribute to receive the dataset as a string and store the string with the variable content.
2. Split the content string into a list based on the line break character and re-assign to the variable content.
3. Use the reader function from the csv library to convert the list of strings to a list of lists.
4. How many rows does the dataset have?
5. Print the column names from the first row
5. Print the last row and try to map each value with its corresponding column name to make sense of the values.

<details><summary>Click here for the solution</summary>

<pre>
import csv

content = response.text
content = content.split('\n')

zoopla = list(csv.reader(content))

print('Number of rows')
print(len(zoopla))

display(zoopla[0])

display(zoopla[-1])
</pre>
</details>

In [40]:
import csv

content = response.text
content = content.split('\n')

zoopla = list(csv.reader(content))

print('Number of rows')
print(len(zoopla))

display(zoopla[0])

display(zoopla[-1])

Number of rows
9970


['Monthly Rent',
 'Location',
 'Bedrooms',
 'Bathrooms',
 'Receptions',
 'Description',
 'Nearest Station',
 'Distance (miles)',
 'Long Description',
 'Available',
 'Link']

['6933',
 'Park Lane Place, Marble Arch, London W1K',
 '3',
 '3',
 '1',
 '3 bed flat ',
 'Marble Arch',
 '0.1',
 'This stunning and spacious lateral apartment, with excellent views over Hyde Park, comes fully furnished and is available to rent trough Prime London. Situated on a higher floor of this beautiful portered building with lift, the apartment benefits ...',
 '',
 'https://www.zoopla.co.uk/to-rent/details/55866239']

<br>
<br>

### Task 3 - Write a read_csv function

Let us review the different steps we have performed so far:

1. Download the websites content with the requests.get() method
2. Get the files content as a string with the text attribute
3. Split the string into a list of strings, where each string represents a row in the data
4. Use the reader function from the csv library to transform the list of strings into a list of lists.

A process like this should be combined into a function, so that we can simply use the functions name to perform these steps at multiple points in our notebook. We might, for instance, want to download a different dataset or reload the original dataset at some point in later our analysis.

*Write a function called **read_csv** to carry out the above four steps, which you have already written the code for in Task 1 and Task 2.*

Test whether your function returns the same dataset as previously, by printing the number of rows, the first and the last row and compare the output to what you got in Task 2.

<details><summary>Click here to get a hint</summary>

The function only needs the url as an argument and should return the parsed dataset. This is a function that can be used for any dataset and not only the Zoopla rent data. The argument names should reflect that and you shouldn't use <b>zoopla_url</b> or <b>zoopla_data</b> as variable names!

</details>

<details><summary>Click here to get another hint</summary>

Below is a scaffold to work from. The body of the function, the code between the def and return statement, is left empty for you to fill in.
    
<pre>
import requests
import csv

def read_csv(url):

        
    return data
</pre>

</details>

<details><summary>Click here for the solution</summary>

<pre>
import requests
import csv

def read_csv(url):
    response = requests.get(url)
    
    content = response.text
    content = content.split('\n')
    
    data = list(csv.reader(content))
    
    return data

zoopla = read_csv(url)

print('Number of rows')
print(len(zoopla))

display(zoopla[0])

display(zoopla[-1])
</pre>
</details>

In [43]:
import requests
import csv

def read_csv(url):
    response = requests.get(url)
    
    content = response.text
    content = content.split('\n')
    
    data = list(csv.reader(content))
    
    return data

zoopla = read_csv(url)

print('Number of rows')
print(len(zoopla))

display(zoopla[0])

display(zoopla[-1])

Number of rows
9970


['Monthly Rent',
 'Location',
 'Bedrooms',
 'Bathrooms',
 'Receptions',
 'Description',
 'Nearest Station',
 'Distance (miles)',
 'Long Description',
 'Available',
 'Link']

['6933',
 'Park Lane Place, Marble Arch, London W1K',
 '3',
 '3',
 '1',
 '3 bed flat ',
 'Marble Arch',
 '0.1',
 'This stunning and spacious lateral apartment, with excellent views over Hyde Park, comes fully furnished and is available to rent trough Prime London. Situated on a higher floor of this beautiful portered building with lift, the apartment benefits ...',
 '',
 'https://www.zoopla.co.uk/to-rent/details/55866239']

<br>

<br>

<div class="alert alert-block alert-info" style="color: white; background-color: #323031; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

<h3>Checking data integrity</h3>
    
</div>   

Whenever you import data, before jumping straight into exploring the data, you want to run a few basic tests to make sure that the data is in the correct format.

For this dataset, you want to ensure that the data has been parsed correctly, that there are no leading or trailing whitespaces and get a sense of the number and distribution of missing values.

### 1) Correct parsing of the data

Let us start by doing a basic sanity check to test whether the data has been parsed correctly. If each row has been parsed correctly, that is, each row has been split correctly into the different columns as specific in the first row, each row should have as many elements as the first row.

<br>
<br>

### Task 4 - Check for correct parsing of the rows

We want to count the number of rows that have a number of elements unequal to the first row. The count should be equal to 0, since all rows have the same number of elements.

<details><summary>Click here to get a hint</summary>

You have to use a for loop.

</details>

<details><summary>Click here to get another hint</summary>

1. Create a counter variable that is intially set to 0 before the for loop.
2. Use a for loop to iterate over the rows of the dataset.
3. Use an if statement to test whether the number of elements in a row is unequal to the number of elements in the first row.
4. Increase the counter by 1 inside the if statement.

</details>

<details><summary>Click here for the solution</summary>

<pre>
cnt = 0
for row in zoopla:
    if len(row) != len(zoopla[0]):
        cnt += 1
        
print(cnt)
</pre>
</details>

In [46]:
cnt = 0
for row in zoopla:
    if len(row) != len(zoopla[0]):
        cnt += 1
        
print(cnt)


0


&nbsp;

### 2) Removing leading and trailing whitespace

Leading and trailing whitespace can easily cause problems in your analysis, as you might get, for instance, an error message when trying to select values from your dataset.

The description and the Distance (miles) column contain trailing whitespace. 

The value `'Studio '` and the value `'0.2 '` both have a space at the end.

If you were to look for rows with the value `'Studio'` (no whitespaces), you will miss some or all of the values, since they will actually contain a trailing whitespace, as in the first row below. It is therefore important to remove all leading and trailing whitespace before further analyzing the data.

In [10]:
rents_raw[0]

['2167',
 '14 Stable Walk, London E1',
 '',
 '1',
 '1',
 'Studio ',
 'Aldgate East',
 '0.2',
 "Beautifully presented one bedroom studio in the phase Neroli House features a spacious living room with an open kitchen, with spacious double bedroom, built-in wardrobe and a large luxurious bathroom. Situated just 3 minutes' walk away from Aldgate East.",
 'Available immediately',
 'https://www.zoopla.co.uk/to-rent/details/56147079']

<br>

You can use the `.strip()` method on a string to remove leading and trailing white space. As you can see, the `.strip()` method only removes the leading and trailing white space. The spaces between the words are not removed.

In [51]:
my_string = '        this is a string      '
print(my_string)

my_string = my_string.strip()
print(my_string)

        this is a string      
this is a string


When using the print function to print a string, the quotes will be omitted, which makes it difficult to notice the trailing whitespace. We can use jupyter's display function to show the quotes around the string.

In [50]:
my_string = '        this is a string      '
display(my_string)

my_string = my_string.strip()
display(my_string)

'        this is a string      '

'this is a string'

<br>

### Task 5 - Removing leading or trailing whitespace

We want to create a new version of the data where any leading and trailing whitespace are removed.

You will have to create a new list for the data, that is being filled with stripped values of the zoopla list of lists. Call this new version of the data **zoopla_stripped**.

Check the first element in **zoopla_stripped** . Are the whitespaces gone now?

<details><summary>Click here to get a hint</summary>

You have to use a nested for loop. The main loop loops over the zoopla dataset, which we created in the previous setp. The second loop inside the main loop loops over the values from each row and uses the stripped values for the new version zoopla_stripped.

</details>

<details><summary>Click here for another hint</summary>

1. Before your for loop, create an empty list to store the new values that have been stripped of leading and trailing whitespaces. Name this list <b>zoopla_stripped</b>.
2. Write a nested for loop, that is, a main for loop with another for loop inside it. The main loop loops over the rows in the <b>zoopla</b> list of lists. Inside this main for loop, create another empty list to store stripped values. Name it <b>row_stripped</b>. 
3. Now add a second for loop inside the main loop. The second loop loops over the values of each row and appends the stripped value to the <b>row_stripped</b> list. 
4. Finally, append each <b>row_stripped</b> to the <b>zoopla_stripped</b> list.

</details>

<details><summary>Click here for the solution</summary>

<pre>
zoopla_stripped = []

for row in zoopla:
    row_stripped = []
    for value in row:
        row_stripped.append(value.strip())

    zoopla_stripped.append(row_stripped)
</pre>
</details>

In [60]:
zoopla_stripped = []

for row in zoopla:
    row_stripped = []
    for value in row:
        row_stripped.append(value.strip())

    zoopla_stripped.append(row_stripped)

In [61]:
zoopla_stripped[1]

['2167',
 '14 Stable Walk, London E1',
 '',
 '1',
 '1',
 'Studio',
 'Aldgate East',
 '0.2',
 "Beautifully presented one bedroom studio in the phase Neroli House features a spacious living room with an open kitchen, with spacious double bedroom, built-in wardrobe and a large luxurious bathroom. Situated just 3 minutes' walk away from Aldgate East.",
 'Available immediately',
 'https://www.zoopla.co.uk/to-rent/details/56147079']

<br>

### Task 6 - Turn the nested for loop into a function

Now that we have the nested for loop, try and turn it into a function. Name the function **remove_whitespace**.

Check that the function works and that the whitespaces are gone in the list it returns.

<details><summary>Click here for the solution</summary>

<pre>
def remove_whitespace(data):
    data_stripped = []
    for row in data:
        row_stripped = []
        for value in row:
            row_stripped.append(value.strip())
    data_stripped.append(row_stripped)
    return data_stripped
</pre>
</details>

In [63]:
def remove_whitespace(data):
    data_stripped = []
    for row in data:
        row_stripped = []
        for value in row:
            row_stripped.append(value.strip())
    data_stripped.append(row_stripped)
    return data_stripped



### 3) Missing values

As a next step, we want to understand whether there are missing values in our data. Just as leading and trailing whitespaces can cause problems with our analysis, so can missing values. 

The first row in our dataset contains a missing value in the thrid 'column'. The third value, which represents the number of bedrooms is an empty string. If we later want to convert the string values to integers and floats to calculate statistics, Python will return an error message

In [66]:
zoopla_stripped[1]

['2167',
 '14 Stable Walk, London E1',
 '',
 '1',
 '1',
 'Studio',
 'Aldgate East',
 '0.2',
 "Beautifully presented one bedroom studio in the phase Neroli House features a spacious living room with an open kitchen, with spacious double bedroom, built-in wardrobe and a large luxurious bathroom. Situated just 3 minutes' walk away from Aldgate East.",
 'Available immediately',
 'https://www.zoopla.co.uk/to-rent/details/56147079']

If we later want to convert the string values to integers and floats to calculate statistics, Python will return an error message

In [69]:
float(zoopla_stripped[1][2])

ValueError: could not convert string to float: ''

### Task 7 - Check for missing values

We can apply the same logic to create a dictionary that counts the number of missing values for each column. 

The final output we want is a dictionary as below, where the keys are the columns of the rents dataset and the values represent the number of missing values per column.

L

In [50]:
# final output of the missings dictionary

missings

{'Monthly Rent': 2,
 'Location': 0,
 'Bedrooms': 928,
 'Bathrooms': 817,
 'Receptions': 2968,
 'Description': 0,
 'Nearest Station': 0,
 'Distance (miles)': 0,
 'Long Description': 2,
 'Available': 2621,
 'Link': 0}

The main idea is to test for each row which columns have missing values, i.e. an empty string, and increase the count each time we find an empty string for one of the columns.

1. Create an empty dictionary called `missings`.
2. Use a for loop to create a key for each column and set the initial value to 0.
3. Loop over the rows of the rents dataset. Test whether each of the elements/columns from each row is missing (empty string: `''`) and increase the count by 1 for the corresponding column in the `missings` dictionary.

In [27]:
missings = {}

for col in columns:
    missings[col] = 0

for rent in rents_stripped:
    for i in range(len(rent)):
        if rent[i] == '':
            missings[columns[i]] += 1
            
missings

{'Monthly Rent': 2,
 'Location': 0,
 'Bedrooms': 928,
 'Bathrooms': 817,
 'Receptions': 2968,
 'Description': 0,
 'Nearest Station': 0,
 'Distance (miles)': 0,
 'Long Description': 2,
 'Available': 2621,
 'Link': 0}

### Challenge yourself - List comprehension to remove missing values (optional)

Recall the example on list comprehension above. Try and remove the lists with missing values using list comprehension.

First, let's write a for loop to count the number of lists within rents_stripped that has missing values

In [28]:
# Count the number of lists with missing values
count = 0
for row in rents_stripped:
    if '' in row:
        count += 1

In [29]:
count

5400

Create a new list of lists called 'rents_nomiss' to store the lists without any missing values. Try and apply list comprehension to shorten your code.

In [30]:
#use list comprehension to create rents_nomiss
rents_nomiss = [x for x in rents_stripped if '' not in x]

Now, let's check the total number of lists in the list again to ensure we removed the lists correctly. Do you get 4569?

In [31]:
len(rents_nomiss)

4569

### Task 5 -  Extract the rental prices

Extract the rental prices (1st element in each list in rents_nomiss) to a new list. Name it rent_prices

In [32]:
rent_prices = []
for row in rents_nomiss:
    rent_prices.append(row[0])

Print the data type of the first element in rent_prices

In [34]:
print(type(rent_prices[0]))

<class 'str'>


Now, let's convert the strings to floats for all the elements in rent_prices. We want to convert these strings to numerical values so that it is easier for us to do calculations or further analysis. Recall what we did in the third python workshop on converting data types.

In [35]:
for i in range(0, len(rent_prices)):
    rent_prices[i] = float(rent_prices[i])

Check the data type of the first element again to verify your result

In [36]:
print(type(rent_prices[0]))

<class 'float'>


### Task 6 - Finding maximum rental price

Find the maximum rental price from the rent_prices list

Try and implement a loop:

In [37]:
max_rent = rent_prices[0]

for element in rent_prices[1:]:
    if element > max_rent:
        max_rent = element

print(max_rent)

54167.0


Then check using the max() function directly

In [38]:
max(rent_prices)

54167.0

### Task 7 - Compute summary statistics

Write a function to calculate the mean, median and std deviation of any list. Use stdev, mean and median from the statistics library to help you.

In [39]:
def summary(a_list):
    from statistics import stdev, mean, median
    mean_ = mean(a_list)
    stdev_ = stdev(a_list)
    median_ = median(a_list)
    summary = f"Mean: {mean_:.2f}; Median: {median_:.2f}; Std Dev: {stdev_:.2f}"
    return summary

Now, you may also try to incorporate a dictionary in the output to store the mean, median and sd values respectively

In [40]:
def summary(a_list):
    from statistics import stdev, mean, median
    return {stat: "{:.2f}".format(func(a_list)) for stat, func in zip(['mean', 'median', 'std'], [mean, median, stdev])}

summary([1, 2, 2, 3, 3])

{'mean': '2.20', 'median': '2.00', 'std': '0.84'}

In [41]:
def summary(a_list):
    from statistics import stdev, mean, median
    summary = {}
    summary['mean'] = "{:.2f}".format(mean(a_list))
    summary['median'] = "{:.2f}".format(median(a_list))
    summary['std'] = "{:.2f}".format(stdev(a_list))
    return summary

summary([1, 2, 2, 3, 3])

{'mean': '2.20', 'median': '2.00', 'std': '0.84'}

Run your function on the rent_prices list to obtain the summary statistics

In [42]:
summary(rent_prices)

{'mean': '2827.43', 'median': '2250.00', 'std': '2372.26'}

### Task 8 -  Use a counter to find the 10 most common property types

1. Search online how to use 'Counter' to count the occurence of elements from a collection.
2. Use a counter to count the occurences of the different property types (e.g. '2 bed flat') based on the description column in rents_nomiss.
3. Select the 10 most common property types. There is a counter method to do this. Search online on how to do this.
4. Select only the property types and not the counts and store in a list named top10_property_types.

In [43]:
from collections import Counter

counter = Counter([rent[5] for rent in rents_nomiss])

counter.most_common(10)

[('2 bed flat', 1723),
 ('1 bed flat', 1689),
 ('3 bed flat', 530),
 ('4 bed flat', 123),
 ('2 bed property', 38),
 ('1 bed property', 37),
 ('2 bed maisonette', 33),
 ('2 bed duplex', 33),
 ('3 bed maisonette', 31),
 ('4 bed terraced house', 25)]

In [44]:
top10_property_types = []

for item in counter.most_common(10):
    top10_property_types.append(item[0])

top10_property_types

['2 bed flat',
 '1 bed flat',
 '3 bed flat',
 '4 bed flat',
 '2 bed property',
 '1 bed property',
 '2 bed maisonette',
 '2 bed duplex',
 '3 bed maisonette',
 '4 bed terraced house']

In [45]:
top10_property_types = [item[0] for item in counter.most_common(10)]
top10_property_types

['2 bed flat',
 '1 bed flat',
 '3 bed flat',
 '4 bed flat',
 '2 bed property',
 '1 bed property',
 '2 bed maisonette',
 '2 bed duplex',
 '3 bed maisonette',
 '4 bed terraced house']

In [46]:
top10_property_types = [key for key, value in counter.most_common(10)]
top10_property_types

['2 bed flat',
 '1 bed flat',
 '3 bed flat',
 '4 bed flat',
 '2 bed property',
 '1 bed property',
 '2 bed maisonette',
 '2 bed duplex',
 '3 bed maisonette',
 '4 bed terraced house']

### Task 9 - Calculate statistics for most common property types

1. Use the summary function to calculate the statistics for each of the 10 most common property types. You may use a for loop to make things easier for you.
2. Which has the highest and lowest median/mean price? Which property type shows the highest spread?

In [47]:
# getting a version with stripped property type and converted prices
rents_clean = []
for rent in rents_nomiss:
    rents_clean.append(rent)
    rents_clean[-1][0] = float(rents_clean[-1][0])

In [48]:
top10_property_types

['2 bed flat',
 '1 bed flat',
 '3 bed flat',
 '4 bed flat',
 '2 bed property',
 '1 bed property',
 '2 bed maisonette',
 '2 bed duplex',
 '3 bed maisonette',
 '4 bed terraced house']

In [49]:
for type_ in top10_property_types:
    print(type_)
    print(summary([float(rent[0]) for rent in rents_clean if rent[5] == type_]))

2 bed flat
{'mean': '2784.52', 'median': '2383.00', 'std': '1420.79'}
1 bed flat
{'mean': '1974.12', 'median': '1800.00', 'std': '677.68'}
3 bed flat
{'mean': '4319.76', 'median': '3499.00', 'std': '2860.35'}
4 bed flat
{'mean': '6229.89', 'median': '3575.00', 'std': '8208.86'}
2 bed property
{'mean': '2702.08', 'median': '2496.00', 'std': '1346.89'}
1 bed property
{'mean': '1940.00', 'median': '1733.00', 'std': '569.98'}
2 bed maisonette
{'mean': '2249.52', 'median': '2100.00', 'std': '476.70'}
2 bed duplex
{'mean': '3123.88', 'median': '2383.00', 'std': '2518.30'}
3 bed maisonette
{'mean': '2350.65', 'median': '2145.00', 'std': '474.33'}
4 bed terraced house
{'mean': '3990.20', 'median': '3120.00', 'std': '3452.95'}


### Good job on completing the notebook!

Now that you have worked through the notebook, we hope that you have learnt some new techniques in Python, while consolidating the knowledge from our Python Fundamentals series. Feel free to explore the dataset further and share your results!