<div class="alert alert-block alert-info" style="background-color: #323031; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">
<div style="color: #ffffff">
<h1>Python Fundamentals Guided Project - Zoopla Rents</h1>

<br>

<span style="font-size:90%">by Ximin Chen & Michael Wiemers<br>
12.05.2022</span>
</div>
</div>

<div  class="alert alert-block alert-info" style="color:#1b1b1b; background-color:#f2f2f2; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

#### Project description

In this guided project, you will explore a **dataset on apartments from Zoopla**. The goal of the project is to practice and consolidate the techniques and skills you have learned in the Python Fundamentals Series and to challenge yourself to combine different techniques in creative ways. The project will also introduce new techniques that have not been covered in the Python Fundamentals workshops.

Time to complete: 3-4 hours

#### What to do if you get stuck

Parts of this project are meant to be challenging and stretch your Python skills and you will most likely have to do a web search to find help online. Being able to use effective search terms and identify useful resources is an important competency for a programmer at any level and we therefore encourage to practice this when getting stuck. If you require 1-2-1 support, please see our website for information on how to join our [daily Coding drop-ins on Teams](https://info.lse.ac.uk/current-students/digital-skills-lab/drop-in-sessions) to get help from a trainer.


#### Learning objectives
- <a href="#download-a-csv"><b>Download a csv file</b> using the request library</a>
- <a href="#convert-data-to-list-of-lists"><b>Parse strings</b> with csv.reader into a list of lists</a>
- <a href="#checking-data-integrity">Basic <b>data integrity checks</b></a>
    - Test for number of elements in each row
    - Remove whitespace
    - Find and remove rows with missing values
- <a href="#exploring-data">Basic <b>exploratory data analysis</b></a>
    - Finding the maximum rent in the dataset
    - Compute summary stats
    - Finding the most common property types
- <a href="#bonus-challenge">Use <b>list comprehensions</b> to create lists with for loops more efficiently</a>

</div>

<img src="regents_park.jpg" width="80%">

<br>

### The Zoopla dataset

The dataset you will be working with contains data scraped from Zoopla's site on 7 Sep 2020. 

It contains 9969 data points of rental properties within 3 miles from LSE. 

You can see a screenshot of the dataset and a list with all column names below.

#### Columns:
 - Monthly rent
 - Location
 - Bedrooms
 - Bathrooms
 - Receptions
 - Descriptions
 - Nearest Station
 - Distance (miles)
 - Long Description
 - Available
 - Link
 

<img src="zoopla.png" width="2000px">

---

<div class="alert alert-block alert-info" style="color: white; background-color: #323031; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

<h3><a id="download-a-csv" style="color:white">Downloading the data</a></h3>
    
</div>    

Instead of downloading the data manually, we will use Python to do so. Being able to download a csv file programmatically and knowing how to correctly parse the downloaded string into a list of list, which we can work with, is a useful skill to have. If you have taken our Python course for tabular data, you may know that we can use pandas to load a csv file very easily. However, here, we focus on the techniques covered in our Python Fundamentals series.

The dataset itself is located at the following url:

https://raw.githubusercontent.com/mwiemers/datasets/main/zoopla.csv

To download the data in the first step, we need the **requests** library, which is a simple, yet elegant, HTTP library that allows you to send HTTP requests extremely easily, to download data from websites. 

The `request.get()` method is used to send a GET request to the specified url, which if successful will return the websites content. The content and much more information about the website is stored in the Response object that the `request.get()` method returns. We use the variable `response` to store that Response object.

For this example, we will use the main zoopla domain - `'https://www.zoopla.co.uk/'`. 

We pass the url to the `request.get()` method to request the websites content.

In [1]:
import requests

url = 'https://www.zoopla.co.uk/'

response = requests.get(url)
print(response)

<Response [200]>


We get the 200 response, which means that our request was successful and we could download the websites content, which is now stored in the `response` variable.

<br>

### Task 1 - Download the zoopla csv
Now it is your turn to download the zoopla dataset using the url https://raw.githubusercontent.com/mwiemers/datasets/main/zoopla.csv

1. Import the requests library
2. Create a variable for the url
3. Get the response for the url
4. Print the response

<details><summary>Click here for the solution</summary>

<pre>
# import requests
import requests

# set a variable for the url
url = "https://raw.githubusercontent.com/ximinchen/DSL-Project/main/zoopla_rents.csv"

# get the response
response = requests.get(url)

# print the response
print(response)

</pre>
</details>

In [3]:
# import requests
import requests

# set a variable for the url
url = "https://raw.githubusercontent.com/mwiemers/datasets/main/zoopla.csv"

# get the response
response = requests.get(url)

# print the response
print(response)


<Response [200]>


<br>

<br>

<div class="alert alert-block alert-info" style="color: white; background-color: #323031; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

<h3><a id="convert-data-to-list-of-lists" style="color:white">Converting the data to a list of lists</a></h3>
    
</div>    

We can access the websites content with the `response.text` attribute. This will show the entire html code or in this case the data, since we downloaded an csv file, as one long string. Let us save the string in the variable `content`.

In [4]:
content = response.text

In the example below, we only print the first 1000 characters of the string. The values are separated by commas. The lines are separated by `\n`, which is the special character for line breaks.

```Python
content[:1000]
```

<img src="csv_string.png">

We can easily split this string into a list of strings for each line using `content.split('\n')`. The split method splits a string into a list of smaller strings based on the character passed to the method. The method cuts off the start of the list whenever it finds a `'\n'` add this part to the list and keeps doing this until it reaches the end of the string.

Below we print the first 5 rows of the list we get.

In [5]:
content = content.split('\n')
content[:5]

['Monthly Rent,Location,Bedrooms,Bathrooms,Receptions,Description,Nearest Station,Distance (miles),Long Description,Available,Link\r',
 '2167,"14 Stable Walk, London E1",,1,1,Studio ,Aldgate East,0.2,"Beautifully presented one bedroom studio in the phase Neroli House features a spacious living room with an open kitchen, with spacious double bedroom, built-in wardrobe and a large luxurious bathroom. Situated just 3 minutes\' walk away from Aldgate East.",Available immediately,https://www.zoopla.co.uk/to-rent/details/56147079\r',
 '2578,"Exchange Gardens, London SW8",2,2,2,2 bed flat ,Vauxhall,0.2,The apartment is located on the fourth floor with lift and offers 860 sq ft of living space. The property benefits from an open plan kitchen / reception room..,Available immediately,https://www.zoopla.co.uk/to-rent/details/56147140\r',
 '1290,"Commercial Street, London E1",1,1,1,1 bed flat ,Shoreditch High Street,0.2,"City Realtor is proud to present this cozy one bedroom flat situated in an ic

We now have a list of strings, where each string in the list represents a row in our data. In this format, we cannot easily select values from each row, since all values from a row are stored in a string. It would be much easier, if we each string itself would be split into a list. That way, we can easily select values from a particular row, based on their position.
 
Effectively, we need to generate a new list of lists by iterating through each element of the content list and splitting each row into a list in itself based on the comma character. There is a much easier way to do this with the reader function from the **csv** library.

The code below uses the reader function to parse the content list of strings into a list of lists. The reader function does not itself return a list, but a reader Object. We have to use the list function to convert it into a list of lists.

In [11]:
import csv

zoopla = list(csv.reader(content))

zoopla[:2]

[['Monthly Rent',
  'Location',
  'Bedrooms',
  'Bathrooms',
  'Receptions',
  'Description',
  'Nearest Station',
  'Distance (miles)',
  'Long Description',
  'Available',
  'Link'],
 ['2167',
  '14 Stable Walk, London E1',
  '',
  '1',
  '1',
  'Studio ',
  'Aldgate East',
  '0.2',
  "Beautifully presented one bedroom studio in the phase Neroli House features a spacious living room with an open kitchen, with spacious double bedroom, built-in wardrobe and a large luxurious bathroom. Situated just 3 minutes' walk away from Aldgate East.",
  'Available immediately',
  'https://www.zoopla.co.uk/to-rent/details/56147079']]

<div class="alert alert-block alert-info" style="background-color:#e0f0ff;color:#033962">
If you are interested, to learn more about the csv library and how some of the arguments of the reader function work and also how to write a csv file to the disk, we would recommend the <a href="https://realpython.com/python-csv/">Reading and Writing CSV Files in Python</a> from RealPython.com
</div>


<br>

### Task 2 - Converting the Zoopla dataset into a list of lists

1. Import the csv library
1. Use the text attribute to receive the dataset as a string and store the string with the variable content.
2. Split the content string into a list based on the line break character and re-assign to the variable content.
3. Use the reader function from the csv library to convert the list of strings to a list of lists.
4. How many rows does the dataset have?
5. Print the column names from the first row
5. Print the last row and try to map each value with its corresponding column name to make sense of the values.

<details><summary>Click here for the solution</summary>

<pre>
import csv

content = response.text
content = content.split('\n')

zoopla = list(csv.reader(content))

print('Number of rows')
print(len(zoopla))

display(zoopla[0])

display(zoopla[-1])
</pre>
</details>

In [12]:
import csv

content = response.text
content = content.split('\n')

zoopla = list(csv.reader(content))

print('Number of rows')
print(len(zoopla))

display(zoopla[0])

display(zoopla[-1])

Number of rows
9970


['Monthly Rent',
 'Location',
 'Bedrooms',
 'Bathrooms',
 'Receptions',
 'Description',
 'Nearest Station',
 'Distance (miles)',
 'Long Description',
 'Available',
 'Link']

['6933',
 'Park Lane Place, Marble Arch, London W1K',
 '3',
 '3',
 '1',
 '3 bed flat ',
 'Marble Arch',
 '0.1',
 'This stunning and spacious lateral apartment, with excellent views over Hyde Park, comes fully furnished and is available to rent trough Prime London. Situated on a higher floor of this beautiful portered building with lift, the apartment benefits ...',
 '',
 'https://www.zoopla.co.uk/to-rent/details/55866239']

<br>
<br>

### Task 3 - Write a read_csv function

Let us review the different steps we have performed so far:

1. Download the websites content with the requests.get() method
2. Get the files content as a string with the text attribute
3. Split the string into a list of strings, where each string represents a row in the data
4. Use the reader function from the csv library to transform the list of strings into a list of lists.

A process like this should be combined into a function, so that we can simply use the functions name to perform these steps at multiple points in our notebook. We might, for instance, want to download a different dataset or reload the original dataset at some point in later our analysis.

*Write a function called **read_csv** to carry out the above four steps, which you have already written the code for in Task 1 and Task 2.*

Test whether your function returns the same dataset as previously, by printing the number of rows, the first and the last row and compare the output to what you got in Task 2.

<details><summary>Click here to open/close hint</summary>

The function only needs the url as an argument and should return the parsed dataset. This is a function that can be used for any dataset and not only the Zoopla rent data. The argument names should reflect that and you shouldn't use <b>zoopla_url</b> or <b>zoopla_data</b> as variable names!

</details>

<details><summary>Click here to open/close another hint</summary>

Below is a scaffold to work from. The body of the function, the code between the def and return statement, is left empty for you to fill in.
    
<pre>
import requests
import csv

def read_csv(url):

        
    return data
</pre>

</details>

<details><summary>Click here for open/close solution</summary>

<pre>
import requests
import csv

def read_csv(url):
    response = requests.get(url)
    
    content = response.text
    content = content.split('\n')
    
    data = list(csv.reader(content))
    
    return data

zoopla = read_csv(url)

print('Number of rows')
print(len(zoopla))

display(zoopla[0])

display(zoopla[-1])
</pre>
</details>

In [13]:
import requests
import csv

def read_csv(url):
    response = requests.get(url)
    
    content = response.text
    content = content.split('\n')
    
    data = list(csv.reader(content))
    
    return data

zoopla = read_csv(url)

print('Number of rows')
print(len(zoopla))

display(zoopla[0])

display(zoopla[-1])

Number of rows
9970


['Monthly Rent',
 'Location',
 'Bedrooms',
 'Bathrooms',
 'Receptions',
 'Description',
 'Nearest Station',
 'Distance (miles)',
 'Long Description',
 'Available',
 'Link']

['6933',
 'Park Lane Place, Marble Arch, London W1K',
 '3',
 '3',
 '1',
 '3 bed flat ',
 'Marble Arch',
 '0.1',
 'This stunning and spacious lateral apartment, with excellent views over Hyde Park, comes fully furnished and is available to rent trough Prime London. Situated on a higher floor of this beautiful portered building with lift, the apartment benefits ...',
 '',
 'https://www.zoopla.co.uk/to-rent/details/55866239']

<br>

<br>

<div class="alert alert-block alert-info" style="color: white; background-color: #323031; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

<h3><a id="checking-data-integrity" style="color:white">Checking data integrity</a></h3>
    
</div>   

Whenever you import data, before jumping straight into exploring the data, you want to run a few basic tests to make sure that the data is in the correct format.

For this dataset, you want to ensure that the data has been parsed correctly, that there are no leading or trailing whitespaces and get a sense of the number and distribution of missing values.

### 1) Correct parsing of the data

Let us start by doing a basic sanity check to test whether the data has been parsed correctly. If each row has been parsed correctly, that is, each row has been split correctly into the different columns as specific in the first row, each row should have as many elements as the first row.

<br>
<br>

### Task 4 - Check for correct parsing of the rows

We want to count the number of rows that have a number of elements unequal to the first row. The count should be equal to 0, since all rows have the same number of elements.

<details><summary>Click here to get a hint</summary>

You have to use a for loop.

</details>

<details><summary>Click here to get another hint</summary>

1. Create a counter variable that is intially set to 0 before the for loop.
2. Use a for loop to iterate over the rows of the dataset.
3. Use an if statement to test whether the number of elements in a row is unequal to the number of elements in the first row.
4. Increase the counter by 1 inside the if statement.

</details>

<details><summary>Click here for the solution</summary>

<pre>
cnt = 0
for row in zoopla:
    if len(row) != len(zoopla[0]):
        cnt += 1
        
print(cnt)
</pre>
</details>

In [3]:
cnt = 0
for row in zoopla:
    if len(row) != len(zoopla[0]):
        cnt += 1
        
print(cnt)


0


&nbsp;

### 2) Removing leading and trailing whitespace

Leading and trailing whitespace can easily cause problems in your analysis, as you might get, for instance, an error message when trying to select values from your dataset.

The description and the Distance (miles) column contain trailing whitespace. 

The value `'Studio '` and the value `'0.2 '` both have a space at the end.

If you were to look for rows with the value `'Studio'` (no whitespaces), you will miss some or all of the values, since they will actually contain a trailing whitespace, as in the first row below. It is therefore important to remove all leading and trailing whitespace before further analyzing the data.

In [10]:
rents_raw[0]

['2167',
 '14 Stable Walk, London E1',
 '',
 '1',
 '1',
 'Studio ',
 'Aldgate East',
 '0.2',
 "Beautifully presented one bedroom studio in the phase Neroli House features a spacious living room with an open kitchen, with spacious double bedroom, built-in wardrobe and a large luxurious bathroom. Situated just 3 minutes' walk away from Aldgate East.",
 'Available immediately',
 'https://www.zoopla.co.uk/to-rent/details/56147079']

<br>

You can use the `.strip()` method on a string to remove leading and trailing white space. As you can see, the `.strip()` method only removes the leading and trailing white space. The spaces between the words are not removed.

In [4]:
my_string = '        this is a string      '
print(my_string)

my_string = my_string.strip()
print(my_string)

        this is a string      
this is a string


When using the print function to print a string, the quotes will be omitted, which makes it difficult to notice the trailing whitespace. We can use jupyter's display function to show the quotes around the string.

In [5]:
my_string = '        this is a string      '
display(my_string)

my_string = my_string.strip()
display(my_string)

'        this is a string      '

'this is a string'

<br>

### Task 5 - Removing leading or trailing whitespace

We want to create a new version of the data where any leading and trailing whitespace are removed.

You will have to create a new list for the data, that is being filled with stripped values of the zoopla list of lists. Call this new version of the data **zoopla_stripped**.

Check the first element in **zoopla_stripped** . Are the whitespaces gone now?

<details><summary>Click here to get a hint</summary>

You have to use a nested for loop. The main loop loops over the zoopla dataset, which we created in the previous setp. The second loop inside the main loop loops over the values from each row and uses the stripped values for the new version zoopla_stripped.

</details>

<details><summary>Click here for another hint</summary>

1. Before your for loop, create an empty list to store the new values that have been stripped of leading and trailing whitespaces. Name this list <b>zoopla_stripped</b>.
2. Write a nested for loop, that is, a main for loop with another for loop inside it. The main loop loops over the rows in the <b>zoopla</b> list of lists. Inside this main for loop, create another empty list to store stripped values. Name it <b>row_stripped</b>. 
3. Now add a second for loop inside the main loop. The second loop loops over the values of each row and appends the stripped value to the <b>row_stripped</b> list. 
4. Finally, append each <b>row_stripped</b> to the <b>zoopla_stripped</b> list.

</details>

<details><summary>Click here for the solution</summary>

<pre>
zoopla_stripped = []

for row in zoopla:
    row_stripped = []
    for value in row:
        row_stripped.append(value.strip())

    zoopla_stripped.append(row_stripped)
</pre>
</details>

In [14]:
zoopla_stripped = []

for row in zoopla:
    row_stripped = []
    for value in row:
        row_stripped.append(value.strip())

    zoopla_stripped.append(row_stripped)

In [61]:
zoopla_stripped[1]

['2167',
 '14 Stable Walk, London E1',
 '',
 '1',
 '1',
 'Studio',
 'Aldgate East',
 '0.2',
 "Beautifully presented one bedroom studio in the phase Neroli House features a spacious living room with an open kitchen, with spacious double bedroom, built-in wardrobe and a large luxurious bathroom. Situated just 3 minutes' walk away from Aldgate East.",
 'Available immediately',
 'https://www.zoopla.co.uk/to-rent/details/56147079']

<br>

### Task 6 - Turn the nested for loop into a function

Now that we have the nested for loop, try and turn it into a function. Name the function **remove_whitespace**.

Check that the function works and that the whitespaces are gone in the list it returns.

<details><summary>Click here for the solution</summary>

<pre>
def remove_whitespace(data):
    data_stripped = []
    for row in data:
        row_stripped = []
        for value in row:
            row_stripped.append(value.strip())
    data_stripped.append(row_stripped)
    return data_stripped
</pre>
</details>

In [63]:
def remove_whitespace(data):
    data_stripped = []
    for row in data:
        row_stripped = []
        for value in row:
            row_stripped.append(value.strip())
    data_stripped.append(row_stripped)
    return data_stripped



### 3) Missing values

As a next step, we want to understand whether there are missing values in our data. Just as leading and trailing whitespaces can cause problems with our analysis, so can missing values. 

The first row in our dataset contains a missing value in the thrid 'column'. The third value, which represents the number of bedrooms is an empty string.

In [66]:
zoopla_stripped[1]

['2167',
 '14 Stable Walk, London E1',
 '',
 '1',
 '1',
 'Studio',
 'Aldgate East',
 '0.2',
 "Beautifully presented one bedroom studio in the phase Neroli House features a spacious living room with an open kitchen, with spacious double bedroom, built-in wardrobe and a large luxurious bathroom. Situated just 3 minutes' walk away from Aldgate East.",
 'Available immediately',
 'https://www.zoopla.co.uk/to-rent/details/56147079']

If we later want to convert the string values to integers and floats to calculate statistics, Python will return an error message, since it cannot convert an empty string to a float.

In [69]:
float(zoopla_stripped[1][2])

ValueError: could not convert string to float: ''

### Task 7 - Check for rows with missing values

Let us first get a sense of the number of rows we have with missing values. Calculate the percentage of rows that have one or more missing values.

<details><summary>Get a hint</summary>

1. Create a variable for the total number of rows in the data
2. Calculate the number fo rows with at least one missing value
3. Calculate a new variable for the percentage of missing rows
    
</details>

<details><summary>Get another hint</summary>

You will need to use a for loop to iterate through the entire data. Before the for loop initiate a counter for the number of rows with missing values. On each iteration increase the counter by 1 if the row has a missing value in it. 
</details>

<details><summary>Get yet another hint</summary>

Use an if statement to test whether there is an empty string inside the row.
</details>

<details><summary>Click here for the solution</summary>

<pre>
n_rows = len(zoopla_stripped)
n_missing_rows = 0

for row in zoopla_stripped:
    if '' in row:
        n_missing_rows += 1
        
perc_missing_rows = n_missing_rows/n_rows
print(perc_missing_rows)
</pre>
</details>

In [15]:
n_rows = len(zoopla_stripped)
n_missing_rows = 0

for row in zoopla_stripped:
    if '' in row:
        n_missing_rows += 1
        
perc_missing_rows = n_missing_rows/n_rows
print(perc_missing_rows)

0.5416248746238717


### Task 8 - Check for columns with missing values

Now we know that roughly 54% of rows have a missing value. To get a full picture of the missing values in our data, we need to know how many missings we have for each column. If a column has a very high percentage of missings, we might choose to not use it in our analysis, since it would strongly limit the number of complete instances we have available.

Use a list of lists to store the percentage of missing values per column, as in the correct output below.
```
[['Monthly Rent', 0.0002],
 ['Location', 0.0],
 ['Bedrooms', 0.0931],
 ['Bathrooms', 0.0819],
 ['Receptions', 0.2977],
 ['Description', 0.0],
 ['Nearest Station', 0.0],
 ['Distance (miles)', 0.0],
 ['Long Description', 0.0002],
 ['Available', 0.2629],
 ['Link', 0.0]]

```

<details><summary>Get a hint</summary>

You will have to use nested for loops.
    
The first for loop selects the index of the columns. The second for loop loops over the rows. For each row, you need to check whether the value at the position corresponding to a specific column is an empty string. If that is the case, increase a counter for the missings of that column by 1.

    
</details>

<details><summary>Get another hint</summary>

Use this coding scaffold and try to fill in the missing bits to complete the for loop.
    
<pre>
perc_missing_per_col = []

for i in _ _ _ _ _ _ _ _ _:
    n_missing = 0
    for row in _ _ _ _ _ _ _ _:
        if _ _ _  == '':
            n_missing += 1
    per_missing = n_missing / len(zoopla_stripped)
    perc_missing_per_col.append([_ _ _ _ _ _ _ _ _, _ _ _ _ _ _ _ _ _ _])
    
</pre>
</details>

<details><summary>Click here for the solution</summary>

<pre>
perc_missing_per_col = []

for i in range(len(zoopla_stripped[0])):
    n_missing = 0
    for row in zoopla_stripped:
        if row[i] == '':
            n_missing += 1
    per_missing = n_missing / len(zoopla_stripped)
    perc_missing_per_col.append([zoopla_stripped[0][i], round(per_missing, 4)])
    
perc_missing_per_col
</pre>
</details>

In [24]:
perc_missing_per_col = []

for i in range(len(zoopla_stripped[0])):
    n_missing = 0
    for row in zoopla_stripped:
        if row[i] == '':
            n_missing += 1
    per_missing = n_missing / len(zoopla_stripped)
    perc_missing_per_col.append([zoopla_stripped[0][i], round(per_missing, 4)])
    
perc_missing_per_col

[['Monthly Rent', 0.0002],
 ['Location', 0.0],
 ['Bedrooms', 0.0931],
 ['Bathrooms', 0.0819],
 ['Receptions', 0.2977],
 ['Description', 0.0],
 ['Nearest Station', 0.0],
 ['Distance (miles)', 0.0],
 ['Long Description', 0.0002],
 ['Available', 0.2629],
 ['Link', 0.0]]

### Task 9 - Remove rows with missing values

There are many different ways to deal with missing data. For this guided project, we will simply remove those instances/rows with missing values. Other approaches consist of some type of data imputation, where missing values are being replaced with a specific value. The most basic type of imputation consists of using the median or mean of the column. More complex approaches require modelling of the data so that missing values are being replaced based on the patterns inferred from the data.

1. Create a new subset of the data under the name zoopla_nomiss, which consists only of rows that have no missing values.
2. Check that the number of rows of zoopla_nomiss is 4570.

<details><summary>Get a hint</summary>

1. Create a new empty list named zoopla_nomiss.
2. Use a for loop to loop over the rows of zoopla_stripped.
3. Use an if statement to check that there are no empty strings in a row.
4. Add those rows for which the if statement is true to the zoopla_nomiss.
    
</details>

<details><summary>Click here for the solution</summary>

<pre>
zoopla_nomiss = []

for row in zoopla_stripped:
    if not '' in row:
        zoopla_nomiss.append(row)

print(len(zoopla_nomiss))
</pre>
</details>

In [29]:
zoopla_nomiss = []

for row in zoopla_stripped:
    if not '' in row:
        zoopla_nomiss.append(row)

print(len(zoopla_nomiss))

4570


<br>

<br>

<div class="alert alert-block alert-info" style="color: white; background-color: #323031; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

<h3><a id="exploring-data" style="color: white">Exploring the data</a></h3>
    
</div>   

We've cleaned the dataset sufficiently to be able to carry out some basic analysis where we will calculate descriptives to get a sense of the relationships between different variables in the data.

In the first step, we will calculate summary statistics, like the mean, min and max value for the rental price. In the next steps, we will calculate frequencies for property types and then calculate the summary statistics for the most common property type.

### Task 10 -  Convert rent prices to float

Before we can calculate the summary statistics for the rental prices, we first need to extract the rental prices from the data and store them in a new list called **rent_prices**.

1. Create a new list called **rent_prices** and add the rent prices converted to float to the list.
2. Print the type of the first element to check that the values have the float type.

<details><summary>Get a hint</summary>

Use a for loop to loop over the rows and append the converted rent values to the **rent_prices** list.
    
</details>

<details><summary>Get another hint</summary>

You have to loop over the data starting from the 2nd row. The first row holds the column names.
    
</details>

<details><summary>Click here for the solution</summary>

<pre>
rent_prices = []
for row in rents_nomiss[1:]:
    rent_prices.append(float(row[0]))
    
print(type(rent_prices[0]))
</pre>
</details>

In [33]:
rent_prices = []
for row in zoopla_nomiss[1:]:
    rent_prices.append(float(row[0]))

In [35]:
print(type(rent_prices[0]))

<class 'float'>


### Task 11 - Compute summary statistics

We now have the rental prices stored as the correct type in the **rent_prices** list. Let us calculate some summary statistics for the rental prices.

Write a function called **summary** that takes as input a list with numerical values and calculates the mean, median, standard deviation, min and max. Use functions from the *statistics* package to calculate the standard deviation, mean and median. Your function should return a string like below.

Don't worry about the exact formatting, which is a bit tricky to achieve.

<pre>
Mean:         2827.43
Median:       2250.00
Stdev:        2372.26
Min:           500.00
Max:         54167.00
</pre>

<details><summary>Click here for the solution</summary>

<pre>
def summary(data):
    from statistics import stdev, mean, median
    mean_ = mean(data)
    median_ = median(data)
    std_ = stdev(data)
    min_ = min(data)
    max_ = max(data)
    print('Mean: ' + str(round(mean_, 2)) + '\n'\
          'Median: ' +  str(round(median_, 2)) + '\n'\
          'Stdev: ' +  str(round(std_, 2)) + '\n'\
          'Min: ' +  str(round(min_, 2)) + '\n'\
          'Max: ' +  str(round(max_, 2)) + '\n'\
         )

</pre>
</details>

<details><summary>Click here for the solution with nicer formatting using an f-string</summary>

<pre>
def summary(data):
    from statistics import stdev, mean, median
    lb = '\n'
    print(f"\
{'Mean:':10} {mean(data):10.2f}{lb}\
{'Median:':10} {median(data):10.2f}{lb}\
{'Stdev:':10} {stdev(data):10.2f}{lb}\
{'Min:':10} {min(data):10.2f}{lb}\
{'Max:':10} {max(data):10.2f}{lb}\
"
)

</pre>
</details>

In [106]:
def summary(data):
    from statistics import stdev, mean, median
    lb = '\n'
    print(f"\
{'Mean:':10} {mean(data):10.2f}{lb}\
{'Median:':10} {median(data):10.2f}{lb}\
{'Stdev:':10} {stdev(data):10.2f}{lb}\
{'Min:':10} {min(data):10.2f}{lb}\
{'Max:':10} {max(data):10.2f}{lb}\
"
)




summary(rent_prices)

Mean: 2827.43
Median: 2250.0
Stdev: 2372.26
Min: 500.0
Max: 54167.0



### Task 12 -  Use a counter to find the 10 most common property types

1. Create a new list called **property_types** to hold the property type values from the zoopla_nomiss list of lists.
2. Search online how to use a 'Counter' to count the occurence of elements from a collection.
3. Use a counter to count the occurences of the different property types in the **property_types** list.
4. Select the 10 most common property types. There is a counter method to do this. Search online on how to do this.
5. Select only the property types and not the counts and store in a list named top10_property_types.

<details><summary>Click here for a hint</summary>


To create the property_types list use a for loop to loop over the rows from **zoopla_nomiss** and extract the values from the property type column at index 5.


</details>

<details><summary>Click here for the solution to create the property_types list</summary>
 
<pre>
property_types = []
for row in zoopla_nomiss:
    property_types.append(row[5])

</pre>
</details>

<details><summary>Click here for the solution to select the most common values</summary>
 
<pre>
from collections import Counter

top10_freq_property_type = counter.most_common(10)
</pre>
</details>

<details><summary>Click here for the solution to select only the property types and not the counts.</summary>
 
<pre>
top10_property_types = []
for item in top10_freq_property_type:
    top10_property_types.append(item[0])
    
</pre>
</details>

In [115]:
from collections import Counter

property_types = []
for row in zoopla_nomiss:
    property_types.append(row[5])

counter = Counter(property_types)

top10_freq_property_type = counter.most_common(10)

top10_freq_property_type

[('2 bed flat', 1723),
 ('1 bed flat', 1689),
 ('3 bed flat', 530),
 ('4 bed flat', 123),
 ('2 bed property', 38),
 ('1 bed property', 37),
 ('2 bed maisonette', 33),
 ('2 bed duplex', 33),
 ('3 bed maisonette', 31),
 ('4 bed terraced house', 25)]

In [116]:
top10_property_types = []
for item in top10_freq_property_type:
    top10_property_types.append(item[0])

top10_property_types

['2 bed flat',
 '1 bed flat',
 '3 bed flat',
 '4 bed flat',
 '2 bed property',
 '1 bed property',
 '2 bed maisonette',
 '2 bed duplex',
 '3 bed maisonette',
 '4 bed terraced house']

### Task 13 - Calculate statistics for most common property types

1. Use the summary function to calculate the statistics for each of the 10 most common property types.
2. Which has the highest and lowest median/mean price? Which property type shows the highest spread?

In [122]:
for type_ in top10_property_types:
    print(type_)

2 bed flat
1 bed flat
3 bed flat
4 bed flat
2 bed property
1 bed property
2 bed maisonette
2 bed duplex
3 bed maisonette
4 bed terraced house


In [124]:
for type_ in top10_property_types:
    print(type_, 'prices')
    rental_prices = []
    for row in zoopla_nomiss[1:]:
        if row[5] == type_:
            rental_prices.append(float(row[0]))

    summary(rental_prices)

2 bed flat prices
Mean: 2784.52
Median: 2383.0
Stdev: 1420.79
Min: 1300.0
Max: 16900.0

1 bed flat prices
Mean: 1974.12
Median: 1800.0
Stdev: 677.68
Min: 900.0
Max: 13520.0

3 bed flat prices
Mean: 4319.76
Median: 3499.0
Stdev: 2860.35
Min: 780.0
Max: 30333.0

4 bed flat prices
Mean: 6229.89
Median: 3575.0
Stdev: 8208.86
Min: 1950.0
Max: 54167.0

2 bed property prices
Mean: 2702.08
Median: 2496.0
Stdev: 1346.89
Min: 1350.0
Max: 9533.0

1 bed property prices
Mean: 1940.0
Median: 1733.0
Stdev: 569.98
Min: 1200.0
Max: 4398.0

2 bed maisonette prices
Mean: 2249.52
Median: 2100.0
Stdev: 476.7
Min: 1675.0
Max: 4000.0

2 bed duplex prices
Mean: 3123.88
Median: 2383.0
Stdev: 2518.3
Min: 1550.0
Max: 14733.0

3 bed maisonette prices
Mean: 2350.65
Median: 2145.0
Stdev: 474.33
Min: 1850.0
Max: 3683.0

4 bed terraced house prices
Mean: 3990.2
Median: 3120.0
Stdev: 3452.95
Min: 2383.0
Max: 20150.0



<br>

<br>

<div class="alert alert-block alert-info" style="color: white; background-color: #55378b; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">

<h3><a id="bonus-challenge" style="color:white">Bonus Challenge - List comprehension (optional)</a></h3>
    
</div>

List comprehensions are a useful technique in Python to create lists with for loops more efficiently.

Below is an example of how to convert the code to create the rent_prices list with a for loop to a list comprehension.

#### Original code from Task 10 using a for loop

In [149]:
rent_prices = []
for row in zoopla_nomiss[1:]:
    rent_prices.append(float(row[0]))

rent_prices[:10]

[2578.0,
 1290.0,
 5633.0,
 4767.0,
 2275.0,
 1450.0,
 4767.0,
 1408.0,
 2383.0,
 1700.0]

#### Same task but using a list comprehension

In [151]:
rent_prices1 = [float(row[0]) for row in zoopla_nomiss[1:]]

rent_prices1[:10]

[2578.0,
 1290.0,
 5633.0,
 4767.0,
 2275.0,
 1450.0,
 4767.0,
 1408.0,
 2383.0,
 1700.0]

The main advantages of using list comprehensions are that they save us a few keystrokes, make the code look more concise and elegant and once you got used to them, definitely make your code easier to read.

To better understand how list comprehensions work, have a look at [this video from realpython.com](https://www.youtube.com/watch?v=1HlyKKiGg-4) or these [examples from w3school](https://www.w3schools.com/python/python_lists_comprehension.asp).

Convert the code from task 5/6, 9, 10, 12 and 13 from normal for loops to list comprehensions.

Convert the for loops in this order (from easiest to more difficult):
- task 12 (simple list comprehension)
- task 9 and 13 (list comprehension with if statement)
- task 5/6 (nested list comprehension)

<details><summary>Click here for the solution for <b>task 12</b></summary>
 
<pre>
property_types = [row[5] for row in zoopla_nomiss]
</pre>
</details>

<details><summary>Click here for the solution for <b>task 9</b></summary>
 
<pre>
zoopla_nomiss = [row for row in zoopla_stripped if not '' in row]
</pre>
</details>

<details><summary>Click here for the solution for <b>task 13</b></summary>
 
<pre>
for type_ in top10_property_types:
    print(type_, 'prices')
    rental_prices = [float(row[0]) for row in zoopla_nomiss[1:] if row[5] == type_]
    summary(rental_prices)
</pre>
</details>

<details><summary>Click here for the solution for <b>task 5/6</b></summary>
 
<pre>
zoopla_stripped = [[ value.strip() for value in row] for row in zoopla]
</pre>
</details>

In [None]:
# list comprehension for task 12



In [None]:
# list comprehension for task 9



In [None]:
# list comprehension for task 13



In [None]:
# list comprehension for task 5/6



### Amazing job on completing the notebook!

Now that you have worked through the notebook, we hope that you have learnt some new techniques in Python, while consolidating the knowledge from our Python Fundamentals series. Feel free to explore the dataset further and share your results!

<details><summary><b>Click here to celebrate!</b></summary>

<h1 align="center"> CONGRATULATIONS!!! </h1>
    
<img src="https://raw.githubusercontent.com/mwiemers/images/main/completion_celebrate.gif">
    
</details>