## 1.0 Introduction



There's been a lot of hype about data science in the last couple of years. Fortunately, this is just the beginning. Using data science, people have been able to build some amazing technologies so far:


<img width="500" src="https://drive.google.com/uc?export=view&id=1r3LrEe3oZF7ZKglDIEJOEAa6fBMhkg5W">


To build data science technologies, we generally need to perform billions of computations over large sets of data. To make a computer do the computations we want it to do, we need to give it the proper instructions. When we give a computer a set of instructions, we say that we're **programming** it. In this course, we'll learn **Python**, which is arguably the best programming language for data science 👍👍👍👍




### 1.1 Python version

There are currently two different supported versions of Python, 2.7 and 3.6. Python 3.x introduced many backwards-incompatible changes to the language, so code written for 2.7 may not work under 3.x and vice versa. For this class all code will use Python 3.6. 

https://wiki.python.org/moin/Python2orPython3

"Python 2.x is legacy, Python 3.x is the present and future of the language""



In [1]:
import sys
print(sys.version)

3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0]


### 1.2 Expressions

To introduce the basic concepts in Python, we'll start by understanding how to evaluate basic **expressions**. If you've ever used a calculator, you're familiar with the process of writing and running mathematical expressions like **4 + 5** or **(1 + 2 + 3) / 3**. Your calculator evaluates what you entered and immediately returns the result.



In [2]:
4+5

9

In [3]:
(1+2+3)/3

2.0

Python has multiple arithmetic operators that allow you to express calculations between values. In the following diagram is a list of the main arithmetic operators, some simple expressions that use each operator, and the result of each expression if we ran it in the Python console.

<center>
<img width="600" src="https://drive.google.com/uc?export=view&id=1PegHzSWBTzLRFDwoizeuJ1YqCzDGAEgn">
</center>

In [4]:
exp_1 = 5 + 5 + 5 / 3
exp_2 = (5 + 5 + 5) / 3

print("Exp #1: {0:.2f} \nExp #2: {1:.2f}".format(exp_1,exp_2))

Exp #1: 11.67 
Exp #2: 5.00


Assuming we'd explore parts of a dataset on U.S. crime rates. For each city the number of violent crimes that occurred in 2012 per 100,000 people are listed next to the city. Here's what the first 5 rows of that dataset look like:

| City | Crime rates  |
|-------------|------|
| Albuquerque | 749  |
| Anaheim     | 371  |
| Anchorage   | 828  |
| Arlington   | 503  |
| Atlanta     | 1379 |

Let's calculate the average of these 5 violent crime rates.

**Exercise**


<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ"/>


1. Calculates the average of **749, 371, 828, 503, 1379** and assign it to variable **crime_rates_avg**. 
2. Display the result using the print() function.



In [0]:
# put your code here



### 1.3 Data types

We've been working with whole numbers like **1379**, which are known as **integers**. The two most common numerical types in Python are **integer** and **float**, which is used to represent fractional (or decimal) values. **3.5** and **4.1111** are both examples of float values.

The most common non-numerical type is a **string**, which is used to represent text. To represent a piece of text as a string value, surround the text with either single quotes (') or double quotes ("). **'Hello'** and **"Hello World!"** are both examples of **string** values.

In [5]:
print("Hello")

Hello


Unlike variable names, strings can contain special characters and spaces. You can assign a string to a variable in the same way you'd assign it a numeric value:

In [6]:
atlanta = "Atlanta"
atlanta

'Atlanta'

You may have noticed a pattern here. **Numerical values like integers and floats don't require quotation marks, but strings do**. The way in which you enter a value tells Python what data type it is. Python will use the data type to determine how the value should be handled. For example, Python allows integer variables to be divided, but not string variables. We'll learn more about that soon, but first let's practice some of the concepts you've learned so far.



**Exercise**

<left><img  width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

1. Assign the string value **"Atlanta"** to **atlanta_string**.
2. Assign the float value **1379.5** to **atlanta_float**.
3. Print all variables using print(). 

In [0]:
# put your code here

**The Type function**

We can look up the data type of a variable's value using the **type()** function. Similar to the **print()** function, you pass a value (or variable) into the parentheses. Unlike the **print()** statement, however, the **type()** statement won't display anything. Instead, it will return the data type as a value, which can be assigned to another variable or displayed using the **print()** statement:

In [0]:
hello = 'Hello'
hello_type = type(hello)
print(hello_type)

<class 'str'>


This will return the string **class 'str'**, which means that the value associated with **hello** is a string (**str** is short for string). To avoid having to create a variable each time, you can chain the **print()** and **type()** statements:

In [0]:
hello = 'Hello'
print(type(hello))

<class 'str'>


**Exercise**

<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


1. Display the type of **atlanta_string**.

In [0]:
# put your code  here

## 2.0 Lists

So far, we've been storing individual values in variables. Often in data science, we're working with thousands of data points that are grouped together in a certain way and have an order to them. We need a container that can hold multiple values that we can use to perform operations on.

We can use a list, which is an object that represents a sequence of values. In our example we will use a dataset about app recomendation in [Apple Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

| |track_name | price                   | currency | rating_count_tot | user_rating |     |
|------------|-------------------------|----------|------------------|-------------|-----|
| 0          | Facebook                | 0.0      | USD              | 2974676     | 3.5 |
| 1          | Instagram               | 0.0      | USD              | 2161558     | 4.5 |
| 2          | Clash of Clans          | 0.0      | USD              | 2130805     | 4.5 |
| 3          | Temple Run              | 0.0      | USD              | 1724546     | 4.5 |
| 4          | Pandora - Music & Radio | 0.0      | USD              | 1126879     | 4.0 |

Each value in the table is a **data point**. For instance, the first row has five data points:

- Facebook
- 0.0
- USD
- 2974676
- 3.5

A collection of data points make up a data set. We can understand our entire table above as a collection of data points, so we call the entire table a data set. We can see that our data set has five rows and five columns. When we work with data sets, we need to store them in the computer memory to be able to retrieve and manipulate the data points.

Above, we stored:

- The text "Facebook" as a **string**
- The price 0.0 as a **float**
- The text "USD" as a **string**
- The rating count 2,974,676 as an **integer**
- The user rating 3.5 as a **float**

Creating a variable for each data point in our data set would be a cumbersome process. Fortunately, we can store data more efficiently using **lists**. This is how we can create a list of data points for the first row:

In [0]:
row_1 = ["Faceboo",0.0,"USD",2974676,3.5]
print(row_1)
print(type(row_1))

**Exercise**

<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Store the second row **('Instagram', 0.0, 'USD', 2161558, 4.5)** as a **list** in a variable named **row_2**.
- Store the third row **('Clash of Clans', 0.0, 'USD', 2130805, 4.5)** as a **list** in a variable named **row_3**.

In [0]:
# put your code here

### 2.1 Indexing

A **list** can contain both mixed and identical data types (so far we've learned four data types: **integers**, **floats**, **strings**, and **lists**). A list like [4, 5, 6] has identical data types (only **integers**), while the list *['Facebook', 0.0, 'USD', 2974676, 3.5]* has mixed data types:

- Two **strings** ('Facebook', 'USD')
- Two **floats** (0.0, 3.5)
- One **integer** (2974676)

The *['Facebook', 0.0, 'USD', 2974676, 3.5]* list has five data points. To find the length of a list we can use the **len()** command:

In [0]:
row_1 = ["Faceboo",0.0,"USD",2974676,3.5]
print(len(row_1))

list_1 = [1,3,5]
print(len(list_1))

list_2 = []
print(len(list_2))

For small lists, we can just count the data points on our screens to find the length, but we'll see that the **len()** command will prove very useful later on, when we'll work with lists containing thousands of elements (we'll see an actual example later in this mission).

Each element (data point) in a **list** has a specific number associated with it, called an **index number**. The indexing always starts at 0, so the first element will have the index number 0, the second element the index number 1, and so on.

<left><img width="300" src="https://drive.google.com/uc?export=view&id=175h_iYr4BFByjpDG8inNdDuTJH8SP0Xw" /></left>

To quickly find the index of a **list** element, identify its position number in the list, and then subtract . For example, the string **'USD'** is the third element of the list (position number 3), so its index number must be 2 since 3-1.


In [0]:
row_1[2]

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

In the cell below, you can already see the **lists** for the first three rows. The fourth element in each list describes the number of ratings an app has received. Retrieve this fourth element from each **list**, and then find the average value of the retrieved numbers.
- Assign the fourth element from the list **row_1** to a variable named **ratings_1**. Don't forget that the indexing starts at 0.
- Assign the fourth element from the list **row_2** to a variable named **ratings_2**.
- Assign the fourth element from the list **row_3** to a variable named **ratings_3**.
- Add the three numbers retrieved together and save the sum to a variable named **total**.
- Divide the sum (now saved in the variable total) by 3 to get the average number of ratings for the first three rows. Assign the result to a variable named **average**.



In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]

# put your code here

### 2.2 Negative indexing

In Python, we have two indexing systems for lists:

- **Positive indexing**: the first element has the index number 0, the second element has the index number 1, and so on.
- **Negative indexing**: the last element has the index number -1, the second to last element has the index number -2, and so on.

<center><img width="300" src="https://drive.google.com/uc?export=view&id=1fOXkIG2DZQ2cphGDUdbsl5P2pkbKCyPT" /></center>

In practice, we almost always use positive indexing to retrieve list elements. Negative indexing is useful when we want to select the last element of a list — especially if the list is long, and we can't tell the length by counting.


In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
print(row_1[-1])
print(row_1[4])

Notice that if we use an index number that is outside the range of the two indexing systems, we'll get an **IndexError**.

In [0]:
row_1[-6]

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Repeat the previous exercise but now using the negative indexing as reference.


In [0]:
# put your code here

### 2.3 Retrieving Multiple List Elements

Oftentimes, we need to retrieve more than one element from a list. Let's say we have the list **['Facebook', 0.0, 'USD', 2974676, 3.5]**, and we're interested in isolating only the name of the app and the data about ratings (the number of ratings and the rating). This is how we could do that, using what we've learned so far:

In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]

app_name = row_1[0]
n_of_ratings = row_1[3]
rating = row_1[-1]

If we wanted to do this for every app, we'd end up having a lot of variables, which will make our code lengthy and hard to keep track of. A better solution is to store the data we want in a separate list.



In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]

fb_rating_data = [row_1[0],row_1[3],row_1[-1]]
print(fb_rating_data)

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- For **Facebook**, **Instagram**, and **Pandora — Music & Radio**, isolate the rating data in separate lists. Each list should contain the name of the app, the rating count, and the user rating. Don't forget that indexing starts at 0.
  - For **Facebook**, assign the list to a variable named **fb_rating_data**.
  - For **Instagram**, assign the list to a variable named **insta_rating_data**.
  - For **Pandora — Music & Radio**, assign the list to a variable named **pandora_rating_data**.
- Compute the average rating for **Facebook**, **Instagram**, and **Pandora — Music & Radio** using the data you stored in **fb_rating_data**, **insta_rating_data**, and **pandora_rating_data**.
  - You'll need to add the ratings together first, and then divide the total by the number of ratings.
  - Assign the result to a variable named **avg_rating.**

In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

# put your code here

### 2.4 List Slicing

In the last exercise, we had to retrieve the first three list elements when we isolated pricing data.

In [0]:
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
cc_pricing_data = [row_3[0],row_3[1],row_3[2]]
cc_pricing_data

Instead of selecting element by element, we can use a syntax shortcut:





In [0]:
row_3[0:3]

When we select the first **n** elements (**n** stands for a number) from a list named **a_list**, we can use the syntax shortcut **a_list[0:n]**. In the example above, we needed to select the first three elements from the list **row_3**, so we used **row_3[0:3]**.

When we selected the first three elements, we sliced a part of the list. For this reason, the process of selecting a part of a list is called **list slicing**.

There are many ways that we might want to slice a **list**:

<center><img width="300" src="https://drive.google.com/uc?export=view&id=1Dnfa2RK-mQl48JPc1LOhdJ78Pfvt7xZ6" /></center>

To retrieve any list slice we want:

- We first need to identify the first and the last element of the slice.
- We then need to identify the index numbers of the first and the last element of the slice.
- Finally we can retrieve the list slice we want by using the syntax **a_list[m:n]**, where:
    - **m** represents the index number of the first element of the slice; and
    - **n** represents the index number of the last element of the slice plus one (if the last element has the index number 2, then we **n** will be 3, if the last element has the index number 4, then **n** will be 5, and so on).
    
<center><img width="400" src="https://drive.google.com/uc?export=view&id=17-LsPPgvRSLUk0kw-pJpWUK0ulv3tEds" /></center>

When we need to select the first or last **x** elements (**x** stands for a number), we can use even simpler syntax shortcuts:

- **a_list[:x]** when we want to select the first **x** elements.
- **a_list[-x:]** when we want to select the last **x** elements.

In [0]:
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]

print(row_3[:2])
print(row_3[-2:])

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Select the first four elements from **row_1** using a list slicing syntax shortcut. Assign the output to a variable named **first_4_fb**.
- Select the last three elements from **row_1** using a list slicing syntax shortcut. Assign the output to a variable named **last_3_fb.**
- From **row_5**, select the list slice **['USD', 1126879]** using a list slicing syntax shortcut. Assign the output to a variable named **pandora_3_4**.

In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

# put your code here

### 2.5 List of Lists

Previously, we introduced lists as a better alternative to using one variable per data point. Instead of having a separate variable for each of the five data points **'Facebook', 0.0, 'USD', 2974676, 3.5**, we can bundle the data points together into a list, and then store the list in a single variable.

So far, we've been working with a data set having five rows, and we've been storing each row as a list in a separate variable (the variables **row_1**, **row_2**, **row_3**, **row_4**, and **row_5**). If we had a data set with 5,000 rows, however, we'd end up with 5,000 variables, which will make our code messy and almost impossible to work with.

To solve this problem, we can store our five variables in a single list:

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1tiUa26Zsu_GqxNxhKx_a7j9Z9vbC8q2n" /></left>

As we can see, **data_set** is a list that stores five other lists (**row_1**, **row_2**, **row_3**, **row_4**, and **row_5**). A list that contains other lists is called a list of lists.

The data_set variable is still a list, which means we can retrieve individual list elements and perform list slicing using the syntax we learned. Below, we:

- Retrieve the first list element (**row_1**) using **data_set[0]**
- Retrieve the last list element (**row_5**) using **data_set[-1]**
- Retrieve the first two list elements (**row_1** and **row_2**) by performing list slicing using **data_set[:2]**.

In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

data_set = [row_1,row_2,row_3,row_4,row_5]

In [0]:
print(data_set[0])

In [0]:
print(data_set[-1])

In [0]:
print(data_set[:2])

We'll often need to retrieve individual elements from a list that's part of a list of lists — for instance, we may want to retrieve the value 3.5 from **['Facebook', 0.0, 'USD', 2974676, 3.5]**, which is part of the data_set list of lists. Below, we extract 3.5 from data_set using what we've learned:

- We retrieve **row_1** using **data_set[0]**, and assign the result to a variable named **fb_row**.
- We print **fb_row**, which outputs **['Facebook', 0.0, 'USD', 2974676, 3.5]**.
- We retrieve the last element from **fb_row** using **fb_row[-1]** (since **fb_row** is a list), and assign the result to a variable named **fb_rating**.
- Print **fb_rating**, which outputs 3.5

In [0]:
fb_row = data_set[0]
print(fb_row)

In [0]:
fb_rating = fb_row[-1]
fb_rating

Above, we retrieved 3.5 in two steps: we first retrieved **data_set[0]**, and then we retrieved **fb_row[-1]**. However, there's an easier way to retrieve the same value of 3.5 by chaining the two indices ([0] and [-1]) — the code **data_set[0][-1]** retrieves 3.5:



In [0]:
data_set[0][-1]

Above, we've seen two ways of retrieving the value 3.5. Both ways lead to the same output (3.5), but the second way involves less typing because it elegantly combines the steps we see in the first case. While you can choose either option, people generally choose the second one.

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


In the cell below, we've already stored the five rows as lists in separate variables. Group together the five lists in a list of lists, and assign the resulting list of lists to a variable named **app_data_set**.
- Compute the average rating of an app by retrieving the right data points from the **app_data_set** list of lists.
  - The rating is the last element of each row. You'll need to sum up the ratings and then divide by the number of ratings.
  - Assign the result to a variable named **avg_rating.**

In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

# put your code here

### 2.6 Opening a File

The data set we've been working with so far is an extract from a much larger data set:

| |id | track_name | size_bytes | currency | price | prime_genre | sup_devices.num | ipadSc_urls.num | lang.num |  
|------|------------|-----------------------------------------------------------------|-----------|-------|-------------|-------------------|-----------------|----------|-----|
| 0 | 284882215 | Facebook | 389879808 | USD | 4+ | Social Networking | 37 | 1 | 1 |
| 1 | 389801252 | Instagram | 113954816 | USD | 12+ | Photo & Video | 37 | 0 | 1 |
| 2 | 529479190 | Clash of Clans | 116476928 | USD | 9+ | Games | 38 | 5 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7195 | 1097148221 | S ou SS | 4824064 | USD | 4+ | Education | 38 | 5 | 1 |
| 7196 | 977965019 | みんなのお弁当 by クックパッド ~お弁当をレシピ付きで記録・共有~ | 51174400 | USD | 4+ | Food & Drink | 37 | 0 | 1 |


Our best strategy so far was to type each data point and bundle them efficiently into a list of lists. The data set above, however, has 7,197 rows and 16 columns, which amounts to 115,152 (7,197  16) data points — typing all that would take us days. We'd also be bound to make typing errors, which will eventually lead us to working with wrong data and reach false conclusions. Fortunately, we can leverage Python to store this data set as a list of lists in a matter of seconds.

A data set is generally stored as a file in a computer — the data set above is stored as a file named AppleStore.csv. We start by opening the file using the **open()** command:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=188ezOR8PfNhz-zzyU2cGXwZyND0adZUl" /></left>


**open('AppleStore.csv')** returned the output **<_io.TextIOWrapper name='AppleStore.csv' mode='r' encoding='UTF-8'>**. The output is an object, which we'll learn more about in the next course. For now, all we have to keep in mind is that the **AppleStore.csv** file will open once **open('AppleStore.csv')** has finished running.

Once we've opened the file, we read it in using a command called **reader()**. We import the **reader()** command from the csv module using the code **from csv import reader** (a module is a collection of commands and variables).

<left><img width="300" src="https://drive.google.com/uc?export=view&id=15VnzXSk5Dopz93kjbY1segLK3-1-edo1" /></left>

Just like **open('AppleStore.csv')**, **reader(opened_file)** returned an object. Now that we've read the file, we can transform it into a list of lists using the **list()** command:

<left><img width="300" src="https://drive.google.com/uc?export=view&id=1T3Zqr2ibBsvhmFt81RwUCm-zuF6nlDYP" /></left>

The **apps_data** variable above is a list of lists, and it stores a data set of 7,197 rows and 16 columns. Below, we print only the first five rows of **apps_data** by using list slicing (and color each individual row differently to help you read the output easier):

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1s2n2U8Ojptf9R91ft0oytUVcz90XIgDa" /></left>


Although there are 7,197 rows (apps) in our data set, **len(apps_data)** indicates there are 7,198 rows because it also considers the header row, which describes the column names (the first row, colored above in orange).

<left><img width="350" src="https://drive.google.com/uc?export=view&id=1bbb6T469FJvK9TEAZ7Bj_8xDycxlV6dQ" /></left>

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Open the **AppleStore.csv** file and store it as list of lists.
  - Open the file using the **open()** command. Save the output to a variable named **opened_file**.
  - Read in the opened file using the **reader()** command (we've already imported **reader()** for you from the csv module). Save the output to a variable named **read_file**.
  - Transform the read-in file to a list of lists using the **list()** command. Save the list of lists to a variable named **apps_data**.
- Explore apps_data:
  - Print its length using the **len()** command.
  - Print the first row (the row describing column names).
  - Print the second and the third row (try to use list slicing here).

In [0]:
from csv import reader
# put your code here

### 2.7 Repetitive Processes

Previously, we were interested in computing the average rating of an app. This was a doable task when we were working with only five rows, but our data set now has 7,197 rows. Our best strategy was to:

- Retrieve each individual rating
- Sum up the ratings
- Divide by the number of ratings

In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

app_data_set = [row_1, row_2, row_3, row_4, row_5]

avg_rating = (app_data_set[0][-1] + app_data_set[1][-1] + app_data_set[2][-1]
              + app_data_set[3][-1] + app_data_set[4][-1])/5
avg_rating

Retrieving 7,197 ratings manually is impractical because it can take a long, long time. We need to find a way to retrieve all 7,197 ratings in a matter of seconds.

Looking at the code example above, we see that a process keeps repeating: we select the last list element for each list within **app_data_set**. The **app_data_set** stores five lists, so we repeat the same process five times. What if we could tell Python directly that we want to repeat this process for each list in **app_data_set**?

Fortunately, we can do that — Python offers us an easy way to repeat a process, which helps us enormously when we need to repeat a process hundreds, thousands, or even millions of times.

Let's say we have a list [3, 5, 1, 2] assigned to a variable ratings, and we want to repeat the following process: for each element in ratings, print that element. This is how we could translate that into Python syntax:

In [0]:
ratings = [3,5,1,2]
for item in ratings:
  print(item)

In our first example above, the process we wanted to repeat was "extract the last element for each list in **app_data_set**". This is how we can translate that process into Python syntax:

In [0]:
for row in app_data_set:
  print(row[-1])

Let's try to get a better understanding of what happens above. Python isolates, one at a time, each list element from **app_data_set**, and assigns it to each_list (which basically becomes a variable that stores a list):

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1bUONwnxiOXma72bynlWP9BULkjgEtQD3" /></left>

The code in the last diagram above is a much more simplified and abstracted version of the code below:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1aJlc_w2S4e8XiD94mrJAX7_OwrckqtbM" /></left>

Using the technique above requires us to write a line of code for every row in the data set. But using the for each_list in app_data_set technique requires us to write only two lines of code regardless of the number of rows in the data set — the data set can have five rows or one million.

Our intermediate goal is to use this new technique to compute the average rating for our five rows above, and our final goal is to compute the average rating for our data set with 7,197 rows. We'll do exactly that over the next screens of this mission, but for now, we'll focus on getting more practice with this technique to get a good grasp of it.


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Use the new technique we've learned to print all the rows in the **app_data_set** list of lists.
- Essentially, you'll need to translate this pattern into Python syntax: for each list in the **app_data_set** variable, print that list.
- Don't forget about indentation.

In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

app_data_set = [row_1, row_2, row_3, row_4, row_5]

# put your code here

### 2.8 For Loops

The technique we've just learned is called a **loop**. Because we always start with for (like in for some_variable in some_list:), this technique is more often known as a **for loop**.

These are the structural parts of a for loop:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1bpVBJ9TPYq6yyaa4aD626_kyR1ruRukZ" /></left>

The indented code in the **body** gets executed the same number of times as elements in the **iterable variable**. If the iterable variable is a list that has three elements, the indented code in the body gets executed three times. We call each code execution an **iteration**, so there'll be three iterations for a list that has three elements. For each iteration, the **iteration variable** will take a different value, following this pattern:

- For the first iteration, the value is the first element of the iterable (if the iterable is the list [1, 3, 5], then the value will be 1).
- For the second iteration, the value is the second element of the iterable (if the iterable is the list [1, 3, 5], then the value will be 3).
- For the third iteration, the value is the third element of the iterable (if the iterable is the list [1, 3, 5], then the value will be 5).

<left><img width="300" src="https://drive.google.com/uc?export=view&id=1rv8aFSPxGg0c7IcwnVYMPp_w9FR3AYlU" /></left>

The code outside the loop body can interact with the code inside the loop body. For instance, in the code below we:

- Initialize a variable **a_sum** with a value of zero outside the loop body.
- We **loop** (or **iterate**) over **a_list**. For every iteration of the loop, we:
- Perform an addition (inside the loop body) between the current value of the iteration variable **value** and the current value stored in **a_sum** (**a_sum** was defined outside the loop body).
- Assign the result of the addition back to **a_sum** (inside the loop body).
- Print the value of the **a_sum** variable (inside the loop body). Notice that the value of **a_sum** changes after each addition. At the end of the loop, **a_sum** has the value 9, which is equivalent to the sum of the numbers in **a_list** (1 + 3 + 5).

<left><img width="200" src="https://drive.google.com/uc?export=view&id=1DN_e7AcSDI8Kw-MhBI8XopLZu_VUkP9N" /></left>

Above, we created a way to sum up the numbers in a list. We can use this technique to sum up the ratings in our data sets. Once we have the sum, we only need to divide by the number of ratings to get the average value. Let's begin with computing the average rating value for the data set with five rows.



**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Compute the average app rating for the apps stored in the **app_data_set** variable.
  - Initialize a variable named **rating_sum** with a value of zero outside the loop body.
  - Loop (iterate) over the **app_data_set** list of lists. For each of the five iterations of the loop (**for** each row **in app_data_set**):
    - Extract the rating of the app and store it to a variable named rating. The **rating** is the last element of each row.
    - Add the value stored in **rating** to the current value of the **rating_sum.**
  - Outside the loop body, divide the rating sum (stored in **rating_sum**) by the number of ratings to get an average value. Store the result in a variable named **avg_rating**.

In [0]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

app_data_set = [row_1, row_2, row_3, row_4, row_5]

# put your code here

### 2.9 The Average App Rating

Now we move on to computing the average rating for the data set that has 7,197 rows. Remember we first need to open the file **AppleStore.csv** and transform it into a list of lists:


<left><img width="500" src="https://drive.google.com/uc?export=view&id=134WN69JTP5D3VB7YBho3xnnhcxmlmc5B" /></left>

If we use the technique we learned and loop over **apps_data** to get the rating sum, we'll get a **TypeError**:

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1ZZAjZJuVZ8hoJfGy3Fhfdhjh-ksNjsHj" /></left>

This error happens because the first row of **apps_data** doesn't contain numbers (it describes column names). In the loop body, we assign the value of row[7] to the rating variable, and then we add rating to **rating_sum**. But for the first iteration of the loop, row[7] takes the string value **'user_rating'** (which is a column name). This means that running **rating_sum + rating** is equivalent to **0 + 'user_rating'**, which causes a **TypeError** because strings and integers cannot be added together.

Theoretically, we'd have two solutions:

- We remove the first row from **apps_data**, and then we start over the iteration. We do that by:
  - Saving the header row to a separate variable named header
  - Saving **apps_data[1:]** back to **apps_data** — **apps_data[1:]** is a list slice that excludes the first row (the header row)
- We iterate directly over **apps_data[1:]**, which is a list slice that excludes the first row.

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1WUqfOF1ma1gJ16MU-Srke4uQnbAUVVKx" /></left>

For some reason, we got the same error. Upon inspecting some of the rows in **apps_data**, we see that all the values are surrounded by quotation marks, which suggests they are strings. Once again, the error is caused by trying to add a string to an integer.

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1qzc0N0-sssb8HGmTYEgOeqRx6jir5Ytx" /></left>

In the previous mission, we learned to convert strings to integers or floats (decimal numbers) using the **int()** and **float()** commands. The ratings are expressed as decimal points, so we'll convert them to floats using the **float()** command.

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1aTGiZRo35dWWO3E4VkK5Vcx6RThk9mLg" /></left>

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Compute the average app rating for all the 7,197 apps stored in the data set.
  - Initialize a variable named **rating_sum** with a value of zero.
  - Loop through the **apps_data[1:]** list of lists (make sure you don't include the header row). For each of the 7,197 iterations of the loop (**for** each row **in** **apps_data[1:]**):
    - Extract the rating of the app and store it to a variable named **rating** (the rating has the index number 7). Make sure you convert the rating value from a string to a float using the **float()** command.
    - Add the value stored in **rating** to the current value of the **rating_sum.**
  - Divide the rating sum (stored in **rating_sum**) by the number of ratings to get an average value, and store the result in a variable named **avg_rating**.

In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

# put your code here

### 2.10 Alternative Way to Compute an Average

Now we'll learn an alternative way to compute the average rating value. Once we create a list, we can add (or **append**) values to it using the **append()** command.

<left><img width="150" src="https://drive.google.com/uc?export=view&id=1b3vUhGc-osOngNaR5_Z2wVO295tz5AZi" /></left>

Unlike other commands we've learned, notice that **append()** has a special syntactical usage, following the pattern **list_name.append()** rather than being simply used as **append()** (we'll get a better understanding of this syntactical quirk once we learn about functions and methods).

Now that we know how to append values to a list, we can take the steps below to compute the average app rating:

1. We initialize an empty list
2. We start looping over our data set and extract the ratings
3. We append the ratings to the empty list we created at step one
4. Once we have all the ratings, we:
  - use the **sum()** command to sum up all the ratings (to be able to use **sum()**, we'll need to store the ratings as floats or integers); and then
  - we divide the sum by the number of ratings (which we can get using the **len()** command)
  
Below, we can see the steps above implemented for our data set with five rows:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1rf4A8XBbxXMTdtD3c9P5X9vyxfy_-nYB" /></left>

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Using the new technique we've learned, compute the average app rating for all of the 7,197 apps stored in our data set.
  - Initialize an empty list named **all_ratings.**
  - Loop through the **apps_data[1:]** list of lists (make sure you don't include the header row). For each of the 7,197 iterations of the loop:
    - Extract the rating of the app and store it to a variable named **rating** (the rating has the index number 7). Make sure you convert the rating value from a string to a float.
    - Append the value stored in **rating** to the list **all_ratings**.
  - Compute the sum of all ratings using the **sum()** command.
  - Divide the sum of all ratings by the number of ratings, and assign the result to a variable named **avg_rating**.
  
  

In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

# put your code here

## 3 Conditional Statements

### 3.1 If Statements

In the last section we worked with a data set that stores information for 7,197 mobile apps.

We used lists and for loops to compute the average rating for all of the 7,197 mobile apps. The data set offers a lot of interesting information, and we might want to answer more granular questions with respect to the average rating:

- What's the average rating of non-free apps?
- What's the average rating of free apps?

In the previous section, we learned to compute the average value for any list of numbers. However, to answer the two questions above, we first need to find a way to separate free apps from non-free apps because they are all mixed together in our data set. More specifically, we could:

- Isolate the ratings for free and non-free apps in separate lists
- Compute the average rating for each list

Let's start by isolating the ratings for the free apps. First, let's do a quick recap of how we used the **list_name.append()** command to extract the ratings into a separate list. In the code below, we:

- Start by transforming the **AppleStore.csv** file into a list of lists, and assign that list of lists to a variable named **apps_data**
- Create an empty list named **ratings**
Iterate over **apps_data[1:]** (which excludes the header row), and for each iteration (for each row), we:
  - Extract the rating and convert it to a float using **float(row[7])** — the rating has the index number 7, and it also comes as a string, so we need to convert it to a float
  - We assign the rating to a variable named **rating**
  - We append **rating** to the **ratings** list we created outside the loop using **ratings.append(rating)** command
  
<left><img width="400" src="https://drive.google.com/uc?export=view&id=1MQsmBmz-2ZxfiN8_RoD1SlsL41oDZ5pM" /></left>

The problem with our approach above is that it includes all the ratings, for both **free and non-free apps**. To isolate only the ratings of the free apps, we need to add a **condition** in our code above. Specifically, we want to add a rating to the ratings list only if the price is equal to 0.0:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1KzJwqQDMq3UT0JmuxK8c7YjC5ft5PwqO" /></left>



  **Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Complete the code in the editor to find the average rating for free apps.
  - Inside the for loop:
    - Assign the price of an app as a float to a variable named **price**. The price is the fifth element in each row (don't forget that the index starts at 0).
    - If **price == 0.0**, append the value stored in rating to the **free_apps_ratings** list using the **list_name.append()** command (note the **free_apps_ratings** is already defined in the code editor). Be careful with indentation.
  - Outside the for loop body, compute the average rating of free apps, and assign the result to a variable named **avg_rating_free**. The ratings are stored in the **free_apps_ratings** list.



In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

free_apps_ratings = []
for row in apps_data[1:]:
    rating = float(row[7])
    # Complete the code from here

### 3.2 Booleans



In the last exercise, we used **if price == 0.0** to check whether **price** is equal to 0.0. When we use the **==** operator to determine whether two values are equal or not, the output returned will always be **True** or **False**. Although they may look like strings, **True** and **False** belong to a different data type (so they are by no means strings).



In [0]:
price = 0

print(price == 0)
print(price == 4)
print(type(True))

**True** and **False** are often called **Boolean** values or **Booleans** — we can see in the code example above that their data type is bool ("bool" is an abbreviation for "Boolean").

Boolean values (**True** and **False**) are necessary parts of any if statement. if must always be followed by:

- A Boolean value; or
- An expression that evaluates to a Boolean value

<left><img width="200" src="https://drive.google.com/uc?export=view&id=1HTGkwKLGArk1d9H-7mYujkooPtx4GyU8" /></left>



**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


In the cell below, we've already initialized the variable **a_price** with a value of 0. Transcribe the following sentences into code by making use of if statements:
- If **a_price** is equal to 0, then print the string **'This is free'** (remember to use the == operator for equality).
- If **a_price** is equal to 1, then print the string **'This is not free'**.

In [0]:
a_price = 0

# put your code here

### 3.3 The Average Rating of Non-free Apps

In the diagram below, we created a list of lists named **app_and_price**, and we want to extract the names of the free apps in a separate list. To do that, we:

- Create an empty list named **free_apps**
- Iterate over **app_and_price**, and for each iteration, we:
  - Extract the name of the app and assign it to a variable named **name**.
  - Extract the price of the app and assign it to a variable named **price**.
  - Append the name of the app to **free_apps** (the empty list that we initialized outside the loop) if the price of the app is equal to 0.
  

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1i2x1X58Urhnz-L6crgs3cJuF0LOZVPUe" /></left>


When we isolated the free apps, we used the condition "if the price is equal to 0.0" (if price == 0.0). To isolate the non-free apps, we need to change the condition to "if the price is not equal to 0.0". For "is equal to", we learned that we can use the operator ==. For "is not equal to", we'll need to use the != operator.

Below, we see an example of how the != operator is used:

<left><img width="300" src="https://drive.google.com/uc?export=view&id=1xKGPB60dq7QHGpntelwBw-BsNpbR-rkh" /></left>

Let's also consider an example where we use a variable (price, in the example below) with the != operator:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1WpCvanXA3AuSKOCSTC59dE94LV-ipByh" /></left>



**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Modify the existing code in the cell below on the right to compute the average rating of non-free apps.
  - Change the name of the empty list from **free_apps_ratings** to **non_free_apps_ratings** (the list we defined before the for loop).
  - Change the condition if price == 0.0 to account for the fact that we now want to isolate only the ratings of **non-free apps.**
  - Change **free_apps_ratings.append(rating)** to make sure the ratings are appended to the new list **non_free_apps_ratings**.
  - Compute the average value by summing up the values in ** non_free_apps_ratings** and dividing by the length of this list, and assign the result to **avg_rating_non_free**.
- Optional exercise: Inspect the value of **avg_rating_non_free** and compare the average with that of free apps (the average rating of free apps is approximately 3.38). Can we use the average values to say that free apps are better than non-free apps, or vice versa?


In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

free_apps_ratings = []
for row in apps_data[1:]:
    rating = float(row[7])
    price = float(row[4])   
    if price == 0.0:
        free_apps_ratings.append(rating)
    
avg_rating_free = sum(free_apps_ratings) / len(free_apps_ratings)



### 3.4 The Average Rating of Gaming Apps

So far, we've used the == and != operators only with integers and floats. But we can use them with other data types as well, such as strings or lists:

<left><img width="550" src="https://drive.google.com/uc?export=view&id=1zIibfa5_mHt8I6r__k5W1ch2biyD--09" /></left>

This enables us to answer more nuanced questions about our data set, like:

- What's the average rating of gaming apps?
- What's the average rating of non-gaming apps?

Note that the **prime_genre** column describes the **app genre**, and the genre of gaming apps is encoded as **'Games'**:

| |id | track_name | size_bytes | price | user_rating_ver | ver | cont_rating | prime_genre | 
|----|------------|-------------------------|-------|-----------------|-----|-------------|-------------|-------------------|
| 0 | 284882215 | Facebook | USD | 3.5 | 3.5 | 95.0 | 4+ | Social Networking |
| 1 | 389801252 | Instagram | USD | 4.5 | 4.0 | 10.23 | 12+ | Photo & Video |
| 2 | 529479190 | Clash of Clans | USD | 4.5 | 4.5 | 9.24.12 | 9+ | Games |
| 3 | 420009108 | Temple Run | USD | 4.5 | 4.0 | 1.6.2 | 9+ | Games |
| 4 | 284035177 | Pandora - Music & Radio | USD | 4.0 | 4.5 | 8.4.1 | 12+ | Music |


To compute the average rating of gaming apps, we can use the same approach as we took in the previous screen when we computed the average rating of free and non-free apps. In the code example below, we:

- Initialize an empty list named **games_ratings**.
- Loop through **apps_data[1:]**, where **apps_data** is a list of lists that stores our data set. For each iteration, we:
  - Assign the rating as a float to a variable named **rating**.
  - Assign the genre to a variable named **genre**. The genre will be saved as a string.
  - Append the rating value stored in **rating** to the list **games_ratings** if the value in **genre** is equal to the string **'Games'**.
  - Compute the average rating of gaming apps, and assign the result to **avg_rating_games**.
  - Print **avg_rating_games**.

<left><img width="500" src="https://drive.google.com/uc?export=view&id=1g4qmdlRzVqQJeL-k_nxr0H2ZANBbWLSh" /></left>

Now let's use compute the average rating of non-gaming apps.


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Following the same techniques we used in the diagram above, compute the average rating of non-gaming apps.
  - Initialize an empty list named **non_games_ratings**.
  - Loop through the **apps_data** list of lists (make sure you don't include the header row). For each iteration of the loop:
    - Assign the rating of the app as a float to a variable named **rating** (the index number of the rating column is 7).
    - Assign the genre of the app to a variable named **genre** (index number 11).
    - If the **genre** is not equal to 'Games', then append the rating to the **non_games_ratings list**.
  - Compute the average rating of non-gaming apps, and assign the result to a variable named **avg_rating_non_games**.
- Optional exercise: Compare the average rating of gaming apps (3.69) with that of non-gaming apps. Why do you think we see this difference?

In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

# put your code here

### 3.5 Multiple Conditions

So far, we've only worked with single conditions, like:

- If price equals 0.0 (if price == 0)
- If genre equals "Games" (if genre == 'Games')

Single conditions won't allow us to answer more granular questions, like:

- What's the average rating of free gaming apps?
- What's the average rating of non-free gaming apps?
- What's the average rating of free non-gaming apps?
- What's the average rating of non-free non-gaming apps?

Fortunately, we can combine two or more conditions together into a single **if** statement using the **and** keyword. In the two diagrams below, we see we can use **and** to check at the same time whether an app is both free and has a gaming genre.

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1RuCg4QallW0ww2ckv9-pJM18TrcFwLvr" /></left>

Notice above that code like **app1_price == 0** and **app1_genre == 'Games'** outputs a single Boolean value.

<left><img width="350" src="https://drive.google.com/uc?export=view&id=1nVFRcHv8OwROTiPxoEamcMGAjNGxb-Ra" /></left>


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Complete the code in the cell below to compute the average rating of free gaming apps.
  - Inside the for loop, append the rating to the **free_games_ratings** list if the price is equal to 0.0 and the genre is equal to **'Games'**.
  - Outside the for loop, compute the average rating of free gaming apps, and assign the result to a variable named **avg_rating_free_games**.

In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

free_games_ratings = []
for row in apps_data[1:]:
    rating = float(row[7])
    price = float(row[4])
    genre = row[11]
    # Complete code from here

### 3.6 The or Operator

If we look at the first five apps, we can see in the **prime_genre** column that Facebook's genre is **"Social Networking"**, while Clash of Clans' and Temple Run's is **"Games"**:

| |id | track_name | size_bytes | price | user_rating_ver | ver | cont_rating | prime_genre | 
|----|------------|-------------------------|-------|-----------------|-----|-------------|-------------|-------------------|
| 0 | 284882215 | Facebook | USD | 3.5 | 3.5 | 95.0 | 4+ | Social Networking |
| 1 | 389801252 | Instagram | USD | 4.5 | 4.0 | 10.23 | 12+ | Photo & Video |
| 2 | 529479190 | Clash of Clans | USD | 4.5 | 4.5 | 9.24.12 | 9+ | Games |
| 3 | 420009108 | Temple Run | USD | 4.5 | 4.0 | 1.6.2 | 9+ | Games |
| 4 | 284035177 | Pandora - Music & Radio | USD | 4.0 | 4.5 | 8.4.1 | 12+ | Music |


Social networking apps and games are usually popular and addictive, and we might want to further investigate this category. One thing we might want to find out is the average rating of this category that encompasses both games and social networking apps.

To do that, we first need to isolate the ratings of all the apps whose genre is either "Social Networking" or "Games" into a separate list. Then, we can compute the average value using techniques we already know.

If we wanted to isolate the ratings of these apps using the condition **if genre == 'Social Networking' and genre == 'Games'**, we'd end up with an empty list because there's no app whose genre is both "Social Networking" and "Games" — an app can have only a single genre.

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1Pc9oFntLg2e3celGJyHLKyYQSs_9EMkX" /></left>

We need to isolate the rating of an app only if the genre is "Social Networking" **or** "Games", not "Social Networking" **and** "Games". To account for this difference, we can use **or** instead of **and**:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1vtet4segJ_2BGe16N2O5TN0kdq2fuIbC" /></left>


**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Complete the code below to compute the average rating of the apps whose genre is either "Social Networking" or "Games".
  - Inside the for loop, append the rating to the **games_social_ratings** list if the genre is either 'Social Networking' **or** 'Games'.
  - Outside the for loop, compute the average rating of the apps whose genre is either "Social Networking" **or** "Games", and assign the result to a variable named **avg_games_social**.

In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

games_social_ratings = []
for row in apps_data[1:]:
    rating = float(row[7])
    genre = row[11]
    # Complete code from here

### 3.7 Combining Logical Operators

In the previous exercise, we computed the average rating of the apps whose genre is either "Social Networking" or "Games". We can ask even more specific questions, like:

- What is the average rating of free apps whose genre is either "Social Networking" or "Games"?
- What is the average rating of non-free apps whose genre is either "Social Networking" or "Games"?

To answer the first question, we need to isolate the apps that:

- Are in either the "Social Networking" **or** "Games" genre 
- **And** have a price of 0.0

To isolate these apps, we can combine or with and in a single if statement:

<left><img width="600" src="https://drive.google.com/uc?export=view&id=13K-sPZJBJmcyj-NMPpf8cKvDr14Z-WQA" /></left>

Notice that we enclosed the **genre == 'Social Networking' or genre == 'Games'** part within parentheses. This helps Python understand the specific logic we want for our if statement.

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Compute the average rating of non-free apps whose genre is either "Social Networking" **or** "Games".
  - Assign the result to a variable named **avg_non_free**.
  - We'll try to solve this exercise without any guidance. You may feel a bit stumped at first, but we've practiced the steps needed to solve this kind of exercise several times. Essentially, the code is almost identical to what we used to extract the ratings for free gaming or social networking apps.

In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

free_games_social_ratings = []
for row in apps_data[1:]:
    rating = float(row[7])
    genre = row[11]
    price = float(row[4])
    
    if (genre == 'Social Networking' or genre == 'Games') and price == 0:
        free_games_social_ratings.append(rating)
        
avg_free = sum(free_games_social_ratings) / len(free_games_social_ratings)

# Non-free apps (average)

### 3.8 Comparison Operators

Previously, we used the **==** and **!=** operators to check whether two values are equal or not. When we check for equality, we compare one value to another to be able to determine whether they are equal or not. For this reason, we call == and != **comparison operators**.

We can compare value **A** to value **B** to determine whether:

- **A** is equal to **B** and vice versa (**B** is equal to **A**).
- **A** is not equal to **B** and vice versa.
- **A** is greater than **B** or vice versa.
- **A** is greater than or equal to **B** or vice versa.
- **A** is less than **B** or vice versa.
- **A** is less than or equal to **B** or vice versa.

In Python, we have special operators for each of the comparison operations above:

<left><img width="350" src="https://drive.google.com/uc?export=view&id=1Q-FkkvndEHlL3-Rt45g3Yp7l3Wv-Unw-" /></left>

Just like with **==** and **!=**, comparing values using any of the comparison operators above will output a single Boolean value:

<left><img width="150" src="https://drive.google.com/uc?export=view&id=1hqGsu83lUx5rERTyAjNfXvP1hzAY6nWv" /></left>


Now let's answer the other three questions:

- What is the average rating of the apps that have a price greater than USD 9?
- How many apps have a price greater than USD 9?
- How many apps have a price smaller than or equal to USD 9?

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Compute the average rating of the apps that have a price greater than USD 9.
  - Using a for loop, isolate the ratings of all the apps that have a price greater than USD 9. When you iterate over **apps_data**, make sure you don't include the header row.
  - Find the average value of these ratings and assign the result to a variable named **avg_rating**.
- Find out how many apps have a price greater than USD 9 and assign the result to a variable named **n_apps_more_9**. You can use the list of ratings from the previous question to find the answer.
- Find out how many apps have a price less than or equal to USD 9 and assign the result to a variable named **n_apps_less_9**. The list of ratings from the first question can help you find a quick answer.


In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

# put your code here

### 3.9 The else Clause

Let's say we need to use information from the price column to label each app as **"free"** or **"non-free"**. If the price is equal to 0.0, we want to label the app **"free"**. Otherwise, we want to label it **"non-free"**.

Remember that we store our data set as a list of lists — each row is represented as a list that describes an app. To label the app, we want to add the string **'free'** or **'non-free'** at the end of each list (row). On a smaller scale, this is what we want to do (below, we're using just a small extract from our data set, showing just the name and the price of four apps):

<left><img width="450" src="https://drive.google.com/uc?export=view&id=1Lod9dSmOZ-Y6IIT2fOiPTgvAP7_u4krQ" /></left>

The code within the body of an **else** clause is executed only if the **if statement** that precedes it resolves to **False**.

**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>


- Complete the code in the cell below to label each app as **"free"** or **"non-free"** depending on its price.
  - Inside the for loop:
    - **If** the price of the app is 0.0, then label the app as **"free"** by appending the string **'free'** to the current iteration variable.
    - **Else**, label the app **"non-free"** by appending the string **'non-free'** to the current iteration variable. Make sure you don't write 'non_free' instead of 'non-free'.
  - By adding labels to the end of each row, we basically created a new column. Name this column "free_or_not" by appending the string 'free_or_not' to the first row of the apps_data data set. Make sure this is done outside the for loop.
- Print the header row and the first five rows to see some of the changes we made.





In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

for app in apps_data[1:]:
    price = float(app[4])
    # Complete code from here

### 3.10 The elif Clause

Let's say we need to do a more granular labeling rather than just using "free" and "non-free". We want to label the apps using this convention:

| price | label |
|---------|----------------|
| 0 | free |
| < 20 | affordable |
| 20 - 50 | expensive |
| > 50 | very expensive |

Using what we know, we can only do the transformations above using a combination of if statements. This is what the process looks like on the small data set below:

<left><img width="400" src="https://drive.google.com/uc?export=view&id=1531tap0l_ReMjGYhjogSljLcB5OEUipA" /></left>


When an app is free, price == 0.0 evaluates to **True** and **app.append('free')** is executed. But then the computer continues to do redundant operations — it checks whether:

- price > 0 and price < 20
- price >= 20 and price < 50
- price >= 50

We already know the three conditions above will evaluate to **False** once we find out that price == 0.0 is **True**. To stop the computer from doing redundant operations, we can use **elif** clauses:


<left><img width="400" src="https://drive.google.com/uc?export=view&id=1WtizoRi0CjIzG9H3ZsO1BsGMVYHP-ULz" /></left>

The code within the body of an elif clause is executed only if:

- The preceding if statement (or the other preceding elif clauses) resolves to False; and
- The condition specified after the elif keyword evaluates to True.



**Exercise**
<left><img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ" /></left>

- Complete the code in the cell below to label each app as **"free"**, **"affordable"**, **"expensive"**, or **"very expensive"**. Inside the loop:
  - If the price of the app is 0, label the app as "free" by appending the string 'free' to the current iteration variable.
  - If the price of the app is greater than 0 and less than 20, label the app as "affordable". For efficiency purposes, use an elif clause.
  - If the app is greater or equal to 20 and less than 50, label the app as "expensive". For efficiency purposes, use an elif clause.
  - If the app is greater or equal to 50, label the app as "very expensive". For efficiency purposes, use an elif clause.
- Name the newly created column "price_label" by appending the string 'price_label' to the first row of the apps_data data set.
- Inspect the header row and the first five rows to see some of the changes you made.

In [0]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)

for app in apps_data[1:]:
    price = float(app[4])
    # Complete code from here