# What we've learned so far, we might think we could store each data point in a variable 

Creating a variable for each data point in our data set would be a cumbersome process. Fortunately, we can store data more efficiently using lists.

To create a list of data points, we only need to:

Separate the data points with a comma.

Surround the sequence of data points with brackets.

We stored it in the computer's memory by assigning it to a variable named let say "row_1"

In [1]:
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]

A list can contain both mixed and identical data types (so far we've learned four data types: integers, floats, strings, and lists).

# Each element (data point) in a list has a specific number associated with it, called an index number. The indexing always starts at 0

In [4]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
ratings_1 = row_1[3]
ratings_2 = row_2[3]
ratings_3 = row_3[3]

total = ratings_1 + ratings_2 + ratings_3
average = total / 3

In Python, we have two indexing systems for lists:

`Positive indexing`: the first element has the index number 0, the second element has the index number 1, and so on.
    
`Negative indexing`: the last element has the index number -1, the second to last element has the index number -2, and so on.


In practice, we almost always use positive indexing to retrieve list elements. Negative indexing is useful when we want to select the last element of a list — especially if the list is long, and we can't tell the length by counting.

In [5]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
rating_1 = row_1[-1]
rating_2 = row_2[-1]
rating_3 = row_3[-1]

total_rating = rating_1 + rating_2 + rating_3
average_rating = total_rating / 3

In [6]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]
fb_rating_data = [row_1[0], row_1[3], row_1[-1]]
insta_rating_data = [row_2[0], row_2[3], row_2[4]]
pandora_rating_data = [row_5[0], row_5[3], row_5[4]]


avg_rating = (fb_rating_data[2] + insta_rating_data[2] + pandora_rating_data[2]) / 3

# The process of selecting a part of a list is called list slicing.

In [7]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]
first_4_fb = row_1[:4]
last_3_fb = row_1[-3:]
pandora_3_4 = row_5[2:4]

print(first_4_fb)
print(last_3_fb)
print(pandora_3_4)

['Facebook', 0.0, 'USD', 2974676]
['USD', 2974676, 3.5]
['USD', 1126879]


So far, we've been working with a data set having five rows, and we've been storing each row as a list in a separate variable (the variables `row_1`, `row_2`, `row_3`, `row_4`, and `row_5`). If we had a data set with 5,000 rows, however, we'd end up with 5,000 variables, which will make our code messy and almost impossible to work with.

To solve this problem, we can store our five variables in a single list:

# A list that contains other lists is called a list of lists.

In [8]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]
app_data_set = [row_1, row_2, row_3, row_4, row_5]
avg_rating = (app_data_set[0][-1] + app_data_set[1][-1] +
              app_data_set[2][-1] + app_data_set[3][-1] +
              app_data_set[4][-1]) / 5

https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps

The data set we will be working with is an extract from a much larger data set:

In [2]:
from csv import reader
opened_file = open('AppleStore.csv', encoding= 'UTF-8')
print(opened_file)

<_io.TextIOWrapper name='AppleStore.csv' mode='r' encoding='UTF-8'>


In [3]:
read_file = reader(opened_file)
apps_data = list(read_file)


In [25]:
print(len(apps_data))
print(apps_data[0])
print(apps_data[1:3])

7198
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']]


# For Loops

In [1]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

app_data_set = [row_1, row_2, row_3, row_4, row_5]
for each_list in app_data_set:
    print(each_list)

['Facebook', 0.0, 'USD', 2974676, 3.5]
['Instagram', 0.0, 'USD', 2161558, 4.5]
['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
['Temple Run', 0.0, 'USD', 1724546, 4.5]
['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]


In [2]:
row_1 = ['Facebook', 0.0, 'USD', 2974676, 3.5]
row_2 = ['Instagram', 0.0, 'USD', 2161558, 4.5]
row_3 = ['Clash of Clans', 0.0, 'USD', 2130805, 4.5]
row_4 = ['Temple Run', 0.0, 'USD', 1724546, 4.5]
row_5 = ['Pandora - Music & Radio', 0.0, 'USD', 1126879, 4.0]

app_data_set = [row_1, row_2, row_3, row_4, row_5]
rating_sum = 0
for row in app_data_set:
    rating = row[-1]
    rating_sum = rating_sum + rating
    print(rating_sum)
    
avg_rating = rating_sum / len(app_data_set)

3.5
8.0
12.5
17.0
21.0


In [1]:
import csv
with open('AppleStore.csv', encoding= 'UTF-8') as  opened_file:
    read_file = csv.reader(opened_file)
    apps_data = list(read_file)

rating_sum = 0
for row in apps_data[1:]:
    rating = float(row[7])
    rating_sum += rating
avg_rating = rating_sum/len(apps_data[1:])
avg_rating

3.526955675976101

In [2]:
# Alternate Method
all_ratings = []
for row in apps_data[1:]:
    rating = float(row[7])
    all_ratings.append(rating)
avg_rating = sum(all_ratings)/len(all_ratings)
avg_rating

3.526955675976101

In [4]:
opened_file.close()