<a href="https://colab.research.google.com/github/sleevetug/Eloise/blob/main/Copy_of_Working_with_Data_Lists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lists

Often we need to store a number of single items of data together so that they can be processed together. This might be because all the data refers to one person (e.g. name, age, gender, etc) OR it might be because we have a set of data (e.g. all the items that should be displayed in a drop down list, such as all the years from this year back to 100 years ago so that someone can select their year of birth)

Python has a range of data structures available including:
*   lists  
*   tuples  
*   dictionaries  
*   sets

This worksheet looks at lists.

## List
A list is a set of related, individual data objects, that are indexed and can be processed as a whole, as subsets or as individual items.  Lists are stored, essentially, as contiguous items in memory so that access can be as quick as possible.  However, they are mutable (they can be changed after they have been created and stored) and so they need to have extra functionality to deal with changing list sizes.

# Let's get some lists of data
For this worksheet we are going to work with data on STEAM games.  We are going to get the data from a spreadsheet and make lists that we can find things out from.



# Creating a list
```
nums = [1, 2, 3, 4, 5]
names = ["Tom","Jerry","Spike"]
```

# Printing a list

```
print(nums)
[1,2,3,4,5]
```

```
print(names)
["Tom","Jerry","Spike"]
```

In [None]:
# create the lists, and print them
nums = [1,2,3,4,5]
names = ["Tom", "Jerry", "Spike"]
print(nums)
print(names)

[1, 2, 3, 4, 5]
['Tom', 'Jerry', 'Spike']


# Print individual items in the list

We can access any item in a list by its position (index).  Lists are indexed from 0.

To print the first item in a list, use listname[0], to print the last item use listname[-1].

```
# This is formatted as code
```



```
print(nums[0])
1
```
```
print(names[1])
Jerry
```

```
print(nums[-1])
5
```


In [None]:
# have a go at printing different items from the lists
print(nums[0])
print(names[1])
print(nums[-1])

1
Jerry
5


# Print a subset of a list

listname[start_index : end_index+1]  
If start_index is the first item, or end_index is the end of the list, they can be left out

```
print(nums[:3])
[1,2,3]
```

```
print(names[1:])
["Jerry","Spike"]
```

```
print(nums[1:3])
[2,3]
```

In [None]:
# have a go at printing subsets of the lists
print(nums[:3])
print(names[1:])
print(nums[1:3])

[1, 2, 3]
['Jerry', 'Spike']
[2, 3]


# List length

Use the len() function to get the number of items in a list.

There are 5 items in the nums list and 3 in the names list.

Write a function that will:
* print the length of the nums list
* print the length of the names list
* concatenate (add) the two lists together to make a new list called num_names
* print the length of the new list

Expected output:
```
The length of the nums list is: 5
The length of the names list is: 3
The length of the joined list is: 8
```


In [None]:
def print_list_info():
  print("The length of the nums list is:", len(nums))
  print("The length of the names list is:", len(names))
  nums_names = nums + names
  print("The length of the joined list is:", len(nums_names))

print_list_info()

The length of the nums list is: 5
The length of the names list is: 3
The length of the joined list is: 8


# List methods

You can get an overview of the methods you can use here: https://www.w3schools.com/python/python_lists_methods.asp

Then: 
1.  Create the nums and names list again 
2.  Append the number 6 to the nums list, and print
3.  Insert the name "Sylvester" before "Jerry" in the names list and print
4.  Print the length of the nums list
5.  Remove the number 4 from the nums list, and print
6.  Print the max and min of the nums list
7.  Create a new list called new_nums which contains the numbers 40 to 50 (use the range function)

Expected output:  


In [None]:
nums = [1,2,3,4,5]
names = ["Tom", "Jerry","Spike"]
nums.append(6)
print(nums)
names.insert(1, "Sylvester")
print(names)
print(len(nums))
nums.remove(4)
print(nums)
print(max(nums), min(nums))
new_nums = range(40,51)
print(new_nums)


[1, 2, 3, 4, 5, 6]
['Tom', 'Sylvester', 'Jerry', 'Spike']
6
[1, 2, 3, 5, 6]
6 1
range(40, 51)


# Now some real data
---

1.  Open the STEAM csv file (which we have taken from Kaggle and have reduced to make it more manageable): https://drive.google.com/file/d/1amPnoBi3uhQXjFaQbUy-L-Y-eeJ1BcxE/view?usp=sharing  

2.  Open the file with Google sheets to see what is in it.  The file contains rows of data, each with a user id and a game that the user has purchased.

3.  NOW, run the code in the cell below to get:  
- users (the list of user ids in the data)
- titles (the list of titles that have been purchased)

In [9]:
import pandas as pd

# open the data file and get a copy of the Titles column
def get_users_and_titles():
  url = "https://drive.google.com/uc?id=1rkG8-cp-KLBc1zK4YMLHIsMMyyTVk5Ju"
  data_table = pd.read_csv(url)
  return data_table["User"].tolist(), data_table["Title"].tolist() 

users, titles = get_users_and_titles()

---
### Exercise 1 - list head, tail and length of the titles list
---

Write a function, **describe_list()** which will:
*  print the length of the list `titles`
*  print the first 10 items in `titles` (the head)  
*  print the last 5 items in `titles` (the tail)

Expected output:  
```
129511
['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2']
['Fallen Earth', 'Magic Duels', 'Titan Souls', 'Grand Theft Auto Vice City', 'RUSH']
```

In [11]:
def describe_list():

    print(len(titles)) #printing length of entire list
    print(titles[:10]) #printing first 10 of list
    print(titles[-5:]) #printing last 5 of list

describe_list()

129511
['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2']
['Fallen Earth', 'Magic Duels', 'Titan Souls', 'Grand Theft Auto Vice City', 'RUSH']


---
### Exercise 2 - use a loop to print the first 20 items

Write a function which will:
*  create a new list from the first 20 items of the titles list
*  loop through the new list and print each item


In [12]:
def print_list():

    new_list = [titles[:20]] #new list with first 20 items created
    for item in new_list: #looping through the titles list and putting first 20 in new list.
      print(item) #printing all the items in the list.

print_list()

['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2', 'Tomb Raider', 'The Banner Saga', 'Dead Island Epidemic', 'BioShock Infinite', 'Dragon Age Origins - Ultimate Edition', 'Fallout 3 - Game of the Year Edition', 'SEGA Genesis & Mega Drive Classics', 'Grand Theft Auto IV', 'Realm of the Mad God', 'Marvel Heroes 2015']


---
### Exercise 3 - count the number of times a title appears in the list

Write a function which will:
*  count the number of times that the title Fallout 4 appears in the list

Expected output:  
168

In [13]:
def count_title():

    print(titles.count("Fallout 4")) #counting "Fallout 4" in titles list

count_title()

168


---
### Exercise 4 - remove all duplicates of a title from the list

Write a function which will: remove all occurences of Fallout 4 from the titles list (Hint:  you can remove an occurence of Fallout 4 repeatedly until there is only one left)


In [14]:
def remove_duplicates():

    for item in titles: #for every item in titles list
      while titles.count(item) > 1: #if title is >1 
        titles.remove(item) #the loop will remove an item
    print(titles.count("Fallout 4")) #testing if all duplicates have been removed.

remove_duplicates()


1


---
### Exercise 5 - print the counts of the first 10 titles in the list

Write a function which will:
* loop through the first 10 items in the titles list
* for each item print the number of times that title appears in the list


In [15]:
def print_count_of_first_ten():

    for item in titles[:10]: #for the first ten items in the titles list
      print(item,":", titles.count(item)) #print the quantity of item in list.

users, titles = get_users_and_titles()
print_count_of_first_ten()

The Elder Scrolls V Skyrim : 717
Fallout 4 : 168
Spore : 67
Fallout New Vegas : 337
Left 4 Dead 2 : 951
HuniePop : 22
Path of Exile : 339
Poly Bridge : 12
Left 4 Dead : 281
Team Fortress 2 : 2323


---
### Project - work as a team

The users list has the ids of all the users who have purchased STEAM games.

Write a function that will:
* count how many games have been purchased by each user.  
* calculate the percentage of all purchases made by each user
* calculate the percentage of all purchases made by these 100 users altogether
* find the id of the user who has purchased the most games of these 100 users 
* calculate the average number of games purchased by a user from the 100 
* print this information, printing each unique user just once  
Do the same with the last 100 users  

Divide up the tasks and each write one part, then try to get them all to work together.

### Practice 1
---
Get a list of unique user ids

Write some code that will loop through the users list and add each new user id to a new list called **unique_users**

**Expected output**:
12393

In [22]:
# get list of unique user ids
def get_unique_users():

    unique_users = [] #created empty list
    for user in users: #for every user in users
      unique_users.append(user) #adds user to list/end of list
      unique_users.sort() #sorts the list ascending

    usercount = 0 #sets usercount to 0
    for user in unique_users: #for every user in unique_users list
      while unique_users.count(user) > 1: #if there are >1 of user ID
        unique_users.remove(user) #removes the duplicates

    print(len(unique_users)) #printing length of list after duplicates have been removed
users, titles = get_users_and_titles()
get_unique_users()

12393


### Practice 2
---

Write code that will create a subset of the unique_users list, containing just the first 100 users and called **hundred_users**.  Loop through the hundred_users list and for each, will print the number of games that user has purchased (`users.count(unique_user`)

**Expected output**:
```
40
1
43
505
1
...
...
2
18
27
1
33
```

In [29]:
# print number of games purchased by each of first hundred users
def game_number(users):

    unique_users = [] #created empty list
    for user in users: #for every user in users list
      unique_users.append(user) #add user to unique_users list
    unique_users.sort() #sort to ascending

    usercount = 0 #set user count to 0
    for user in unique_users: #for every user in unique_users list
      while unique_users.count(user) > 1: #if user is counted more than once
        unique_users.remove(user) #remove duplicate

    hundred_users = unique_users[:100] #new list of 100 users
    #print(len(hundred_users)) #checking length of list

    for user in hundred_users: #for every user in hundred_users list
      print(users.count(user)) #count and print how many times user came up in original users list

users, titles = get_users_and_titles()
game_number(users)


21
36
82
10
8
12
27
259
14
10
25
28
148
8
8
10
8
13
10
30
8
9
12
17
14
378
8
12
23
8
8
15
22
8
8
9
9
8
10
8
28
8
8
12
12
10
152
77
10
16
61
14
35
12
203
8
15
29
8
49
118
10
8
8
12
8
8
10
98
8
12
23
14
12
8
12
27
8
8
27
11
8
8
8
8
27
10
8
8
34
33
12
8
8
12
16
8
18
8
7


### Practice 3
---
Write code to calculate the percentage that the first user in the unique_user list has purchased of all the purchases made by users.  Print the users id and the percentage

*Hint*:  get the count for that user (as in the last practice), divide it by the number of purchase made (the length of the original users list) and multiply by 100

**Expected output**:  
`151603712 0.03 %`

In [34]:
# Find percentage purchases bought by first user

def find_percentage():

    unique_users = [] #created empty list
    for user in users: #for every user in users list
      unique_users.append(user) #add user to unique_users list
    unique_users.sort() #sort to ascending

    usercount = 0 #set user count to 0
    for user in unique_users: #for every user in unique_users list
      while unique_users.count(user) > 1: #if user is counted more than once
        unique_users.remove(user) #remove duplicate

    hundred_users = unique_users[:100] #new list of 100 users
    
    for user in hundred_users[:1]: #for user 1 in hundred users 
        num_games = users.count(user) + 1 #number of games = the occurence of user original users list
        percentage = num_games / len(users) * 100 #percentage of purchase = the number of times the user was in users list divided by the amount of users in users list x 100
        print("User ID: ",hundred_users[:1],"Percentage: ",round(percentage,2), "%") #printing first user in hundred users with their percentage rounded to two decimal places

find_percentage()

User ID:  [5250] Percentage:  0.02 %


### Practice 4
---
Write some code that will loop through the `hundred_users` and find the id of the user with the largest number of purchases

**Expected output**:  
```
53875128
```


In [42]:
# find user who has made most purchases
def max_value():

    unique_users = [] #created empty list
    for user in users: #for every user in users list
      unique_users.append(user) #add user to unique_users list
      unique_users.sort() #sort to ascending

    usercount = 0 #set user count to 0
    for user in unique_users: #for every user in unique_users list
      while unique_users.count(user) > 1: #if user is counted more than once
        unique_users.remove(user) #remove duplicate

    hundred_users = unique_users[:100] #new list of 100 users

    max_purchase = 0
    max_userID = 0
    for user in hundred_users: #for every user in hundred_users 
        num_games = users.count(user) #number of games = the occurence of user in original users list
        if num_games > max_purchase: #if num games is greater than last loop round, assign to max purchase
          max_purchase = num_games
          max_userID = user

    print("User ID:",max_userID, "Purchase count:",max_purchase)
        
max_value()

User ID: 975449 Purchase count: 378


### Practice 5
---
Write some code that will loop through the `hundred_users`, add all the purchases made by them, then calculate this as a percentage of the total number of purchases made (as before)  divide by 100 (the number of users in this list) to get the average.

**Expected output**:  
```
0.01 %

```

In [57]:
# find percentage of total purchases made by first hundred users
def total_percentage():

    unique_users = [] #created unique_users list
    for user in users: # for every user in users list
      if user not in unique_users: #if user not found in unique_users list
        unique_users.append(user) #append user to the unique_users list

    hundred_users = unique_users[:100] #new list of 100 users
    hundred_total = 0 #starting total at 0

    for user in hundred_users: #for every user in hundred users 
        num_games = users.count(user) #number of games they purchased
        hundred_total += num_games #in every loop hundred total adds num games to hundred total variable

    percentage = hundred_total / len(users) #percentage = total games hundred users purchased divided by the total amount of users in original list
    print("Average percent: ",round(percentage,2), "%")
        
total_percentage()

Average percent:  0.01 %


### Practice 6
---

Write some code that will loop through the `hundred_users`, add all the purchases made by them, then divide by 100 (the number of users in this list) to get the average.

**Expected output**:  
```
16.46

```

In [61]:
# find average number of purchases made by first 100 users

def average_percentage():

    unique_users = [] #created unique_users list
    for user in users: # for every user in users list
      if user not in unique_users: #if user not found in unique_users list
        unique_users.append(user) #append user to the unique_users list

    hundred_users = unique_users[:100] #new list of 100 users
    hundred_total = 0 #starting total at 0

    for user in hundred_users: #for every user in hundred users 
        num_games = users.count(user) #number of games they purchased
        hundred_total += num_games #in every loop hundred total adds num games to hundred total variable

    percentage = hundred_total / 100 #percentage = total games hundred users purchased divided by the total amount of users in the hundred list (can use: len(hundred_users))
    print("Average percent: ",round(percentage,2), "%")
        
average_percentage()


Average percent:  16.46 %


### Practice 7
---
Put all the above together into a function, and add code to print the average number of games per user, the user id of the user with the maximum number of purchases, and a list of the hundred users ids and the percentage each has purchased

In [83]:
def process_user_purchases():

    unique_users = [] #created unique_users list
    for user in users: # for every user in users list
      if user not in unique_users: #if user not found in unique_users list
        unique_users.append(user) #append user to the unique_users list

    hundred_users = unique_users[:100] #new list of 100 users

    for user in hundred_users: #for every user in hundred users 
      num_games = users.count(user) #number of games they purchased
      percentage = (num_games / len(users)) * 100 #percentage = total games hundred users purchased divided by the total amount of users in original list x 100
      average_num = num_games / 100 #average_num = total games hundred users purchased divided by the total amount of users in the hundred list (can use: len(hundred_users))
      print("User ID: ", user, " - ","Average number of games: ",round(average_num,2), " - ", percentage, "% of total purchased.")

    max_purchase = 0
    max_userID = 0
    for user in hundred_users: #for every user in hundred_users 
        num_games = users.count(user) #number of games = the occurence of user in original users list
        if num_games > max_purchase: #if num games is greater than last loop round, assign to max purchase
          max_purchase = num_games
          max_userID = user

    print("\nUser ID with the maximum number of purchases:",max_userID, "Purchase count:",max_purchase)

process_user_purchases()

User ID:  151603712  -  Average number of games:  0.4  -  0.030885407417130594 % of total purchased.
User ID:  187131847  -  Average number of games:  0.01  -  0.0007721351854282647 % of total purchased.
User ID:  59945701  -  Average number of games:  0.43  -  0.03320181297341539 % of total purchased.
User ID:  53875128  -  Average number of games:  5.05  -  0.3899282686412737 % of total purchased.
User ID:  234941318  -  Average number of games:  0.01  -  0.0007721351854282647 % of total purchased.
User ID:  140954425  -  Average number of games:  0.01  -  0.0007721351854282647 % of total purchased.
User ID:  26122540  -  Average number of games:  0.1  -  0.0077213518542826485 % of total purchased.
User ID:  176410694  -  Average number of games:  0.01  -  0.0007721351854282647 % of total purchased.
User ID:  197278511  -  Average number of games:  0.01  -  0.0007721351854282647 % of total purchased.
User ID:  150128162  -  Average number of games:  0.01  -  0.0007721351854282647 % o