<a href="https://colab.research.google.com/github/shammud/python/blob/main/Working_with_Data_Lists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lists

Often we need to store a number of single items of data together so that they can be processed together. This might be because all the data refers to one person (e.g. name, age, gender, etc) OR it might be because we have a set of data (e.g. all the items that should be displayed in a drop down list, such as all the years from this year back to 100 years ago so that someone can select their year of birth)

Python has a range of data structures available including:
*   lists  
*   tuples  
*   dictionaries  
*   sets

This worksheet looks at lists.

## List
A list is a set of related, individual data objects, that are indexed and can be processed as a whole, as subsets or as individual items.  Lists are stored, essentially, as contiguous items in memory so that access can be as quick as possible.  However, they are mutable (they can be changed after they have been created and stored) and so they need to have extra functionality to deal with changing list sizes.

# Let's get some lists of data
For this worksheet we are going to work with data on STEAM games.  We are going to get the data from a spreadsheet and make lists that we can find things out from.



# Creating a list
```
nums = [1, 2, 3, 4, 5]
names = ["Tom","Jerry","Spike"]
```

# Printing a list

```
print(nums)
[1,2,3,4,5]
```

```
print(names)
["Tom","Jerry","Spike"]
```

In [None]:
# create the lists, and print them
nums = [1,2,3,4,5]
names = ["Tom", "Jerry", "Spike"]
print(nums)
print(names)

[1, 2, 3, 4, 5]
['Tom', 'Jerry', 'Spike']


# Print individual items in the list

We can access any item in a list by its position (index).  Lists are indexed from 0.

To print the first item in a list, use listname[0], to print the last item use listname[-1].

```
# This is formatted as code
```



```
print(nums[0])
1
```
```
print(names[1])
Jerry
```

```
print(nums[-1])
5
```


In [None]:
# have a go at printing different items from the lists
print(nums[0])
print(names[1])
print(nums[-1])

1
Jerry
5


# Print a subset of a list

listname[start_index : end_index+1]  
If start_index is the first item, or end_index is the end of the list, they can be left out

```
print(nums[:3])
[1,2,3]
```

```
print(names[1:])
["Jerry","Spike"]
```

```
print(nums[1:3])
[2,3]
```

In [None]:
# have a go at printing subsets of the lists
print(nums[:3])
print(names[1:])
print(nums[1:3])

[1, 2, 3]
['Jerry', 'Spike']
[2, 3]


# List length

Use the len() function to get the number of items in a list.

There are 5 items in the nums list and 3 in the names list.

Write a function that will:
* print the length of the nums list
* print the length of the names list
* concatenate (add) the two lists together to make a new list called num_names
* print the length of the new list

Expected output:
```
The length of the nums list is: 5
The length of the names list is: 3
The length of the joined list is: 8
```


In [None]:
def print_list_info():
  print("The length of the nums list is:", len(nums))
  print("The length of the names list is:", len(names))
  nums_names = nums + names
  print("The length of the joined list is:", len(nums_names))

print_list_info()

The length of the nums list is: 5
The length of the names list is: 3
The length of the joined list is: 8


# List methods

You can get an overview of the methods you can use here: https://www.w3schools.com/python/python_lists_methods.asp

Then: 
1.  Create the nums and names list again 
2.  Append the number 6 to the nums list, and print
3.  Insert the name "Sylvester" before "Jerry" in the names list and print
4.  Print the length of the nums list
5.  Remove the number 4 from the nums list, and print
6.  Print the max and min of the nums list
7.  Create a new list called new_nums which contains the numbers 40 to 50 (use the range function)

Expected output:  


In [None]:
nums = [1,2,3,4,5]
names = ["Tom", "Jerry","Spike"]
nums.append(6)
print(nums)
names.insert(1, "Sylvester")
print(names)
print(len(nums))
nums.remove(4)
print(nums)
print(max(nums), min(nums))
new_nums = range(40,51)
print(new_nums)


[1, 2, 3, 4, 5, 6]
['Tom', 'Sylvester', 'Jerry', 'Spike']
6
[1, 2, 3, 5, 6]
6 1
range(40, 51)


# Now some real data
---

1.  Open the STEAM csv file (which we have taken from Kaggle and have reduced to make it more manageable): https://drive.google.com/file/d/1rkG8-cp-KLBc1zK4YMLHIsMMyyTVk5Ju/view?usp=sharing


2.  Open the file with Google sheets to see what is in it.  The file contains rows of data, each with a user id and a game that the user has purchased.

3.  NOW, run the code in the cell below to get:  
- users (the list of user ids in the data)
- titles (the list of titles that have been purchased)

In [9]:
import pandas as pd

# open the data file and get a copy of the Titles column
def get_users_and_titles():
  url = "https://drive.google.com/uc?export=download&id=1rkG8-cp-KLBc1zK4YMLHIsMMyyTVk5Ju"
  data_table = pd.read_csv(url)
  return data_table["User"].tolist(), data_table["Title"].tolist() 

users, titles = get_users_and_titles()

---
### Exercise 1 - list head, tail and length of the titles list
---

Write a function, **describe_list()** which will:
*  print the length of the list `titles`
*  print the first 10 items in `titles` (the head)  
*  print the last 5 items in `titles` (the tail)

Expected output:  
```
129511
['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2']
['Fallen Earth', 'Magic Duels', 'Titan Souls', 'Grand Theft Auto Vice City', 'RUSH']
```

In [10]:
def print_info():
  print(len(titles))
  print(titles[0:9])
  print(titles[-5:-1])               

print_info()

129511
['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead']
['Fallen Earth', 'Magic Duels', 'Titan Souls', 'Grand Theft Auto Vice City']


---
### Exercise 2 - use a loop to print the first 20 items

Write a function which will:
*  create a new list from the first 20 items of the titles list
*  loop through the new list and print each item


In [11]:
def print_list():
  new_list=[]
  for x in range(0,20):
    new_list.append(titles[x])
  print(new_list)
  

print_list()

['The Elder Scrolls V Skyrim', 'Fallout 4', 'Spore', 'Fallout New Vegas', 'Left 4 Dead 2', 'HuniePop', 'Path of Exile', 'Poly Bridge', 'Left 4 Dead', 'Team Fortress 2', 'Tomb Raider', 'The Banner Saga', 'Dead Island Epidemic', 'BioShock Infinite', 'Dragon Age Origins - Ultimate Edition', 'Fallout 3 - Game of the Year Edition', 'SEGA Genesis & Mega Drive Classics', 'Grand Theft Auto IV', 'Realm of the Mad God', 'Marvel Heroes 2015']


---
### Exercise 3 - count the number of times a title appears in the list

Write a function which will:
*  count the number of times that the title Fallout 4 appears in the list

Expected output:  
168

In [12]:
def count_title():
 print(titles.count("Fallout 4"))

count_title()

168


---
### Exercise 4 - remove all duplicates of a title from the list

Write a function which will: remove all occurences of Fallout 4 from the titles list (Hint:  you can remove an occurence of Fallout 4 repeatedly until there is only one left)


In [13]:
def remove_duplicates():
  while ("Fallout4" in titles):
    titles.remove("Fallout4")
  print(len(titles))




remove_duplicates()


129511


---
### Exercise 5 - print the counts of the first 10 titles in the list

Write a function which will:
* loop through the first 10 items in the titles list
* for each item print the number of times that title appears in the list


In [14]:
def print_count_of_first_ten():
 for i in titles[0:10]:
   #print(i)
   print(i," : ",titles.count(i))

users, titles = get_users_and_titles()
print_count_of_first_ten()

The Elder Scrolls V Skyrim  :  717
Fallout 4  :  168
Spore  :  67
Fallout New Vegas  :  337
Left 4 Dead 2  :  951
HuniePop  :  22
Path of Exile  :  339
Poly Bridge  :  12
Left 4 Dead  :  281
Team Fortress 2  :  2323


---
### Project - work as a team

The users list has the ids of all the users who have purchased STEAM games.

Write a function that will:
* count how many games have been purchased by each user.  
* calculate the percentage of all purchases made by each user
* calculate the percentage of all purchases made by these 100 users altogether
* find the id of the user who has purchased the most games of these 100 users 
* calculate the average number of games purchased by a user from the 100 
* print this information, printing each unique user just once  
Do the same with the last 100 users  

Divide up the tasks and each write one part, then try to get them all to work together.

### Practice 1
---
Get a list of unique user ids

Write some code that will loop through the users list and add each new user id to a new list called **unique_users**

**Expected output**:
12393

In [15]:
# get list of unique user ids
def get_unique_users():
  unique_users=[]
  for i in users:
      if i not in unique_users:
        unique_users.append(i)
  print(len(unique_users))
  print(unique_users[0:2])
  return unique_users

unique_users=get_unique_users()





12393
[151603712, 187131847]


### Practice 2
---

Write code that will create a subset of the unique_users list, containing just the first 100 users and called **hundred_users**.  Loop through the hundred_users list and for each, will print the number of games that user has purchased (`users.count(unique_user`)

**Expected output**:
```
40
1
43
505
1
...
...
2
18
27
1
33
```

In [16]:
# print number of games purchased by each of first hundred users
def sub_set():
  hundred_users=[]
  for i in range(0,100):
    hundred_users.append(unique_users[i])
  print(len(hundred_users))
  print(hundred_users[0:2])
  for user in hundred_users:
    print(users.count(user))

  return hundred_users

hundred_users=sub_set()


100
[151603712, 187131847]
40
1
43
505
1
1
10
1
1
1
1
1
3
1
3
1
29
1
2
2
1
1
1
4
2
1
1
1
1
2
1
1
67
5
1
1
1
11
1
2
2
6
1
1
15
13
2
8
6
127
1
1
4
458
1
2
2
31
1
4
16
1
1
1
6
11
1
1
22
1
1
1
3
5
1
1
2
1
4
3
1
1
1
1
1
2
5
4
1
1
6
2
9
3
4
2
18
27
1
33


### Practice 3
---
Write code to calculate the percentage that the first user in the unique_user list has purchased of all the purchases made by users.  Print the users id and the percentage

*Hint*:  get the count for that user (as in the last practice), divide it by the number of purchase made (the length of the original users list) and multiply by 100

**Expected output**:  
`151603712 0.03 %`

In [37]:
# Find percentage purchases bought by first user
print("user_id : ",unique_users[0])
print("count : ",users.count(unique_users[0]))
average=(count/len(users))*100
print("average : ",round(average,2),"%")


 

user_id :  151603712
count :  40
average :  0.03 %


### Practice 4
---
Write some code that will loop through the `hundred_users` and find the id of the user with the largest number of purchases

**Expected output**:  
```
53875128
```


In [None]:
# find user who has made most purchase
def large_purchases():
  most_common=hundred_users[0]
  frequency=users.count(most_common)
  for user in hundred_users:
    if users.count(user) > frequency:
      most_common=user
      frequency=users.count(user)
  print("Largest purchase id : ",most_common)
  print("Number of items : ",frequency)
large_purchases()  

Largest purchase id :  53875128
Number of items :  505


### Practice 5
---
Write some code that will loop through the `hundred_users`, add all the purchases made by them, then calculate this as a percentage of the total number of purchases made (as before)  divide by 100 (the number of users in this list) to get the average.

**Expected output**:  
```
0.01 %

```

In [26]:
# find percentage of total purchases made by first hundred users
a=0
for user in hundred_users:
    count=users.count(user)
    a=a+count
avg=((a/len(users))*100)/100
print("average : ",round(avg,2),"%")

average :  0.01 %


### Practice 6
---

Write some code that will loop through the `hundred_users`, add all the purchases made by them, then divide by 100 (the number of users in this list) to get the average.

**Expected output**:  
```
16.46

```

In [27]:
# find average number of purchases made by first 100 users
def avg_hundred_users():
  a=0
  for user in hundred_users:
    count=users.count(user)
    a=a+count
average = (a/len(hundred_users))
print("average : ",average)


average :  16.46


### Practice 7
---
Put all the above together into a function, and add code to print the average number of games per user, the user id of the user with the maximum number of purchases, and a list of the hundred users ids and the percentage each has purchased

In [54]:
def process_user_purchases():
#user with maximum number of purchase
  count=0
temp=0
index=0
for x in range(0,len(users)):
  temp=users.count(users[x])
  if temp > count:
    count=temp
    index=x
a=users[index]
print("user_id : ",a)
print("Number of items : ",count)

#list of hundred user id and percentage :
for i in range(0,len(hundred_users)):
   print("user_id : ",hundred_users[i])
   total=users.count(hundred_users[i])
   print("total : ",total)
   average=(total/len(users))*100
   print("percentage : ",round(average,6),"%")
#average number of games per user
for i in range(0,len(unique_users)):
   total=users.count(unique_users[i])
   average=total/len(users)
   print("user_id: ",unique_users[i],"  average : ",round(average,5))


process_user_purchases()

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
user_id:  76027799   average :  8e-05
user_id:  113820072   average :  1e-05
user_id:  194678830   average :  1e-05
user_id:  216924034   average :  1e-05
user_id:  141012357   average :  1e-05
user_id:  192341303   average :  5e-05
user_id:  20704366   average :  0.00023
user_id:  60227446   average :  4e-05
user_id:  299025375   average :  2e-05
user_id:  25822628   average :  5e-05
user_id:  105517470   average :  2e-05
user_id:  181262663   average :  1e-05
user_id:  101695880   average :  0.0007
user_id:  99110207   average :  8e-05
user_id:  48798067   average :  0.00308
user_id:  122471552   average :  2e-05
user_id:  304788598   average :  1e-05
user_id:  126063949   average :  1e-05
user_id:  276073724   average :  2e-05
user_id:  74270476   average :  1e-05
user_id:  91567150   average :  1e-05
user_id:  79626460   average :  1e-05
user_id:  298367684   average :  1e-05
user_id:  205253661   average :  1e-05
use