## Problem
- Let's say we just care about the 10 latest reviews given to a movie, and each time a new review is added we discard the oldest one before adding the new one in a FIFO way (First-In First Out)
- We can do this using a list and consider the oldest review at position 0 and the latest review at position 9
- The problem with this solution is that it involves a lot of reviews.pop(o) which is costly. Can we do better?

In [1]:
last10_reviews = [2.5, 1.0, 1.4, 5.0, 3.0, 3.9, 1.8, 4.2, 3.4, 4.7]
oldest, *_ ,newest = last10_reviews
print(f'last10_reviews = {last10_reviews}')
print(f'oldest_review = {oldest}')
print(f'newest_review = {newest}')

last10_reviews = [2.5, 1.0, 1.4, 5.0, 3.0, 3.9, 1.8, 4.2, 3.4, 4.7]
oldest_review = 2.5
newest_review = 4.7


## Answer
- Yes we can do better using a deque

In [75]:
from collections import deque #<0> deque stands for double-ended queue
new_review = 2.5

In [76]:
# BAD way to store the last 10 reviews: using a list

# first remove the oldest review
last10_reviews.pop(0) #<1>   time-complexity = O(n)
last10_reviews, len(last10_reviews)

([1.0, 1.4, 5.0, 3.0, 3.9, 1.8, 4.2, 3.4, 4.7], 9)

In [77]:
# now add the new one at the end
last10_reviews.append(new_review) # time-complexity = O(1)
last10_reviews, len(last10_reviews)

([1.0, 1.4, 5.0, 3.0, 3.9, 1.8, 4.2, 3.4, 4.7, 2.5], 10)

In [78]:
# GOOD way to store the last 10 reviews: using a deque

last10_reviews = deque([2.5, 1.0, 1.4, 5.0, 3.0, 3.9, 1.8, 4.2, 3.4, 4.7])

# first remove the oldest review
last10_reviews.popleft() #<2>   time-complexity = O(1)
last10_reviews, len(last10_reviews)

(deque([1.0, 1.4, 5.0, 3.0, 3.9, 1.8, 4.2, 3.4, 4.7]), 9)

In [79]:
# now add the new one at the end
last10_reviews.append(new_review) # time-complexity = O(1)
last10_reviews, len(last10_reviews)

(deque([1.0, 1.4, 5.0, 3.0, 3.9, 1.8, 4.2, 3.4, 4.7, 2.5]), 10)

In [80]:
# EVEN BETTER: using a deque with maxlen=10

last10_reviews = deque([2.5, 1.0, 1.4, 5.0, 3.0, 3.9, 1.8, 4.2, 3.4, 4.7], maxlen=10) #<3>
# with maxlen=10, appending to the right automatically pops on the left for us
last10_reviews.append(new_review)
last10_reviews, len(last10_reviews)

(deque([1.0, 1.4, 5.0, 3.0, 3.9, 1.8, 4.2, 3.4, 4.7, 2.5]), 10)

In [81]:
long_list = list(range(10**7))
long_deque = deque(list(range(10**7)))

In [84]:
%%time
long_list.pop(0)
#long_list.insert(0, 4)

CPU times: user 11.6 ms, sys: 358 µs, total: 11.9 ms
Wall time: 12 ms


4

In [85]:
%%time
long_deque.popleft()
#long_deque.appendleft(4)

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 7.15 µs


4

## Discussion
- <0> internally, a deque uses a doubly-linked list for quick access and update on both left and right ends.
- <1> list.pop(0) takes O(n) time-complexity: costly operation as all the other items have to be shifted to the left
- <2> on the other hand deque.popleft() is constant time operation: O(1) as internally it uses linked-lists to store the items.

## Problem
- Let's say we have a huge list for know in advance that all the items are of the same type (say: float)
- Which type of sequence can we use to keep all the advantages of list (mainly mutabiblity and api: append, pop, etc...) but boost our program performances time and memory wise?

## Answer
- We can use array.array which is mutable like list but provides "almost" C-like performance as internally, everything is stored using the corresponding C-type.

In [86]:
from array import array
from random import random

In [87]:
%%time

long_list_of_floats1 = [random() for _ in range(10**7)] #<0>
with open('floats.txt', 'w') as f:
    f.writelines(f'{i}\n' for i in long_list_of_floats1) #<1>

with open('floats.txt', 'r') as f:
    long_list_of_floats2 = [float(line) for line in f]  #<2>
    
print(long_list_of_floats1 == long_list_of_floats2) #<3>

True
CPU times: user 18 s, sys: 1.06 s, total: 19 s
Wall time: 20.1 s


In [88]:
%%time

long_array_of_floats1 = array('d', (random() for _ in range(10**7))) #<4>
with open('floats.b', 'wb') as f:
    long_array_of_floats1.tofile(f)  #<5>

long_array_of_floats2 = array('d')
with open('floats.b', 'rb') as f:
    long_array_of_floats2.fromfile(f, 10**7) #<6>
    
print(long_array_of_floats1 == long_array_of_floats2) #<7>

True
CPU times: user 2 s, sys: 290 ms, total: 2.29 s
Wall time: 2.37 s


In [71]:
!rm floats.text
!rm floats.b

rm: floats.text: No such file or directory


## Discussion
- <0> constructing a list of floats of length 10^7
- <1> saving that list to a file as text file
- <2> reloading the float values from the floats.txt file in a different variable
- <3> comparing the two list variables

- <4> constructing an array of doubles ( 'd' ) of lenght 10^7  
- <5> saving those values as binaries in a file since they python knows before-hand their type, it can deduce their byte representation to store efficiently 
- <6> similarly, we can retrieve faster from disk if we know the type double ('d') and the length (10^7)
- <7> comparing takes less time as well