<a href="https://colab.research.google.com/github/suryagokul/Data-Science-Portfolio/blob/master/collections_and_itertools_modules.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# python collections

    1. Counters

    2. OrderedDict

    3. Defaultdict

    4. ChainMap

    5. NamedTuple

    6. DeQue

    7. UserDict

    8. UserList

    9. UserString

## Counter

In [25]:
from collections import Counter

In [26]:
fruits_count = Counter(['apple','mango','apple','pomegrenate'])

fruits_count

Counter({'apple': 2, 'mango': 1, 'pomegrenate': 1})

In [27]:
fruits_count['apple']

2

In [28]:
ipl_teams_won = ('RR','DC','MI','CSK','MI','KKR','MI','CSK')

Counter(ipl_teams_won)

Counter({'CSK': 2, 'DC': 1, 'KKR': 1, 'MI': 3, 'RR': 1})

In [29]:
total_mi_wins = Counter(ipl_teams_won)['MI']

total_mi_wins

3

## OrderedDict


It remembers the order in which the keys were inserted

In [30]:
from collections import OrderedDict

elections_won = {}

elections_won['JSP'] = 1

elections_won['YSRCP'] = 3

elections_won['NDA'] = 2

elections_won['TDP'] = 4

elections_won['UPA'] = 5



local_elections_won = {}

local_elections_won['YSRCP'] = 3

local_elections_won['NDA'] = 2

local_elections_won['TDP'] = 4

local_elections_won['UPA'] = 5

local_elections_won['JSP'] = 1

In [31]:
elections_won 

{'JSP': 1, 'NDA': 2, 'TDP': 4, 'UPA': 5, 'YSRCP': 3}

In [32]:
local_elections_won 

{'JSP': 1, 'NDA': 2, 'TDP': 4, 'UPA': 5, 'YSRCP': 3}

In [33]:
local_elections_won  == elections_won            # Normal dictionaries equality check

True

In [34]:
elections_won_ordered = OrderedDict(elections_won)

local_elections_won_ordered = OrderedDict(local_elections_won)

In [35]:
if local_elections_won_ordered  == elections_won_ordered:             # Ordered dictionaries equality check
  print("Not a ordered dict")

else:
  print("Must Follow Order")

Must Follow Order


Here what happens is - 

        I have created two dictionaries elections_won and local_elections_won with same key value pairs but changed order of elements.

        Then checking if two dictionaries are equal or not.

        Both are equal so it returns True.


        But after converting both dictionaries into OrderedDict, because of difference in order it returns False i.e they are different dictionaries.

`SO OrderedDict preserves the order in which keys are inserted into dictionary.`

`Deletion and Re-Inserting:` Deleting and re-inserting the same key will push it to the back as OrderedDict however maintains the order of insertion.

In [36]:
print("Before deleting:\n")
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
od['d'] = 4
  
for key, value in od.items():
    print(key, value)


Before deleting:

a 1
b 2
c 3
d 4


In [37]:
print("\nAfter deleting:\n")
od.pop('c')
for key, value in od.items():
    print(key, value)


After deleting:

a 1
b 2
d 4


In [38]:
print("\nAfter re-inserting:\n")
od['c'] = 3
for key, value in od.items():
    print(key, value)


After re-inserting:

a 1
b 2
d 4
c 3


c was added at the end to maintain order of elements..

## DefaultDict


provides default values for keys that doesn't exist and never raises KeyError.

In [39]:
from collections import defaultdict

d = defaultdict(int)

d

defaultdict(int, {})

In [40]:
d[0],d[2],d[5000]       # by default value for int takes as zero. we just need to provide key.

(0, 0, 0)

In [41]:
d['a'] = 20          

d['c'] = 50

In [42]:
d

defaultdict(int, {0: 0, 2: 0, 5000: 0, 'a': 20, 'c': 50})

In [43]:
dd = defaultdict(list)

dd

defaultdict(list, {})

In [44]:
dd[652]                                 # by default value takes as empty list.we just need to provide key.

[]

In [45]:
dd['pm'] = 'modi'

In [46]:
dd

defaultdict(list, {652: [], 'pm': 'modi'})

In [47]:
dd[0].append(50)

In [48]:
dd

defaultdict(list, {0: [50], 652: [], 'pm': 'modi'})

In [49]:
movies_dict = defaultdict(lambda : 'Not Present')      # default value if key is not present

movies_dict['AA19'] = 'AVPL'

movies_dict['AA20'] = 'PUSHPA'

movies_dict

defaultdict(<function __main__.<lambda>>, {'AA19': 'AVPL', 'AA20': 'PUSHPA'})

In [50]:
movies_dict['AA21']

'Not Present'

## ChainMap


    A ChainMap encapsulates many dictionaries into a single unit.

    The ChainMap class manages a sequence of dictionaries.

In [51]:
from collections import ChainMap

In [54]:
results_2019 = {'mukesh':9.9,'suresh':9.7,'mahesh':9.5}

results_2020 = {'ramesh':9.2,'ganesh':8.7,'vignesh':8.3}

chain = ChainMap(results_2019, results_2020)

chain

ChainMap({'mukesh': 9.9, 'suresh': 9.7, 'mahesh': 9.5}, {'ramesh': 9.2, 'ganesh': 8.7, 'vignesh': 8.3})

In [58]:
for name,gpa in chain.items():
  print(name, gpa)


ramesh 9.2
ganesh 8.7
vignesh 8.3
mukesh 9.9
suresh 9.7
mahesh 9.5


`By seeing this, we can say that both the dictionaries are combined..`

In [61]:
chain['jiwitesh'] = 10

print("Chain ",chain)

print('-'*42)

print("Results of 2019 : ",results_2019)

Chain  ChainMap({'mukesh': 9.9, 'suresh': 9.7, 'mahesh': 9.5, 'jiwitesh': 10}, {'ramesh': 9.2, 'ganesh': 8.7, 'vignesh': 8.3})
------------------------------------------
Results of 2019 :  {'mukesh': 9.9, 'suresh': 9.7, 'mahesh': 9.5, 'jiwitesh': 10}


`here when we adding new key jiwitesh to chain, it is added to the first dictionary results_2019 as shown above..`

### Adding new dictionary

A new dictionary can be added by using the new_child() method. The newly added dictionary is added at the beginning of the ChainMap.

In [62]:
dic1 = { 'a' : 1, 'b' : 2 } 
dic2 = { 'b' : 3, 'c' : 4 } 
dic3 = { 'f' : 5 }     

In [63]:
c = ChainMap(dic1,dic2)

In [68]:
print ("All the ChainMap contents are : ") 
print (c) 

All the ChainMap contents are : 
ChainMap({'a': 1, 'b': 2}, {'b': 3, 'c': 4})


In [69]:
c1 = c.new_child(dic3)

In [70]:
print ("Displaying new ChainMap : ") 
print (c1)

Displaying new ChainMap : 
ChainMap({'f': 5}, {'a': 1, 'b': 2}, {'b': 3, 'c': 4})


## NamedTuple

is one of the easiest ways to clean up your code and make it more readable. It self-documents what is happening in the tuple. Namedtuples instances are just as memory efficient as regular tuples as they do not have per-instance dictionaries, making them faster than dictionaries. 

In [74]:
student_tuple = 'Lisa', 'Simpson', 'A'

student_tuple

('Lisa', 'Simpson', 'A')

In [77]:
student_tuple[0],student_tuple[1]

('Lisa', 'Simpson')

`Named tuples are tuples that allow their elements to be accessed by name instead of just index!` 

In [79]:
from collections import namedtuple

Student = namedtuple('Student',['first', 'last', 'grade'])

In [80]:
Student

__main__.Student

In [82]:
nmdt = Student('surya','peddinti',8.7)

In [84]:
nmdt.first

'surya'

In [85]:
nmdt.last

'peddinti'

In [86]:
nmdt.grade

8.7

### Example 2

In [88]:
Color = namedtuple('Color', ['hue', 'saturation', 'luminosity'])

p = Color(170, 0.1, 0.6)
if p.saturation >= 0.5:
  print("Whew, that is bright!")
if p.luminosity >= 0.5:
  print("Wow, that is light")

Wow, that is light


In [90]:
# Without naming each element in the tuple, it would read like this:

p = (170, 0.1, 0.6)
if p[1] >= 0.5:
    print("Whew, that is bright!")
if p[2]>= 0.5:
    print("Wow, that is light")

Wow, that is light


`Instead of p[1], we can call it p.saturation. It's easier to understand. And it looks cleaner.`

When might you use namedtuple

    1. As just stated, the namedtuple makes understanding tuples much easier. 
    2. So if you need to reference the items in the tuple, then creating them as namedtuples just makes sense.
    3. Besides being more lightweight than a dictionary, namedtuple also keeps the order unlike the dictionary.

In [91]:
#  _make(): This function is used to return a namedtuple() from the iterable passed as argument.

# _asdict(): This function returns the OrdereDict() as constructed from the mapped values of namedtuple().

In [94]:
# Declaring namedtuple() 
Student = namedtuple('Student',['name','age','DOB']) 
    
# Adding values 
S = Student('Nandini','19','2541997') 
    
# initializing iterable  
li = ['Manjeet', '19', '411997' ] 
  
    
# using _make() to return namedtuple() 
print ("The namedtuple instance using iterable is  : ") 
print (Student._make(li)) 

print('-'*50)
    
# using _asdict() to return an OrderedDict() 
print ("The OrderedDict instance using namedtuple is  : ") 
print (S._asdict())

The namedtuple instance using iterable is  : 
Student(name='Manjeet', age='19', DOB='411997')
--------------------------------------------------
The OrderedDict instance using namedtuple is  : 
OrderedDict([('name', 'Nandini'), ('age', '19'), ('DOB', '2541997')])


In [98]:
S.age

'19'

## Deque (Doubly Ended Queue) 

is the optimized list for quicker append and pop operations from both sides of the container. It provides O(1) time complexity for append and pop operations as compared to list with O(n) time complexity.

### Inserting Elements

`append and appendleft`

In [139]:
from collections import deque

In [140]:
numbers_dq = deque(
              [10,20,30,40]
              )

In [141]:
numbers_dq

deque([10, 20, 30, 40])

In [142]:
# appends right 

numbers_dq.append(50)

print("Appending Right end : ",numbers_dq)

Appending Right end :  deque([10, 20, 30, 40, 50])


In [143]:
# appends left

numbers_dq.appendleft(60)

print("Appending left : ",numbers_dq)

Appending left :  deque([60, 10, 20, 30, 40, 50])


In [144]:
print(list(numbers_dq))

[60, 10, 20, 30, 40, 50]


### Removing Elements


`pop and popleft`

In [145]:
print("Numbers in deque : ",numbers_dq)

print('-'*50)

print("Popped element is : ",numbers_dq.pop())     # pop last element

print('-'*50)

print("After pop operation numbers in deque : ",numbers_dq)

Numbers in deque :  deque([60, 10, 20, 30, 40, 50])
--------------------------------------------------
Popped element is :  50
--------------------------------------------------
After pop operation numbers in deque :  deque([60, 10, 20, 30, 40])


In [146]:
print("Numbers in deque : ",numbers_dq)

print('*'*50)

print("Popped element is : ",numbers_dq.popleft())     # pop left or first element

print('*'*50)

print("After pop operation numbers in deque : ",numbers_dq)

Numbers in deque :  deque([60, 10, 20, 30, 40])
**************************************************
Popped element is :  60
**************************************************
After pop operation numbers in deque :  deque([10, 20, 30, 40])


### List vs Deque time comparison

In [120]:
# using  deque and iterates over 10000000 items and popping.

!python3 -mtimeit -s 'import collections' -s 'items = range(10000000); base = [*items]' -s 'c = collections.deque(base)' 'c.pop()'

5000000 loops, best of 5: 55.5 nsec per loop


In [121]:
# using list and iterates over 10000000 items and popping.

!python3 -mtimeit -s 'import collections' -s 'items = range(10000000); base = [*items]' 'base.pop()'

5000000 loops, best of 5: 72.2 nsec per loop


    by using deque it takes 55 nsec,

    by using list it takes 72 nsec 

    which is huge difference when we do complex programs.

    So clearly we can say that deque time complexity is much lesser than lists.

## UserString


It is used when someone wants to create their own strings with some modified or additional functionality. 

In [158]:
from collections import UserString

class MyString(UserString):

  def append(self, s):
    self.data += s

  def replace(self,old,new):
    self.data = self.data.replace(old,new)
    print(f'replacing.... {old} with {new}')
    self.show()
  
  def show(self):
    print('Data : ',self.data)
    print('-'*50)


obj = MyString('surya')

print("appending....")

obj.append('gokul')

obj.show()

obj.replace('y','i')

appending....
Data :  suryagokul
--------------------------------------------------
replacing....y with i
Data :  suriagokul
--------------------------------------------------


## UserList

In [163]:
from collections import UserList

In [181]:
class MyList(UserList):
  def delete(self):                 # custom function
    self.data = []

  def show(self):
    return self.data


l = [1,2,3,4,5]

obj_l = MyList(l)

lst = obj_l.show()
print("Before Deletion elements are : ",lst)

obj_l.delete()

data_del = obj_l.show()

print(f'After Deletion elements are {data_del}')


Before Deletion elements are :  [1, 2, 3, 4, 5]
After Deletion elements are []


In [187]:
class MyList(UserList):

  def append(self,elem):                                           # modified append method
    if int(elem) <= 10:
      raise RuntimeError('Cannot append into the list..wants more bigger number mann')
    else:
      self.data[-1] = int(elem)

  def show(self):
    print(self.data)

l = [1,2,3,4,5]

obj_l = MyList(l)

obj_l.show()

obj_l.append(input(">> enter a no "))

obj_l.show()

[1, 2, 3, 4, 5]
>> enter a no 10


RuntimeError: ignored

# python itertools


      1. product

      2. permutations

      3. combinations

      4. accumulate

      5. groupby

      6. infinite iterators.

## product 


This tool computes the cartesian product of input iterables. To compute the product of an iterable with itself, we use the optional repeat keyword argument to specify the number of repetitions. The output of this function are tuples in sorted order.

In [188]:
from itertools import product

In [193]:
print("The cartesian product using repeat:") 

list(product([0,1], repeat=2))            # utmost one element can repeat 2 times i.e 1 as (1,1) upto 2 as (2,2)

The cartesian product using repeat:


[(0, 0), (0, 1), (1, 0), (1, 1)]

In [194]:
print("The cartesian product using repeat:") 

list(product([0,1], repeat=3))         # utmost one element can repeat 3 times i.e 1 as (1,1,1) upto 2 as (2,2,2)

The cartesian product using repeat:


[(0, 0, 0),
 (0, 0, 1),
 (0, 1, 0),
 (0, 1, 1),
 (1, 0, 0),
 (1, 0, 1),
 (1, 1, 0),
 (1, 1, 1)]

In [195]:
l1 = [0,1]

l2 = [2,3]

print(f'product of l1 and l2 is : {list(product(l1,l2))}')

product of l1 and l2 is : [(0, 2), (0, 3), (1, 2), (1, 3)]


In [199]:
print(f'product of l1 and l2 using is : \n {list(product(l1,l2, repeat=2))}')

product of l1 and l2 using repeat is : 
 [(0, 2, 0, 2), (0, 2, 0, 3), (0, 2, 1, 2), (0, 2, 1, 3), (0, 3, 0, 2), (0, 3, 0, 3), (0, 3, 1, 2), (0, 3, 1, 3), (1, 2, 0, 2), (1, 2, 0, 3), (1, 2, 1, 2), (1, 2, 1, 3), (1, 3, 0, 2), (1, 3, 0, 3), (1, 3, 1, 2), (1, 3, 1, 3)]


product vs zip

In [201]:
prod = list(product(l1,l2))

In [202]:
z = list(zip(l1,l2))

In [207]:
print(f"l1 = {l1} \nl2 = {l2}")

print()

print('product is : ', prod)

print('+'*60)

print('zip is : ', z)

l1 = [0, 1] 
l2 = [2, 3]

product is :  [(0, 2), (0, 3), (1, 2), (1, 3)]
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
zip is :  [(0, 2), (1, 3)]


`BY this we can clearly differentiate them.`

`product takes all the combintaions whereas`

`zip takes only vertical ones (columns) as shown above..`

In [208]:
list(product('AB'))

[('A',), ('B',)]

In [226]:
list(product(['12','gf']))

[('12',), ('gf',)]

## permutations


used to generate all possible permutations of an iterable. All elements are treated as unique based on their position and not their values. 

In [239]:
from itertools import permutations as pm

In [242]:
l = [1,2,3]

all_orderings = list(pm(l))       # gives all the orderings with length 3 

print(all_orderings)

[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]


In [244]:
orderings_with_length_two = list(pm(l, 2))  

print(orderings_with_length_two)

[(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]


In [246]:
m1 = [1,2]

m2 = [5,6]

list(pm([m1,m2])) 

[([1, 2], [5, 6]), ([5, 6], [1, 2])]

`treats m1 whole as a single element and m2 whole as another element.so it gives orderings as (m1,m2) and (m2,m1)..`


`But for product it takes elements in m1 and m2 as separate ones.so it gives orderings as (m1[0],m2[0]), (m1[0],m2[1]), (m1[1],m2[0]), (m1[1],m2[1]) ...`

In [238]:
list(permutations(range(3),2))

[(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]

## combinations


prints all the possible combinations(without replacement) of the container passed in arguments in the specified group size in sorted order.


`permutations vs combinations`

    Repetions                                 No repetions occurs

    ex 
    
    l = [1,2]                                 l = [1,2]

    permutations are (1,2),(2,1)              combinations are (1,2)
                                 

In [247]:
from itertools import combinations 

In [253]:
l = [1,2,3]

all_combs = list(combinations(l, 3))       # gives all the orderings  with length 3 and no repeated elements...

print(all_combs)

[(1, 2, 3)]


In [254]:
all_combs_len_two = list(combinations(l, 2))

print(all_combs_len_two)

[(1, 2), (1, 3), (2, 3)]


`there is no elements like (2,1), (3,1), (3,2) because they are already taken into count as (1,2), (1,3), (2,3).so not taken again...` 

### combinations with replacement


`Individual elements may repeat itself in combinations_with_replacement function.`

In [257]:
from itertools import combinations_with_replacement as cwr

In [260]:
l

[1, 2, 3]

In [259]:
list(cwr(l,2))

[(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]

## accumulate


takes two arguments one is iterable and second one is function.If doesn't provide any function, by default it takes as `sum` returns accumlated sums by default...

In [264]:
from itertools import accumulate

print(l)

acc_sum = list(accumulate(l))

print(f"After accumulation iterable is {acc_sum}")

[1, 2, 3]
After accumulation iterable is [1, 3, 6]


    l = [1,2,3]

    1
    1+2 = 3
    3+3 = 6

    [1,3,6]

In [280]:
import operator as op

acc_mul = list(accumulate(l, func=op.mul))

acc_mul

[1, 2, 6]

In [289]:
#help(op)   to get all operator functions

In [283]:
final_df = ['df1 ', 'df2 ', 'df30 ', 'df42 ']

acc_cat = list(accumulate(final_df, func=op.concat))

acc_cat

['df1 ', 'df1 df2 ', 'df1 df2 df30 ', 'df1 df2 df30 df42 ']

In [288]:
l1 = [1,80,2,9,95,25,3,150,42]

acc_max = list(accumulate(l1, func=max))

acc_max

[1, 80, 80, 80, 95, 95, 95, 150, 150]

# groupby


`takes two arguments (iterable, function) and returns key, value where key is boolean value if it is condition`

In [290]:
from itertools import groupby

In [295]:
age = [25,28,32,25,42,25,28,32,54]

for k,v in groupby(age, lambda x: x>32):
  print(k,list(v))

False [25, 28, 32, 25]
True [42]
False [25, 28, 32]
True [54]


`groupby persons age`

In [300]:
persons = [{'name':'sai', 'age':15},

           {'name':'kishan', 'age':15},
           
           {'name':'surya', 'age':35},

           {'name':'hitman', 'age':35},

           {'name':'hitman', 'age':29},

           {'name':'hitman', 'age':15},

           
           ]

In [301]:
for k,v in groupby(persons, lambda x: x['age']):
  print(k,list(v))

15 [{'name': 'sai', 'age': 15}, {'name': 'kishan', 'age': 15}]
35 [{'name': 'surya', 'age': 35}, {'name': 'hitman', 'age': 35}]
29 [{'name': 'hitman', 'age': 29}]
15 [{'name': 'hitman', 'age': 15}]


# Infinite iterators (count, cycle, repeat)

In [303]:
from itertools import count, cycle, repeat

In [304]:
for i in count(42):           # starts from 42 and loops infinetly doesn't stop
  print(i)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
50450
50451
50452
50453
50454
50455
50456
50457
50458
50459
50460
50461
50462
50463
50464
50465
50466
50467
50468
50469
50470
50471
50472
50473
50474
50475
50476
50477
50478
50479
50480
50481
50482
50483
50484
50485
50486
50487
50488
50489
50490
50491
50492
50493
50494
50495
50496
50497
50498
50499
50500
50501
50502
50503
50504
50505
50506
50507
50508
50509
50510
50511
50512
50513
50514
50515
50516
50517
50518
50519
50520
50521
50522
50523
50524
50525
50526
50527
50528
50529
50530
50531
50532
50533
50534
50535
50536
50537
50538
50539
50540
50541
50542
50543
50544
50545
50546
50547
50548
50549
50550
50551
50552
50553
50554
50555
50556
50557
50558
50559
50560
50561
50562
50563
50564
50565
50566
50567
50568
50569
50570
50571
50572
50573
50574
50575
50576
50577
50578
50579
50580
50581
50582
50583
50584
50585
50586
50587
50588
50589
50590
50591
50592
50593
50594
50595
50596
50597
50598
50599
50600
50601
50602
50603
50604
50605

KeyboardInterrupt: ignored

In [308]:
a = [1,2,3]

# cycle means 1,2,3,1,2,3,1,2,3

#infinitely prints these values

cnt = 0

for i in cycle(a):
  print(i)
  if cnt==10:
    break
  cnt+=1


1
2
3
1
2
3
1
2
3
1
2


In [310]:
for i in repeat(a,50):   # repeat list upto 50 times
  print(i)

[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3]
