<div class="alert alert-block alert-success">

## Topics Covered in this Notebook 
- **Lambda Functions**
- **Map and Filter**
- **Iterators and Generators**
    
</div>

In [2]:
import pandas as pd
import numpy as np
from datetime import date, timedelta

##  Lambda Functions, Map and Filter

<div class="alert alert-block alert-success"> 
    
**What are Lambda Functions???**

In Python, lambda is mostly just used as a shorter syntax for creating functions. Usually it's used for short, throw-away functions.
    
</div>

Example of Lambda Function with single parameter

In [19]:
my_lambda_function = lambda x: x**2 + ((2*x)-2) 
print(my_lambda_function(5))

33


Example of Lambda Function with multiple parameters

In [20]:
my_lambda_function = lambda x,y: x*y + ((2*x)-2) 
print(my_lambda_function(5,10))

58


Usual function definition in Python

In [18]:
def my_normal_function(x):
    return x**2 + ((2*x)-2)

my_normal_function(5)

33

<div class="alert alert-block alert-success"> 
    
**Where are lambda functions useful and how to correctly use Lambda Functions???**  
Lambda Functions are more useful in combination with filter, map, dictionaries, list_comprehensions etc
    
</div>

Example Problem: Sorting a list based on some rules using a lambda function  
We need to sort the given list starting from the second letter in each string

In [23]:
my_list = ['John', 'Rupert', 'Lilly', 'Jack', 'Brad', 'Jill']

print(sorted(my_list), end="\n\n")


### sort the list based on the second and third elements
#print(sorted(my_list, key=lambda x:x[1]))

print(sorted(my_list, key=lambda x:x[1:]))

['Brad', 'Jack', 'Jill', 'John', 'Lilly', 'Rupert']

['Jack', 'Jill', 'Lilly', 'John', 'Brad', 'Rupert']


In [22]:
help(sorted)

Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.



<div class="alert alert-block alert-success"> 

##### Map, Filter and Using Lambda Expressions with them
    
</div>    

**Examples of *map***

In [31]:
### Map
my_string = '1-2-3-4-5-6'

print(my_string.split('-'))

print(list(map(int, my_string.split('-'))))

['1', '2', '3', '4', '5', '6']
[1, 2, 3, 4, 5, 6]


Combining **map** and **lambda**

In [32]:
list(map(lambda x: int(x) + 100, my_string.split('-')))

[101, 102, 103, 104, 105, 106]

Using **if else conditions** with lambda functions

Example: Split the given string into seprate elements and convert the numbers to integers 

In [10]:
my_string = '1-a-2-b-3-c'

print(my_string.split('-'))

### map is lazily evaluated
print(map(lambda x: int(x), my_string.split('-')))

list(map(lambda x: int(x[1]) if x[0]%2 == 0 else x[1], enumerate(my_string.split('-'))))

#list(map(lambda x: int(x), my_string.split('-')))

['1', 'a', '2', 'b', '3', 'c']
<map object at 0x000001D13A7F8208>


[1, 'a', 2, 'b', 3, 'c']

**map is lazily evaluated**  
This means that it will return a map object and not the result unless it it iterated through it using a loop or converted into an iterable like a list

In [11]:
print(map(lambda x: x, my_string.split('-')))

for i in map(lambda x: x, my_string.split('-')):
    print(i)

<map object at 0x000001D13A7F8A48>
1
a
2
b
3
c


**Examples of *filter*** 

Works in a similar fashion to map but filters out the condition that is satisfied

In [33]:
list(filter(lambda n: n % 2, range(10)))

[1, 3, 5, 7, 9]

In [35]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

**Further use of Lambda Functions**  

You will find lambda functions to be quite useful with `apply` and more complex `groupby` operations while working with pandas dataframes

A sample use case has been shown below

In [36]:
my_df = pd.DataFrame({
    'ColA': [1,1,1,2,2,2,3,3,3],
    'ColB': ['c','b','a', 'e', 'd', 'f', 'i', 'h', 'g']
})

my_df

Unnamed: 0,ColA,ColB
0,1,c
1,1,b
2,1,a
3,2,e
4,2,d
5,2,f
6,3,i
7,3,h
8,3,g


**Lambda used with groupby**

In [14]:
result_df = my_df.groupby('ColA').agg({
    'ColB': lambda x: sorted(list(x))
}).reset_index()

result_df

Unnamed: 0,ColA,ColB
0,1,"[a, b, c]"
1,2,"[d, e, f]"
2,3,"[g, h, i]"


In [15]:
result_df.to_dict(orient='list')

{'ColA': [1, 2, 3],
 'ColB': [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]}

**Lambda used with apply**

In [16]:
### With Apply
result_df['ColC'] = result_df['ColB'].apply(lambda x: ', '.join(x))
result_df

Unnamed: 0,ColA,ColB,ColC
0,1,"[a, b, c]","a, b, c"
1,2,"[d, e, f]","d, e, f"
2,3,"[g, h, i]","g, h, i"


<div class="alert alert-block alert-info"> 
<b>
TODO: For the dataframe given below get the Top 2 Months in each Year based on Value  
(use lambda, groupby and apply)
</b>
</div>    

In [17]:
my_df = pd.DataFrame({
    'Value': [1,5,1,2,2,2,3,5,10],
    'Month': ['Jan','Feb','Mar','Apr','May','Jan','Feb','Mar','Apr'],
    'Year': ['2014','2014','2014', '2014', '2014', '2015', '2015', '2015', '2015']
})

my_df

Unnamed: 0,Value,Month,Year
0,1,Jan,2014
1,5,Feb,2014
2,1,Mar,2014
3,2,Apr,2014
4,2,May,2014
5,2,Jan,2015
6,3,Feb,2015
7,5,Mar,2015
8,10,Apr,2015


## Intro to Iterators and Generators

Prerequisite: Object Oriented Programming

<div class="alert alert-block alert-success">
    
<a href="https://wiki.python.org/moin/Iterator">**Iterators:**</a>  
Iterators are special kinds of objects with some must have methods that can be implemented in python 

- Iterator are objects which uses `__next__` method to get next value of sequence.  
- They also have a `__iter__` method that returns itself (iterator).  
- They need to have a `StopIteration` exception that signifies the ending of the iterator

Whenever you use a `for loop`, or `map`, or `a list comprehension`, etc. in Python, the `next` method is called automatically to get each item from the iterator, thus going through the process of iteration
    
</div>

In [152]:
help(map)

Help on class map in module builtins:

class map(object)
 |  map(func, *iterables) --> map object
 |  
 |  Make an iterator that computes the function using arguments from
 |  each of the iterables.  Stops when the shortest iterable is exhausted.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.



#### Basic Overview of Exception Handling 
Useful for catching specific errors and defining how to handle those errors

In the following example `IndexError` is called an exception

In [78]:
my_list = [1,2,3]
my_list[4]

IndexError: list index out of range

In [82]:
try:
    ### code to try running
    a = my_list[3]

except IndexError as e:
    ### what to do when it throws error
    print("Code Failed With Error: ", str(e))
else:
    ### what to do when there is no error (this block is not mandatory)
    print(my_list)

Code Failed With Error:  list index out of range


**Main reason to user Iterators: Saves memory**

Iterators don’t compute the value of each item when instantiated. They only compute it when you ask for it. This is known as lazy evaluation.  

This behavior of only returning the next element when asked to has two main advantages:  
- Iterators need less space in memory. They remember the last value and a rule to get to the next value instead of memorizing every single element of a (potentially very long) sequence.
- Iterators don’t check how long the sequence they produce might get. For instance, they don’t need to know how many lines a file has or how many files are in a folder to iterate through them.

How to create an Iterator object manually

<a href="https://stackoverflow.com/a/8689983/6267086"> Method naming conventions

In [59]:
class Squares():
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop
        
    def __iter__(self): 
        return self
    
    def __next__(self):
        if self.start >= self.stop:
            raise StopIteration
        current = self.start * self.start
        self.start += 1
        return current

iterator = Squares(2, 10)

In [60]:
for i in iterator:
    print(i)

4
9
16
25
36
49
64
81


<a href="https://wiki.python.org/moin/Generators">**Generators**</a>  

A generator is a simplified way to build iterators. Instead of implementing the iteration protocol (those methods mentioned above), generators are functions that return values with a `yield` statement. The difference between yield and return is that yield keep track of their local variables. Every time a generator reaches yield, it returns a value. It then remembers the current state of variables inside the function and waits for the next call (This is demonstrated below with an example)

Some common use-cases of Generator in python are for **building data loading and processing pipelines** for huge data that does not fit into memory, Generating infinite sequences   
It's very widely used. One example is in the <a href='https://github.com/scikit-learn/scikit-learn/blob/f0ab589f/sklearn/model_selection/_split.py#L357'>KFold Cross Validation in Scikit-Learn.</a> 


In [206]:
help((i for i in range(10)))

Help on generator object:

<genexpr> = class generator(object)
 |  Methods defined here:
 |  
 |  __del__(...)
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  close(...)
 |      close() -> raise GeneratorExit inside generator.
 |  
 |  send(...)
 |      send(arg) -> send 'arg' into generator,
 |      return next yielded value or raise StopIteration.
 |  
 |  throw(...)
 |      throw(typ[,val[,tb]]) -> raise exception in generator,
 |      return next yielded value or raise StopIteration.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  gi_code
 |  
 |  gi_frame
 |  
 |  gi_running
 |  
 |  gi_yieldfrom
 |      object being iterated by yield from, or None



**Example of Generator**

In [84]:
def squares(start, stop):
    for i in range(start, stop):
        yield i * i

generator = squares(2, 10)

In [219]:
for i in generator:
    print(i)

4
9
16
25
36
49
64
81


**How Yield Works**

In [88]:
def foo(x):
    print("Start")
    for i in range(x):
        print("Before yield", i)
        yield i
        print("After yield", i)
    print("End")

In [89]:
f = foo(3)

In [90]:
for i in f:
    print(i, end="\n\n")

Start
Before yield 0
0

After yield 0
Before yield 1
1

After yield 1
Before yield 2
2

After yield 2
End


**Looping through infinite lists using Generators**

In [362]:
def date_iterator(start, days):
    while True:
        yield {'start': str(start), 'end': str(start + timedelta(days=days - 1))}
        start += timedelta(days=days)

cycles = date_iterator(date(2012, 1, 12), 28)

print(cycles.__next__())

print(cycles.__next__())

print(cycles.__next__())

{'start': '2012-01-12', 'end': '2012-02-08'}
{'start': '2012-02-09', 'end': '2012-03-07'}
{'start': '2012-03-08', 'end': '2012-04-04'}


**Creating Data Pipelines using Generators**

In [94]:
! dir ..\Data\

 Volume in drive F is Storage Space
 Volume Serial Number is EE4B-777B

 Directory of F:\TrainingProgram\PythonAdvanced\Data

02/28/2021  02:37 PM    <DIR>          .
02/28/2021  02:37 PM    <DIR>          ..
02/28/2021  12:41 PM         2,633,502 Retail_Data_Transactions.csv
02/28/2021  12:47 PM           559,185 Retail_Data_Transactions_2011.csv
02/28/2021  12:47 PM           880,854 Retail_Data_Transactions_2012.csv
02/28/2021  12:47 PM           890,643 Retail_Data_Transactions_2013.csv
02/28/2021  12:47 PM           872,385 Retail_Data_Transactions_2014.csv
02/28/2021  12:47 PM           180,604 Retail_Data_Transactions_2015.csv
02/28/2021  02:37 PM            93,536 TechCrunchcontinentalUSA.csv
               7 File(s)      6,110,709 bytes
               2 Dir(s)  69,166,202,880 bytes free


<b>Example 1: Reading from multiple files (retail transaction data) and summing up one variable by year

In [5]:
pd.read_csv('python-concurrency/data/Retail_Data_Transactions_2011.csv').head(3)

Unnamed: 0,customer_id,trans_date,tran_amount,year
0,CS1217,2011-11-16,99,2011
1,CS4102,2011-07-09,96,2011
2,CS3510,2011-10-24,81,2011


In [100]:
### Creating a Pipeline for reading multiple files and performing operations

def load(start, stop):
    for i in range(start, stop):
        for r in open('python-concurrency/data/Retail_Data_Transactions_'+str(i)+'.csv', 'r'):
            yield r
            
load_data = load(2011, 2016)
sum_values = dict(zip(range(2011,2016), [0,0,0,0,0]))

for i in load_data:
    try:
        sum_values[int(i.split(',')[3])] += int(i.split(',')[2])
    except ValueError:
        pass

sum_values

{2011: 1340339, 2012: 2116599, 2013: 2137368, 2014: 2094508, 2015: 435175}

<b>Example 2: Calculating some value based on some conditions (Series A funding amount) from a very large file that does not fit into memory

In [3]:
pd.read_csv("python-concurrency/data/TechCrunchcontinentalUSA.csv").head(3)

Unnamed: 0,permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round
0,lifelock,LifeLock,,web,Tempe,AZ,1-May-07,6850000,USD,b
1,lifelock,LifeLock,,web,Tempe,AZ,1-Oct-06,6000000,USD,a
2,lifelock,LifeLock,,web,Tempe,AZ,1-Jan-08,25000000,USD,c


In [4]:
### Generator Expressions for Creating a Data Pipeline
file_name = "python-concurrency/data/TechCrunchcontinentalUSA.csv"

lines = (line for line in open(file_name)) ### load the file line by line

list_line = (s.rstrip().split(",") for s in lines) ### csv format

cols = next(list_line) ### first row is columns

company_dicts = (dict(zip(cols, data)) for data in list_line) ### create a dictionary

funding = (int(company_dict["raisedAmt"]) for company_dict in company_dicts 
           if company_dict["round"] == "a") ### select relevant rows based on condition

total_series_a = sum(funding) ### take sum

print(f"Total series A fundraising: ${total_series_a}")

Total series A fundraising: $4376015000
