<a href="https://colab.research.google.com/github/shrishtinigam/intermediate-python/blob/main/01_Comprehensions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# List Comprehnsions

 * A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result is a new list resulting from evaluating the expression in the context of the for and if clauses which follow it.
 * By making code shorter and more concise, you make it more readable without
compromising efficiency.


```py
[ expression + context ]

OR

[expression for item in iterable if condition]
```




### How List Comprehensions Work
Under the hood, list comprehensions are a more readable and succinct way to write for loops. They are `syntactic sugar` for the following for loop:

In [2]:
squares = [x**2 for x in range(10)]
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [3]:
squares = []
for x in range(10):
    squares.append(x**2)

## Different Ways to Use List Comprehensions

#### Simple List Comprehension

In [9]:
squares = [x**2 for x in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [42]:
lower_case = [x.lower() for x in ['I', 'AM', 'NOT', 'SHOUTING']]
lower_case

['i', 'am', 'not', 'shouting']


In [45]:
# On a dictionary
employees = {'Alice' : 100000,
             'Bob' : 99817,
             'Carol' : 122908,
             'Frank' : 88123,
             'Eve' : 93121}
## One-Liner
top_earners = [(k, v) for k, v in employees.items() if v >= 100000]
top_earners

[('Alice', 100000), ('Carol', 122908)]

#### List Comprehension with Conditionals

In [10]:
evens = [x for x in range(10) if x % 2 == 0]
evens

[0, 2, 4, 6, 8]

#### Nested List Comprehensions
* Useful for multidimensional data structures, such as matrices.

In [11]:
matrix = [[i+j for j in range(3)] for i in range(3)]
matrix

[[0, 1, 2], [1, 2, 3], [2, 3, 4]]

In [12]:
# Translation of above list comprehension
matrix = []
for i in range(3):
    inner_list = []
    for j in range(3):
        inner_list.append(i + j)
    matrix.append(inner_list)

print(matrix)

[[0, 1, 2], [1, 2, 3], [2, 3, 4]]


#### List Comprehension with Multiple for Clauses
* Best for generating combinations or Cartesian products.

In [13]:
pairs = [(i, j) for i in range(3) for j in range(3)]
pairs

[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

In [14]:
# Translation of above list comprehension
pairs = []
for i in range(3):
    for j in range(3):
        pairs.append((i, j))

print(pairs)

[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]


#### Using Functions in List Comprehensions

In [16]:
def square(x):
    return x**2

squares = [square(x) for x in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

## Common Mistakes and Pitfalls

### Misunderstanding Scope
* Variables defined in the list comprehension are local to the comprehension.

In [17]:
x = 10
squares = [x**2 for x in range(10)]
print(x)  # Output: 10 (not modified)

10


### Using Complex Expressions
* List comprehensions are meant for readability. Complex expressions can make them hard to read.

In [None]:
# Hard to read
result = [(x, y) for x in range(3) for y in range(3) if x != y]

# More readable
result = []
for x in range(3):
    for y in range(3):
        if x != y:
            result.append((x, y))

### Memory Issues [!IMP]

* List comprehensions can be very memory-efficient for small to moderately sized datasets.
* However, they can lead to significant memory usage problems when dealing with large datasets.
* List comprehensions evaluate the expression and generate the entire list in memory immediately.
* I.e., list comprehensions create an **entire list in memory all at once**.




In [None]:
squares = [x**2 for x in range(10000000)]
squares

#### Generators as a Memory-Efficient Alternative
* Generators are a better alternative when working with large datasets.
* They generate items one at a time and do not store the entire list in memory, thus reducing memory usage.
* Instead of a list comprehension, use a generator expression by replacing square brackets [] with parentheses ():

In [36]:
squares_gen = (x**2 for x in range(10000000))

for i in range(5):
  try:
      value = next(squares_gen)
      print(value)
  except StopIteration:
      break

0
1
4
9
16


Comparing Memory Usage: List Comprehension vs. Generator


In [37]:
import sys
squares = [x**2 for x in range(10000000)]
print(sys.getsizeof(squares))

squares_gen = (x**2 for x in range(10000000))
print(sys.getsizeof(squares_gen))

89095160
104


#### Best Practices to Mitigate Memory Issues
1. Use Generators for Large Datasets:
* Prefer generator expressions for large datasets to save memory.

2. Avoid Storing Large Intermediate Results:
* Break down the problem to avoid generating large intermediate results. Process data in smaller chunks if possible.

3. Use Built-in Functions Efficiently:
* Functions like map() and filter() return iterators, which can be more memory-efficient than list comprehensions.

In [None]:
squares_map = map(lambda x: x**2, range(10000000))

print(sys.getsizeof(squares_map))

for i in range(5):
  try:
      value = next(squares_map)
      print(value)
  except StopIteration:
      break

4. Chunk Processing:
* If you must use list comprehensions, consider processing data in chunks and combining results incrementally.

In [None]:
def process_chunks(chunk_size):
    for start in range(0, 10000000, chunk_size):
        chunk = [x**2 for x in range(start, start + chunk_size)]
        # Process the chunk
        print(f"Processed chunk starting at {start}")

process_chunks(100000)

## How to use Comprehensions Efficiently

### Use Generators for Large Datasets
As explained in a section above, instead of list comprehensions, use generator expressions to save memory.

In [23]:
# List comprehension
large_list = [x**2 for x in range(10000000)]
print(large_list[:5])

# Generator expression
large_gen = (x**2 for x in range(10000000))
for i in range(5):
  try:
      value = next(large_gen)
      print(value)
  except StopIteration:
      break

[0, 1, 4, 9, 16]
0
1
4
9
16


### Optimize Conditions
Place the most restrictive conditions first to minimize iterations.

In [24]:
# Less efficient
result = [x for x in range(1000) if x % 2 == 0 and x > 500]
print(result)

# More efficient
result = [x for x in range(501, 1000) if x % 2 == 0]
print(result)

### Avoid Nested Comprehensions for Deep Nesting

Deeply nested comprehensions can be hard to read and maintain.

In [20]:
# Hard to read
result = [[[y for y in range(x)] for x in range(10)] for i in range(5)]
print(f"Nested Comprehensions: {result}")

# More readable with loops
result = []
for x in range(10):
    inner_list = []
    for y in range(x):
        inner_list.append(y)
        for i in range(5):
            result.append(inner_list)
    result.append(inner_list)
print(f"For loop: {result}")

Nested Comprehensions: [[[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]], [[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]], [[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]], [[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]], [[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]]]
For loop: [[], [0], [0], [0], [0], [0], [0], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 

## Advanced Usage

#### Dictionary Comprehensions

In [28]:
squares = {x: x**2 for x in range(-5, 5)}
squares

{-5: 25, -4: 16, -3: 9, -2: 4, -1: 1, 0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

#### Set Comprehensions

In [29]:
unique_squares = {x**2 for x in range(-5, 5)}
unique_squares

{0, 1, 4, 9, 16, 25}

#### Comprehensions with `zip`

* Combining elements from multiple iterables.

In [34]:
# What is zip?
"""
The zip function in Python is used to combine multiple iterables (e.g., lists, tuples) into a single
iterable of tuples. Each tuple contains one element from each of the input iterables. The zip function
stops when the shortest input iterable is exhausted.
"""
list1 = [1, 2, 3]
list2 = ['a', 'b', 'c']
combined = zip(list1, list2)

# Convert the zip object to a list to see the combined pairs
combined_list = list(combined)
print(combined_list)

[(1, 'a'), (2, 'b'), (3, 'c')]


In [35]:
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]

combined = {name: age for name, age in zip(names, ages)}
combined

{'Alice': 25, 'Bob': 30, 'Charlie': 35}

## Try it out yourself!

Solve these questions from Python One Liners by Christian Mayer to test what we learnt!

###  FIND WORDS WITH HIGH INFORMATION VALUE

Our goal is to solve the following problem: given a multiline
string, create a list of lists—each consisting of all the words in
a line that have more than three characters.

In [47]:
text = '''
Call me Ishmael. Some years ago - never mind how long precisely - having
little or no money in my purse, and nothing particular to interest me
on shore, I thought I would sail about a little and see the watery part
of the world. It is a way I have of driving off the spleen, and regulating
the circulation. - Moby Dick'''

#### Answer:

In [48]:
high_info_words = [[word for word in sentence.split() if len(word)>3] for sentence in text.split('.')]
high_info_words

[['Call', 'Ishmael'],
 ['Some',
  'years',
  'never',
  'mind',
  'long',
  'precisely',
  'having',
  'little',
  'money',
  'purse,',
  'nothing',
  'particular',
  'interest',
  'shore,',
  'thought',
  'would',
  'sail',
  'about',
  'little',
  'watery',
  'part',
  'world'],
 ['have', 'driving', 'spleen,', 'regulating', 'circulation'],
 ['Moby', 'Dick']]

### READING A FILE

Our goal is to open a file, read all lines, strip the leading and trailing whitespace characters, and store the result in a list.

Hint: Run the below code to create an example.txt file




In [49]:
import os

lorem_ipsum = """Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat."""

current_directory = os.getcwd()
file_path = os.path.join(current_directory, "example.txt")

with open(file_path, "w") as file:
    file.write(lorem_ipsum)

In [52]:
my_file = [line.strip() for line in open('example.txt')]
my_file

['Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
 'Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.',
 'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris',
 'nisi ut aliquip ex ea commodo consequat.']

### USING LAMBDA AND MAP FUNCTIONS

The map() function then applies the function f on
each element in the sequence s. f can be a lambda function.

You can check whether a string x contains substring y by using the expression `y in x`

When given a list of strings, our next one-liner
creates a new list of tuples, each consisting of a Boolean value and the original string. The Boolean value indicates whether the string 'anonymous' appears in the original string! We call the
resulting list mark because the Boolean values mark the string elements in the list that contain the string 'anonymous'.


In [54]:
txt = ['lambda functions are anonymous functions.',
       'anonymous functions dont have a name.',
       'functions are objects in Python.']

In [64]:
mark = map(lambda sentence: ('anon' in sentence, sentence), txt)
print(list(mark))

# OR

mark = map(lambda s: (True, s) if 'anonymous' in s else (False, s), txt)
print(list(mark))

[(True, 'lambda functions are anonymous functions.'), (True, 'anonymous functions dont have a name.'), (False, 'functions are objects in Python.')]
[(True, 'lambda functions are anonymous functions.'), (True, 'anonymous functions dont have a name.'), (False, 'functions are objects in Python.')]


### USING SLICING TO EXTRACT MATCHING SUBSTRING ENVIRONMENTS
Slicing syntax: `x[start:stop:step]`

Our goal is to find a particular text query within a multiline
string. You want to find the query in the text and return its
immediate environment, up to 18 positions around the found
query

### COMBINING LIST COMPREHENSION AND SLICING

### USING SLICE ASSIGNMENT TO CORRECT CORRUPTED LISTS

### ANALYZING CARDIAC HEALTH DATA WITH LIST CONCATENATION

### USING GENERATOR EXPRESSIONS TO FIND COMPANIES THAT PAY BELOW MINIMUM WAGE

### FORMATTING DATABASES WITH THE ZIP() FUNCTION