# Comprehensions

## List Comprehension

Python is famous for allowing you to write code that’s elegant, easy to write, and almost as easy to read as plain English. One of the language’s most distinctive features is the list comprehension, which you can use to create powerful functionality within a single line of code. However, many developers struggle to fully leverage the more advanced features of a list comprehension in Python. Some programmers even use them too much, which can lead to code that’s less efficient and harder to read.

By the end of this tutorial, you’ll understand the full power of Python list comprehensions and how to use their features comfortably. You’ll also gain an understanding of the trade-offs that come with using them so that you can determine when other approaches are more preferable.

### How to Create Lists in Python

There are a few different ways you can create lists in Python. To better understand the trade-offs of using a list comprehension in Python, let’s first see how to create lists with these approaches.

#### Using for Loops

The most common type of loop is the for loop. You can use a for loop to create a list of elements in three steps:
- Instantiate an empty list.
- Loop over an iterable or range of elements.
- Append each element to the end of the list.

If you want to create a list containing the first ten perfect squares, then you can complete these steps in three lines of code:

In [1]:
squares = []
for i in range(10):
    squares.append(i * i)

print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


Here, you instantiate an empty list, squares. Then, you use a for loop to iterate over range(10). Finally, you multiply each number by itself and append the result to the end of the list.

#### Using map() Objects

map() provides an alternative approach that’s based in functional programming. You pass in a function and an iterable, and map() will create an object. This object contains the output you would get from running each iterable element through the supplied function.

As an example, consider a situation in which you need to calculate the price after tax for a list of transactions:

In [2]:
txns = [1.09, 23.56, 57.84, 4.56, 6.78]
TAX_RATE = 0.08

def get_price_with_tax(txn):
    return txn * (1 + TAX_RATE)

In [3]:
final_prices = map(get_price_with_tax, txns)

In [4]:
final_prices = list(final_prices)

In [5]:
final_prices

[1.1772000000000002, 25.4448, 62.467200000000005, 4.9248, 7.322400000000001]

Here, you have an iterable txns and a function get_price_with_tax(). You pass both of these arguments to map(), and store the resulting object in final_prices. You can easily convert this map object into a list using list().

Zaokrožimo vrednosti na dve decimalke:

In [6]:
rounded = []
for price in final_prices:
    rounded.append(round(price, 2))

print(rounded)

[1.18, 25.44, 62.47, 4.92, 7.32]


> Note: The behavior of round() for floats can be surprising: for example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This is not a bug: it’s a result of the fact that most decimal fractions can’t be represented exactly as a float. See Floating Point Arithmetic: Issues and Limitations for more information.
- [Floating Point Arithmetic: Issues and Limitations](https://docs.python.org/3.8/tutorial/floatingpoint.html#tut-fp-issues)
- [Decimal fixed point and floating point arithmetic](https://docs.python.org/3.8/library/decimal.html)

#### Using List Comprehensions

List comprehensions are a third way of making lists. With this elegant approach, you could rewrite the for loop from the first example in just a single line of code:

    squares = []
    for i in range(10):
        squares.append(i * i)

    print(squares)

In [7]:
squares = [i * i for i in range(10)]

In [8]:
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Rather than creating an empty list and adding each element to the end, you simply define the list and its contents at the same time by following this format:

    new_list = [expression for member in iterable]

Every list comprehension in Python includes three elements:
- expression is the member itself, a call to a method, or any other valid expression that returns a value. In the example above, the expression i * i is the square of the member value.
- member is the object or value in the list or iterable. In the example above, the member value is i.
- iterable is a list, set, sequence, generator, or any other object that can return its elements one at a time. In the example above, the iterable is range(10).

Because the expression requirement is so flexible, a list comprehension in Python works well in many places where you would use map(). You can rewrite the pricing example with its own list comprehension:

In [9]:
txns = [1.09, 23.56, 57.84, 4.56, 6.78]
TAX_RATE = .08
def get_price_with_tax(txn):
    return txn * (1 + TAX_RATE)
    #return round(txn * (1 + TAX_RATE),2)

In [10]:
final_prices = [round(get_price_with_tax(i), 2) for i in txns] 
print(final_prices)

[1.18, 25.44, 62.47, 4.92, 7.32]


The only distinction between this implementation and map() is that the list comprehension in Python returns a list, not a map object.

### Benefits of Using List Comprehensions

List comprehensions are often described as being more Pythonic than loops or map(). But rather than blindly accepting that assessment, it’s worth it to understand the benefits of using a list comprehension in Python when compared to the alternatives. Later on, you’ll learn about a few scenarios where the alternatives are a better choice.

One main benefit of using a list comprehension in Python is that it’s a single tool that you can use in many different situations. In addition to standard list creation, list comprehensions can also be used for mapping and filtering. You don’t have to use a different approach for each scenario.

This is the main reason why list comprehensions are considered Pythonic, as Python embraces simple, powerful tools that you can use in a wide variety of situations. As an added side benefit, whenever you use a list comprehension in Python, you won’t need to remember the proper order of arguments like you would when you call map().

List comprehensions are also more declarative than loops, which means they’re easier to read and understand. Loops require you to focus on how the list is created. You have to manually create an empty list, loop over the elements, and add each of them to the end of the list. With a list comprehension in Python, you can instead focus on what you want to go in the list and trust that Python will take care of how the list construction takes place.

### How to Supercharge Your Comprehensions

In order to understand the full value that list comprehensions can provide, it’s helpful to understand their range of possible functionality. You’ll also want to understand the changes that are coming to the list comprehension in Python 3.8.

#### Using Conditional Logic

Earlier, you saw this formula for how to create list comprehensions:

    new_list = [expression for member in iterable]

While this formula is accurate, it’s also a bit incomplete. A more complete description of the comprehension formula adds support for optional conditionals. The most common way to add conditional logic to a list comprehension is to add a conditional to the end of the expression:

    new_list = [expression for member in iterable (if conditional)]

Here, your conditional statement comes just before the closing bracket.

Conditionals are important because they allow list comprehensions to filter out unwanted values, which would normally require a call to filter():

In [11]:
sentence = 'the rocket came back from mars'

vowels = [i for i in sentence if i in 'aeiou']

print(vowels)

['e', 'o', 'e', 'a', 'e', 'a', 'o', 'a']


In this code block, the conditional statement filters out any characters in sentence that aren’t a vowel.

The conditional can test any valid expression. If you need a more complex filter, then you can even move the conditional logic to a separate function:

In [12]:
sentence = 'The rocket, who was named Ted, came back \
from Mars because he missed his friends.'

In [13]:
def is_consonant(letter):
    vowels = 'aeiou'
    return letter.isalpha() and letter.lower() not in vowels

In [14]:
consonants = [i.lower() for i in sentence if is_consonant(i)]

In [15]:
print(consonants)

['t', 'h', 'r', 'c', 'k', 't', 'w', 'h', 'w', 's', 'n', 'm', 'd', 't', 'd', 'c', 'm', 'b', 'c', 'k', 'f', 'r', 'm', 'm', 'r', 's', 'b', 'c', 's', 'h', 'm', 's', 's', 'd', 'h', 's', 'f', 'r', 'n', 'd', 's']


You can place the conditional at the end of the statement for simple filtering, but what if you want to change a member value instead of filtering it out? In this case, it’s useful to place the conditional near the beginning of the expression:

    new_list = [expression (if conditional) for member in iterable]

With this formula, you can use conditional logic to select from multiple possible output options. For example, if you have a list of prices, then you may want to replace negative prices with 0 and leave the positive values unchanged:

In [16]:
original_prices = [1.25, -9.45, 10.22, 3.78, -5.92, 1.16]
prices = [i if i > 0 else 0 for i in original_prices]
print(prices)

[1.25, 0, 10.22, 3.78, 0, 1.16]


Here, your expression i contains a conditional statement, if i > 0 else 0. This tells Python to output the value of i if the number is positive, but to change i to 0 if the number is negative. If this seems overwhelming, then it may be helpful to view the conditional logic as its own function:

In [17]:
def get_price(price): 
    return price if price > 0 else 0

prices = [get_price(i) for i in original_prices]

In [18]:
print(prices)

[1.25, 0, 10.22, 3.78, 0, 1.16]


Now, your conditional statement is contained within get_price(), and you can use it as part of your list comprehension expression.

#### Using Set and Dictionary Comprehensions

While the list comprehension in Python is a common tool, you can also create set and dictionary comprehensions. A set comprehension is almost exactly the same as a list comprehension in Python. The difference is that set comprehensions make sure the output contains no duplicates. You can create a set comprehension by using curly braces instead of brackets:

In [19]:
quote = "life, uh, finds a way"
unique_vowels = {i for i in quote if i in 'aeiou'}

print(unique_vowels)

{'i', 'a', 'e', 'u'}


Your set comprehension outputs all the unique vowels it found in quote. Unlike lists, sets don’t guarantee that items will be saved in any particular order. This is why the first member of the set is a, even though the first vowel in quote is i.

Dictionary comprehensions are similar, with the additional requirement of defining a key:



In [20]:
squares = {i: i * i for i in range(10)}
print(squares)

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}


To create the squares dictionary, you use curly braces ({}) as well as a key-value pair (i: i * i) in your expression.

### When Not to Use a List Comprehension in Python

List comprehensions are useful and can help you write elegant code that’s easy to read and debug, but they’re not the right choice for all circumstances. They might make your code run more slowly or use more memory. If your code is less performant or harder to understand, then it’s probably better to choose an alternative.

#### Watch Out for Nested Comprehensions

Comprehensions can be nested to create combinations of lists, dictionaries, and sets within a collection. For example, say a climate laboratory is tracking the high temperature in five different cities for the first week of June. The perfect data structure for storing this data could be a Python list comprehension nested within a dictionary comprehension:

In [21]:
cities = ['Austin', 'Tacoma', 'Topeka', 'Sacramento', 'Charlotte']

In [22]:
temps = {city: [0 for _ in range(7)] for city in cities}

In [23]:
temps

{'Austin': [0, 0, 0, 0, 0, 0, 0],
 'Tacoma': [0, 0, 0, 0, 0, 0, 0],
 'Topeka': [0, 0, 0, 0, 0, 0, 0],
 'Sacramento': [0, 0, 0, 0, 0, 0, 0],
 'Charlotte': [0, 0, 0, 0, 0, 0, 0]}

You create the outer collection temps with a dictionary comprehension. The expression is a key-value pair, which contains yet another comprehension. This code will quickly generate a list of data for each city in cities.

Nested lists are a common way to create matrices, which are often used for mathematical purposes. Take a look at the code block below:

In [24]:
matrix = [[i for i in range(5)] for _ in range(6)]

In [25]:
matrix

[[0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4]]

The outer list comprehension [... for _ in range(6)] creates six rows, while the inner list comprehension [i for i in range(5)] fills each of these rows with values.

So far, the purpose of each nested comprehension is pretty intuitive. However, there are other situations, such as flattening nested lists, where the logic arguably makes your code more confusing. Take this example, which uses a nested list comprehension to flatten a matrix:

In [26]:
matrix = [
[0, 0, 0],
[1, 1, 1],
[2, 2, 2],]

In [27]:
flat = [num for row in matrix for num in row]

In [28]:
flat

[0, 0, 0, 1, 1, 1, 2, 2, 2]

The code to flatten the matrix is concise, but it may not be so intuitive to understand how it works. On the other hand, if you were to use for loops to flatten the same matrix, then your code will be much more straightforward:

In [29]:
matrix = [
    [0, 0, 0],
    [1, 1, 1],
    [2, 2, 2],
]
flat = []
for row in matrix:
    for num in row:
        flat.append(num)

flat

[0, 0, 0, 1, 1, 1, 2, 2, 2]

Now you can see that the code traverses one row of the matrix at a time, pulling out all the elements in that row before moving on to the next one.

While the single-line nested list comprehension might seem more Pythonic, what’s most important is to write code that your team can easily understand and modify. When you choose your approach, you’ll have to make a judgment call based on whether you think the comprehension helps or hurts readability.

#### Profile to Optimize Performance

So, which approach is faster? Should you use list comprehensions or one of their alternatives? Rather than adhere to a single rule that’s true in all cases, it’s more useful to ask yourself whether or not performance matters in your specific circumstance. If not, then it’s usually best to choose whatever approach leads to the cleanest code!

If you’re in a scenario where performance is important, then it’s typically best to profile different approaches and listen to the data. timeit is a useful library for timing how long it takes chunks of code to run. You can use timeit to compare the runtime of map(), for loops, and list comprehensions:

In [30]:
import random
import timeit
TAX_RATE = .08
txns = [random.randrange(100) for _ in range(100000)]
def get_price(txn):
    return txn * (1 + TAX_RATE)

In [31]:
def get_prices_with_map():
    return list(map(get_price, txns))

def get_prices_with_comprehension():
    return [get_price(txn) for txn in txns]

def get_prices_with_loop():
    prices = []
    for txn in txns:
        prices.append(get_price(txn))
    return prices

In [32]:
timeit.timeit(get_prices_with_map, number=100)

1.6914971999940462

In [33]:
timeit.timeit(get_prices_with_comprehension, number=100)

2.2438498999981675

In [34]:
timeit.timeit(get_prices_with_loop, number=100)

2.796779600001173

Here, you define three methods that each use a different approach for creating a list. Then, you tell timeit to run each of those functions 100 times each. timeit returns the total time it took to run those 100 executions.

As the code demonstrates, the biggest difference is between the loop-based approach and map(), with the loop taking 50% longer to execute. Whether or not this matters depends on the needs of your application.

Whenever you have to choose a list creation method, try multiple implementations and consider what’s easiest to read and understand in your specific scenario. If performance is important, then you can use profiling tools to give you actionable data instead of relying on hunches or guesses about what works the best.

## The Map Function

While the ability to pass in functions as arguments is not unique to Python, it is a recent development in programming languages. Functions that allow for this type of behavior are called first-class functions. Any language that contains first-class functions can be written in a functional style.

There are a set of important first-class functions that are commonly used within the functional paradigm. These functions take in a Python iterable, and, like sorted, apply a function for each element in the list. Over the next few screens, we will examine each of these functions, but they all follow the general form of `function_name(function_to_apply, iterable_of_elements)`.

The first function we'll work with is the map() function. The map() function takes in an iterable (ie. list), and creates a new iterable object, a special map object. The new object has the first-class function applied to every element.

    # Pseudocode for map.
    def map(func, seq):
        # Return `Map` object with
        # the function applied to every
        # element.
        return Map(
            func(x)
            for x in seq
        )

Here's how we could use map() to add 10 or 20 to every element in a list:

In [45]:
values = [1, 2, 3, 4, 5]

# Note: We convert the returned map object to
# a list data structure.
add_10 = list(map(lambda x: x + 10, values))
add_20 = list(map(lambda x: x + 20, values))

print(add_10)

[11, 12, 13, 14, 15]


In [46]:
print(add_20)

[21, 22, 23, 24, 25]


Note that it's important to cast the return value from map() as a list object. Using the returned map object is difficult to work with if you're expecting it to function like a list. First, printing it does not show each of its items, and secondly, you can only iterate over it once.

### Vaja

- Map each line in the lines variable to its corresponding IP address:
    - Split the line on empty spaces ' '.
    - Return the first element on the split line.
- Cast the mapped object to a list, and assign it to the ip_addresses variable.
- Print first 10 ips from ip_addresses variable

In [47]:
with open('data/example_log.txt') as file:
    lines = file.readlines()
    ip_addresses = list(map(lambda x: x.split()[0], lines))
    print(ip_addresses[:10])

['200.155.108.44', '36.139.255.202', '50.112.115.219', '10.0.25.26', '233.154.7.24', '241.220.141.78', '191.198.138.97', '172.40.187.145', '225.119.46.80', '97.218.117.229']


### Alternatives to map

The built-in function map() takes a function as a first argument and applies it to each of the elements of its second argument, an iterable. Examples of iterables are strings, lists, and tuples. For more information on iterables and iterators, check out Iterables and Iterators.

map() returns an iterator corresponding to the transformed collection. As an example, if you wanted to transform a list of strings to a new list with each string capitalized, you could use map(), as follows:

In [48]:
list(map(lambda x: x.capitalize(), ['cat', 'dog', 'cow']))

['Cat', 'Dog', 'Cow']

You need to invoke list() to convert the iterator returned by map() into an expanded list that can be displayed in the Python shell interpreter.

Using a list comprehension eliminates the need for defining and invoking the lambda function:

In [49]:
[x.capitalize() for x in ['cat', 'dog', 'cow']]

['Cat', 'Dog', 'Cow']

In [50]:
# comprehations alternativa - zgornji primer
with open('data/example_log.txt') as file:
    lines = file.readlines()
    ip_addresses = [line.split()[0] for line in lines]
    print(ip_addresses[:10])

['200.155.108.44', '36.139.255.202', '50.112.115.219', '10.0.25.26', '233.154.7.24', '241.220.141.78', '191.198.138.97', '172.40.187.145', '225.119.46.80', '97.218.117.229']


## The Filter Function

The second function we'll work with is the filter() function. The filter() function takes in an iterable, creates a new iterable object (again, a special map object), and a first-class function that must return a bool value. The new map object is a filtered iterable of all elements that returned True.

    # Pseudocode for filter.
    def filter(evaluate, seq):
        # Return `Map` object with
        # the evaluate function applied to every
        # element.
        return Map(
            x for x in seq
            if evaluate(x) is True
        )

Here's how we could filter odd or even values from a list:

In [51]:
values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Note: We convert the returned filter object to
# a list data structure.
even = list(filter(lambda x: x % 2 == 0, values))
odd = list(filter(lambda x: x % 2 == 1, values))

print(even)

[2, 4, 6, 8, 10]


In [52]:
print(odd)

[1, 3, 5, 7, 9]


### Vaja

- Filter each line in the ip_addresses list to IP addresses that begin with less than or equal to 20.
- Cast the filtered object to a list, and assign it to the filtered_ips variable.
- Print the filtered_ips variable.

In [53]:
# comprehations alternativa - zgornji primer
with open('data/example_log.txt') as file:
    lines = file.readlines()
    ip_addresses = list(map(lambda x: x.split()[0], lines))
    filtered_ips = list(filter(lambda x: int(x.split('.')[0]) <= 20, ip_addresses))
    print(filtered_ips[:10])

['10.0.25.26', '4.31.18.29', '10.3.25.58', '5.237.70.145', '4.186.143.85', '7.205.198.134', '2.98.108.99', '20.123.163.219', '17.192.186.123', '19.137.101.141']


### Alternatives to filter

The built-in function filter(), another classic functional construct, can be converted into a list comprehension. It takes a predicate as a first argument and an iterable as a second argument. It builds an iterator containing all the elements of the initial collection that satisfies the predicate function. Here’s an example that filters all the even numbers in a given list of integers:

In [54]:
list(filter(lambda x: x%2 == 0, range(11)))

[0, 2, 4, 6, 8, 10]

Note that filter() returns an iterator, hence the need to invoke the built-in type list that constructs a list given an iterator.

The implementation leveraging the list comprehension construct gives the following:

In [55]:
[x for x in range(11) if x%2 == 0]

[0, 2, 4, 6, 8, 10]

In [56]:
# comprehations alternativa - zgornji primer
with open('data/example_log.txt') as file:
    lines = file.readlines()
    ip_addresses = [line.split()[0] for line in lines]
    filtered_ips = [ip for ip in ip_addresses if int(ip.split('.')[0]) <= 20]
    print(filtered_ips[:10])

['10.0.25.26', '4.31.18.29', '10.3.25.58', '5.237.70.145', '4.186.143.85', '7.205.198.134', '2.98.108.99', '20.123.163.219', '17.192.186.123', '19.137.101.141']


## The Reduce Function

The last function we'll look at is the reduce() function from the functools package. The reduce() function takes in an iterable, and then reduces the iterable to a single value. Reduce is different from filter() and map(), because reduce() takes in a function that has two input values.

Here's an example of how we can use reduce() to sum all elements in a list.

In [57]:
from functools import reduce

values = [1, 2, 3, 4]

summed = reduce(lambda a, b: a + b, values)
print(summed)

10


<img alt="diagram of reduce" src="https://dq-content.s3.amazonaws.com/263/s5_reduce_function.svg">

An interesting note to make is that you do not have to operate on the second value in the lambda expression. For example, you can write a function that always returns the first value of an iterable:

In [58]:
from functools import reduce

values = [1, 2, 3, 4, 5]

# By convention, we add `_` as a placeholder for an input
# we do not use.
first_value = reduce(lambda a, _: a, values)
print(first_value)

1


### Vaja

- Using reduce, count the total amount of elements in lines and filtered_ips.
- Find the ratio between filtered_ips and lines, and assign the value to ratio.
- Print the ratio variable.

In [59]:
from functools import reduce

with open('data/example_log.txt') as file:
    lines = file.readlines()
    ip_addresses = list(map(lambda x: x.split()[0], lines))
    filtered_ips = list(filter(lambda x: int(x.split('.')[0]) <= 20, ip_addresses))
    count_all = reduce(lambda x, _: 2 if isinstance(x, str) else x + 1, lines)
    count_filtered = reduce(lambda x, _: 2 if isinstance(x, str) else x + 1, filtered_ips)
    ratio = count_filtered / count_all

print(ratio)

0.07770304186326674


### Alternatives to reduce

Since Python 3, reduce() has gone from a built-in function to a functools module function. As map() and filter(), its first two arguments are respectively a function and an iterable. It may also take an initializer as a third argument that is used as the initial value of the resulting accumulator. For each element of the iterable, reduce() applies the function and accumulates the result that is returned when the iterable is exhausted.

To apply reduce() to a list of pairs and calculate the sum of the first item of each pair, you could write this:

In [60]:
import functools
pairs = [(1, 'a'), (2, 'b'), (3, 'c')]
functools.reduce(lambda acc, pair: acc + pair[0], pairs, 0)

6

A more idiomatic approach using a generator expression, as an argument to sum() in the example, is the following:



In [61]:
pairs = [(1, 'a'), (2, 'b'), (3, 'c')]

In [62]:
sum(x[0] for x in pairs)

6

A slightly different and possibly cleaner solution removes the need to explicitly access the first element of the pair and instead use unpacking:

In [63]:
pairs = [(1, 'a'), (2, 'b'), (3, 'c')]
sum(x for x, _ in pairs)

6

The use of underscore (_) is a Python convention indicating that you can ignore the second value of the pair.

sum() takes a unique argument, so the generator expression does not need to be in parentheses.

In [64]:
# comprehations alternativa - zgornji primer
with open('data/example_log.txt') as file:
    lines = file.readlines()
    ip_addresses = [line.split()[0] for line in lines]
    filtered_ips = [ip for ip in ip_addresses if int(ip.split('.')[0]) <= 20]
    count_all = sum(1 for line in lines)
    count_filtered = sum(1 for filtered_ip in filtered_ips)
    ratio = count_filtered / count_all
    print(ratio)

0.07770304186326674


In [65]:
# alternativa
with open('data/example_log.txt') as file:
    lines = file.readlines()
    ip_addresses = [line.split()[0] for line in lines]
    filtered_ips = [ip for ip in ip_addresses if int(ip.split('.')[0]) <= 20]
    ratio = len(filtered_ips) / len(lines)
    print(ratio)

0.07770304186326674


## Primer: Parsanje IPjev

In [66]:
!head -n 3 data/example_log.txt

200.155.108.44 - - [30/Nov/2017:11:59:54 +0000] "PUT /categories/categories/categories HTTP/1.1" 401 963 "http://www.yates.com/list/tags/category/" "Mozilla/5.0 (Windows CE) AppleWebKit/5332 (KHTML, like Gecko) Chrome/13.0.864.0 Safari/5332"
36.139.255.202 - - [30/Nov/2017:11:59:54 +0000] "PUT /search HTTP/1.1" 404 171 "https://www.butler.org/main/tag/category/home.php" "Mozilla/5.0 (Macintosh; PPC Mac OS X 10_5_0) AppleWebKit/5332 (KHTML, like Gecko) Chrome/15.0.813.0 Safari/5332"
50.112.115.219 - - [30/Nov/2017:11:59:54 +0000] "POST /main/blog HTTP/1.1" 404 743 "http://deleon-bender.com/categories/category.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_5_5 rv:2.0; apn-IN) AppleWebKit/531.48.1 (KHTML, like Gecko) Version/4.0 Safari/531.48.1"


- Preberemo datoteko in izločimo samo tiste vrstice, ki imajo privatne IPje.

In [67]:
with open('data/example_log.txt') as file:
    lines = file.readlines()

> [ipaddress — IPv4/IPv6 manipulation library](https://docs.python.org/3/library/ipaddress.html)

is_private: True if the address is allocated for private networks. See iana-ipv4-special-registry (for IPv4) or iana-ipv6-special-registry (for IPv6).

In [68]:
import ipaddress

def is_ip_private(ip):
    ip_object = ipaddress.ip_address(ip)
    return ip_object.is_private and not ip_object.is_reserved  and not ip_object.is_loopback

In [69]:
status_codes = []
for log in lines:
    log_splited = log.split()
    ip = log_splited[0]
    if is_ip_private(ip):
        status_code = int(log_splited[8])
        status_codes.append(status_code)

In [70]:
print(status_codes)

[404, 401, 404, 404, 200, 200, 404, 200, 200, 200, 200, 200, 404, 401, 401, 401, 404, 200, 401, 200, 200, 200, 200, 401, 200, 200, 401, 401, 404, 401, 404, 200, 401, 401, 200, 401, 200, 200, 401, 200, 404, 404, 200, 401, 200, 200, 404, 404, 404, 200, 200, 401, 401, 200, 200, 401, 404, 401, 404, 401, 200, 200]


In [71]:
from collections import Counter

In [72]:
status_counter = Counter(status_codes)

In [73]:
status_counter.most_common()

[(200, 28), (401, 19), (404, 15)]

In [74]:
for code, count in status_counter.most_common():
    print(f'Status code {code}: {count}x')

Status code 200: 28x
Status code 401: 19x
Status code 404: 15x


### Pandas alternativa

In [75]:
import ipaddress

def is_ip_private(ip):
    ip_object = ipaddress.ip_address(ip)
    return ip_object.is_private and not ip_object.is_reserved  and not ip_object.is_loopback

In [76]:
import pandas as pd
df = pd.read_csv('data/example_log.txt', sep=' ', header=None)
df.drop(labels=[1,2,3,4,5,7,8,9], axis=1, inplace=True)
df.rename(columns={0:'ip', 6:'status_code'}, inplace=True)
df['is_private'] = df['ip'].apply(lambda ip: is_ip_private(ip))
status_codes_private_ips = df[df['is_private'] == True].copy()

In [77]:
result = status_codes_private_ips['status_code'].value_counts().to_dict()
result

{200: 28, 401: 19, 404: 15}

# Razlaga

List Comprehension

In [4]:
prices = [1.09, 23.56, 57.84, 4.56, 6.78]
TAX_RATE = 0.22


def get_price_with_tax(txn):
    return round(txn * (1 + TAX_RATE), 2)


prices_with_tax = []
for price in prices:
    prices_with_tax.append(get_price_with_tax(price))

print(prices_with_tax)

[1.33, 28.74, 70.56, 5.56, 8.27]


In [5]:
# alternativa - list comprehension
prices = [1.09, 23.56, 57.84, 4.56, 6.78]
TAX_RATE = 0.22


def get_price_with_tax(txn):
    return round(txn * (1 + TAX_RATE), 2)


prices_with_tax = [get_price_with_tax(price) for price in prices]
print(prices_with_tax)

[1.33, 28.74, 70.56, 5.56, 8.27]


In [9]:
# alternativa - map funkcija
prices = [1.09, 23.56, 57.84, 4.56, 6.78]
TAX_RATE = 0.22


def get_price_with_tax(txn):
    return round(txn * (1 + TAX_RATE), 2)


final_prices = list(map(get_price_with_tax, prices))
print(final_prices)

[1.33, 28.74, 70.56, 5.56, 8.27]


    new_list = [expression for member in iterable]
    new_list = [expression for member in iterable (if conditional)]

In [11]:
sentence = "the rocket came back from mars"

vowels = [letter for letter in sentence if letter in "aeiou"]
print(vowels)

['e', 'o', 'e', 'a', 'e', 'a', 'o', 'a']


In [12]:
original_prices = [1.25, -9.45, 10.22, 3.78, -5.92, 1.16]

non_neg_prices = [price for price in original_prices if price > 0]
print(non_neg_prices)


[1.25, 10.22, 3.78, 1.16]


In [14]:
non_neg_prices = sorted([price for price in original_prices if price > 0])
print(non_neg_prices)


[1.16, 1.25, 3.78, 10.22]


    new_list = [expression (if conditional) for member in iterable]

In [15]:
original_prices = [1.25, -9.45, 10.22, 3.78, -5.92, 1.16]
prices = [price if price > 0 else 0 for price in original_prices]
print(prices)


[1.25, 0, 10.22, 3.78, 0, 1.16]


In [16]:
original_prices = [1.25, -9.45, 10.22, 3.78, -5.92, 1.16]

articles = {f"article_{number}": price for number, price in enumerate(original_prices)}
print(articles)


{'article_0': 1.25, 'article_1': -9.45, 'article_2': 10.22, 'article_3': 3.78, 'article_4': -5.92, 'article_5': 1.16}


In [18]:
list(enumerate(original_prices))

[(0, 1.25), (1, -9.45), (2, 10.22), (3, 3.78), (4, -5.92), (5, 1.16)]

In [None]:
logs = """
122.55.236.192 - - [09/Apr/2020:14:07:20 -0600] "GET /ply/bookplug.gif HTTP/1.1" 304 -
74.6.19.236 - - [09/Apr/2020:14:07:20 -0600] "GET /photos/u505/pages/IMG_1524.htm HTTP/1.0" 404 133
24.7.210.64 - - [09/Apr/2020:14:07:20 -0600] "GET /dynamic/03ProgramStructure.pdf HTTP/1.1" 304 -
81.103.63.40 - - [09/Apr/2020:14:07:21 -0600] "GET /photos/wind/pages/IMG_1262.htm HTTP/1.1" 404 133
189.144.107.121 - - [09/Apr/2020:14:07:21 -0600] "GET /cgi-bin/wiki.pl?SwigFaqMakeCheckFails HTTP/1.1" 200 2861
139.82.24.80 - - [09/Apr/2020:14:07:22 -0600] "GET /cgi-bin/wiki.pl?SwigFaqBuildErrorsRedHat HTTP/1.1" 200 2099
84.110.191.75 - - [09/Apr/2020:14:07:22 -0600] "GET /papers/Py96/python96.html HTTP/1.0" 200 22442
74.6.24.228 - - [09/Apr/2020:14:07:23 -0600] "GET /cgi-bin/wiki.pl?NameDirective HTTP/1.1" 200 1785
151.96.0.8 - - [09/Apr/2020:14:07:23 -0600] "GET /dynamic/05ObjectModel.pdf HTTP/1.0" 200 731143
130.79.100.39 - - [09/Apr/2020:14:07:23 -0600] "GET /ply/ply-1.8.tar.gz HTTP/1.1" 200 12819
59.103.3.200 - - [09/Apr/2020:14:07:24 -0600] "GET /dynamic/assign2.html HTTP/1.1" 304 -
84.110.153.190 - - [09/Apr/2020:14:07:25 -0600] "GET /swill/about.html HTTP/1.0" 404 133
151.200.90.2 - - [09/Apr/2020:14:07:25 -0600] "GET /photos/u505/pages/IMG_1492.htm HTTP/1.0" 404 133
80.161.85.77 - - [09/Apr/2020:14:07:25 -0600] "GET /dynamic HTTP/1.1" 301 246
76.68.215.63 - - [09/Apr/2020:14:07:26 -0600] "GET /PLYTalk.pdf HTTP/1.1" 404 133
"""

In [None]:
def extract_ips_from_server_logs(logs: str) -> list[str]:
    pass