# 2.6 List Comprehensions

A common task is processing items in a list. This section introduces list comprehensions, a powerful tool for doing just that.

**Creating new lists**

A list comprehension creates a new list by applying an operation to each element of a sequence.

In [1]:
a = [1,2,3,4,5]
b = [2*x for x in a]
b

[2, 4, 6, 8, 10]

Another example:

In [2]:
names = ['Elwood', 'Jake']
a = [name.lower() for name in names]
a

['elwood', 'jake']

The general syntax is:

`[ <expression> for <variable_name> in <sequence> ].`

**Filtering**

You can also filter during the list comprehension.

In [3]:
a = [1, -5, 4, 2, -2, 10]
b = [2*x for x in a if x > 0]
b

[2, 8, 4, 20]

**Use cases**

List comprehensions are hugely useful. For example, you can collect values of a specific dictionary fields:

stocknames = [s['name'] for s in stocks]

You can perform database-like queries on sequences.

In [None]:
a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ]

You can also combine a list comprehension with a sequence reduction:

In [None]:
cost = sum([s['shares']*s['price'] for s in stocks])

**General Syntax**

`[ <expression> for <variable_name> in <sequence> if <condition>]`

What it means:

In [None]:
result = []
for variable_name in sequence:
    if condition:
        result.append(expression)

**Historical Digression**

List comprehensions come from math (set-builder notation).

a = [ x * x for x in s if x > 0 ] # Python

a = { x^2 | x ∈ s, x > 0 }         # Math

It is also implemented in several other languages. Most coders probably aren't thinking about their math class though. So, it's fine to view it as a cool list shortcut.

## Exercises

Start by running your report.py program so that you have the portfolio of stocks loaded in the interactive mode.

Now, at the Python interactive prompt, type statements to perform the operations described below. These operations perform various kinds of data reductions, transforms, and queries on the portfolio data.

**Exercise 2.19: List comprehensions**

Try a few simple list comprehensions just to become familiar with the syntax.

In [5]:
nums = [1,2,3,4]
squares = [x*x for x in nums]
squares

[1, 4, 9, 16]

In [6]:
twice = [2*x for x in nums if x > 2]
twice

[6, 8]

Notice how the list comprehensions are creating a new list with the data suitably transformed or filtered.

**Exercise 2.20: Sequence Reductions**

Compute the total cost of the portfolio using a single Python statement.

In [6]:
import csv

def read_portfolio(filename):

    '''
    Read a stock portfolio file into a list of dictionaries with keys :
    name, shares, and price
    '''

    portfolio = []

    with open(filename, 'rt') as f:
        rows = csv.reader(f)
        headers = next(rows)
        # print(headers)
        # for row in rows:
        #     record = dict(zip(headers, row))
        #     name = record['name']
        #     nshares = int(record['shares'])
        #     price = float(record['price'])

        for row in rows:            
            stock = {
                'name' : row[0], 
                'shares' : int(row[1]), 
                'price' : float(row[2])}
            portfolio.append(stock)
            # portfolio.append(name)
            # portfolio.append(nshares)
            # portfolio.append(price)
            # print(portfolio)
    return portfolio

def read_prices(filename):
    
    '''
    Read a CSV file of price data into a dict mapping names to prices
    '''
    prices = {}

    with open(filename, 'rt') as f:
        rows = csv.reader(f)
        for row in rows:   
            try:
                prices[row[0]] = float(row[1])
            except IndexError:
                pass
    return prices

def make_report(portfolio, prices):
    """
    Takes a list of stocks and dictionary of prices as input
    returns a list of tuples containing the rows of table (name, shares, price, change)
    """

    rows = []

    for stock in portfolio:
        current_price = prices[stock['name']] # Current price is from prices.csv file, choose the price of stock name in portfolio.csv
        change = current_price - stock['price'] # Calculate change from current_price - price in portfolio.csv
        summary = (stock['name'], stock['shares'], current_price, change)
        rows.append(summary)
    
    return rows

# Read data files and create the report data        

portfolio = read_portfolio('Data\portfolio.csv')
prices = read_prices('Data\prices.csv')

In [4]:
import os
os.chdir(r"C:\Users\Fadinda Shafira\Documents\KALBE\Python\practical-python\Work")

In [7]:
portfolio = read_portfolio('Data\portfolio.csv')
cost = sum([s['shares'] * s['price'] for s in portfolio])
cost

44671.15

After you have done that, show how you can compute the current value of the portfolio using a single statement.

In [13]:
value = sum([ s['shares'] * prices[s['name']] for s in portfolio])
value

28686.1

Both of the above operations are an example of a map-reduction. The list comprehension is mapping an operation across the list.

In [14]:
[s['shares'] * s['price'] for s in portfolio]

[3220.0000000000005,
 4555.0,
 12516.0,
 10246.0,
 3835.1499999999996,
 3254.9999999999995,
 7044.0]

The sum() function is then performing a reduction across the result:

In [15]:
sum(_)

44671.15

With this knowledge, you are now ready to go launch a big-data startup company.

**Exercise 2.21: Data Queries**

Try the following examples of various data queries.

First, a list of all portfolio holdings with more than 100 shares.

In [22]:
more100 = [s for s in portfolio if s['shares'] > 100]
more100

[{'name': 'CAT', 'shares': 150, 'price': 83.44},
 {'name': 'MSFT', 'shares': 200, 'price': 51.23}]

All portfolio holdings for MSFT and IBM stocks

In [11]:
x = [s for s in portfolio if s['name'] == 'MSFT' or s['name'] == 'IBM']
x

[{'name': 'IBM', 'shares': 50, 'price': 91.1},
 {'name': 'MSFT', 'shares': 200, 'price': 51.23},
 {'name': 'MSFT', 'shares': 50, 'price': 65.1},
 {'name': 'IBM', 'shares': 100, 'price': 70.44}]

In [12]:
msfitbm = [ s for s in portfolio if s['name'] in {'MSFT', 'IBM'}]
msfitbm

[{'name': 'IBM', 'shares': 50, 'price': 91.1},
 {'name': 'MSFT', 'shares': 200, 'price': 51.23},
 {'name': 'MSFT', 'shares': 50, 'price': 65.1},
 {'name': 'IBM', 'shares': 100, 'price': 70.44}]

A list of all portfolio holdings that cost more than $10000.

In [14]:
costmore10000 = [s for s in portfolio if s['price'] * s['shares'] > 10000]
costmore10000

[{'name': 'CAT', 'shares': 150, 'price': 83.44},
 {'name': 'MSFT', 'shares': 200, 'price': 51.23}]

**Exercise 2.22: Data Extraction**

Show how you could build a list of tuples (name, shares) where name and shares are taken from portfolio.

In [15]:
name_shares = [ (s['name'], s['shares']) for s in portfolio]
name_shares

[('AA', 100),
 ('IBM', 50),
 ('CAT', 150),
 ('MSFT', 200),
 ('GE', 95),
 ('MSFT', 50),
 ('IBM', 100)]

If you change the square brackets ([,]) to curly braces ({, }), you get something known as a set comprehension. This gives you unique or distinct values.

For example, this determines the set of unique stock names that appear in portfolio:

In [18]:
uniquenames = { s['name'] for s in portfolio}
uniquenames

{'AA', 'CAT', 'GE', 'IBM', 'MSFT'}

If you specify key:value pairs, you can build a dictionary. For example, make a dictionary that maps the name of a stock to the total number of shares held.

In [27]:
holdings = { name: 0 for name in uniquenames}
holdings

{'IBM': 0, 'CAT': 0, 'MSFT': 0, 'AA': 0, 'GE': 0}

This latter feature is known as a dictionary comprehension. Let’s tabulate:

In [25]:
for s in portfolio:
    holdings[s['name']] += s['shares']

holdings

{'IBM': 150, 'CAT': 150, 'MSFT': 250, 'AA': 100, 'GE': 95}

Try this example that filters the prices dictionary down to only those names that appear in the portfolio:

In [28]:
portfolio_prices = { name: prices[name] for name in uniquenames}
portfolio_prices

{'IBM': 106.28, 'CAT': 35.46, 'MSFT': 20.89, 'AA': 9.22, 'GE': 13.48}

**Exercise 2.23: Extracting Data From CSV Files**

Knowing how to use various combinations of list, set, and dictionary comprehensions can be useful in various forms of data processing. Here’s an example that shows how to extract selected columns from a CSV file.

First, read a row of header information from a CSV file:

In [29]:
# Task: Read a row of header information from a CSV file

import csv

f = open('Data\portfoliodate.csv')
rows = csv.reader(f)
headers = next(rows)
headers

['name', 'date', 'time', 'shares', 'price']

Next, define a variable that lists the columns that you actually care about:

In [30]:
# Create a variable contain lists of selected CSV columns
select = ['name', 'shares', 'price']

Now, locate the indices of the above columns in the source CSV file:

In [31]:
indices = [ headers.index(colname) for colname in select ]
indices

[0, 3, 4]

Finally, read a row of data and turn it into a dictionary using a dictionary comprehension:

In [32]:
row = next(rows)
record = {colname: row[index] for colname, index in zip(select, indices)} # dict-comprehension
record

{'name': 'AA', 'shares': '100', 'price': '32.20'}

If you’re feeling comfortable with what just happened, read the rest of the file:

In [33]:
portfolio = [ { colname: row[index] for colname, index in zip(select, indices) } for row in rows ]
portfolio

[{'name': 'IBM', 'shares': '50', 'price': '91.10'},
 {'name': 'CAT', 'shares': '150', 'price': '83.44'},
 {'name': 'MSFT', 'shares': '200', 'price': '51.23'},
 {'name': 'GE', 'shares': '95', 'price': '40.37'},
 {'name': 'MSFT', 'shares': '50', 'price': '65.10'},
 {'name': 'IBM', 'shares': '100', 'price': '70.44'}]