# B08 Badge - List Comprehentions: Filter and Map (represent) Lists

note: This is a **JUPYTER NOTEBOOK**. It's a type of website where you can edit and run computer programms (code). You interact with it in your web browser and you can find it via your Learn.

1. These blocks here are cells
2. There are **TEXT CELLS** like this one with explanations of concepts
3. and **CODE CELLS** with Python code (see below). Code cells have a ```In []``` written to the left
4. You can **RUN CODE CELLS** by clicking on them and pressing **Shirt + Enter**. When you run a cell code in it is run (it "happens", computer will do what you asked for it to do). Results of what your code does will appear underneath the cell.
5. As we go through these lessons, please READ text cells, and RUN code cells
6. Good luck!

### List Comprehensions - a special Python syntax to represent ('comprehend') collections

LIST COMPREHENSION is a special, very useful syntax that makes it faster and cleaner to change and filter collections.
Very frequently you have a collection of some objects: strings, numbers, or even of dictionaries and you want to:

- **FILTER DATA**: only keep some values in the collection (just like you would filter coffee grains out from your coffee, using a paper filter that allows small particles through, and stops larger particles)
- **MAP DATA**: reduce complex data to simpler data or a different data format (just like geographical map represents terrain in a simplified flat paper format) 

Think aboutit like a conveyor belt: things go in on one side, and slightly changed things come out on the other side.

One way to read list comprehention is:


`
[ 
OUTPUT
INPUT
]
`

or in a longer format:

`
[ 
format for the OUTPUT which is using the ITEM somehow 
definition of the INPUT, specifying that this we will use each ITEM in all ITEMS
]
`

The syntax for this is:

`result = [ my_output for one_item in all_items]`

for readability it's best to add an extra new line (Python basically ignores new lines) and write it like this:

`result = [  my_output
            for one_item in all_items]`


## Example Case studies:

### Lowercase words

To compare a number of tweets, you migth want to simplify all words to be lower case, so that words `WOW`, `wow` amd `Wow` are considered the same word. To do that, for each word in your List of words, you would want to represent it as a lowercase version of itself. This way `['WOW','wow', 'Wow']` would become `['wow','wow', 'wow']` hence making it easier to count, analyse etc.

Btw. to represent a word as its lowercase version, you can use `some_word.lower()`, eg change `Just` into `just`, `think` into `think` etc.

e.g. `"BAnanA".lower()` will return `"banana"`


In [None]:
# note: some_word.lower() turns that word into a lowercase

words = ['THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lower_case_words = [ word.lower() 
                      for word in words ]

print(lower_case_words)

### Represent words as only their lengths

Example: for each word in words, represent it with that word's length (eg. change 'THIS' into 4, 'it' into 2 etc)

In [None]:
words = ['THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lengths_of_words = [ len(word) 
                      for word in words ]

print(lengths_of_words)

Notice: we are taking not filtering any data out and we keep everything. If innput is a list ten of somethings, then we are returning a list of ten something-else's. **We are Mapping the data, leaving the amount of it unchanged**. This detail will be important in a minute.

### Simplify a list of dictionaries, to a list of just one of its values 

Example: take a collection of coffee orders and represent it as just the amount of sugars. For example to check if we still have enough sugar

In [None]:
coffee_orders = [{'type': 'Latte', 'milk': 'Cow', 'sugar': 4}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2}]

number_of_sugars = [ order['sugar'] 
                      for order in coffee_orders ]

print(number_of_sugars)

### Represent a list of dictionaries as something else (e.g. a string)

Example: take a collection of coffee orders and represent them as a sentence, like "Latte coffee, with Cow milk and 1 sugars"

We will learn how to do this in a much cleaner way in one of the next badges about working with strings.

In [None]:
coffee_orders = [{'type': 'Latte', 'milk': 'Cow', 'sugar': 4}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2}]

orders_as_sentences = [ order['type'] + " coffee, with "+ order['milk']+" milk and "+ str(order['sugar'])+" sugars"
                      for order in coffee_orders ]

print(orders_as_sentences)

### In a List represent each word as... itself. This does not make much sense, but why not

Example: take a list of words and represent each word as itself. It will make sense in a minute why you would do that.

Notice that in the place where output format is specified, we just leave the `word` with no changes. Not `len(word)` or `word.lower()` - just a simple, unchanged `word`

In [None]:
words = ['THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lengths_of_words = [ word
                      for word in words ]

print(lengths_of_words)

# and finally: We can also FILTER the input list, and only keep some of its elements:

Optionally you can add a third line with a condition, that needs to be true for the item to be kept in the final result.


`
result = [  output
            input
            condition ]
`

The syntax becomes: 

`
result = [  output
            for one_item in all_items
            if condition]
`

Note: This will FIRST execute the condition, and THEN produce output from all elements that returned True.

(Rather than changing things that meet the condition, and leaving everything else unchanged)

## Example Case studies:

### Lowercase words, and only keep those longer than 3 characters

Let's say that we discovered that most of short words are not very meaningful, so we would like to unify the set by removing short words and lowercasing everything else.

In [None]:
words = ['THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lowercased_long_words = [ word.lower() # turn a word into its lowercase version
                    for word in words # for each word in words list
                    if len(word) > 3 ] # but only keep those where word's length was over 3

print(lowercased_long_words)

### Represent all words starting with letter 't' as their lengths

Let's say that we have a theory that words starting with 't' are all very long. To test it, we first will try to just represent them as their lengths  

NOTE: Strings in python are basically Lists of characters. So `"this"` is identical to `['t','h','i','s']`. We will talk more about it in one of future badges. Than means that to get the first character of a word, we just request the item at index 0. e.g. `my_word[0]` or even `"Banana"[0]`

In [None]:
words = ['THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lengths_of_t_words = [ len(word) # turn a word into its length
                          for word in words # for each word in tokens list
                          if  word[0] == 't' ] # but only keep those where word's first letter is 't'

print(lengths_of_t_words)

**Wait, what?** something is not right here! how about "THIS" and "The"? Well... technically they do not start with a 't', but rather they start with a 'T'. Python is case sensitive, which means that 'T' and 't' have nothing in common. 

You would need to lowercase first to confirm it starts with a 't' or 'T'.


You could write

`
if  word[0] == 't' or word[0] == 'T' ]
`

Try to read each block of code connected with `.` from left hand side. For example `word[0].lower()` means:  take word -> then just take the first character -> then lowercase it

You could lowercase the first letter before the comparison 

`
if  word[0].lower() == 't' ]  # take word -> then just take the first character -> then lowercase it
`

or even lowercase the whole word, and then get the first letter... 


`
if  word.lower()[0] == 't' ]
`

we will talk abotu all these scenarios and this type of daisy-chaining soon.

In [None]:
words = ['THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

lengths_of_t_words = [ len(word)
                          for word in words
                          if  word[0].lower() == 't' ]

print(lengths_of_t_words)

Ok, this makes more sense!

### Simplify the data and only keep some of them

For example, if we only wanted to get the sugar content of vegan beverages (where milk has a value different than 'Cow')

In [None]:
coffee_orders = [{'type': 'Latte', 'milk': 'Cow', 'sugar': 4}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2}]

orders_as_sentences = [ order['sugar']
                      for order in coffee_orders 
                      if order['milk'] != 'Cow']

print(orders_as_sentences)

### A very common and useful scenario: Keep only some elements, but do not change them

We can even use the words as they came, for output, and not change them at all (just use the filtering part). For exmaple let's just keep the words starting with a 't' but not change them in any way

In [None]:
words = ['THIS','is', 'The','MOST','FUNNY','thing','I','have','ever','heard','I','Love','it','LOL']

t_words = [ word # if you just want to filter words, but output should be the same as input
                for word in words
                if  word[0].lower() == 't' ]

print(t_words)

## Extra scenarios

### Compare two groups - what percent of coffees are 'Latte'

You can use methematical operators max( ), len( ), sum( ) on your lists to achieve some very useful results

In [None]:
coffee_orders = [{'type': 'Latte', 'milk': 'Cow', 'sugar': 4}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2}]

all_lattes = [ order
                      for order in coffee_orders 
                      if order['type'] == 'Latte']

number_of_lattes = len( all_lattes )

print("number of lattes", number_of_lattes)
print("percent of lattes", number_of_lattes / len(coffee_orders) )

### Compare two groups - who drinks more sugar: vegans or people who cow-milk drink Latte?



In [None]:
coffee_orders = [{'type': 'Latte', 'milk': 'Cow', 'sugar': 2}, 
                 {'type': 'Lungo', 'milk': 'None', 'sugar': 1}, 
                 {'type': 'Latte', 'milk': 'Oat', 'sugar': 2}]

sugars_in_vegan_coffees = [ order['sugar']
                      for order in coffee_orders 
                      if order['milk'] != 'Cow']

sugars_in_all_lattes = [ order['sugar']
                      for order in coffee_orders 
                      if order['type'] == 'Latte' and order['milk'] == 'Cow']

if sum(sugars_in_vegan_coffees) > sum(sugars_in_all_lattes):
    print("there's more total sugar used in vegan coffees, than in all cow-milk lattes")
else:
    print("there's no more total sugar used in vegan coffees, than in all cow-milk lattes")


# ⛏ Minitask 1: filter and modify/map some simple lists 

Remember to every now and then save your progress (File > Save, or a keyboard shortcut) 

Given below Lists write list comprehentions that will change them into different formats. Write 3-4 list comprehentions for each, and try to keep them not identical to examples from this notebook

your data sets are: 

- part of Romeo and Juliet by Shakespeare from http://www.gutenberg.org/files/1513/1513-0.txt (I already pasted it for you below, so don;t worry about getting the text by yourself)
- random sample of currency exchange values from pound to euro

some inspiration, but feel free to coma up with your own:

- is name of Romeo or Juliet mentioned more? (remember that you can filter a list and then ask for `len( )` of the result)
- what are the words longer than 5 characters starting with a letter 'M'?
- how many days has seen the pound being worth more than 1.1 euro?
- how much euro could one buy for 100 pounds on the days when pound was being worth more than 1.1 euro?

In [None]:
shakespeare = ['JULIET', '.', '’', 'Tis', 'but', 'thy', 'name', 'that', 'is', 'my', 'enemy', ';',
              'Thou', 'art', 'thyself', ',', 'though', 'not', 'a', 'Montague', '.', 'What', '’', 's',
              'Montague', '?', 'It', 'is', 'nor', 'hand', 'nor', 'foot', ',', 'Nor', 'arm', ',', 'nor', 
              'face', ',', 'nor', 'any', 'other', 'part', 'Belonging', 'to', 'a', 'man', '.',
              'O', 'be', 'some', 'other', 'name', '.', 'What', '’', 's', 'in', 'a', 'name', '?',
              'That', 'which', 'we', 'call', 'a', 'rose', 'By', 'any', 'other', 'name', 'would', 'smell', 
              'as', 'sweet', ';', 'So', 'Romeo', 'would', ',', 'were', 'he', 'not', 'Romeo', 'call', '’',
              'd', ',', 'Retain', 'that', 'dear', 'perfection', 'which', 'he', 'owes', 'Without', 'that', 
              'title', '.', 'Romeo', ',', 'doff', 'thy', 'name', ',', 'And', 'for', 'thy', 'name', ',',
              'which', 'is', 'no', 'part', 'of', 'thee', ',', 'Take', 'all', 'myself', '.']

pound_to_euro = [1.079974, 1.084409, 1.087774, 1.096891, 1.094839, 1.091664, 1.090948,
                 1.090948, 1.089578, 1.086815, 1.092255, 1.093133, 1.0926, 1.095731, 
                 1.095731, 1.102182, 1.095389, 1.101342, 1.098181, 1.103804, 1.103819,
                 1.103819, 1.101216, 1.099458, 1.097423, 1.10108, 1.101352, 1.102922, 1.102922, 1.106241]

 
# ⛏ Minitask 2: Extract and filter some information from JSON information abotu cash machines

You know this dataset from another badge. Ask 2-3 business questions and answer them using a list comprehention.

some inspiration, but feel free to coma up with your own:

- what are the city names where all these ATMs are based in?
- what is the typical number of currencies supported by an ATM?
- what are names of cities of ATMs that dispense 5 pound notes? (MinimumPossibleAmount is 5)

In [None]:
import pprint as pp

bank_of_scotland_atms = [{'Identification': 'BFF7BC11',
  'SupportedLanguages': ['eng', 'spa', 'ger', 'fre'],
  'ATMServices': ['PINUnblock',
   'Balance',
   'BillPayments',
   'CashWithdrawal',
   'FastCash',
   'MobilePhoneTopUp',
   'PINChange',
   'MiniStatement'],
  'Accessibility': ['WheelchairAccess'],
  'SupportedCurrencies': ['GBP'],
  'MinimumPossibleAmount': '5',
  'Branch': {'Identification': '80453100'},
  'Location': {'LocationCategory': ['BranchExternal'],
   'Site': {'Identification': '80453100'},
   'PostalAddress': {'AddressLine': ['136 BUCHANAN STREET; BALFRON'],
    'BuildingNumber': 'BOS BRANCH',
    'StreetName': '136 BUCHANAN STREET',
    'TownName': 'GLASGOW',
    'CountrySubDivision': ['GLASGOW'],
    'Country': 'GB',
    'PostCode': 'G63 0TG',
    'GeoLocation': {'GeographicCoordinates': {'Latitude': '56.071629',
      'Longitude': '-4.336911'}}}}},
 {'Identification': 'BFA6HC11',
  'SupportedLanguages': ['eng', 'spa', 'ger', 'fre'],
  'ATMServices': ['PINUnblock',
   'Balance',
   'BillPayments',
   'CashWithdrawal',
   'FastCash',
   'MobilePhoneTopUp',
   'PINChange',
   'MiniStatement'],
  'Accessibility': ['AudioCashMachine', 'WheelchairAccess'],
  'SupportedCurrencies': ['GBP'],
  'MinimumPossibleAmount': '10',
  'Branch': {'Identification': '80496000'},
  'Location': {'LocationCategory': ['BranchInternal'],
   'Site': {'Identification': '80496000'},
   'PostalAddress': {'AddressLine': ['BRISTOL ROOM COPLEY DATA CENTRE; WAKEFIELD ROAD'],
    'BuildingNumber': 'BOS BRANCH',
    'StreetName': 'BRISTOL ROOM COPLEY DATA CENTRE',
    'TownName': 'HALIFAX',
    'CountrySubDivision': ['WEST YORKSHIRE'],
    'Country': 'GB',
    'PostCode': 'HX3 0TD',
    'GeoLocation': {'GeographicCoordinates': {'Latitude': '51.454900',
      'Longitude': '2.592000'}}}}},
 {'Identification': 'BF88GC11',
  'SupportedLanguages': ['eng', 'spa', 'ger', 'fre'],
  'ATMServices': ['PINUnblock',
   'Balance',
   'BillPayments',
   'CashWithdrawal',
   'FastCash',
   'MobilePhoneTopUp',
   'PINChange',
   'MiniStatement'],
  'Accessibility': ['AudioCashMachine', 'WheelchairAccess'],
  'SupportedCurrencies': ['GBP'],
  'MinimumPossibleAmount': '5',
  'Branch': {'Identification': '80452300'},
  'Location': {'LocationCategory': ['BranchInternal'],
   'Site': {'Identification': '80452300'},
   'PostalAddress': {'AddressLine': ['111 HIGH STREET;'],
    'BuildingNumber': 'BOS BRANCH',
    'StreetName': '111 HIGH STREET',
    'TownName': 'ANNAN',
    'CountrySubDivision': ['DUMFRIES AND GALLOWAY'],
    'Country': 'GB',
    'PostCode': 'DG12 6AB',
    'GeoLocation': {'GeographicCoordinates': {'Latitude': '54.987176',
      'Longitude': '-3.260075'}}}}}]

pp.pprint(bank_of_scotland_atms)

## ⭐️⭐️⭐️💥 What you learned in this session: Three stars and a wish 
**In yoru own words** write in your Learn diary:

- 3 things you yould like to remember from this badge
- 1 thing you wish to understand better in the future or a question you'd like to ask
