# List Comprehension Lab

### Introduction

In this lesson, we'll practice working with list comprehension while working with data regarding restaurant chains.  Let's get started.

### Loading the Data

We can get our data about restaurant chains at the following web address.

In [1]:
url = "https://en.wikipedia.org/wiki/List_of_the_largest_fast_food_restaurant_chains"

> Press shift + return on the cell above.

Then, let's scrape our data from the url using pandas.

In [2]:
import pandas as pd
tables = pd.read_html(url, index_col = 0)

We select the last table on the page.

In [3]:
restaurant_chains_df =  tables[-1]

Then convert our dataframe to a list of dictionaries.

In [4]:
restaurant_chains = restaurant_chains_df.to_dict('records')

In [5]:
restaurant_chains[:3]

[{'Name': nan,
  'Name.1': "McDonald's",
  'Number of locations': '38,348 [1]',
  'Revenue': 'US$21.07 billion (2020)[2]'},
 {'Name': nan,
  'Name.1': 'Subway',
  'Number of locations': '36,840 [3]',
  'Revenue': 'US$16.1 billion (2020)[2]'},
 {'Name': nan,
  'Name.1': 'Starbucks',
  'Number of locations': '33,833[4]',
  'Revenue': 'US$26.7 billion (2020)[5]'}]

Ok, so we now have a list dictionaries where each dictionary represents a separate restaurant chain.

### Coercing our Data

> For this first section, do not use list comprehension.

Now that we have our list of dictionaries above.  Let's start by focusing on our revenue data.  Begin by simply extracting the text related to the revenue for each chain, and storing this data in a list.

In [6]:
revenue_texts = []

for restaurant in restaurant_chains:
    revenue_texts.append(restaurant['Revenue'])

In [7]:
revenue_texts[:10]

# ['US$10.4 billion (2020)[2]', 'US$20.8 billion (2020)[2]']

['US$21.07 billion (2020)[2]',
 'US$16.1 billion (2020)[2]',
 'US$26.7 billion (2020)[5]',
 'US$27.9 billion (2020)[7]',
 nan,
 nan,
 'US$3.62 billion (2019)[11]',
 'US$1.37 billion (2020)[13]',
 nan,
 nan]

Take a look at the `nan`s below.  This stands for not a number, and it is not very helpful to us.  A lotof the values after the first 8 are nan values.  So instead let's just select the first 8 restaurant chains, and then get to work with turning the data above into numbers.

In [16]:
first_restaurants = restaurant_chains[:4]

In the cell below extract the revenue data from `first_restaurants` and store the `revenues` in billions of revenue.

In [17]:
revenues = []
for restaurant_chain in first_restaurants:
    revenue = float(restaurant_chain['Revenue'][3:].split()[0])
    revenues.append(revenue)

In [18]:
revenues

# [21.07, 16.1, 26.7, 27.9]

[21.07, 16.1, 26.7, 27.9]

### Moving to List Comprehension

Let's begin by using list comprehension to create a list of the names of each restaurant chain.

> Assign the result to the variable `restaurant_names`.

In [19]:
restaurant_names = [chain['Name.1'] for chain in restaurant_chains]

In [20]:
restaurant_names[:4]

# ["McDonald's", 'Subway', 'Starbucks', 'KFC']

["McDonald's", 'Subway', 'Starbucks', 'KFC']

Now use list comprehension to select the number of locations of each restaurant chain.

> Assign the result to the variable `franchise_amounts`.

In [21]:
franchise_amounts = [chain['Number of locations'] for chain in restaurant_chains]

In [22]:
franchise_amounts[:3]

# ['38,348 [1]', '36,840 [3]', '33,833[4]']

['38,348 [1]', '36,840 [3]', '33,833[4]']

Now remove the square brackets from the number of locations.

In [23]:
clean_franchise_amounts = [chain['Number of locations'].split('[')[0] for chain in restaurant_chains]

In [24]:
clean_franchise_amounts[:3]

['38,348 ', '36,840 ', '33,833']

In [26]:
clean_franchise_amounts[-3:]

['508', '507+', '500+']

Now, coerce the data from a list of strings to a list of integers. 
> **Hint**: Look up the `replace` method.

In [27]:
franchise_amount_ints = [int(clean_franchise.replace(',', '').replace('+', '').replace('~', '')) for clean_franchise in clean_franchise_amounts]

In [29]:
franchise_amount_ints[:3]

# [38348, 36840, 33833]

[38348, 36840, 33833]

### Bonus: Combining our Data

Ok, so now, for the first eight restaurant chains let's find the amount of revenue earned per franchise.  Remember, that we already have the revenues.

In [40]:
revenues

[21.07, 16.1, 26.7, 27.9]

And we have the number of franchises.

In [41]:
franchise_amount_ints[:4]

[38348, 36840, 33833, 25000]

Create a list of where each element is the revenue per franchise, for the first eight chains.

In [42]:
one_billion = 1_000_000_000

In [43]:
revenues_per_franchise = [int((revenue*one_billion)/franchise_amount_int) 
 for revenue, franchise_amount_int in list(zip(revenues, franchise_amount_ints))]

In [44]:
revenues_per_franchise
# [549441, 437024, 789170, 1116000]

[549441, 437024, 789170, 1116000]

In [46]:
restaurant_names[:4]

["McDonald's", 'Subway', 'Starbucks', 'KFC']

### Summary

In this lesson we practiced working with looping through our data and list comprehension.  We should use simple looping when performing more complex steps in our loop, and lean on list comprehension when there is not a coercion involved as we loop through each elements.