# Looping Through Live Data

### Introduction

In this lesson, it's time to work with some live data gathered from Wikipedia! 

### Getting our Data

Ok, now let's get some data from our [Wikipedia page of city populations](https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population).

<img src="https://storage.cloud.google.com/curriculum-assets/curriculum-assets.nosync/mom-files/cities-chart.png">

Now as we would expect, this data will come as a list of dictionaries.  Each row above represents a different dictionary of key value pairs, and we place each of these rows in a list.

```python
cities = [
    {'City': 'New York', '2018estimate': 8398748},
    {'City': 'Los Angelos', '2018estimate': 3990456}
]
```

It's time to gather our data in the cell below.  Let's skip over what this code is doing until the next lesson.  For now, the important thing is how to work with this data that we collect.

> Press shift + return on the cell below.

In [81]:
import pandas as pd
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population')
cities_df = tables[4]
cities = cities_df.to_dict('records')

We expect that this code is a list of dictionaries, with each row of data being represented as a separate dictionary.  Let's just slice the first two elements from the list to take a look.

> Press shift + enter.

In [82]:
cities[0:2]

[{'2018rank': 1,
  'City': 'New York[d]',
  'State[c]': 'New York',
  '2018estimate': 8398748,
  '2010Census': 8175133,
  'Change': '+2.74%',
  '2016 land area': '301.5\xa0sq\xa0mi',
  '2016 land area.1': '780.9\xa0km2',
  '2016 population density': '28,317/sq\xa0mi',
  '2016 population density.1': '10,933/km2',
  'Location': '40°39′49″N 73°56′19″W\ufeff / \ufeff40.6635°N 73.9387°W'},
 {'2018rank': 2,
  'City': 'Los Angeles',
  'State[c]': 'California',
  '2018estimate': 3990456,
  '2010Census': 3792621,
  'Change': '+5.22%',
  '2016 land area': '468.7\xa0sq\xa0mi',
  '2016 land area.1': '1,213.9\xa0km2',
  '2016 population density': '8,484/sq\xa0mi',
  '2016 population density.1': '3,276/km2',
  'Location': '34°01′10″N 118°24′39″W\ufeff / \ufeff34.0194°N 118.4108°W'}]

Next, let's get some practice by selecting just the data we want from this list.  Our list is stored in the variable `cities`.  Select the `2018estimate` of the first city.

In [92]:
first_pop = None
first_pop

# 8398748

Then select the `2018estimate` population of the second city.

In [91]:
second_pop = None
second_pop
# 3990456

> You can listen to me and my mom work through the second answer together.

In [106]:
beginning = "https://storage.googleapis.com/curriculum-assets/curriculum-assets.nosync/mom-files/"

import IPython.display as ipd
ipd.Audio(beginning + "live-nested-data.wav")

### Our goal

Now, our goal is to get the population not just of one city but **of every city**.  This is a common task in programming.  Our dictionaries often comes to us with lots of information we don't need, so we need to loop through each dictionary to just to get to the information that we want. 

Here's how.

> We added some comments to show you what each variable is equal to as you move through the code.

In [86]:
two_cities = cities[0:2]
two_cities
# [
#     {'City': 'New York', '2018estimate': 8398748}
#    {'City': 'Los Angelos', '2018estimate': 3308748}
# ]

for top_city in two_cities:
    # top_city -> {'City': 'New York', '2018estimate': 8398748}
    city_name = top_city['City']
    print(city_name)

New York[d]
Los Angeles


The key thing to realize in the code above is that each element of the list is a dictionary.  So our `for loop` grabs each dictionary one by one, calls it `top_city`, and then selects just the information it wants from that city.

<img src="https://storage.googleapis.com/curriculum-assets/curriculum-assets.nosync/intro-to-coding/mailboxes.jpg" width = 30%>

If we think of our set of mailboxes being the list of dictionaries again.  Then with a for loop, we no longer have to worry about selecting the correct mailbox.  Our `for loop`, opens each mailbox, one by one, and gives us access to that mailbox with the name of the `block variable`, `top_city`.  Then, we repeatedly ask each mailbox for the same information.

With that in mind try to talk through what happens in each line of code below.

In [79]:
#two_cities =   [
#     {'City': 'New York', '2018estimate': 8398748},
#     {'City': 'Los Angelos', '2018estimate': 3990456},
#     {'City': 'Chicago', '2018estimate': 2700000}
# ]

two_cities = cities[0:2]

for top_city in two_cities:
    # top_city -> {'City': 'New York', '2018estimate': 8398748}
    city_name = top_city['City']
    print(city_name)


New York[d]
Los Angeles


> The code above can be confusing so here's some extra explanation if you need.  

In [107]:
beginning = "https://storage.googleapis.com/curriculum-assets/curriculum-assets.nosync/mom-files/"

import IPython.display as ipd
ipd.Audio(beginning + "explain-looping-through.wav")

Ok, now it's your turn.  Loop through the first six cities selecting just the `2018estimate` population from each one.  We get you started by slicing the first `six_cities` as a list.

In [94]:
six_cities = cities[0:6]
#  [
#     {'City': 'New York', '2018estimate': 8398748},
#     {'City': 'Los Angelos', '2018estimate': 3990456},
#    ...
# ]
# write your code in the lines below




# for each city in our list of cities 
# select and print out the 2018estimate from the city

> In the cell below, I copied the code from above for you to reference.  Audio is further below.

In [None]:
two_cities = cities[0:2]

for top_city in two_cities:
    # top_city -> {'City': 'New York', '2018estimate': 8398748}
    city_name = top_city['City']
    print(city_name)


> Audio of my mom working through our above problem.

In [109]:
beginning = "https://storage.googleapis.com/curriculum-assets/curriculum-assets.nosync/mom-files/"

import IPython.display as ipd
ipd.Audio(beginning + "looping-live-prob.wav")

### Summary

In this lesson we learned how to start with a list of dictionaries, and go through each one to select just the data that we want, and print out that data.  

```python
for each_city in cities:
    city_pop = each_city['2018estimate']
    print(city_pop)
```

The key point is to realize that our block variable `each_city` represents each dictionary.  So in the code above, we access the population value for the dictionary and then print out that value, and thne do the same for the second dictionary, and so on.  

In the next lesson, we'll see how to hold onto those values of populations by adding them to a list of populations, and also adding each city name to a list of names.  Then we can use these two lists to create plots of our data.

<right> 
<a href="https://colab.research.google.com/github/jigsawlabs-student/code-intro/blob/master/9-loops-to-lists.ipynb">
<img src="https://storage.cloud.google.com/curriculum-assets/curriculum-assets.nosync/mom-files/pngfuel.com.png" align="right" style="padding-right: 20px" width="10%">
    </a>
</right>

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="https://storage.cloud.google.com/curriculum-assets/curriculum-assets.nosync/mom-files/jigsaw-labs.png" width="15%" style="text-align: center"></a>
</center>

### Answers

In [97]:
six_cities = cities[0:6]

for city in six_cities:
    pop = city['2018estimate']
    print(pop)

8398748
3990456
2705994
2325502
1660272
1584138
