# Holding Onto Our Data

### Introduction

In the last lesson we were able to gather data from our [Wikipedia page of city populations](https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population).

<img src="https://github.com/jigsawlabs-student/code-intro/blob/master/top_cities.png?raw=1">

In [1]:
import pandas as pd
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population')
cities_df = tables[4]
cities = cities_df.to_dict('records')
# cities[:2]

> Press shift + enter on the code above.

The list of dictionaries looks like the following.

```python
cities = [
     {'City': 'New York', '2020census': 8804190},
    {'City': 'Los Angeles', '2020census': 3898747}
]
```

From there, we wrote code to loop through each city, selecting the population estimate and printing it out.  

> We slice the first few dictionaries so that we are not printing out the population of every city.

In [4]:
six_cities = cities[0:6]

for top_city in six_cities:
    #print(top_city)
    #top_city -> {'City': 'New York', '2020 census': 8804190}
    city_pop = top_city['2020 census']
    print(city_pop)

8804190
3898747
2746388
2304580
1608139
1603797


## Holding onto our data

Ok, we already done some great work with printing out our data.  But what we really need is **a list** of populations.  Let's walk through how to add each population to a list, so that we can hold onto our populations, and then add these populations to our plot.

#### 1. Adding a single element to a list

We can add data to a list using the append method.

In [5]:
populations = []
first_pop = 8804190

populations.append(first_pop)

populations

[8804190]

In the code above, we first created an empty list using the square brackets.  Then we used the `append` method to add in the first population.

```python
empty_list.append(data)
```

Now it's your turn.  Create an empty list and assign it to the variable `more_populations`.  Then append in the `second_pop`.

In [6]:
more_populations = []

second_pop = 3990456

more_populations.append(second_pop)
# [3990456]

> Replace the word None with your list.  Press shift + enter.  If you appended correctly, you will see your the population.

#### Adding Multiple Elements to a list

Ok, now we don't want to just add a single population to an empty list but every population to an empty list.  Think about our mailboxes again.

<img src="https://github.com/jigsawlabs-student/code-intro/blob/master/mailboxes.jpg?raw=1" width = 30%>

This time as our loop opens up each mailbox one by one, we not only need to access that information but also need to add the selected information to a list.

1. Start with our list of dictionaries called `cities`.  
2. Create an empty list and assign it to the variable `city_names`.  
3. As our loop moves one by one through each dictionary, we select just the information we want from the dictionary and add it to the list.

In [7]:
cities
# cities = [
#      {'City': 'New York', '2020census': 8804190},
#     {'City': 'Los Angeles', '2020census': 3898747}
# ]
city_names = []

for city in cities:
    #   each_city -> {'City': 'New York', '2020census': 8804190}
    city_name = city['City']
    # city_name -> 'New York'
    city_names.append(city_name)

> Press shift + return on the cell above.

Now let's take a look at the first few elements of our `city_names` list.

In [8]:
city_names[:3]

# ['New York[d]', 'Los Angeles', 'Chicago']

['New York[d]', 'Los Angeles', 'Chicago']

> Once again, before moving, try to explain what is happening in each line in our code that loops through the cities.

Now it's your turn.  Create a list of `populations` by accessing the `2020census` from each element and adding them to the empty list of populations.

In [9]:
# create an empty list of populations
populations = []

# loop through the each city
    # select the 2020census
    # add it to the list
for city in cities:
    city_census = city['2020 census']
    populations.append(city_census)

In [10]:
populations[:3]
# [8804190, 3898747, 2746388]

[8804190, 3898747, 2746388]

This is an important task so let's loop through our `cities` once more with data listed under the key `2010Census`.

In [24]:
pops_2022 = []
# write your code below
# no 2010 data exists
for city in cities:
  city_census = city['2022 estimate']
  pops_2022.append(city_census)




> Then check to make sure that the first three populations are correct.

In [None]:
pops_2010[:3]
# [8175133, 3792621, 2695598]

### Plotting Our Data

Now we have a list of city names and a list of populations.

In [26]:
city_names = []

for each_city in cities:
    city_name = each_city['City']
    city_names.append(city_name)

populations = []

for each_city in cities:
    city_pop = each_city['2020 census']
    populations.append(city_pop)

> Press shift + enter on the cell above.

In [27]:
city_names[:3]

['New York[d]', 'Los Angeles', 'Chicago']

In [28]:
populations[:3]

[8804190, 3898747, 2746388]

Let's use them to create a plot with Plotly.

Remember we can import our library and create a blank chart with the following code.

> Press shift + enter on the cell below.

In [29]:
import plotly.graph_objects as go

go.Figure()

Next, let's create the scatter plot of our data with our `populations` as the `y` values, and `hovertext` as the `city_names` to set a label for each marker.

In [30]:
population_scatter = go.Scatter(y = populations, hovertext = city_names, mode = 'markers')

Now we can add that scatter plot to our empty `plotly figure`.

In [31]:
import plotly.graph_objects as go

population_scatter = go.Scatter(y = populations, hovertext = city_names, mode = 'markers')

go.Figure(data = [population_scatter])

### Summary

In this lesson we learned how to start with a list of dictionaies, and go through each one to select just the data that we want, and it to a list.  

```python
populations = []

for each_city in cities:
    city_pop = each_city['2020census']
    populations.append(city_pop)
```

```python
city_names = []

for each_city in cities:
    city_name = each_city['City']
    city_names.append(city_name)
```

Once we have gathered our needed data into our two lists, `populations` and `city_names`.  We then use the two lists to create a plot of our data.

```python
import plotly.graph_objects as go
population_scatter = go.Scatter(y = populations, hovertext = city_names, mode = 'markers')
go.Figure(data = [population_scatter])
```

[Saving nested data](youtube.com/watch?v=w6qtwVW3l3w&list=PLCG6Te769p1gkVJizwSmo6GoEI9oHoAPA&index=24&ab_channel=JigsawLabs)

<right>
<a href="https://colab.research.google.com/github/jigsawlabs-student/code-intro/blob/master/10-exploring-live-data.ipynb">
<img src="https://github.com/jigsawlabs-student/code-intro/blob/master/next-yellow.jpg?raw=1" align="right" style="padding-right: 20px" width="10%">
    </a>
</right>

<center>
<a href="https://www.jigsawlabs.io" style="position: center"><img src="https://github.com/jigsawlabs-student/code-intro/blob/master/jigsaw-icon.png?raw=1" width="15%" style="text-align: center"></a>
</center>

### Answers

In [32]:
second_pop = 3990456
more_populations = []

more_populations.append(second_pop)

In [33]:
more_populations

[3990456]

In [37]:
populations = []

for city in cities:
    pop = city['2022 estimate']
    populations.append(pop)

In [None]:
populations[:2]

[8398748, 3990456]