# Back to Live data

### Introduction

In the last lesson, we learned about how to use boolean expressions to ask questions of our data.

In [2]:
population = 200_000

population > 100_000

True

And then how our if else statement would perform different operations based on what our boolean expression returned.

In [3]:
if population > 100_000:
    print('too large')
else:
    print('too small')

too large


In this lesson, we'll go back to working with our live data -- beginning with a single dictionary, and then a list of dictionaries.  

## But first, a challenge

Ok, so now we'll give code so that you can assign the `city` variable to three different cities, beginning with New York.

In [38]:
city = {'City': 'New York[d]',
  'State': 'New York',
  '2020census': 8_804_190}

In [4]:
# city = {'City': 'Los Angeles',
#   'State': 'California',
#   '2020census': 3_898_747}

# city = {'City': 'Chicago',
#   'State': 'Illinois',
#   '2020census': 2746388}

Write code below that will only print the City if the population is over `3_000_000`, otherwise it will print `too small`.

> First try the code with the first city New York, which should be printed.  And then try it again with Los Angeles, which should also be printed.  And finally with Chicago, which should not be printed.

In [40]:
# Write code here



too small


## The answer, and moving on

Ok, so we can solving the above by repeatedly answering questions of our data.  For example, let's set our city to New York.

In [2]:
city = {'City': 'New York[d]',
  'State': 'New York',
  '2020census': 8_804_190}

Then, us determining if the city's population is over `3_000_000` just involves asking a series of questions.

In [3]:
# first find the population
city['2020census'] 
# 8804190

# then see if the population is over 3_000_000
city['2020census'] > 3_000_000

True

After answering those questions, we can fill in the if else statement.

In [9]:
if city['2020census'] > 3_000_000:
    print(city['City'])
else:
    print('too small')

New York[d]


## Removing the else

In our problem above, it is pretty silly to `print('too small')` when we hit our `else` statement.  Instead, we would just prefer to do nothing. 

> Unfortunately, if we have an `else` statement with nothing below, Python will complain.

In [11]:
if city['2020census'] > 3_000_000:
    print(city['City'])
else:
    

SyntaxError: incomplete input (1657403283.py, line 4)

But it turns out, we are ok if we remove the else statement all together.

In [14]:
city = {'City': 'New York[d]',
  'State': 'New York',
  '2020census': 8_804_190}

# city = {'City': 'Chicago',
#   'State': 'Illinois',
#   '2020census': 2746388}

if city['2020census'] > 3_000_000:
    print(city['City'])

New York[d]


So above we are saying if the population is over 3 million, print the city name, and otherwise do nothing.  And we accomplish this "doing of nothing", by removing the else clause all together.

## Working with a list of dictionaries

So now that we have used an if else statement with a dictionary.  Let's try another task.  Let's start with our list of cities, and then only add the names of the cities that have a population over 1 million. 

> First we get our list of cities.

In [4]:
import pandas as pd

url = 'https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population'
tables = pd.read_html(url)
cities_table = tables[4]
cities = cities_table.to_dict('records')

Remember, that each city is a dictionary.

In [8]:
cities[:2]

[{'2021rank': 1,
  'City': 'New York[d]',
  'State[c]': 'New York',
  '2021estimate': 8467513,
  '2020census': 8804190,
  'Change': '−3.82%',
  '2020 land area': '300.5\xa0sq\xa0mi',
  '2020 land area.1': '778.3\xa0km2',
  '2020 population density': '29,298/sq\xa0mi',
  '2020 population density.1': '11,312/km2',
  'Location': '.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}40°40′N 73°56′W\ufeff / \ufeff40.66°N 73.93°W'},
 {'2021rank': 2,
  'City': 'Los Angeles',
  'State[c]': 'California',
  '2021estimate': 3849297,
  '2020census': 3898747,
  'Change': '−1.27%',
  '2020 land area': '469.5\xa0sq\xa0mi',
  '2020 land area.1': '1,216.0\xa0km2',
  '2020 population density': '8,304/sq\xa0mi',
  '2020 population density.1': '3,206/km2',
  'Location': '34°01′N 118°25′W\ufeff / \ufeff34

Now let's create a list of the `city_names` whose population is over `1_000_000`.  If not over `1_000_000`, then do nothing. We'll walk through the answer below.

In [15]:
city_names = []
for city in cities:
    if city['2020census'] > 1_000_000:
        city_names.append(city['City'])

In [18]:
print(city_names)

# ['New York[d]', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia[e]', 
# 'San Antonio', 'San Diego', 'Dallas', 'San Jose']

['New York[d]', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia[e]', 'San Antonio', 'San Diego', 'Dallas', 'San Jose']


## How we do it

Let's start with a single city.  

In [19]:
city = {'2021rank': 2,
  'City': 'Los Angeles',
  'State[c]': 'California',
  '2021estimate': 3849297,
  '2020census': 3898747}

And if we only want to add the city name of our cities that are larger than a million, we can accomplish this with the following: 

In [21]:
city_names = []

if city['2020census'] > 1_000_000:
    city_names.append(city['City'])
    
city_names

['Los Angeles']

Really, it's the same code as in our first problem above.  The only difference is that we are adding our large city to the list, instead of printing it out.

And now that we have performed our code properly for one city, we can use a loop to perform this for every city.

In [23]:
city_names = []

for city in cities:
    if city['2020census'] > 1_000_000:
        city_name = city['City']
        city_names.append(city['City'])

In [25]:
print(city_names)

['New York[d]', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia[e]', 'San Antonio', 'San Diego', 'Dallas', 'San Jose']


And we can read the code above, like so.

In [26]:
city_names = []

# for each dictionary in our list of dictionaries
for city in cities:
    # select the population, and if it's over 1_000_000
    if city['2020census'] > 1_000_000:
        # add the city name to out list of city_names
        city_name = city['City']
        city_names.append(city['City'])

What we just accomplished above is called filtering.  We filtered through our list for cities that meet a population threshold.  If you think about data analytics, being able to filter our data is critically important.  For example, think about when you visit a  website like Facebook or Linkedin -- they don't show you all users but instead show you a subset of users interesting to you.  And that is likely done by looping through their data and just selecting the data that meets certain criteria.

### Summary

In this lesson, we saw how we can work with dictionaries and lists of dictionaries with `if` `else` statements.  First, we saw that we can eliminate the else statement altogether, if we wish to do nothing when our code does not meet a certain condition.  

> So code like this...

In [27]:
if city['2020census'] > 3_000_000:
    print(city['City'])
else:
    print('too small')

too small


> Can be changed to just an `if` statement:

In [28]:
if city['2020census'] > 3_000_000:
    print(city['City'])

We then saw how we can only add items to a list if they meet a certain criteria.  And can simplify this problem by first trying this with a single element.

In [29]:
city_names = []


if city['2020census'] > 1_000_000:
    city_name = city['City']
    city_names.append(city['City'])

And then looping through all of the elements.

In [30]:
city_names = []

for city in cities:
    if city['2020census'] > 1_000_000:
        city_name = city['City']
        city_names.append(city['City'])