# Functions Review

The purpose of functions is to create a degree of encapsulation.  

For example, think of an IPhone.  When you turn on your phone, the system needs to bootup, it takes you to the login screen, and then wait for you to enter your password.  As you might imagine there's a lot that occurs under the hood.

But all you, as the user need to know is that if you press the on button, the login screen pops up.  Functions work in a similar way.  They allow you to simply call a procedure and get a return value.  

For example, let's see a function that pulls our list of countries from Wikipedia, and returns a list of dictionaries of those countries.

### Seeing a function

> Let's say we have a function below called `most_populous_cities`.  

For now, don't even look at the body of the function.  Instead we'll just call the function.

> Press shift + return below, but don't look at the body of the function.

In [14]:
import pandas as pd
def most_populous_cities():
    url = "https://simple.wikipedia.org/wiki/List_of_United_States_cities_by_population"
    dfs = pd.read_html(url)
    cities_df = dfs[2]
    cities = cities_df.to_dict('records')
    return cities 

Now let's call the function below and assign the output to the variable `collected_cities`.

In [19]:
collected_cities = most_populous_cities()

> Scroll down.

In [20]:
collected_cities[:2]

[{'2017rank': 1,
  'City': 'New York[3]',
  'State': 'New York',
  '2017estimate': 8622698,
  '2010Census': 8175133,
  'Change': '+5.47%',
  '2016 land area': '301.5\xa0sq\xa0mi',
  '2016 land area.1': '780.9\xa0km2',
  '2016 population density': '28,317/sq\xa0mi',
  '2016 population density.1': '10,933/km2',
  'Location': '.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:none}.mw-parser-output .longitude,.mw-parser-output .latitude{white-space:nowrap}40°39′49″N 73°56′19″W\ufeff / \ufeff40.6635°N 73.9387°W'},
 {'2017rank': 2,
  'City': 'Los Angeles',
  'State': 'California',
  '2017estimate': 3999759,
  '2010Census': 3792621,
  'Change': '+5.46%',
  '2016 land area': '468.7\xa0sq\xa0mi',
  '2016 land area.1': '1,213.9\xa0km2',
  '2016 population density': '8,484/sq\xa0mi',
  '2016 population density.1': '3,276/km2',
  'Location': '34°01′10″N 118°24′39″W\ufeff 

So what you can see from the code above, is that we were able to call the function `most_populous_cities()` and it returned a list of dictionaries which we assigned to `collected_cities`.

In [21]:
collected_cities = most_populous_cities()

Just like the pressing the on button on the iphone, we don't really need to know what occurs under the hood to turn on the phone.  Instead we just call the most `most_populous_cities()` function, and get back a list of cities.

In fact by default, functions will prevent you from accessing anything defined in the function.  For example let's take another look at the `most_populous_cities` function below.

In [None]:
def most_populous_cities():
    url = "https://simple.wikipedia.org/wiki/List_of_United_States_cities_by_population"
    dfs = pd.read_html(url)
    cities_df = dfs[2]
    cities = cities_df.to_dict('records')
    return cities

If we try to access any of the variables that are defined inside of the function, they are unavailable to us.

In [22]:
url

NameError: name 'url' is not defined

Any variable defined in a function, is only available from within that function.  This is why we access `url` in line two of the function, but not anywhere else in our colab.  This is encapsulation by the function.  The point is that, you the programmer shouldn't be exposed to the underlying procedure of the function by default, just like you don't need to open up an iphone to see how it works under the hood.  

But if we encapsulted *everything* in a function, we could not share it's output with the rest of our program.  For this reason, we use `return` to specify that this is the output of our function.  

So above, we return the variable cities, our list of cities.  This is the only thing *returned* to us when we call a function.  

> Just like the only thing returned to us when we press the on button is the login screen.  The rest of the procedure to get there, turning on the phone, booting up the phone, is hidden from us.  We just need to see the final output.

Uncomment and call `most_populous_cities`.

In [24]:
# most_populous_cities()[:2]

I like to think of a function as a dungeon and as a return value of throwing something over the walls of the dungeon.  So all of the variables inside of our function, `url`, `dfs` are trapped inside of the dungeon.  But we throw the `cities` over the wall, and assign them to the variable `collected_cities`.

In [25]:
collected_cities = most_populous_cities()

Notice that if we have a function but there is no return value, that our function returns nothing.

In [26]:
def say_greeting():
    greeting = 'hello world'

In [27]:
say_greeting()

> Fix the function above so that `'hello world'` is returned from the function.