# Software Carpentry with Python: Part 1

## Intro to Python Concepts

For October 14, 2020. 

Starts with overview of Python and Anaconda as in: [Data Carpentry: Python Ecology Lesson](https://datacarpentry.org/python-ecology-lesson/00-before-we-start/index.html)

## What is Python 

Python is a general purpose programming language that supports rapid development of data analytics applications. 

Its main advantages are:

* Free
* Open-source
* Available on all major platforms (macOS, Linux, Windows)
* Supported by Python Software Foundation, has large community
* Supports multiple programming paradigms: data analysis, application development, machine learning
* Rich ecosystem of third-party packages

So, why use Python for data analysis?
* As a language, accessible for new members of the community to get up to speed.
* Supports reproducibility: Reproducibility is the ability to obtain the same results using the same dataset(s) and analysis.
* Data analysis written as a Python script can be reproduced on any platform. Moreover, if you collect more or correct existing data, you can quickly and easily re-run your analysis! An increasing number of journals and funding agencies expect analyses to be reproducible, so knowing Python will give you an edge with these requirements.
* Versatility: Python can read text files, connect to databases, and many other data formats, on your computer or on the web.
* Interdisciplinary and extensible: Python provides a framework that allows anyone to combine approaches from different research (but not only) disciplines to best suit your analysis needs.

## Knowing your way around Anaconda
**Everyone open Anaconda Navigator**

Anaconda distribution of Python includes a lot of its popular packages, such as the Jupyter Notebook, Jupyter Lab, and Spyder IDE. Have a quick look around the Anaconda Navigator. You can launch programs from the Navigator or use the command line.
ments that
The Jupyter Notebook is an open-source web application that allows you to create documents that combine code, graphs, and narrative text. 

Anaconda also comes with a package manager called conda, which makes it easy to install and update additional packages from the command line. 

### Opening up a notebook:
1. Click on the Launch button under Jupyter Notebook (not Jupyter Lab). 
2. Navigate to your Desktop folder
3. Create a new folder called python-lesson
4. Click into that. 
5. Click the New > Notebook button


## Introduction to Jupyter

Jupyter is a browser-based interface to a Python kernel. The kernel is essentially a process that runs the Python code. We have Python installed as part of the Anaconda software. 

In the notebook, we can use code cells and text cells which explain our code or add documentation. We can even put plots and images in this notebook, making it a great tool for sharing code. 

Let's update the title on thie notebook. The title will become the new filename, and as you learned in the Bash lesson, there are good reasons to not use spaces in your filenames.  

Title: data-carpentry-python-day1

1. Add header using Markdown 
2. Add a code cell. 
 
 

## Introduction to the dataset

For our Python lessons, we're using a dataset from the [Gapminder project](https://www.gapminder.org/). 
The Gapminder Foundation promotes understanding of statistics in order to support sustainable global development.
They've gathered a number of datasets that are useful in learning about statistical analysis and how to explore data with code.  

Throughout today's lesson and tomorrow's, we'll be working with a dataset that includes GDP and population figures for countries over time. 


### In preparation:
Return to the home tab in your browser for the Jupyter Home page that shows your files. Remember, we're in the python-lesson directory. 

1. Let's create a New > Folder. Click the checkbox and Rename it: data. 
2. Then click to go into it. 
2. We need to put our gapminder data here. The file is at: https://go.gwu.edu/gapminderdata. Once you get to that page, got to File > Save Page As to download it. Be sure to save it as a comma separated values file, not a text or html file. 
 Download it now. Make sure to note where you downloaded it to!
3. Back in Jupyter Hun, click the Upload button and upload the data file. 

We're setting things up this way so that we all have the same file structure and can follow along. Also, this is generally a good practice, to create a data folder and put your original data files in there. 

A quick aside that there are Python libraries like `os` library that can work with our directory structure, however, that is not our focus today.

### Looking at the data

We'll look at different ways of working with this data in Python. 
As we proceed, we'll introduce various levels of abstraction that will allow us to work with the data in increasingly efficient ways.

Let's take a moment to get acquainted with the dataset. You can view the data in tabular form [on GitHub](https://github.com/gwu-libraries/gwlibraries-workshops/blob/master/python-datacarpentry/data/gapminder_all_cleaned.csv).

### Exercise:
I'll be putting you into small groups in breakout rooms. Please introduce yourselves to each other and share something about why you're interested in learning Python. Once you've done that, as a group, take a look at the gapminder data file together and discuss these questions: 

1. What does each row represent?
2. What does each column represent? 
3. What kinds of analytical questions does this dataset let us pose?


## Variables and Assignment 
(from [SWC Plotting and Programming with Python](https://swcarpentry.github.io/python-novice-gapminder/02-variables/index.html))

### Use variables to store values.
* Variables are names for values.
* In Python the `=` symbol assigns the value on the right to the name on the left.
* The variable is created when a value is assigned to it.
* It's difficult to work with complex data in Python **without** using variables. Variables represent a key element of **abstraction**, whereby we can represent something potentially complex and changeable by a more concise and persistent expression.

One approach to representing our gapminder data would be to assign individual values to variables.

In [None]:
country = 'Algeria'
gdpPercap_1952 = 2449.008185

### Use print to display values.
* Python has a built-in function called `print` that prints things as text.
* Call the function (i.e., tell Python to run it) by using its name.
* Function calls require **parentheses** after the name of the function. 
* If the function accepts **arguments**, these go inside the parens. 
* The `print` function takes one or more arguments. If the arguments are variables, it outputs the values they represent to the screen.

In [None]:
print(country, gdpPercap_1952)

### Naming and assigning variables
* Variable names can contain only letters, digits, and the underscore `_` character (typically used to separate words in long variable names).
* Variable names cannot start with a digit.
* Python treats upper- and lower-case letters as different, so `gdpPercap` and `gdppercap` are different variables.
* Variables must be created before they can be used. In Python, in most cases, we cannot create a variable without explicitly assigning it a value.
* If you refer to a variable that doesn't yet exist, Python reports an error.

In [None]:
print(gdpPercap_1987)

### Variables persist.
* Having created (by assignment) the variables `country` and `gdpPercap_1952` above, we can continue to use them throughout this notebook session.
* We can change the value to which a variable refers by re-assigning it. 
* Below, we do this explicitly. Later on, we'll see how other Python structures -- namely, `for` loops -- reassign variables implicitly. 
* Be aware that it is the order of execution of cells that is important in a Jupyter notebook, **not the order in which they appear**. Python will remember all the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. If you define variables lower down the notebook and then (re)run cells further up, those defined lower down will still exist.



In [None]:
country = 'Angola'
gdpPercap_1952 = 3520.610273
gdpPercap_2007 = 4797.231267

### Variables can be used in expressions.
* An **expression** is a series of Python instructions that results in a new value.
* A mathematical calculation is an example of an expression.
* We can use variables in expressions just as if they were values.
* We can mix variables and **literals** (values not assigned to a variable).
* We can assign the result of an expression to a new variable.

In [None]:
percent_change = (gdpPercap_2007 - gdpPercap_1952) / gdpPercap_1952 * 100
print('Percent change from 1952 to 2007 =', percent_change)

### Boolean expressions
* A special subset of Python expressions result in a binary value, called a Boolean, that is either `True` or `False`. 
* `True` and `False` (note the capitalization) are reserved words in Python, so they should not be used as the names for variables.
* A useful kind of Boolean expression is to test for equality. To distinguish it from variable assignment, which uses a single equals sign (`=`), the test for equality uses a double equals sign (`==`).

In [None]:
print(country == 'Algeria')

In [None]:
print(gdpPercap_1952 == 3520.610273)

## Types and Conversion

### Every value has a type.
* Every value in a program has a specific type.
* Python has a few different types for representing single values. These include:
  * Integer (`int`): represents positive or negative whole numbers like 3 or -512.
  * Floating point number (`float`): represents real numbers like 3.14159 or -2.5.
  * Character string (`str`): text of any length, enclosed in either single quotes or double quotes (as long as they match).

### Use the built-in function type to find the type of a value.
* Use the built-in function `type` to find out what type a value has.
* Works on variables as well.
* But remember: the value has the type — the variable is just a label.

In [None]:
type(country)

In [None]:
type(gdpPercap_1952)

### Certain kinds of expressions work only for certain types.
In other words, a value’s type determines what the program can do with it.

* `int` and `float` values can be used in mathematical expressions.
* You can't automatically combine a `str` with either an `int` or a `float`.
* The `print` function, like the `type` function, works for any Python type. That is not true of all Python functions.

In [None]:
print(gdpPercap_2007 - gdpPercap_1952)

In [None]:
newValue = country + gdpPercap_1952

Strings have a length, and we can use another built-in method called `len()` to see how many characters are in it.

In [None]:
len(country)

We can't take the `len()` of a float.

In [None]:
len(gdpPercap_1952)

### Type conversion allows us to combine different types in a single expression.
* Use the `+` operator to combine strings. 
* Use the `str` function to convert a float or an integer to a string. 

In [None]:
label = 'GDP per capita for ' + country + ' in 2007: ' + str(gdpPercap_1952) 
print(label)

Note that the code `str(gdpPercap_1952)` does not change the value to which the variable `gdpPercap_1952` refers. Rather, it returns a **new** value of the `str` type, which is then added to the end of the previous string in our expression.

In [None]:
type(gdpPercap_1952)

### Exercise
1. Using variables and expressions, calculate the percentage change in **life expectancy** between 1952 and 2007 for another country in the CSV file.
2. Print a statement as above that summarizes your findings, including the name of the country.

Percent change between quantities `x` and `y` is calculated as `(y - x) / x`.

In [None]:
country2 = 'New Zealand'
lifeExp1952 = 69.39 
lifeExp2007 = 80.204
percent_change_le = (lifeExp2007 - lifeExp1952) /  lifeExp1952 * 100
statement = 'Percent change in New Zealand life expectancy is ' + str(round(percent_change_le, 2)) + "."
print(statement)

## Working with strings and lists

We have seen how to create and manipulate variables that store single values. But that's not terribly helpful for working with data, where we usually want to perform operations on large amounts of values. Assigning a new variable to every country's GDP per capita in 1952, for instance, would be tiresome, to say the least.

In what follows, we'll see how to use some additional Python types to work with **collections** of values.

But first, we need to load our dataset into Python's memory space so that we can work with it.

### Reading a file

* One way to access the contents of a file in Python is to use the built-in `open` and `read` functions. 

(Type in code here rather than covering all of this first.)

* Here we are opening the file called `gapminder_all_cleaned.csv`, which is in the `data` subdirectory within the directory that contains this notebook.
* We are opening the file as read-only, using the `'r'` argument to `open`.
* A reference to the open file is assigned to the variable `f`.
* We assign a new variable, `gap_data`, to the contents of the file by calling `f.read()`.
  * The dot between `f` and `read()` means that `read` is a function associated with anything that has the special **file-object type**. (We could have chosen any valid variable name; the choice of `f` is merely conventional).
* The following code creates what's called a **code block**. 
  * Notice the colon (`:`) at the end of the first line.
  * Notice that the second line is indented. 
  * To indent, you can use either the `tab` key or the spacebar, but whichever you choose, you should be consistent.
  * The Jupyter notebook will automatically indent the next line after a colon.
* Code blocks identify lines of code that must be run together. 
* In this case, the `with...as` keywords instruct Python to keep the file open only for the duration of the ensuing code block.
* Since the code block contains only one more statement -- the call to `f.read()` -- the file will be closed automatically after its contents have been read into memory.


In [None]:
with open('data/gapminder_all_cleaned.csv', 'r') as f:
    gap_data = f.read()

Now our variable `f` isn't good for much anymore, but we should be able to access our dataset via the `gap_data` variable.


In [None]:
print(gap_data)

Reading a file into Python converts the contents into a string.

If you inspect the output of the `print` statement, you'll notice the commas between of the column values for each row of the original table. This is because the file is in CSV (comma-separated value) format.

That's the **format** of the data, but its Python **type** is still just `str`.

This is not many strings, but one long string with newline characters between rows. 

In [None]:
type(gap_data)

### Use slices and indexing to inspect strings.
* A Python string is basically an **ordered collection** of zero or more characters.
* Anything between quotation marks is a string.
  * An empty pair of quotation marks (`''`) is also a string.
* Each **position** in the string (first, second, etc.) is given a number. This number is called an **index**.
* Indices are numbered from 0.
* Use the index in square brackets to get the character at that position.

Here's how we see the first character in the file.

In [None]:
print(gap_data[0])

* A **slice** is a part of a string (or, more generally, any list-like thing).
* We take a slice by using `[start:stop]`, where start is replaced with the index of the first element we want and stop is replaced with the index of the element **just after** the last element we want.
* We can omit either the `start` or `stop` index -- keeping the colon -- and Python will default to the start or end of the string itself, respectively.
* Taking a slice does not change the contents of the original string. Instead, the slice is a copy of part of the original string.

Returning to our country variable for a moment, here's what it looks like to take a slice starting with the character with index 2 and slicing up to but not including position 4. Our country variable is `Angola` so we get `go`.

In [None]:
print(country[2:4])

Here's how to look at the first 100 characters of the file. We're starting at position 0, so can leave that out before the colon. 

In [None]:
print(gap_data[:100])

### Split a string to create a Python list.

As a single string, our data isn't very useful. For one thing, we have no good way of knowing where the individual data elements begin and end in order to access them by position, since even the numbers are just collections of characters.
(Remember: `6432.23` is not the same as `'6432.23'`, because one is a `float` and the other a `str`.)

* Python strings have some built-in functions. These built-in functions for particular data types are usually called methods. (This means that from every value of type `str`, and every variable assigned to such a value, you can access the function using the dot-notation shown below.)
* The `split` function will divide a string into multiple strings. 
* `split` accepts as an argument an additional string of one or more characters that represents a separator: at every occurrence of the separator, a new substring will be created. 
* `split` is a function that returns a new value. What it returns is a Python `list`.
* In this case, we use the special **line-break** character, `\n`, to split our dataset into rows. 

In [None]:
lines = gap_data.split('\n')

### Indexing and slicing works with lists, too.
* Because a Python `list` is an ordered collection, we can access its values by position.
* We can also create new lists by slicing the original into smaller parts.

First, let's see how long our `lines` list is. Earlier, we found the length of a string using a Python built-in function, `len()`. Try using that method on the `lines` list. 

In [None]:
len(lines)

In [None]:
print('Header row:', lines[0])
print('Next three rows:', lines[1:4])

### Understanding lists
* Note that `lines` refers to a value of type `list`, but `lines[0]` refers to a value of type `str`, because our list is actually a list of strings.
* Lists, like other Python collection types, can themselves contain values of diverse types.
* It's common to work with such **nested** structures in Python. 
* Indexing a string produces another string, but indexing a list might produce a value of any type (depending on the elements of the list.)

### Exercise
1. The first row of our dataset is a comma-separated string of column headers. Split this string on the commas to create a list of headers and assign it to a variable.
2. Print just the column headers in the header row that contain the column names for GDP per capita (the ones starting with gdpPercap).

In [None]:
headers = lines[0].split(',')
print(headers[2:12])

### Use loops to work with lists.

* Programming lets us automate repeated operations.
* Python makes this straightforward by allowing us to loop over any collection type.
* With a `for` loop, each element in the collection is assigned to the same variable in sequence.
* Within the `for`-loop code block, we can use that variable in other Python expressions. 
* The code in the block will be executed for each element in the collection.
* When the loop reaches the end of the collection, the code block is finished.

In [None]:
for h in headers:
    print(h)

The above loop simply prints each element in the first row on a new line. 

But we can use a `for` loop to create a new list out of an old one. 

The following code:

1. Creates an empty list (the square brackets with nothing inside) and assigns it to a new variable (`gap_tbl`). **Remember, we can't use a variable in Python until we assign it.**
2. Loops over each element of `lines`, which will be a comma-separated string.
3. On each iteration, the next element in the list will be assigned to the `line` variable. (The name of the loop variable is arbitrary. We could have called it anything. Calling it `line` makes the code more legible, since it will store each value in the list `lines` in sequence.)
3. Splits the string on the commas, creating a list, and assigns this list to a variable called `row`.
4. Uses the built-in list function `append()` to add each row to `gap_tbl`.

The result -- called `gap_tbl` -- is a list of lists.

In [None]:
gap_tbl = []
for line in lines:
    row = line.split(',')
    gap_tbl.append(row)

Our `gap_tbl` variable now holds something that more closely resembles our original dataset, since each data point is its own element in a row, and each row is an element in the larger list (the table).

Now we can access individual rows or even data elements by position. 

In [None]:
print('First row:', gap_tbl[0])
print('Second row, third element:', gap_tbl[1][2])

To access a single element, we use two pairs of square brackets. 
* The first contains the index to the row.
* The second contains the index to the element within that row.

### Exercise
1. Create a new list of just the rows for the countries in Africa. (There are 52 countries in Africa in the file.)
2. Using a `for` loop, create another list with just the GDP per capita values for African countries for the year 2007.

**Hint**: Assigning a slice of a list to a variable creates a new list.

In [None]:
africa = gap_tbl[1:53]
gdp_africa = []

for line in africa:
    gdp_africa.append(line[13])

print(gdp_africa)

## Switch instructors here

## Lists vs. dictionaries

Being able to isolate data points by positional indexing is a big improvement. 

But it would even better if we had a way to identify data points by column name. That's typically how we would work with such a dataset. For instance, we'll want to know about _the percentage change in GDP per capita between 1952 and 2007_, not the _percentage change between columns 3 and columns 12_.

Python gives us another data structure we can use for this kind of case. It's called a **dictionary**.

In [None]:
countries = {'country': 'Angola',
            'gdpPercap_1952': 3520.610273}

### Access dictionary values by their keys.
* Python dictionaries are enclosed in **curly braces**, not square brackets.
* List Python lists, they can contain values of diverse types.
* Unlike lists, dictionaries are not accessed by position but by **key**. 
* Each key must be **unique** (just like the entries in an ordinary dictionary). Strings and numeric types are common choices for keys.

In [None]:
countries['country']

* Note that we can't go the other way, from value to key.

In [None]:
countries['Angola']

* We can also use variables for the keys and/or the values of a dictionary. This is useful for creating dictionaries dynamically.

In [None]:
countries['gdpPercap_2007'] = gdpPercap_2007
print(countries)

### Converting lists to dictionaries

We saw how to convert a value of type float to a value of type string. We also saw how to split a string to create a list.

How can we convert a list to a dictionary?
* A dictionary is a collection where every element is actually a pair: a key and a value. 
* A dictionary can't have keys that have no values (although the keys can have `None` as a value, which is the Python null value.)
* A list is a collection of single elements. 
* We can stitch two lists togeher, however, to create a dictionary, provided they are the same length, 
* The resulting dictionary will have for its **keys** the elements from the first list, and for its **values** the elements from the second.

In [None]:
print(gap_tbl[0])
print(gap_tbl[1])

First, let's assign the first element in our outer list (the first row in the table) to a new variable. 

The first row is the header row, so we'll call this `headers`.

In [None]:
headers = gap_tbl[0]

* The built-in function `zip()` takes one or more Python collection types and aligns them, like the two halves of a zipper.
* If we loop over the `zip`ped collections, each time through the loop, we get a new pair of elements.

In [None]:
for element in zip(headers, gap_tbl[1]):
    print(element)

* But instead of looping over them, we can take our `zip`ped lists and pass them to the built-in function `dict()`.
* `dict()` takes a list of **pairs** and creates a dictionary.
* The first element in each pair becomes a new key.
* The second element becomes the value for that key.

Here, we `zip` the header row up with the first row of data in our nested list (table).

In [None]:
dict(zip(headers, gap_tbl[1]))

### Exercise

Play around with using `zip` and `dict` to create dictionaries out of two lists.

For example, zip together a list of state abbreviations with the states' full names. Or zip together a list of months and a list of average temperatures. 

Try the `zip()` procedure on two strings -- can you tell what happened?

### Creating a list of records

You can think of a dictionary as a record in a database. Provided each dictionary has the same set of keys, a list of dictionaries would then be like a database, where each row is another record.

Using the tools we've encountered so far, we can create this structure out of our dataset. 

Once we've done that, we can begin to do some analysis!

The following code

1. Creates a new empty list (`gap_records`)
2. Assigns the first row of `gap_tbl` to a variable (`headers`)
3. Loops through the rest of our `gap_tbl` list to work with each row
4. Use the technique above to create a dictionary mapping the column headers to the data in each row
5. Uses `append` to add each dictionary to our new list.

In [None]:
gap_records = []
headers = gap_tbl[0]
for row in gap_tbl[1:]:
    record = zip(headers, row)
    record = dict(record)
    gap_records.append(record)

Before you look at what gap_records contains, think about what you expect to see. Then `print()` it. 

In [None]:
print(gap_records)

## Loops & conditionals

When working with loops, it's useful to be able to control the flow of the code in the loop. We don't always want to take the same action for every item in a collection.

That's where `if` statements come in. 

The structure of an `if` statement is similar to a `for` statement:
* First line opens with `if` and ends with a colon
* Body containing one or more statements is indented (creating a code block)
* The code block will be executed **only** if the expression in the `if` statement evaluates to `True`.

In [None]:
record = gap_records[0]
if record['continent'] == 'Africa':
    print(record['country'], "is in Africa.")

The above code tests whether a single record from our list of records is for a country in the continent of Africa. If the country is in Africa, it prints the country's name.

### Exercise

1. Try re-running this code with different indices of `gap_records`. 
2. See if you can re-do the exercise above: Using a `for` loop, create another list with just the GDP per capita values for African countries for the year 2007. But this time, try using an `if` statement instead of a slice to isolate the African countries.


In [None]:
african_gdp = []
for record in gap_records:
    if record['continent'] == 'Africa':
        african_gdp.append(record['gdpPercap_2007'])

### Use dictionaries with loops to store results of calculations.

Let's calculate the percentage change between the 1957 population and the 2007 population for each country in our dataset.

With our list of records (dictionaries), it will be straightforward to access those values by key (column name).

We can also use a dictionary, rather than a list, to store the results. Then we'll be able to look up the result for any given country by name.

Finally, we're using the `int` function to convert the population values to integers. (We wouldn't expect a population count to be a decimal number.) If we don't do that, Python will complain when we try to calculate the percentage change.

In [None]:
pop_growth = {}
for record in gap_records:
    country = record['country']
    pop_1952 = int(record['pop_1952'])
    pop_2007 = int(record['pop_2007'])
    pop_diff = pop_2007 - pop_1952
    pct_change = pop_diff / pop_1952 * 100
    pop_growth[country] = pct_change

### Use conditionals to check for error conditions.

If you ran the above code, you should have gotten a `KeyError`. This tells us that Python couldn't find the requested key in the dictionary.

The key is `country`. We know that each row in our dataset should have a `country` value. So what happened?

In a case like this, we can start troubleshooting by just visually inspecting the dataset.

In a Jupyter notebook, this will actually be easier to do if we do **not** use the `print` statement.

In [None]:
gap_records

Scrolling to the very end of the list, we can see that the very last record is non-standard. It has only a single key: `continent`. And the value of that key is the Python null string (a set of empty quotation marks).

One approach would be to remove that element from our list. But what if there happen to be others like that?

An alternative is to re-write our loop with a conditional, to test for the presence of such empty records. 

The null string, in a Boolean expression, evaluates to `False`. So the code block in the following `if` statement will not be executed:

In [None]:
if '':
    print('Not an empty string.')

Using that technique, we can re-write our loop as follows, checking each time to make sure the `continent` key has a non-null value. (We presume that if the `continent` key isn't null, then the rest of the row is valid. That may or may not be an accurate assumption, but for this dataset, it happens to be.)

In [None]:
pop_growth = {}
for record in gap_records:
    if record['continent']:
        country = record['country']
        pop_1952 = int(record['pop_1952'])
        pop_2007 = int(record['pop_2007'])
        pop_diff = pop_2007 - pop_1952
        pct_change = pop_diff / pop_1952 * 100
        pop_growth[country] = pct_change

Now as long as we know the name of the country, we can find its percentage change in population between 1952 and 2007.

In [None]:
pop_growth['India']

### Branching with conditionals

What if we want our loop not just to ignore a certain kind of element in a collection, but to do something different with it?

Python provides two other conditional statements that can accompany an `if` statement to create multi-pronged logic. Both are optional. The following code evaluates whether a number is greater than, less than, or equal to 10.

In [None]:
number = 0
if number == 10:
    print('Number is 10.')
elif number < 10:
    print('Number is less than 10.')
else:
    print('Number is greater than 10.')

We can also use `if` and `elif` without `else`. If we remove `else` and its code block from the cell above, the code would do nothing if `number` was set to a value greater than 10.

### Exercise

Use `if` and `elif` to write a loop that filters our dataset for two groups:

1. Countries with a GDP per capita in 2007 **less than 1000**. 
2. Countries with a GDP per capita in 2007 **greater than 40,000**.

Your code should create two new lists, one containing the records for the low-GDP countries, and the other containing the records for the high-GDP countries.

**Hints**
* You'll need an `if` statement to test for the empty records (as above).
* You can use the `float` function (works like `int`) to convert the GDP values to numerics before you test them. Comparing strings to numbers in Python is not permitted.
* You can nest one set of conditional statements inside another. For instance, in the code below, the inner `if`-`else` statements are only executed if `x` is greater than 0.

```
if x > 0:
    if x < .5:
        print('50th percentile or lower.')
    else:
        print('Above the 50th percentile')
```



In [None]:
low_gdp = []
high_gdp = []
for record in gap_records:
    if record['continent']:
        gdp_2007 = float(record['gdpPercap_2007'])
        if gdp_2007 < 1000:
            low_gdp.append(record)
        elif gdp_2007 > 40000:
            high_gdp.append(record)

How many countries are in each list? 

In [None]:
len(low_gdp)

In [None]:
len(high_gdp)

In [None]:
high_gdp