# A list stores many values in a single structure.

* Doing calculations with a hundred variables, say we are looking at populations from the [gapminder](https://www.gapminder.org/data/) dataset in 2007, would be very slow doing them by hand.
* Use a list to store many values together.
    * Contained within square brackets `[...]`.
    * Values separated by commas `,`.
* Use `len` to find out how many values are in a list.

In [1]:
Americas_pop = [301139947, 190010647, 108700891]   # Top 3 populations of countries in North and South America
len(Americas_pop)                                  # In this example the countires are US, Brazil and Mexico

3

In [13]:
primes = [2, 3, 5, 7, 11, 13]
len(primes)

6

# Use an item's index to fetch it from a list.

Python uses zero indexing meaning the first element of a list is the zeroth element.

In [3]:
Americas_pop[0]     # 1st element of Americas_pop

301139947

In [14]:
primes[0]      # 1st element of primes

2

In [15]:
primes[3]      # 4th element of primes

7

In [4]:
Americas_pop[2]     # 3rd element of Americas_pop

108700891

Use an index expression on the left of assignment to replace a value.

In [None]:
Americas_pop[0] = 6942069
print('Americas_pop is now:', Americas_pop)

This was clearly just for an example and we can change it right back to the correct numbers.

In [6]:
Americas_pop[0] = 301139947
Americas_pop

[301139947, 190010647, 108700891]

# Slicing up the list.

* A slice is a part of a list (or, any list-like thing, can include strings).
* We take a slice by using `[start:stop]`, where `start` is replaced with the index of the first element we want and stop is replaced with the index of the element just after the last element we want.
* Mathematically, you might say that a slice selects `[start:stop)`.
* The difference between stop and start is the slice's length.
* Taking a slice does not change the contents of the original list. Instead, the slice is a copy of part of the original list.

In [2]:
primes = [2, 3, 5, 7, 9]
primes[0:3]

[2, 3, 5]

* Using `[start:]` starts at the index specified and goes through the rest of the array.
* Using `[:end]` goes from the begginging through the end-1.
* Using `[:]` copies the whole list.

In [15]:
primes[3:]

[7, 9]

In [16]:
primes[:2]

[2, 3]

There is also the `step` value, which can be used with any of the above:  
`[start:end:step]`  
The key point to remember is that the `:end` value represents the first value that is not in the selected slice. So, the difference beween `end` and `start` is the number of elements selected (if `step` is 1, the default).  

The other feature is that `start` or `end` may be a **negative** number, which means it counts from the end of the array instead of the beginning. So:

In [18]:
primes[-1]   # last item in the list

9

In [19]:
primes[-3:]  # last three items in the list

[5, 7, 9]

In [21]:
primes[:-3]  # everything but the last three items in the list

[2, 3]

`step` can also be a negative number allowing for reverse indexing of a list.

In [23]:
primes[::-1]      # all items in the array, reversed

[9, 7, 5, 3, 2]

In [24]:
primes[1::-1]     # the first two items, reversed

[3, 2]

In [25]:
primes[:-3:-1]    # the last two items, reversed

[9, 7]

In [26]:
primes[-3::-1]    # everything except the last two items, reversed

[5, 3, 2]

This may be a little confusing at first, but just remember `[start:end:step]`

In [7]:
primes

[2, 3, 5, 7, 9]

# Adding items to a list

Use `list + list` to add items to the end of a list.

In [8]:
primes = [2, 3, 5]
print('primes is initially:', primes)
primes = primes + [7, 9]
print('primes has become:', primes)

primes is initially: [2, 3, 5]
primes has become: [2, 3, 5, 7, 9]


# Removing items from a list

* `del list_name[index]` removes an item from a list and shortens the list.  
* Not a function or a method, but a statement in the language.

In [8]:
print('primes before removing last item:', primes)
del primes[4]
print('primes after removing last item:', primes)

primes before removing last item: [2, 3, 5, 7, 9]
primes after removing last item: [2, 3, 5, 7]


* `list_name.remove(value)` removes an item from the list based on the first matching value.

In [10]:
print('primes before removing last item:', primes)
primes.remove(7)
print('primes after removing last item:', primes)

primes before removing last item: [2, 3, 5, 7]
primes after removing last item: [2, 3, 5]


## To remove all occurrences of your element from your list, you will have to use a list comprehension which we will learn in the loops module soon.

In [16]:
a = [10, 20, 30, 40, 20, 30, 40, 20, 70, 20]
a = [x for x in a if x != 20]
a

[10, 30, 40, 30, 40, 70]

# Lists can contain more than just numbers.

* Every value in a program has a specific type.
* Integer (`int`): represents positive or negative whole numbers like 3 or -512.
* Floating point number (`float`): represents real numbers like 3.14159 or -2.5.
* Character string (usually called "string", `str`): text.
    * Written in either single quotes or double quotes (as long as they match).
    * The quote marks aren't printed when the string is displayed.  
    
* Use the built-in function `type` to find out what type a value has.

In [5]:
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']
type(goals[0])

int

In [6]:
type(goals[1])

str

# Dictionaries

A dictionary is a set of *key:value* pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: `{}`. Placing a comma-separated list of *key:value* pairs within the braces adds initial *key:value* pairs to the dictionary; this is also the way dictionaries are written on output.

In [27]:
Americas_pop = {'United States': 301139947, 'Mexico': 190010647, 'Brazil': 108700891}
Americas_pop    # Top 3 populations of countries in North and South America

{'United States': 301139947, 'Mexico': 190010647, 'Brazil': 108700891}

# Subsetting by name

We can extract elements by using their name:

In [24]:
Americas_pop['Mexico']

190010647

To delete a *key:value* pair use the `del`.

In [28]:
del Americas_pop['Brazil']
Americas_pop

{'United States': 301139947, 'Mexico': 190010647}

To add a new key to an existing dictionary you can do the following:  
`dict['key_name'] = value`

In [30]:
Americas_pop['Brazil'] = 108700891
Americas_pop

{'United States': 301139947, 'Mexico': 190010647, 'Brazil': 108700891}

Sometimes we will want a boolean to be returned, specifically later many times for loops working with booleans are easier. To check whether a single key is in the dictionary, use the `in` or `not in` keyword.

In [32]:
'Brazil' in Americas_pop

True

In [34]:
'Mexico' not in Americas_pop

False

To filter dictionaries we would have to use a dict comprehenesion

# Handling special values

This can only be used in numpy and pandas.

   * `isnull():` Generate a boolean mask indicating missing values
   * `notnull():` Opposite of `isnull()`
   * `dropna():` Return a filtered version of the data
   * `fillna():` Return a copy of the data with missing values filled or imputed

In [36]:
Americas_pop['Canada'] = None
Americas_pop

{'United States': 301139947,
 'Mexico': 190010647,
 'Brazil': 108700891,
 'Canada': None}

In [40]:
import pandas as pd
import numpy as np
Americas_pop.isnull()

AttributeError: 'dict' object has no attribute 'isnull'

In [41]:
data = pd.Series([1, np.nan, 'hello', None])
data.isnull()

0    False
1     True
2    False
3     True
dtype: bool