# More Python and Margins of Error

In [1]:
import pandas as pd
import numpy as np

## Python continued

### 1. Built-in Functions

Although default Python doesn't have a lot of functions and methods, it still has some useful functions. Let's go over some of the most valuable built-in functions in Python.

In [2]:
min(1, 3)

1

In [3]:
max(15, 25, 70)

70

You can also use these functions with strings. Can you guess what the output will be?

In [4]:
max("Hamilton", "Washington")

'Washington'

So does Python know that George Washington was older than Alexander Hamilton? Of course not, when comparing strings, Python uses the letters of alphabet. Since "W" goes after "H" in the alphabet, "Washington" is a value that is "bigger" than "Hamilton".

We have created two lists with different integers. Let's now find the biggest number among the two smallest numbers in two lists. In other words, let's find the `max` of the two `min`. You can achieve it with either 3 or 1 lines of code.

In [6]:
dob = [1757, 1756, 1732, 1737, 1754]
dod = [1804, 1836, 1799, 1793, 1782] 

##fill in
min_dob = ...
min_dod = ...
max_of_mins = ...


# you can do it in one line

max_mins = max(min(), min())
max_mins

1782

Another valuable function you can use with your numerals is "round". It will round your floats to the nearest integer. Like so:

In [7]:
round(8.7)

9

Another useful function we can use with numbers is "abs". It outputs an absolute value of a number:

In [8]:
abs(-3.5)

3.5

### 2. For-loops

#### 2.1 Loops

For-loops allow for some code to be executed repeatedly. For example, if you wish to print the numbers from 0 to 10, you can do that with the for-loop. 

A for-loop iterates through a sequence of elements (list, string, array, etc.) and reassigns an element (which can have any name, most common are `x`, `i`, `elem`, or even `_`—the name in itself doesn't matter) to each element of the sequence sequentially. 

In [9]:
for elem in ['a', 'b', 'c']:   
    print(elem)

elem

a
b
c


'c'

That is why in the previous code cell we can see that when `elem` is called at the end of the loop, it is equal to "c" which is the last element of the sequence.

In the cell below we will be using the built-in function called `range`. It enumerates the integers from 0 up to the provided value (exclusive of that last number).

**Note:** remember that Python starts to count from 0, hence the last number you specify is not going to be included.

In [10]:
for i in range(11):
    print(i)

0
1
2
3
4
5
6
7
8
9
10


**Exercise**: Let's combine what you know so far about lists, conditionals, and for loops. Create a list with values 1, 2, 3, 4, 5. Iterate through the list with a for loop, and print only the values that are greater than 2. Otherwise, print the statement "This value is not greater than 2."

In [None]:
##fill this cell in



# 2. Numpy continued

### 2.1 Arrays

With the help of numpy we can introduce a new data type - arrays. Arrays are commonly used with Data Frames (basically, tables of values). Arrays are often better than lists when working with data, as arrays have a lot of useful methods and functions. The main difference between  lists and arrays is that an array can take only one type of data (eg. only numerals, or only strings, but never both).

To create an array with values, just wrap your input inside `np.array`.

In [11]:
arr = np.array(10)
arr

array(10)

A very useful method that can be used with an array of numbers is `np.arange`. It takes at least 2, or sometimes 3, positional arguments. The first number will be the starting point of your array, the last one will identify up to which number your array will go (exclusive of the last value, just like `range`). The third positional argument is optional, it shows how far apart each element in the range should be.

For example, if you need an array with numbers from 0 to 20, but you only want for it to include every other number, you will need add a third positional argument.

In [12]:
every_other = np.arange(0, 21, 2)
every_other

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

Floats can also be provided as arguments:

In [18]:
every_2point5 = np.arange(2.5, 21, 2.5)
every_2point5

array([ 2.5,  5. ,  7.5, 10. , 12.5, 15. , 17.5, 20. ])

Note that the default value of the third argument is equal to 1, so if you just want to count every integer, you don't need to provide it.

In [15]:
every_one = np.arange(0, 21)
every_one

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20])

You can also convert a list of numbers in Python into an array:

In [19]:
num_lst = [1, 2, 3, 4, 5]
num_arr = np.array(num_lst)
num_arr

array([1, 2, 3, 4, 5])

You can do some arithmetic with arrays. Guess what will be the output of this line of code before you run it:

In [20]:
num_arr*2

array([ 2,  4,  6,  8, 10])

As you can see, all the values in our array got mutliplied by 2.

Try performing the same operation with the initial list of numbers instead. What do you think the output will be in the cell below?

In [21]:
num_lst*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

As you can see, lists and arrays not only can be used differently, but they also give different outputs for the same operations.

# 3. Margins of Error (MoEs) in Python

It often makes sense to congregate census categories. For example, in the Excel sheet “Ages” in the workbook “Lab4Data.xlsx” you will find the education information for Census Tract 4004. The information separates the population into 6 age categories. This may be more than we need for the purposes of our analysis, so we are going to consolidate the information into three categories, “Child”, “Working Age Adult”, and “Senior Adult”.

Here is the data we'll be loading in. Don't worry too much about this code—we are importing data from an Excel spreadsheet into a Pandas DataFrame, and then dropping some redundant columns.

In [27]:
data = pd.read_excel('Lab4Data.xlsx')
data = data.drop(columns = data.columns[[1, 3, 4]])
data

Unnamed: 0,AGE,Estimate,Margin of Error
0,Under 5 years,257,+/-101
1,5 to 17 years,421,+/-125
2,18 to 34 years,1241,+/-204
3,35 to 64 years,1569,+/-140
4,65 to 74 years,329,+/-77
5,75 years and over,169,+/-59


### 3.1 Aggregating Age

Let’s create a new column for each of our new categories, and a column for our new categories’ MOEs. Then let’s congregate our categories by summing estimates across categories. “Child” will include “Under 5 years” and “5 to 17 years”. “Working Age Adult” will include “18 to 34 years” and “35 to 64 years”. “Senior Adult” will include “65 to 74 years” and “75 years and over”.

In [29]:
# To do this, lets make a new data frame:

aggregated = pd.DataFrame()
aggregated['Age Categories'] = ['Child', 'Working Age Adult', 'Senior Adult']
aggregated['Estimate'] = [sum(data.Estimate[0:2]), sum(data.Estimate[2:4]), sum(data.Estimate[4:])]
aggregated

Unnamed: 0,Age Categories,Estimate
0,Child,678
1,Working Age Adult,2810
2,Senior Adult,498


### 3.2 Aggregating Margins of Error

To calculate the MOE for aggregated count data:
1. Obtain the MOE of each individual estimate.
2. Square the MOE of each estimate.
3. Sum the squared MOEs.
4. Take the square root of the sum of the squared MOEs.

We've written a function called `MOE` that does these steps for us. Make sure you understand how it works!

In [31]:
# this is a function to calculate MOE

def MOE(arr):
    sq = arr**2
    return np.sqrt(sum(sq))

We can apply the function specifically on the rows corresponding to the "Child" age group, which are the first two rows of our `data`. Remember that when slicing lists, and arrays, the second term is exclusive.

In [33]:
# Apply the function above on an array of MOE's you want to aggregate
# For example: age group Child.

agg_moe_child = MOE(data['Estimate'][0:2])
agg_moe_child

493.24436134638177

Now, do the same for the "Working Age Adult" and "Senior Adult" categories!

In [36]:
agg_moe_adult = ... #FILL IN 
agg_moe_senior = ... #FILL IN 

The cell below puts our calculated `agg_moe_child`, `agg_moe_adult`, `agg_moe_senior` into an array, and then adds that array as a column to our `aggregated` DataFrame.

In [37]:
# Run this cell to produce and view your final aggregated table!
aggregated['MOE'] = np.array([agg_moe_child, agg_moe_adult, agg_moe_senior])
aggregated

Unnamed: 0,Age Categories,Estimate,MOE
0,Child,678,493.244
1,Working Age Adult,2810,Ellipsis
2,Senior Adult,498,Ellipsis


Finally, you can run this line that will output the aggregated table to a .csv file, so you can use it in other programs!

In [38]:
aggregated.to_csv('Lab4Agg.csv')