In [1]:
from IPython.core.display import HTML

def css_styling():
    styles = open("../Data/www/styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

# Synopsis

One of the strengths of Python is the sheer amount of code that has been included by default in its distribution. In this unit, we will learn:

1. What libraries are
2. How to handle importing a library
3. Usage of several important, basic Python libraries (`math`, `glob`, `random`, `Collections`, `os`, `time`, `datetime`, `operator`)

# The Python Standard Library: "Batteries Included"

Of all the reasons to use Python, number one  (by a landslide) is that the community of Python programmers provide a ton of support. Most problems that you will encounter have been solved by someone, it's just a matter of knowing how to find and interpret the solutions.

Prepare to become intricately familiar with [stackoverflow](https://stackoverflow.com) which is a messageboard where people post a dizzying array of problems and solutions that are ranked by the community in terms of their helpfulness. And if you're not already proficient at articulating the precise nature of your problem with google, prepare to learn. Google can either be your [best friend](https://www.google.com/#q=python+divide+string+into+list+of+characters), or a [mortal foe](https://www.google.com/#q=how+do+i+get+numbers+and+letters+as+a+list+in+python).

Some problems, however, are so general that they just keep ocurring again and again and again. The community of Python programmers have taken some of these problems, figured out fast and efficient solutions to them, and provided the code directly to you in the form of a "library". Not all libraries come with the "default" Python because there are just too many of them. You can and likely will write your own libraries for specific problems you encounter. But some libraries are _really_ useful. The reason you all downloaded the punily named "Anaconda" is because it's Python, only _bigger_. Instead of just giving you Python, Anaconda has collected a range of other common "libraries" that nearly every Python programmer uses and has installed these for you.

"Default" Python, however still has a lot of useful libraries known as the "Standard Library". A key principle here is _trust_, the things you find in the Python Standard Library will work. And they will work well. __When you find an answer to a question on stackoverflow, your confidence in that answer should be tempered and the results tested.__ If it's in the Python Standard Library, thousands of people have already tested it and it works. The same goes for some other very common packages, such as those included in Anaconda.

Before we move on, we should note that this notebook has drawn heavily from the following references which are each great in their own right at helping to explain details of the Standard Library and Python programming in general. If you're looking for more after this lecture or after this course this is a good starting point: 

* [Brief Tour of the Standard Library](https://docs.python.org/3/tutorial/stdlib.html)
* [The Python Standard Library - Index](https://docs.python.org/3/library/index.html)
* [Think Python](http://www.greenteapress.com/thinkpython/)

# Libraries 
### (...or how I learned to stop writing so much code and use the Python Standard Library)

I've said the words "library" and "libraries" quite a few times now, but what are they? Think of a library as a collection of useful functions all relating to a generally similar topic. You might have seen code like this:

```
import math
from math import log
import math as awesome_mathematical functions
from math import *
```
We'll look at each of these in turn to see what is going on here. 

Suppose that I was interested in knowing the logarithm of the number 348. I might type:

In [2]:
log(348)

NameError: name 'log' is not defined

And you should see a `NameError`, because 'normal' python doesn't know what 'log' means. We never defined it or told it that when I type `log(number)` what I really mean is that I want the exponent to which another fixed value, the base, must be raised to produce that number (phew). We could labor and think about _how_ to write that code but logarithms are pretty common right? Surely someone else has figured this out already. Enter `math`:

In [3]:
import math

In [4]:
log(348)

NameError: name 'log' is not defined

But still we have a `NameError`, What's going on here? Well to access all the cool functions in `math` we need to first tell python what we really mean when we say `log` is a specific function written in the math library:

In [5]:
math.log(348)

5.8522024797744745

Voila! 5.8522024... you get the point. Suppose we thought that was a little tedious to type over and over again. And all we really need the `math` library for is `log`, we don't care about all the other cool stuff that it has. Well, we could just type:

In [6]:
from math import log

And now `log()` should work out just fine:

In [7]:
log(348)

5.8522024797744745

To assure ourselves that these two things are entirely equivalent:

In [8]:
math.log(348) == log(348)

True

How you choose to import functions from a library might depend a lot on your project. You'll also frequently see something like this:

In [9]:
import math as awesome_mathematical_functions

All we did was kind of rename `math`. This is silly, because we're actually typing more in this example than math. But as you'll see later there are some packages that are so commonly used that even though their name is only 5 letters long people _always_ import them as an alias with two letters. Our function should work exactly the same as before though: 

In [10]:
awesome_mathematical_functions.log(348)

5.8522024797744745

In [11]:
awesome_mathematical_functions.log(348) == math.log(348)

True

There is a final way of importing libraries that you might see, but we're not going to actually run the code because it's the worst.
```
from math import *
```
You might be able to guess what this is doing and some of you might see why it's a terrible idea. Instead of having to type `math.log()`, importing in this manner will let us access every function in math directly by name `log()`, `exp()`, etc. This might seem nice and easy, but do you know everything that is in the `math` library? It might be huge. And what if my code is analyzing the revenues of a timber company and I happen to have a variable called `log` that refers to the price of a fallen tree. Depending on the order of when I run my code and my imports, `log` might either refer to a function or my variable. If I always use 
```
from math import log
```
I have the same problem in that I've defined `log` but I'm explicitly reminded of the name of the function that I'm importing. And if I was a timber company I might see the err in my ways. But by using the first syntax:
```
from math import *
```
I'm importing perhaps hundreds or thousands of functions whose name I don't even know.

> you'll see `from math import *` in your googling. Don't do it. 

### Documentation
So we found the logarithm of 348 a number of ways. But the astute among you may ask, logarithm of base what? Well, Jupyter (Ipython Notebook) can be really helpful here. Try typing:

In [12]:
math.log?

A helpful little box should have popped up explaining a bit about math.log (which you can close by clicking the x in the upper right corner).
We could also type `math.log` then Shift+Tab. Try that below:

In [13]:
math.log

<function math.log>

or

In [14]:
help(math.log)

Help on built-in function log in module math:

log(...)
    log(x[, base])
    
    Return the logarithm of x to the given base.
    If the base not specified, returns the natural logarithm (base e) of x.



From either of these options we learned that `math.log` needs a number `x`. We could also give it a second number separated by a comma to specify the base. If we don't give that second argument, then it will default to `e`, the natural logarithm.

In [15]:
print(math.log(348))
print(math.log(348, 2))
print(math.log(348, 10))

5.8522024797744745
8.442943495848729
2.5415792439465807


But is it really the natural logarithm? Let's double check:

In [16]:
print(math.log(348, e))

NameError: name 'e' is not defined

Ack! Python doesn't know what e is! How do I know whether `math.log` is really using the base `e`?

Well `e` is pretty mathy, maybe the math library can help us but how do I know?

In [17]:
help(math)

Help on module math:

NAME
    math

MODULE REFERENCE
    https://docs.python.org/3.6/library/math
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module is always available.  It provides access to the
    mathematical functions defined by the C standard.

FUNCTIONS
    acos(...)
        acos(x)
        
        Return the arc cosine (measured in radians) of x.
    
    acosh(...)
        acosh(x)
        
        Return the inverse hyperbolic cosine of x.
    
    asin(...)
        asin(x)
        
        Return the arc sine (measured in radians) of x.
    
    asinh(...)
        asinh(x)
        
        Return the inverse hyperbolic sine of x.
    
    atan(...)
        atan(x)
        
 

Remember we could also just have typed `math` and then held down Shift+Tab to get a drop down of some available options. But it looks like `exp` is in there somewhere which is just what we want.

In [18]:
help(math.exp)

Help on built-in function exp in module math:

exp(...)
    exp(x)
    
    Return e raised to the power of x.



So e raised to the first should just be e! Right?

In [19]:
math.exp(1)

2.718281828459045

Success! Now is math.log really defaulting to base e?

In [20]:
math.log(348) == math.log(348, math.exp(1))

True

You win `math` library, you win. 

This is a fun little exercise. But really, I trusted that `math` library all along because I know it's part of the Python Standard Library. Trust is key when using built-in libraries and functions otherwise you might never get anything done. Just don't spread that trust too broadly. You really _really_ need to get in the habit of looking at documentation when you use a library or a function that you've never used before. Thankfully Jupyter gives you a lot of options on how to do so, so you have no excuse. 

# Time for the tour:

# `math` - Mathematical functions
[Package documentation](https://docs.python.org/3/library/math.html)

Our old friend `math` is as good of a starting point as any. We'll learn a lot more about complex mathematics and statistics libraries later. But for now, you'll need some basic math aside from +-*/ (which should all work as expected!).

But to see why `math` is so great let's take a break and try a little exercise:

**Exercise:** what is the value of 21! (21 factorial)?

In [21]:
def calculate_factorial(number_of_interest):
    #Place your code here
    return factorial_of_number

In [22]:
number_of_interest = 21
calculate_factorial(number_of_interest)

NameError: name 'factorial_of_number' is not defined

When you're learning how to code these exercises are great practice. Once you're comfortable you'll know that it's far easier to say:

In [23]:
math.factorial(21)

51090942171709440000

Did your answers match up? They better!

In [24]:
math.factorial(number_of_interest) == calculate_factorial(number_of_interest)

NameError: name 'factorial_of_number' is not defined

# `random` - Generate pseudo-random numbers

[Package documentation](https://docs.python.org/3/library/random.html)

Greatest Hits:
* `random.random()`: returns a number in the range [0.0, 1.0)
* `random.randint(a, b)`: returns an integer in the range [a, b]
* `random.choice(x)`: randomly returns a value from the sequence x
* `random.sample(x, y)`: randomly returns a sample of length y from the sequence x without replacement

In [25]:
import random

What if I just want a random number between 0 and 1, because who knows, it might be useful (it will be at some point):

In [26]:
random.random()

0.19591372200460733

Make sure that you run that cell a few times, you should get a different answer every time.

Sometimes integers are just easier to deal with. 

In [27]:
random.randint(7, 261)

83

__Exercise:__ Are the numbers 7 and 261 included or excluded from this random number generator:

In [28]:
#Place your code here



For a lot of statistical tests you'll want to be able to randomly select items from a list so here are two easy ways to do it. As always, if this is random it better give you different results when you run it multiple times!

In [29]:
dwarfs = ['Doc', 'Grumpy', 'Happy', 'Sleepy', 'Bashful', 'Sneezy', 'Dopey']

In [30]:
random.sample(dwarfs, 3)

['Doc', 'Bashful', 'Sleepy']

In [31]:
random.choice(dwarfs)

'Bashful'

# `os` - Miscellaneous operating system interfaces

[Package documentation](https://docs.python.org/3/library/os.html)

These should be pretty self explanatory. But when you're navigating through file systems to read and write files you'll quickly learn how important they are.

In [32]:
import os

In [33]:
current_directory = os.getcwd()
print(current_directory)

/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons


In [34]:
contents = os.listdir(current_directory)
print(contents)

['Day4_pm1_Mini-Project.ipynb', 'Day6_am1_Using_APIs_1.ipynb', 'Day3_pm1_Functions.ipynb', '.DS_Store', 'Day7_pm2_Structured-Data-Analysis-Pt2.ipynb', 'Day3_am2_Data-Visualization.ipynb', 'Day4_am1_Dictionaries.ipynb', 'Day8_am1_Image-Manipulation.ipynb', 'Day5_pm2_Sentiment-Analysis.ipynb', 'Day7_am1_Statistical_analysis_w_Python.ipynb', 'Day6_am2_Using_APIs_2.ipynb', 'Day5_am1_Text-analysis.ipynb', 'Day8_pm1_Cell_detection_project.ipynb', 'Day6_pm1_Web_scraping.ipynb', 'Day8_am2_Image-Analysis.ipynb', 'introduction-to-python', 'Day7_am2_The-Bootstrap.ipynb', 'Day3_pm2_Review.ipynb', 'Day7_pm1_Structured-Data-Analysis-Pt1.ipynb', 'Day5_pm1_Regular-expressions.ipynb', 'Day3_am1_Standard-Library.ipynb', '.ipynb_checkpoints', 'Day7_am2_More_stats_w_Python.ipynb', 'Day4_am2_Review.ipynb', 'Day7_am2_Bootstrapping_MC_chains.ipynb', 'web-scrapping']


# `glob` - Unix-style pathname pattern expansion

[Package documentation](https://docs.python.org/3/library/glob.html)

`glob` doesn't have a lot. In fact, it's just two functions which are quite similar but powerful nevertheless. 

In [35]:
import glob

In [36]:
help(glob)

Help on module glob:

NAME
    glob - Filename globbing utility.

MODULE REFERENCE
    https://docs.python.org/3.6/library/glob
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

FUNCTIONS
    escape(pathname)
        Escape all special characters.
    
    glob(pathname, *, recursive=False)
        Return a list of paths matching a pathname pattern.
        
        The pattern may contain simple shell-style wildcards a la
        fnmatch. However, unlike fnmatch, filenames starting with a
        dot are special cases that are not matched by '*' and '?'
        patterns.
        
        If recursive is true, the pattern '**' will match any files and
        zero or more directories and subdirectories.
    
    ig

In [37]:
for infile in glob.glob(current_directory + '/*'):
    print(infile)

/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons/Day4_pm1_Mini-Project.ipynb
/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons/Day6_am1_Using_APIs_1.ipynb
/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons/Day3_pm1_Functions.ipynb
/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons/Day7_pm2_Structured-Data-Analysis-Pt2.ipynb
/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons/Day3_am2_Data-Visualization.ipynb
/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons/Day4_am1_Dictionaries.ipynb
/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons/Day8_am1_Image-Manipulation.ipynb
/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons/Day5_pm2_Sentiment-Analysis.ipynb
/Users/lgaalves/Documents/presentations/school_of_applied_math/Lessons/Day7_am1_Statistical_analysis_w_Python.ipynb
/Users/lgaalves/Documents/presentations/school_of_applied_mat

**Exercise:** how many files are in your current directory? How many of those are '.ipynb' files?

In [38]:
###Place code here


# `time` - Time access and conversions

[Package documentation](https://docs.python.org/3/library/time.html)

Greatest Hits:
* `time.sleep(x)`: pauses for x seconds
* `time.time()`: gets current time in seconds

In [39]:
import time

In [40]:
time.time()

1561324965.7057338

In [41]:
time.time()

1561324966.05299

This can be a useful if somewhat tedious way to see how long your code takes to run!

In [42]:
start_time = time.time()
for i in range(10000):
    trash = i**2
end_time = time.time()
print(end_time - start_time)

0.0035271644592285156


(There is another library called `timeit` that provides functions to help you do this as well)

In [43]:
###Place your code here
import timeit
timeit.timeit('for i in range(10000): trash = i**2', number=10)

0.03151125001022592

__Exercise__: Remember when we made a function to rival `math.factorial`? Which one runs faster?

On rare occasions, you might actually want your code to run _slower_ (perhaps when scraping a website). You might want to take a little break between each time you run a line of code:

In [44]:
for i in range(10):
    print(i)
    time.sleep(2)

0
1
2
3
4
5
6
7
8
9


Compare that to:

In [45]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


# `datetime` - Basic date and time types

[Package documentation](https://docs.python.org/3/library/datetime.html)

We'll talk about this package a bit more later, but for now let's just give you some basics:

In [46]:
import datetime

In [47]:
today = datetime.date.today()
print(today)
print(today.day)
print(today.year)


2019-06-23
23
2019


In [48]:
birthday = datetime.date(1984, 2, 25)
print(birthday.day)
print(birthday.month)
print(birthday.year)

25
2
1984


In _one_ variable called `birthday` we now have lots of information. This is much easier to work with than having separate variables for each of these:
```
birth_day = 25
birth_month = 2
birth_year = 1984
```
or one variable that we have to split apart everytime we only care about a particular piece of it:

```
birthday = '02-25-1984'
```

# `copy` - Shallow and deep copy operations

[Package documentation](https://docs.python.org/3/library/copy.html)

Greatest Hits:
* `copy.copy(x)`
* `copy.deepcopy(x)`

This is a really subtle but important point that you need to be aware of.

In [49]:
import copy

Suppose I defined variable `x` and for the time being I want to have `y` equal the same thing:

In [50]:
x = [5, 6]
y = x

But now something came up, I need to change `y`:

In [51]:
y[0] = 2

So now what are the values of x and y?

In [52]:
print(x)
print(y)

[2, 6]
[2, 6]


Ack! That's not what we wanted at all. This is an important point in Python. Every time you are trying to copy a variable that is a collection (list, tuple, set, or dictionary) python actually copies just the _reference_ to that variable in order to save memory.

So how would we get what we wanted? Enter `copy`

In [53]:
x = [5, 6]
y = copy.copy(x)
y[0] = 2
print(x)
print(y)

[5, 6]
[2, 6]


Much better. However `copy.copy()` is _shallow_. If I have nested lists for instance it would only 'copy' the top-level list and not the underlying lists. It's a subtle point but for almost all applications what you really want is copy.deepcopy(). I.e. it will create an exact replica all the way down to the variable you give it. And it functions the exact same way as copy:

In [54]:
# Standard copy.copy doesn't work

x = ["a", [5, 6]]
y = copy.copy(x)
y[1][0] = 2
print(x)
print(y)

['a', [2, 6]]
['a', [2, 6]]


In [55]:
# But copy.deepcopy works!

x = ["a", [5, 6]]
y = copy.deepcopy(x)
y[1][0] = 2
print(x)
print(y)

['a', [5, 6]]
['a', [2, 6]]


# `operator` - Standard operators as functions

[Package documentation](https://docs.python.org/3/library/operator.html)

This will be easiest to describe with an example:

In [56]:
import operator

In [57]:
x = [[5,4,3], [2, 4, 5], [9,2,1]]
x.sort()
print(x)

[[2, 4, 5], [5, 4, 3], [9, 2, 1]]


What actually happened here? I sorted a list of lists based off of the first value of the lists. But suppose I wantd to sort based off the second? or the last?

In [61]:
x.sort(key=operator.itemgetter(3))
print(x)

[[9, 2, 1], [2, 4, 5], [5, 4, 3]]


Woops! Remember, the lists inside only have three elements in them. And we start indexing at 0 so I just told it to sort based off a non-existent entry. 

In [62]:
x.sort(key=operator.itemgetter(2))
print(x)

[[9, 2, 1], [5, 4, 3], [2, 4, 5]]


Much better :)

# `collections` - Container datatypes

[Package documentation](https://docs.python.org/3/library/collections.html)

Greatest Hits:
* `collections.Counter`: counts repeated instances from an iterable

A pretty common problem that you may encounter is: given a list, how many times does each unique element appear inside of that list? 

An easy way to solve this is with Counter

In [63]:
#Write code to count the occurrences of all the names in the list
dwarfs = ['Doc', 'Grumpy', 'Happy', 'Sleepy', 'Bashful', 'Sneezy', 'Doc', 'Dopey']
###Place your code here



print(dwarfs_count)

NameError: name 'dwarfs_count' is not defined

While that's a good exercise, it's obviously a little bit tedious. Since I mentioned that this is a common problem, as you might expect, the work has already been done for you. 

In [64]:
import collections
dwarfs = ['Doc', 'Grumpy', 'Happy', 'Sleepy', 'Bashful', 'Sneezy', 'Doc', 'Dopey']
dwarfs_count = dict(collections.Counter(dwarfs))
print(dwarfs_count)

{'Doc': 2, 'Grumpy': 1, 'Happy': 1, 'Sleepy': 1, 'Bashful': 1, 'Sneezy': 1, 'Dopey': 1}


# Bonus libraries (because they're awesome):

# `numpy` - Numerical Python
[Package documentation](http://docs.scipy.org/doc/numpy/)


# `scipy` - Scientific Python
[Package documentation](http://docs.scipy.org/doc/scipy/reference/)

Strictly speaking, neither of these packages are in the Python Standard Library. But they're included by default in 'Anaconda' because they're extremely useful for any scientist. Both of these packages have a ton of information in them and you're sure to see more of them throughout the remainder of this bootcamp. But for now here are a few basics just to get you started and to show you how easy statistics is in python. 


*data from: [statcrunch](https://www.statcrunch.com/5.0/shareddata.php?keywords=HEIGHT)


In [66]:
ny_sky_scrapers = [541.3, 417, 415.1, 381, 365.8, 318.9, 318.8, 306.4, 297.7, 290.2]
chicago_sky_scrapers = [442.1, 423.2, 346.3, 343.7, 306.9, 303.3, 292.9, 265, 261.9, 261.8]

Okay we have some data, the common question is... are they significantly different? Some common summary statistics that we might want are the mean and the standard deviations. Let's go ahead and get those easily using numpy:

In [67]:
import numpy as np
print(np.mean(ny_sky_scrapers), np.std(ny_sky_scrapers))
print(np.mean(chicago_sky_scrapers), np.std(chicago_sky_scrapers))

365.21999999999997 73.7358772918584
324.71 61.28861966140207


Looks like buildings in NYC are taller! But is the difference significant? For that, we can turn to scipy...

In [68]:
from scipy import stats
print(stats.ttest_ind(ny_sky_scrapers, chicago_sky_scrapers))

Ttest_indResult(statistic=1.2675012140787592, pvalue=0.2211287940263085)


Well, we have some numbers! But what are they? Don't forget to always check the documentation!

In [69]:
help(stats.ttest_ind)

Help on function ttest_ind in module scipy.stats.stats:

ttest_ind(a, b, axis=0, equal_var=True, nan_policy='propagate')
    Calculate the T-test for the means of *two independent* samples of scores.
    
    This is a two-sided test for the null hypothesis that 2 independent samples
    have identical average (expected) values. This test assumes that the
    populations have identical variances by default.
    
    Parameters
    ----------
    a, b : array_like
        The arrays must have the same shape, except in the dimension
        corresponding to `axis` (the first, by default).
    axis : int or None, optional
        Axis along which to compute test. If None, compute over the whole
        arrays, `a`, and `b`.
    equal_var : bool, optional
        If True (default), perform a standard independent 2 sample test
        that assumes equal population variances [1]_.
        If False, perform Welch's t-test, which does not assume equal
        population variance [2]_.
    
   

So it returns two things, the t-statistic and the p-value. Since our p-value was 0.22 we wouldn't conventionally say that there is a significant difference. 

Let's not go too far into the statistical weeds here... but these two lists probably aren't normally distributed  (a central assumption of a t-test!). So we should really perform a non-parametric test that doesn't assume normality:

In [70]:
print(stats.ranksums(ny_sky_scrapers, chicago_sky_scrapers))

RanksumsResult(statistic=1.209486313629527, pvalue=0.22647606604348625)


Same result, take that New York City! The point being, a lotttt of common statistical tests are already implemented in scipy so rather than re-invent the wheel, search for your favorite, understand its assumptions and limitations, and implement it in a line for a quick result. 

Another really common problem you'll inevitably have is whether two things are correlated or not. Standard linear regression is super nice and easy using `scipy`. I wonder if buildings are getting taller over time or not?

In [71]:
heights = [541.3, 442.1, 423.2, 417, 415.1, 381, 365.8, 346.3, 343.7, 318.9, 318.8,\
          306.9, 306.4, 303.3, 297.7, 292.9, 290.2, 265, 261.9, 261.8] 
dates = [2014, 1974, 2009, 1972, 1973, 1931, 2009, 1973, 1969, 1930, 2007,\
        1989, 2014, 1990, 2014, 1990, 1932, 1989, 1976, 2009]

Here are two functions for calculating correlations. The first is a standard linear regression and the second is what is known as a non-parametric regression because it doesn't assume a linear relationship between the variables.

In [72]:
print(stats.linregress(dates, heights))
print(stats.spearmanr(dates, heights))

LinregressResult(slope=0.16407069333851218, intercept=19.58000097106259, rvalue=0.06225466775144008, pvalue=0.7942952339246117, stderr=0.6199827708596287)
SpearmanrResult(correlation=-0.10645618397234095, pvalue=0.6550882607496282)


You'll have to look in the documentation to see what all of those numbers are!

But since I already know, the correlation doesn't appear to be significant. Of course, we're not saying buildings *aren't* getting taller with time. What we're saying is that based off of the 10 tallest buildings in Chicago and the 10 tallest buildings in NYC, there is no evidence that buildings are getting taller with time. If we really wanted to test this hypothesis, our limited/truncated little dataset probably wouldn't suffice. The most important part of data analysis is good data, and we kind of violated that rule here, but it provides a nice introduction to some basic statistical tests at least!

# Wrap-up
That's all for the Python Standard Library, but you haven't seen the last of any of these packages. I highly recommend reading through some of the links I provided at the top of this notebook. But the best way to learn about packages also happens to be the best way to learn programming. Practice. Practice. Practice. Just know that while you're practicing, if you encounter a problem that seems like it might already have been done before, turn to google. Trust the standard library. Trust other packages slightly less so. And trust stackoverflow the least. Test functions that you use, make sure they give you results that you expect, and delight in how much time you'll have saved. 