# A Beginner's Guide to Programming in Python

Welcome to python, the second language of QBIO490! This document will take you through the basics of python before we jump right into our analyses. Let's get started!

## Jupyter Notebook

This is Jupyter Notebook! It is a 'notebook' style integrated development environment (IDE) for Python scripts. There are comment blocks (like this)

In [None]:
# And there are code blocks (like this)
print('This is a code block!')

Run code blocks by pressing the 'run this cell' arrow button to the right of the codeblock execution counter (this is the 'In [ ]') on the top left of each cell. Everytime you run a cell, the execution counter will increase for all programs by 1. This lets you easily see the order that cells were executed in. <br />    
You can also run the current cell selected by pressing the run button at the top or pressing shift and enter.<br/>
<br/>
You can change the type of block in the top menu by switching between <u>Markdown</u> and <u>Code</u> <br/>
<br/>

### This is currently a markdown cell. Change it to code and run the code
print("I won't run unless I'm in a code block")

You can create a new code block, or <u>cell</u>, underneath the currently selected one with the plus icon at the top. You can move cells up and down with the arrows next to run. Delete a cell by selecting the entire cell and then pressing X.

## Python

Python as a language has a lot of similarities to R with regards to syntax and execution, but some differences as well. Let's explore the basic syntax of python.

## Setting up your working directory

Just like in R, if you want to use relative file paths, you need to know where you are in terms of your directory. Run the following code to set your working directory to the analysis_data folder.

In [1]:
import os

print("Current working directory: {0}".format(os.getcwd()))

os.chdir('PATH/TO/analysis_data')

print("New working directory: {0}".format(os.getcwd()))

Current working directory: C:\Users\wadeb\Downloads


FileNotFoundError: [WinError 3] The system cannot find the path specified: 'PATH/TO/analysis_data'

## (1) Indentation

In the other programming languages you've used before, such as R, you have defined code blocks using curly braces. Python is completely different, in that it uses indentation to demark a new code block. You'll see this in the looping, control flow, and function parts of the guide. 

## (2) Indexing

Python uses zero-based indexing, which means that the first element in a data-structure has the index of 0, and the second element has an index of 1, and so on. So to access the first thing in a list called fruits, you would do: `fruits[0]`.

## (3) Variables

In Python, like in R, variables are not typed; meaning you don't declare a variable as a specific type. To assign to variables in Python, you use the equals sign.

In [None]:
my_int = 4
my_float = 4.3
my_bool = True
my_char = '4'
my_string = "hello"
# Notice when using quotes it doesn't matter whether you use single or double quotes

### Accessing and Modifying Variables

There are two ways to modify variables. For example, to add 2 to some variable x, we can either do the traditional way: `x = x + 2`, or with a special operator `x += 2`. There are equivalent operators for subtraction (-=), multiplication (*=), division (/=), etc. <br/>

<br/>**Exercise 3.1**

Below, write the short version for the following variable assignments.

In [None]:
x = 4
y = 2

#1. y = y / x (example is filled in below)
y /= x

# 2. y = y * 3

# 3. x = x - y

print(x,y)

## (4) Printing

Printing is pretty straightforward in Python. To print, you use the print() function, where you put what you want to print in the parentheses. E.g. `print("This is the word: ", word)`

### Special Print Formatting

Sometimes, printing strings and variables together can get clunky and hard to read. If you put f in front of the string (i.e. single/double quotes) and put variables in curly braces, it automatically substitutes that variable in the string!

In [None]:
my_var1 = 'red'
my_var2 = 'blue'

print(f'My first variable is {my_var1}, my second variable is {my_var2}.')

## (5) Functions
Functions are user-defined bits of code that can be called with arguments to run a specific line of code or return a value. You can declare a function with `def`.

In [None]:
def my_function():
    print('I am a function!')

my_function()

You can implement parameters by adding local variables (variables that only exist within the scope of the function) and then give a return value with `return`.

In [None]:
def square_function(my_input):
    my_output = my_input**2 # **2 is equivalent to raising it to the second power
    return my_output

print(square_function(8))

**Exercise 5.1**

Write a function, `print_args(a, b)` that prints two variables, a and b, using the string formatting trick and then call it on two variables. For example, `print_args("red", "blue")` will print "a is red, b is blue".

In [None]:
# write code here

## (6) Objects
In Python, everything is an object, including packages and functions. Very abstractly, an object is a specially-defined data type, and it has the following two attributes (i.e. it stores the following information):

+ Data attributes: these store variables.
+ Methods: these are functions.

To access data attributes, use object_name.attribute (note the lack of parentheses). To call a function from an object, use object_name.function() (note that these have parentheses).

We'll use this notation in the next section when we introduce lists (which are a great example of objects).

## (7) Data Structure: Lists
Lists are the standard array data structure in Python (being ordered and changeable). Lists will be the main in-built data structure we use in python. You declare a list using square brackets: `my_list = [1, 2, 3]`. 

In [None]:
# Lists can be defined over multiple lines as well.
my_list = [8,
          'three',
          7]

print (my_list)

**Exercise 7.1**

Declare a list called `example_list` that contains your age, name, and a boolean value for if you are a first-year student.

Print the following: "Here is some info about me: \<`example_list` goes here\>"

In [None]:
# write code here

### Accessing Values in a List

Just like in R, we can use bracket notation [] to access value(s) within a list.

In [None]:
greetings_list = ["hola", "bonjour", "hallo", "ciao", "你好", "olá", "أهلا", "こんにちは", "안녕하세요", "привет"]

In [None]:
print(greetings_list[5]) # outputs the value at index 5 (the 6th value in the list)

To access a set of values, we can use a colon (:) and specify the first and last+1 indices of that set. Note that the range is inclusive of the first index, but not of the second (which is why we must specify the last+1 index as our second input). This is called splicing.

In [None]:
print(greetings_list[3:10]) # outputs the values from index 3 to index 9 (the 4th through 10th values)

If you don't specify an index when using the colon and bracket notation, Python will default to the beginning/end, depending on which index you omit (and if you omit both, it will give the entire array).

In [None]:
print(greetings_list[:5]) # outputs all values up to index 5 (the 1st through 6th values)

print(greetings_list[5:]) # outputs all values starting at index 5 (the 6th through nth values)

print(greetings_list[:]) # outputs all values in the list

print(greetings_list[-1:]) # using a negative number results in starting from the end

In [None]:
test = [1, 2, 3, 4, 5, 6]

**Exercise 7.2** Access the following from `test`:

1. 5th value only (5)
2. First through 4th values (1, 2, 3, 4)
3. Last two values (5, 6)
4. Create a new list called `list2` which contains the last three values of `test`.

In [None]:
# write code here

There are many more ways to splice a list, but we won't go into them here. Feel free to look up python list splicing to explore more on your own time!

### List Functions
There are many functions we can use on lists, here are just a few particularly helpful ones:

`len()`: This function gives us the length of the list. Note that this is not a method, `object.method()`, it is just a regular function, `function(args)`.

`.append()`: This method allows you to add an element to the back of the list.

`.count()`: This method returns the number of elements with the specified value within your list.

`.index()`: This method returns the index of the first element with the specified value within your list.

`.sort()`: This method sorts your list.

In [None]:
test = [21, 1, 1, 2, 3, 5, 8]

**Exercise 7.3**

Do to following things to `test`.

1. Count the number of times the value "1" appears within our list.
2. Print the index of "8".
3. Append "13" to the back of our list.
4. Print the length of our list.
5. Sort the list.
6. Print the newly sorted list.

In [None]:
# write code here

### Side Note on Other Data Structures
There are a few other data structures in Python that we generally will not use but it's still worth
going over them.


<u>Tuples</u> are like lists except they are immutable (they can't be changed once defined). They are more memory efficient and can be computationally advantageous but otherwise work the exact same. They are defined with parentheses instead of square brackets.

In [None]:
my_tuple = (1, 'test')
print(my_tuple)
print(type(my_tuple))

<u>Sets</u> are immutable and unordered. Duplicate values are not allowed. They are defined with curly braces.

In [None]:
my_set = {1, 3, 3, 'test', False}
print(my_set)

Notice how the order is not the same as originally defined.
<br></br>
<u>Dicts</u> are a very unique data structure. If you are familiar with hashmaps in other languages, they work pretty much the same. In a dict, each element consists of a value and a key. They are defined with curly braces and colons.

In [None]:
my_dict = {
    'name' : 'SpongeBob',
    'species' : 'Spongiforma squarepantsii',
    'hobbies' : ['Jellyfishing', 'Frycooking', 'Blowing bubbles'],
    'square pants' : True
}

print(f'Hi my name is {my_dict["name"]}. I am a {my_dict["species"]}.')
print(f'My hobbies include {my_dict["hobbies"]}')
if my_dict['square pants']:
    print('I have square pants!')
else:
    print('My pants are not square :(')

As you can see, each key, value pair in a dictionary can be any type of variable. Dictionaries are very powerful data structures and we will encounter them within Pandas dataframes but for this course you will generally not need to know how to use them.

## (8) Control Flow
### If, Elif, Else

Python uses if statements like R, with three main differences.

Instead of curly brackets, you have colons and indents.
You don't put the `if` statement in parentheses.
Instead of `else if`, you have the abbreviated `elif`.

In [None]:
x = -10

if x > 0:
    print('x is positive!')
elif x == 0:
    print('x is 0!')
else:
    print('x is negative!')

Because there are no brackets like R, Python relies on the indentations to decide what goes in and out of an `if/elif/else` statement. If there are problems with indentations, or if your indentations are not the same number of spaces (let's say, 3 spaces vs. 4 spaces), the statements will not execute. 
<br></br>

### For loops

Also like R, there are `for` and `while` loops. Like R, all for loops in Python are "for-each" loops, meaning you have to go through a list. For example, the following chunk of code prints each element in `a_new_list` on a separate line. Like the `if` statements, you do not use parentheses around the for condition:

In [None]:
a_new_list = [1, 'fish', 2, 'fish']
for x in a_new_list:
    print(x) 

If you know the certain amount of times you want to repeat something, use the `range()` function like so.

In [None]:
# this loop will print 10 times
for i in range(10):
    print(f"looping: {i}")

Remember, Python indexes at 0 instead of 1 like R. You'll see it prints 0-9 instead of 1-10. Many other languages like C++ follow this zero-based indexing. 
<br></br>
**Exercise 8.1**

Fill in the ellipses to calculate the mean of the elements in `nums`.

In [None]:
nums = [1,2,3,4,5,6]
total = 0

for i in ...:
    total += ...

mean_value = total / ... # DO NOT fill in 6 (use a function instead)

print(f'The mean is {mean_value}!')

**Exercise 8.2**

Add every element from `a_new_list` onto the end of `num_list` using a `for` loop using `append()` and the `range()` function. Hint: for this to work, you'll have to get the length of `a_new_list`.

In [None]:
a_new_list = [1, 'fish', 2, 'fish']
num_list = [0,1,2,3,4,5,6]

# write code here

**Exercise 8.3** 

Given the following list of strings `string_list`, copy all strings that start with the letter "A" into `starts_A_list` using `append()`. Hint: you can get the first letter of a string just by treating it as a list of characters.

In [None]:
# example of string indexing
my_string = "Tree"
print(my_string[0])

In [None]:
string_list = ["Apple", "Banana", "Alligator", "Anteater", "Potato", "Water", "Aardvark"]
starts_A_list = []

# write code here

### While loops

While loops in Python are the same as in R, except again without curly brackets and with colons instead. Again, like if/elif/else statements and for loops, Python relies on indents to figure out what's in the loop and what isn't. 

In [None]:
i = 1
while i < 64:
    i *= 2  # note: this is equivalent to writing i = i * 2
    print(i)

## (9) Importing Packages

Like R, we can perform a lot more advanced things using our code by using packages. Importing packages in Python uses the `import` keyword (vs. library() in R). Let's import the first package we're going to use, numpy. We'll use the `as` keyword to give it the shorthand `np` to save typing, which is a standard abbreviation you will see practically everywhere. You'll see that other Python packages also have standard abbreviations.
<br></br>
Note: It's good practice to import packages only once. In scripts, they are generally put at the very top before everything else. Here, we will import the packages as we need them.

In [None]:
import numpy as np

As mentioned previously, you have to prefix everything from numpy with `np`. For example, numpy includes the constant pi and the sine function. Here's how you would call the sine of pi radians using np. 

In [None]:
np.sin(np.pi/2)

This line is the same as using `numpy.sin(numpy.pi/2)` but again, importing using a standard abbreviation saves us a lot of typing. 

**The two main takeaways of importing packages are:**
1. Always use the `import` statement. This is your library() function in R. 
2. Put the package name before the period in front of any function that is specific to the package. 

There are more complicated ways to import packages. 

In [None]:
import matplotlib.pyplot as plt

`pyplot` is the plotting functionality of `matplotlib`, so this import statement would only import pyplot and any of its dependencies in matplotlib. 

An easier way to do this if you just want a specific function(s) in a package is using the `from` keyword. 

In [None]:
from numpy import pi
from numpy import sin

In this case, you would only get `pi` and `sin` from numpy. You wouldn't get something like cos, since we only imported pi and sin. Now, pi and sin are imported as a function and a float so we don't have to call numpy to use them.

In [None]:
sin(pi/2) # this does not work unless we specifically import these two functions/variables

## (10) Numpy Arrays

While numpy has a bunch of useful functions, the real meat of numpy are the (multidimensional) arrays it implements, called the `ndarray`. It has the following properties:

* A fixed size.
* A shape (dimension).
* Its contents must be the same data type.

First, let's look at a 1D array. You can declare one by calling passing a list into the function `np.array()`.

In [None]:
arr = np.array([1, 2, 3])
arr

Why is the ndarray (and the numpy package in general) important? For one, we can use vectorized functions on them. For example, you can quickly perform mathematical operations on the entire array:

In [None]:
print(arr + 1)

Another benefit is that you get extra methods that you can apply on the arrays. For example, you can quickly find the mean and variance of the values in your array without having to write those functions yourself.

In [None]:
arr = np.arange(0, 501, 10) # we can get a list of every 10th number from 0 to 500 using the arange function

print(arr.mean()) # np.mean(arr) is the equivalent function, but it is much slower
print(arr.var()) # np.var(arr) is the equivalent function, but it is much slower

Accessing values from a 1D array is the same as accessing values from a python list.

In [None]:
print(arr[2])
print(arr[:])
print(arr[0:2])

You can also create 2D arrays with numpy (not quite data frames, we'll cover that in the pandas section). The way you declare one is very similar to making the 1D array, except you pass it a list of lists.

In [None]:
arr2d = np.array([[1,2,3], [4,5,6], [7,8,9]])
arr2d

2D arrays support all of the functionality of 1D arrays (vectorized functions, `.mean()`, `.var()`, accessing values/splicing) and also have some additional attribute functionality.

* `.shape` returns the dimensions of our 2D array
* `.T` returns the transposed version of our 2D array (note that this is a capitalized T!)

**Exercise 10.1**

1. What are the dimensions of `arr2d`?
2. Create a new array called `t_array` with the transposed version of `arr2d`.

In [None]:
# write code here

There's not too much else you need to know about numpy arrays, since most of your data will be in a data frame. Let's move on to pandas!

## (11) Pandas

The pandas package implements data frames, which are similar to those in R. As usual, we'll have to load it beforehand, so let's do it (`pd` is a standard abbreviation for pandas that you'll see pretty much everywhere pandas is used).

In [None]:
import pandas as pd

### Pandas Series

A pandas `Series` object is in essence, a better, more functional version of the native python list. Unlike numpy arrays, they can store different data types in the same Series object. Here are a few examples of Series and functions that show what makes them so great.

In [None]:
my_list = [1,2,4,12]
my_series = pd.Series(my_list)

print(my_series)

print(my_series.loc[my_series > 3]) # [my_series > 3] is a Boolean mask! More on that in a bit

my_char_series = pd.Series(char for char in 'test')
print(my_char_series)

my_string_series = pd.Series(['This is a string', 'This is also a string', 'Yet again, another string', 'lol'])
print(my_string_series.str[5]) # this is a little more advanced than what we're looking at in this course, but basically
# you are using what is called a 'vectorized' operation to get the index=5 (6th element) character in each string

### Pandas DataFrame

A pandas `DataFrame` in essence is the same as any R dataframe but they work a bit differently. Pandas is a package and dataframes aren't part of the base code of python the same way they are in R.

You still have columns and rows though, which can have integer indices (remember python is a 0-based indexing language!) 

You can have different columns that contain different data types (try and keep every column to a single data type although you don't necesarrily have to).

You can make a data frame with the `.DataFrame()` function. This will convert a dictionary (of lists, Series (another pandas object), lists, tuples, etc), numpy array, or any other iterable object into a pandas DataFrame object

In [None]:
# a dictionary with keys (columns) a, b, c, with values numerical values
df1 = pd.DataFrame({
    'a': [0, 1, 2],
    'b': [False, True, True],
    'c': ['This is', 'three different', 'strings!']
})

# a dataframe from a two dimensional array, though its unnamed
df2 = pd.DataFrame([
    [1,2,3,4],
    [2,6],
    ['True', False, True]
])

print(df1)
print('')
print(df2)
# notice how pandas always makes the DataFrame object rectangular, and fills in 
# NaN (this is a pd.NA object and is equivalent to NA in R) for any spots where it is missing data

In general, you can use the same commands implemented in numpy on pandas dataframes. Here are some data attributes that are useful:
* `.index`: This will give you the index of every row.
* `.columns` This will give you the column names.
* `.axes`: This is a list that contains both the indices and columns in that order (i.e. [index, columns]).
* `.shape`: As with numpy, this atrributre is a tuple containing the shape of the dataframe (i.e. rows by columns).
* `.dtypes`: This contains the datatypes in each column of the dataframe.

In [None]:
print(df1)

print('\n-----------\n')

print(f"Here's the axes (names by row, column):\n{df1.axes}\n")
print(f"Here's the shape:\n{df1.shape}\n")
print(f"Here's the data types in each column:\n{df1.dtypes}\n")

#### Accessing Rows/Columns in a DataFrame

Since the DataFrame object isn't built into python like it is into R, you can't just use [row, column] notation to access different rows or columns.
The best way to access rows and columns in pandas DataFrame objects is using the functions `.loc()`, and `.iloc()`. These functions give a Series object as the return value.

* `.loc()`: This gets values by its name. You need to specify which rows (by index) and columns (by name). Note that you can select multiple columns if you pass a list of column names.
* `.iloc()`: This gets values by its index (hence, index-loc). You don't need to specify both rows and columns -- if you don't, it will default to rows only. (To make your code more readable and to make the syntax easier to remember, it's probably best to specify both anyways).



In [None]:
print(df1.loc[:,'a'])  # gets every value in column 'a'
print(df1.iloc[:, 2])  # gets every value in column 2, which is also named 'c'
print(df1.iloc[1])     # gets every value in row 1, which is the second row
print(df1.iloc[1, :])  # same as above, but this syntax makes it clearer than the above

Other ways to access values

There are a few other different ways to access rows and columns in pandas DataFrame objects but they aren't always as clear and you should generally try to stick to `.loc()` and `.iloc()` for ease of reading

Using the dot (`.`) to access the attribute directly; for example `df1.a`. This is probably the shortest method (as long as the name of your attribute doesn't have spaces or special characters (e.g. `Name of Gene`, instead of `gene_name`)).

Using single square brackets to get columns, for example `df1['a']`. Note that the name a is a string.

Using double square brackets, for example `df1[['a']]`. Note that this returns a 2D data frame, not a 1D series (which is just a 1D array in pandas). This method has the benefit of being able to select multiple columns, such as by `df1[['a', 'b', 'c']]`.


In [None]:
print(df1.a)
print(df1['a'])
print(df1[['a']])

## (12) Boolean Indexing
As with R, you have the option of selecting rows by boolean indexing by using the loc attribute. As a review, you can apply vectorized comparison operators to an entire 1D pandas array (i.e. a pandas series):

In [None]:
print(df1.a)
df1.a >= 1

Therefore, you can select rows by putting boolean values into the loc columns.

In [None]:
df1.loc[df1.a >= 1, 'b']  # gets the column 'b' for every row with a value of 1 or greater in the column 'a'

Use the following exercise array:

In [None]:
#  feel free to try to understand this code, though by all means it's not expected that you will
df_exercise = pd.DataFrame(
    [[10*j + i for i in range(10)] for j in range(10)],
    columns = [char for char in "abcdefghij"])
df_exercise

**Exercise 12.1**

Select columns b-g and store it as a separate data frame named df_new. Can you think of more than one way to do so?
Filter out rows in df_new where the value in c is not a multiple of 3 (modify df_new). Remember that the modulo (%) operator gets the remainder.
Get the value at index (2, 3). It should be 74.

In [None]:
# write code here


## (13) Matplotlib

Matplotlib is the main plotting package in Python. Specifically, we will be using the `pyplot` module from matplotlib (the package is massive, so it's faster to just get the specific module you need (plt is the go-to common shortening for matplotlib.pyplot). Here's how you typically would import it.

In [None]:
import matplotlib.pyplot as plt

The workflow behind pyplot is somewhat familiar to plotting with R: you create the plot, then show the plot (or alternatively, save it to a file). For example, let's plot a simple sine wave:

In [None]:
# generates values from 0 to 10 in 0.1 intervals to plot
x_vals = [i/10 for i in range(0, 100)]
y_vals = np.sin(x_vals)

# sets up the plot area
# note that one function can have 2 return values in Python
fig, ax = plt.subplots(1, 1)  # this controls the number of subplots and how they're placed

# use ax to plot the data
ax.plot(x_vals, y_vals)
plt.show() # show the most recent plot created

Let's break down all the objects we made:

* `fig` doesn't really ever get used.
* `ax` controls the axes -- in short, it controls the variables you plot, the plot labels, etc.
* `plt`  is the plot module you imported, which you can think of as a "plot window." Basically, the plots you made get saved to plt, and from there you can see the plots you made.

You can look into the matplotlib.figure module on your own time to see all the options, though here's an example with using subplots and labeling the axis labels (using the set function):

In [None]:
# good to know: constrained_layout spaces the plots out so plot titles don't overlap

fig, ax = plt.subplots(2, 2, constrained_layout=True)

# ax is an list of lists (2D list) -- you need two brackets to access the data
ax[0][0].plot(x_vals, y_vals, color='red') 
ax[0][0].set(title = "sin(x)", ylabel='y', xlabel='x')

ax[0][1].hist(x_vals, color='green')  # a very boring histogram
ax[0][1].set(title = "just x values", ylabel='counts', xlabel='x values')

ax[1][0].scatter(x_vals, y_vals, color='purple', s=0.1)
ax[1][0].set(title = "scatter plot", ylabel='y', xlabel='x')

ax[1][1].plot(x_vals, y_vals, color='pink')
ax[1][1].plot(x_vals, np.cos(x_vals), color='black')
ax[1][1].set(title = "sin(x) and cos(x)", ylabel='y', xlabel='x')
ax[1][1].legend(['sin(x)','cos(x)'], loc='upper right')

plt.show()

To save a figure to your computer, you can either copy and paste it from this notebook, or (the better way), use plt.savefig() function.

For creating figures that will eventually be seen by others, you'll want to use the following arguments:
* `dpi=300`   - this ensures the saved pic will be clear even if blown up
* `bbox_inches='tight'`   - you probably won't need this unless you are playing with the axes positions or adding multiple subplots but it's still useful as it prevents different parts from being cutoff

In [None]:
# so the final way to save your figure would be...
plt.savefig(PATH, dpi=300, bbox_inches='tight')
plt.clf() # run plt.clf() between creating new figures to clear the old one out

## Putting it all together...
**Exercise 14.1**
+ Define a function which takes in one input,`n`, and returns one output, `y`.
+ The function should compute the following: Start with the number 100. If the number is divisible by 3, divide by 3. If it is not divisible by 3, add 7. Compute this `n` times. Return the output, `y`, of the nth computation. For example:
> + n=0, return 100
> + n=1, return 107
> + n=2, return 114
> + n=3, return 38 etc
+ Create a list, `x_vals`, with the values 1-10
+ Create a list, `y_vals`, with the output of the function for each x value.
+ Plot `x_vals` versus `y_vals` in a line plot in the color red

In [None]:
# write code here

**Exercise 14.2**
</br>Here is a predefined DataFrame. Explore the structure of it to understand how to complete the next parts.
+ For each row in the DataFrame, compute the mean of the values in the row. 
+ Count the number of values in the row that are greater than the mean and add the count to `count_list`.
+ Find the number of different values in your list (hint: you can use the set() wrapper to convert a list to a set) and call it `num_bins`.
+ Create a histogram of your `count_list` with the argument `bins=num_bins` and the color purple.


In [None]:
import random
example_df = pd.DataFrame(
    [[random.randint(1,100)+j*5 for i in range(10)] for j in range(1000)], 
    columns=[i for i in range(10)])

In [None]:
# write code here

**Exercise 14.3 (Extra Credit)**
</br> The code below will import the `iris` dataset, a commonly used dataset in pattern recognition. It contains 150 measurements from three different types of irises, *Iris setosa*, *Iris versicolor*, and *Iris virginica*. These are denoted with the numbers 0, 1, and 2 respectively in the DataFrame. The measurements include sepal length, sepal width, petal length, and petal width.
+ Create a single figure with 4 scatterplots comparing all four combinations of length and width.
+ Color the points based on their species. 
+ Add an appropriate title, axis labels, and legend. 
+ Determine which combination of measurements is best able to differentiate the species of iris and explain why.

In [None]:
from sklearn import datasets

iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data[:, :4], columns=iris.feature_names)
iris_df = pd.concat([iris_df, pd.Series(iris.target, name='species')], axis=1)

In [None]:
# write code here