# Working with Data

In this lesson, we'll start to wrap up on some of the fundamentals of Python and start to think about how to work with data using Python.

A large part of programming for artificial intelligence and machine learning is using and manipulating data, and the goal of this lesson is to look at a few tools in Python that help us organize and visualize that data. Keep this in mind while we go through the lesson -- there are quite a few concepts we'll go over, but remember that they serve as fundamental tools in our toolbox that we'll use for the hands-on projects.

As always, you got this!

![alt text](https://media.giphy.com/media/14bhmZtBNhVnIk/giphy.gif)

(Batman's superpowers: being rich, having lots of tools, and having lots of data about everything.)

## Quick Review Exercise

Let's take a quick moment to review using libraries, which we saw in the previous lesson.

Python has a built-in library called `math` that includes some useful math functions. One of these is `math.sqrt()` which takes 1 number as a parameter and returns its square root.

For example:

```
print(math.sqrt(144))
```

will print 12.

Write a program that asks the user for a number input and then uses the `math.sqrt()` function and a conditional to tell them if they've entered a perfect square.

One example of your output:

```
Enter a number: 23
This is not a perfect square.
```

Another example:

```
Enter a number: 121
This is a perfect square.
```

Hint: Recall that a "perfect square" is a number whose square root is an integer. (For example, the square root of 121 is 11, so it is a perfect square. The square root of 23 is 4.795... so it is not a perfect square.)

In [None]:
import math
### YOUR CODE HERE ### 

### END CODE HERE ### 

Hint 2: (Try the problem out before reading this hint.) There are a few different ways to do this! One particularly clever way is to use the `%` operator. There also happens to be an `is_integer()` function in Python that you can call by writing `num.is_integer()`.

## Writing Functions

So far, we've only been calling functions that have already been written by others. How can we write our own function?

Let's take the square root example above. Let's say we don't want to use the `math` library or just want to write and use our own square root function. 

First, we need to think about a clever mathematical way to get a square root. If you recall from math, putting a number to the $1/2$ exponent is the same as performing a square root! For example:

$$ 25^{1/2} = 5 $$

Recall how we do exponentiation in Python:

In [None]:
a = 25 ** (1/2)
print(a)

Now that we know how to get the square root of a number, we need to figure out how to write this as a function.

Let's think about what we want as our input and output of this function.

**What should the input to this function be?**

**What should the output of this function be?**

Take a moment to think about and answer these two questions.



---

Answer:

We want to input a single number to our function, and the output should be its square root.

Because we are inputting only one number, our function should have 1 parameter.

We can name our functions whatever we like, so in this case, let's call our function `square_root()`. An example of using `square_root()` in some code might look like:

```
a = square_root(25)
print(a)
```

which should print `5`.

Now that we've figured out what we want our function to do, and what we want to name it, it's time to actually write it. The following is what a square root function might look like:

In [None]:
def square_root(x):
  return x ** (1/2)

The keyword `def` tells our computer that we are about to _define_ a function. In this particular case, we're defining a function named `square_root` that happens to take one parameter. Just like when we write `for` loops, we can choose the name of this variable, as it's a placeholder that we refer to only within our function.

Our function in this particular case returns `x ** (1/2)`, which is the square root of `x`, the input.

Let's see this function in action:

In [None]:
def square_root(x):
  return x ** (1/2)

a = square_root(25)
print(a)

In the example above, what does `x` equal when we call `square_root(25)`?



---

Functions are useful because we often have logical blocks in our code that we want to re-use. Imagine having to type `x ** (1/2)` every time you want to take a square root -- it might be easier to say `square_root(x)`. For reading purposes, it also helps make it more clear what your code is doing.

Think about the earlier exercise about writing a program that takes in a number and tells the user if that number is a perfect square. What if we wanted to do this functionality multiple times in the same program? We could copy and paste the code every time we want to check if a number is a perfect square, but this would make our program very long and hard to update.

Instead, we could do something like:

In [None]:
def square_root(x):
  return x ** (1/2)

def is_perfect_square(x):
  return square_root(x).is_integer()

print(is_perfect_square(23))
print(is_perfect_square(121))

In this case, we've written a function called `is_perfect_square()` that takes in one parameter and returns whether the square root of that parameter is an integer or not. `is_integer()` happens to be a function, like `print()`, that's already pre-built into Python that we can use.

But what ends up being printed? `False` and `True`!

These are examples of the Boolean data type (which we briefly mentioned in the first lesson).

### An Aside: Booleans

We've actually been using booleans throughout our code. Let's look at the following conditionals:

In [None]:
if True:
  print("This will always print!")
  
if False:
  print("This will never print.")

When we use if statements, we're actually checking for the booleans `True` and `False`. If the condition in the if statement is `True`, then we will perform the code block underneath. If it is `False`, we will not.

In [None]:
if 5 > 3:
  print("This will always print!")
  
print("Notice what the following line prints:")
print(5 > 3)

Revisiting our perfect square example, let's say we want to tell the user in a complete sentence whether their number is a perfect square or not. We know that `is_perfect_square()` returns a boolean, so let's combine it with a conditional:

In [None]:
num = int(input("Enter an integer: "))

if is_perfect_square(num):
  print("Your number was a perfect square.")
else:
  print("Your number was not a perfect square.")

Notice that we didn't have to re-write all of the `is_perfect_square()` code again! This is because we had already written and run that code once before, so we can keep re-using it in the future. That's why functions are so useful!

### Function Exercises

1. Read the following code. What do you think it does?

In [None]:
def weird_func(a, b):
  test = a + b
  return test * 2

x = 5
y = 10

if weird_func(x, y) > 30:
  print("Will this print?")
  
if weird_func(10, 10) > 30:
  print("How about this?")

2. Write a function named `wheres_Waldo` that takes a list as a parameter and returns whether the element "Waldo" is in the list.

Remember that you can use the keyword `in` to check if an element is in a list, like so:

In [None]:
lis = ["Pikachu", "Captain Marvel"]
print("Waldo" in lis)

In [None]:
### YOUR CODE HERE ###

### END CODE HERE ### 

In [None]:
wheres_Waldo(lis)

## Slicing

![alt text](https://media.giphy.com/media/OHNg1tHZcUcKc/giphy.gif)

Not that type of slicing.

Recall how we access individual elements of lists:

In [None]:
lis = ["The Poet X", "Electric Arches", "America Is Not The Heart", "Reliquaria"]

# If we wanted to print the element "Electric Arches":
print(lis[1])

A common thing we'll want to do when working with data is to get multiple elements from a list.

In [None]:
lis = ["The Poet X", "Electric Arches", "America Is Not The Heart", "Reliquaria"]

sub_lis = lis[0:2]

# What do you think this will print?
print(sub_lis)

When we say `lis[0:2]`, we are telling Python to return us a new list that is a _slice_ of the original list. Specifically, we want this new list to start at the element whose index is 0 and go up to (but don't include) the element whose index is 2.

In the above example, that means the new list will include "The Poet X" (index 0) and "Electric Arches" (index 1).

In [None]:
lis = ["The Poet X", "Electric Arches", "America Is Not The Heart", "Reliquaria"]

# Finish the following line so that your slice has "America Is Not the Heart" and "Reliquaria"
sub_lis = 

print(sub_lis)

The Python creators also provided a shortcut for doing our slices:

In [None]:
lis = ["The Poet X", "Electric Arches", "America Is Not The Heart", "Reliquaria"]

sub_lis = lis[:2]
print(sub_lis)

When you leave out the number before the `:`, Python assumes you mean the first element (index 0).

You can do the same for leaving out the number after the `:`. What do you think this does?

In [None]:
lis = ["The Poet X", "Electric Arches", "America Is Not The Heart", "Reliquaria"]

sub_lis = lis[2:]
print(sub_lis)

We can actually use slicing for some other types of data, not just lists. For example, slicing with strings:

In [None]:
string = "This is a string!"

# Can you guess what this will print?
sub_string = string[10:]
print(sub_string)

As you may have guessed, each character in a string can be indexed just like how we've been indexing into lists. This is very useful if you're working with string manipulation and natural language data.

Example:

In [None]:
string = "This is a string!"

# If I wanted the first character of this string, 'T', I would do:
first_char = string[0]
print("The first character of the string is", first_char)

We'll see slicing quite commonly in machine learning, because we often want to use smaller or specific portions of our data.

## Hello, Meet Numpy :)

Numpy = Numerical Python 

This is the first library we will be looking at. We will be using this libraray ALL THE TIME. 


In [None]:
# First lets import numpy 
import numpy as np

#### Add 1 every element of a list  
Write code below to add 1 to every element of the list

In [None]:
# This is an empty list
L1 = [1, 2, 3, 4, 5]

### YOUR CODE HERE ### 

### END CODE HERE ### 

#### Multiply every element of a list  
Write code below to multiply every element of the list by 3

In [None]:
# This is an empty list
L1 = [1, 2, 3, 4, 5]

### YOUR CODE HERE ### 

### END CODE HERE ### 

#### Initializing arrays

Now we will practice manipulating arrays

In [None]:
### YOUR CODE HERE ### 

# 1. make a 1x8 array with all 1's 


# 2. make a 1x8 array with random numbers between 1 and 100


# 3. add the arrays together into a 1x8 array


# 4. stack the arrays together into a 2x8 array (Hint: use np.concatenate)


### END CODE HERE ### 

## Visualizing Data

To wrap up this lesson, we'll cover one more topic: how to visualize the data that you've been given. Although data may be well organized in Python using the different collections, we often want to be able to present this data to the user in a visual way. Visualization comes in many different forms, from creating tables to colorful charts to just plotting some points on a graph.

In [None]:
# A list of tuples
points_on_a_graph = [(3, 9), (1,1), (5, 25), (2, 4), (4, 16)]

# We can print this list, but... is this the most useful?
print(points_on_a_graph)

For visualization, we'll use a versatile Python library called `matplotlib`. (The reason it's called `matplotlib` is because much of the functionality of this library comes from a plotting library from a different language, calle Matlab.)

For the first example, let's plot some points on a graph:

In [None]:
%matplotlib inline
import matplotlib

x = [3, 1, 5, 2, 4]
y = [9, 1, 25, 4, 16]

matplotlib.pyplot.scatter(x, y)
matplotlib.pyplot.show()

The above code imports the `matplotlib` library, takes in two lists (which correspond to `x` and `y` values of 5 different points), and then plots those five points on a graph.

For Jupyter Notebooks, in order to show a `matplotlib` graph within the Notebook itself, we need to add the line `%matplotlib inline` before doing the import.

When we say `matplotlib.pyplot.scatter(x, y)`, we are saying that we want to put the values of `x` and `y` onto a specific type of graph called a scatterplot.

Finally, to actually show the scatterplot, we say `matplotlib.pyplot.show()`.



---


It's a lot of typing to say `matplotlib.pyplot` every time we want to use functions from this library, so there's a shorthand that we can actually use, when importing the specific `pyplot` module:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

x = [3, 1, 5, 2, 4]
y = [9, 1, 25, 4, 16]

plt.scatter(x, y)
plt.show()

All this says is that instead of importing all of `matplotlib`, which consists of many different modules, we'll only import the `matplotlib.pyplot` module, and we'll refer to it as `plt` from now on in our code.

`pyplot` has a lot of different useful features that we can use to manipulate our visualization.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

x = [3, 1, 5, 2, 4]
y = [9, 1, 25, 4, 16]

plt.scatter(x, y)
plt.xlabel("Some numbers")
plt.ylabel("Some other numbers")

# Notice that when we show the graph now, we'll have words along the x and y dimensions
plt.show()

Instead of using a scatter plot, we can use a normal plot to connect the actual points:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

x = [3, 1, 5, 2, 4]
y = [9, 1, 25, 4, 16]

plt.plot(x, y)
plt.xlabel("Some numbers")
plt.ylabel("Some other numbers")
plt.show()

Did this connect the points like you expected? The first point is $(3, 9)$ and the next point is $(1,1)$, so it will draw a line between the two. If we re-order our points, we might get something better:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]

plt.plot(x, y)
plt.xlabel("Some numbers")
plt.ylabel("Some other numbers")
plt.show()

There are many, many other functions, and we won't go over all of them. This is why it's common for programmers to refer to [documentation](https://matplotlib.org/api/index.html) when they read code, because there are so many different functions and libraries that exist in the world, and we need the documentation to help us understand what's going on.

As you program more, you may also start to gain some intuition about how code you haven't seen before works:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]

plt.plot(x, y)
plt.xlabel("Some numbers")
plt.ylabel("Some other numbers")

# What might these following lines do?
plt.gray()
plt.axis([0, 10, 0, 100])

plt.show()

In [None]:
## Combining matplotlib and numpy

### YOUR CODE HERE ### 
# 1. initialize x array with 100 numbers between 0 and 2pi


# 2. compute y as sin(x)


# 3. plot x vs y on a scatter plot (don't forget your labels)


### END CODE HERE  ###