# Module 1 In-Class Activity: Math Review & Graphs

In this module, we'll cover using Python for basic math and give an introduction to graphs.

## Python for Math
We can use python and Jupyter Notebooks like a calculator to perform simple math. These operations include simple addition, subtraction, multiplication, division, as well as modulo remainders `%`, floor division `//`, and exponentiation `**`. Below are some examples.

In [5]:
1 + 2

3

In [6]:
2-1

1

In [7]:
100/10

10.0

In [8]:
3*4

12

Modulo is an operation that returns the remainder when dividing one number by another, and it uses `%` as an operator.

In [9]:
10%2

0

In [10]:
15%4

3

Floor division is division that "rounds down" - it essentially chops off all the numbers to the right of the decimal point. It uses the `//` operator.

In [11]:
12//3 

4

In [12]:
5//3

1

In [13]:
9//10

0

Exponentiation uses the `**` operator, and is the equivalent of using `^`.

In [14]:
2**3

8

You can also use exponentiation to take the square root by using `**0.5`.

In [15]:
4**0.5

2.0

Other operators include comparison operators, such as greater than `>`, less than `<`, equal to `==`, greater than or equal to `>=`, and less than or equal to `<=`. These comparison operators return boolean values, which we discussed in the last activity.

In [16]:
1 < 2 

True

In [17]:
1 == 1

True

In [18]:
4 <= 3

False

## Intro to Graphs

Graphing helps us understand and summarize data, and there are different types of graphs for different types of data. For example, we can use a line graph to show how values may change over time, or a bar graph to show categories and counts, or a scatterplot to show the relationship between different variables.

We'll be taking a look at some data about popular cereals and finding relationships between them.

You have access to four functions:

* `scatter` takes in three arguments; the first should always be `cereal_data`, and the other two should be the names of the `x` and `y` attributes you want to compare, and produces a scatter plot of them.
* `line` takes in three arguments; the first should always be `cereal_data`, and the other two should be the names of the `x` and `y` attributes you want to compare, and produces a line plot of them.
* `bar` takes in three arguments; the first should always be `cereal_data`, and the other two should be the names of the `x` and `y` attributes you want to compare, and produces a bar chart of them.
* `histogram` takes in three arguments; the first should always be `cereal_data`, and the other two should be the name of the attribute you would like to produce a histogram of and (optionally) the number of bins you would like to use, and produces a histogram of that attribute.

**You don't need to know how this code works yet, but you'll be learning how it works and how to create your own visualizations by the end of the course!**

In [22]:
'''
You do not need to understand the code in this cell! You will learn how it works by the end of the course.
'''
import pandas as pd
import matplotlib.pyplot as plt

def scatter(d, x, y):
    """Creates a scatter plotting using data, an x-column, and a y-column"""
    df = pd.DataFrame(d)
    plt.scatter(d[x], d[y])
    plt.title(x + ' vs. ' + y)
    plt.xlabel(x)
    plt.ylabel(y)
    plt.show()

def bar(d, x, y):
    """Creates a barplot using data, an x-column, and a y-column"""
    df = pd.DataFrame(d)
    plt.bar(d[x], d[y])
    plt.title(x + ' vs. ' + y)
    plt.xlabel(x)
    plt.ylabel(y)
    plt.show()

def histogram(d, label, bins=None):
    """Creates a histogram using data and a specified label"""
    assert not bins or type(bins) == int, "If bins is provided, it should be an integer."
    df = pd.DataFrame(d)
    plt.hist(d[label])
    plt.xlabel(label)
    plt.ylabel("Frequency")
    plt.show()
    
cereal_data = pd.read_csv('simplifiedCereal.csv', index_col=0)

In [23]:
# Displaying the available data.
cereal_data

Unnamed: 0,name,calories,protein,fat,sodium,fiber,rating
0,100% Bran,70,4,1,130,10.0,68.402973
1,100% Natural Bran,120,3,5,15,2.0,33.983679
2,All-Bran,70,4,1,260,9.0,59.425505
3,All-Bran with Extra Fiber,50,4,0,140,14.0,93.704912
4,Almond Delight,110,2,2,200,1.0,34.384843
...,...,...,...,...,...,...,...
72,Triples,110,2,1,250,0.0,39.106174
73,Trix,110,1,1,140,0.0,27.753301
74,Wheat Chex,100,3,1,230,3.0,49.787445
75,Wheaties,100,3,1,200,3.0,51.592193


Below is a histogram of calories in these cereals, split into 10 bins. Approximately what seems to be the most common calorie range?

In [21]:
histogram(cereal_data, 'calories', 10)

NameError: name 'plt' is not defined

Do you see any relationship between calories and fat? Why do you think what you see is the case?

In [None]:
scatter(cereal_data, 'calories', 'fat')

Do you see any relationship between calories and the cereal rating? Why do you think what you see is the case?

In [None]:
scatter(cereal_data, 'calories', 'rating')

What about between the grams of fat and the rating? 

In [None]:
bar(cereal_data, 'fat', 'rating')

### Now, create your own graphs! Choose some variables to compare and a graph type to see what happens. Think about what the relationships and patterns you see might mean!

In [None]:
# YOUR CODE HERE