# Day 7

* [NumPy Mathematical Operations](#numpy-math-operations)
* [Visualizations](#matplotlib)

## Numpy math operations

We can perform basic mathematical operations on a single array as well as between arrays. 


### Single Array Math

In [None]:
import numpy as np

# Creating a new array of random values 
random_arr = np.random.rand(3,3)

random_arr

In [None]:
# The following addition will give a new array 
# random_arr will remain unchanged 
added_five = random_arr + 5

In [None]:
# Modifying random_arr itself 
random_arr = random_arr - 5 
random_arr 

In [None]:
# Another way of modifying random_arr 
# The following method can be used with int, float, and string variables too
random_arr += 5  # this means random_arr = random_arr + 5

In [None]:
# Using other mathematical operators 
random_arr//2, random_arr**2 

### Multiple Array Math 

In [None]:
# Adding two columns of random_arr 
added_col = random_arr[:,0] + random_arr[:,1]

added_col.shape 

In [None]:
# Multiplying two columns of random_arr 
added_col * random_arr[:,1]

In [None]:
# Is random_arr changed?
random_arr

## Visualizations

The matplotlib library is a fairly well-documented libary with many examples and tutorials. The library makes it easy to generate plots and save them to files. 

It can be used with lists as well as numpy arrays. 

Full Documentation: https://matplotlib.org/stable/users/index.html

Official Tutorials: https://matplotlib.org/stable/tutorials/index.html

In [None]:
# Importing pyplot from the matplotlib library  
import matplotlib 
from matplotlib import pyplot as plt 

### Using the plot() function 

The coordinates of the points or line nodes are given by x and y, the first two arguments of the plot() function. We can also specify a third argument to modify the formatting of the graph. When formatting is not specified by the user, its default value to the plot() function is "b-". This stands for the blue color (b) and solid line (-). 

In [None]:
# Declaring some data 
x_data = [1, 2, 3, 4]
y_data = [1, 4, 9, 16]

# Plotting the data using the plot() function
plt.plot(x_data, y_data)

# Need to call plt.show()
plt.show() 

In [None]:
# Setting the labels for axis 
plt.xlabel("Data X")
plt.ylabel("Data Y")

# Again displaying the plot
plt.plot(x_data, y_data) 
plt.show() 

In [None]:
plt.xlabel("Data X")
plt.ylabel("Data Y")

# Setting the title of the graph 
plt.title("Simple Line Graph")

plt.plot(x_data, y_data)
plt.show()

In [None]:
# Saving the graph 
plt.title("Simple Line Graph")
plt.xlabel("Data X")
plt.ylabel("Data Y")

# Call the savefig function with filename as argument
plt.plot(x_data, y_data)
plt.savefig("LineGraph2.png")

In [None]:
# Changing the formatting 
plt.title("Simple Line Graph")
plt.xlabel("Data X")
plt.ylabel("Data Y")

# r - red, o - circular markers 
plt.plot(x_data, y_data, "ro")
plt.show() 

In [None]:
# The first and the last data points are not fully visible 
# Let's change the axis limits to make them more visible 
plt.xlim(0,5)
plt.ylim(0,20)

plt.title("Simple Line Graph")
plt.xlabel("Data X")
plt.ylabel("Data Y")

plt.plot(x_data, y_data, "ro")
plt.show()

In [None]:
# Multiple line plots on the same graph 
plt.xlim(0,5)
plt.ylim(0,20) 

y_data2 = [2, 7, 9, 10]

# Making sure a legend is provided by using the label argumet 
plt.plot(x_data, y_data, "ro", label="Data 1")
# plt.plot(x_data, y_data2, "bx", label="Data 2")

plt.plot(x_data, y_data2, "bx-", linewidth=0.1, label="Data 2")

plt.legend() 
plt.show() 

In [None]:
# Generating a sequence sampled at intervals of 0.2 between 0.0 and 0.5
y1 = np.arange(0., 5., 0.2) #  like range(0,5,0.2) but returns a numpy array

plt.plot(y1)
plt.show()

### Lecture Practice (15 mins) 

1. Let's do some numpy array math! We have generated a 1D array y1 in the cell above. Create two more arrays, y2 and y3. 
    1. Data in y2 should be twice of the data in y1
    2. Data in y3 should be cube of the data in y1 <br><br>

2. Create a visualization that looks like below: 

![curves-cubic](day7_three_lines.png)

HINT: Color codes - red: 'r', blue: 'b', green: 'g'

In [None]:
y2 = y1*2
y3 = y1**3 

plt.plot(y1, label="Y1-Linear")
plt.plot(y2, "ro-", label="Y2-Double")
plt.plot(y3, "gx-", label="Y3-Cube")
plt.legend()
plt.show() 

## Bar plots

In [None]:
# Making a bar plot 
# Documentation: https://matplotlib.org/3.2.2/api/_as_gen/matplotlib.pyplot.bar.html?highlight=bar%20plot#matplotlib.pyplot.bar

x_data = ["2014", "2015", "2016", "2017"]
y_data = [4500, 5566, 1255, 3556]

# Using the bar() function 
plt.bar(x_data, y_data)
plt.show()

In [None]:
# Fixing the x-axis labels by explicitly stating the positions of the bars in the bar() function
bar_positions = [0, 1, 2, 3]
plt.bar(bar_positions, y_data)

# Providing labels as a list of strings
plt.xticks(bar_positions, x_data)
plt.show() 

### Lecture Practice (20 mins) 

1. Refer to the documentation of bar plots. What other parameters should you specify when calling plt.bar() function to create a bar plot as shown below? Specify bar positions and xticks as done in the cell above. 

![day7-barplot](day7-barplot.png)

You can use the values below:

```
x_data = ["2014", "2015", "2016", "2017"]
y_data = [4500, 5566, 1255, 3556]
```

Documentation: https://matplotlib.org/3.2.2/api/_as_gen/matplotlib.pyplot.bar.html?highlight=bar%20plot#matplotlib.pyplot.bar


In [None]:
bar_positions = [0, 1, 2, 3]
plt.bar(bar_positions, y_data, color="r", width=0.2, align="edge", edgecolor="green")

# Providing labels as a list of strings
plt.xticks(bar_positions, x_data)
plt.show() 

### Plotting wine data from yesterday

In [None]:
# Read the data again 
import csv 

filename = "day6-winequality-red.csv" 

# read file and store in a list
with open(filename) as f: 
    wines = list(csv.reader(f, delimiter=";"))

# Let's get rid of the first row i.e. header row  
wines_without_Header = wines[1:]

# put into numpy array so we can plot it
wines_array = np.array(wines_without_Header, dtype=float)

# Printing all the column names 
header_row = wines[0]

for i in range(0, len(header_row)):
    print("Colum {}: {}".format(i,header_row[i]))

In [None]:
# Creating a scatter plot and specifying the use of circles for the scatter
x = wines_array[:,10]
y = wines_array[:,11]
plt.plot(x,y,"o")
plt.xlabel("Quality")
plt.ylabel("Alc. Content")
plt.show()

In [None]:
# Creating a scatter plot using the scatter function
plt.ylabel("volatile acidity")
plt.xlabel("quality")
plt.scatter(wines_array[:,11], wines_array[:,1], color="red")
plt.show()

In [None]:
# Creating a simple boxplot on alcohol content values
plt.boxplot(wines_array[:,10])
plt.show()

In [None]:
# Creating histograms on quality rating 
# Documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html
print(plt.hist(wines_array[:,11], bins = 10, color="red"))
plt.vlines(5.636, 0, 700, "g")
plt.show() 

In [None]:
# Print the min and max values of quality to decide the bins 
wines_array[:,11].min()
wines_array[:,11].max()
wines_array[:,11].mean()

### Lecture Practice 

1. Create a histogram on alcohol content of wines <br> Which bin has the maximum number of wines? 

2. Create a scatter plot of pH (Y-axis) vs Quality (X-Axis). Use the scatter function. 

In [None]:
# Practice Problem 1
print(plt.hist(wines_array[:,10], color="red", histtype='step'))
plt.show()

# 9.05 to 9.7 range has the highest number of wines (515)

In [None]:
plt.scatter(wines_array[:,11], wines_array[:,8])
plt.show()
# Maximum pH seems to be ~4.0, which makes sense since wines are primarily acidic in nature 
# so they are on the left side of 7