# Lesson 11
In the previous lesson, we took a first look at the Pandas module and we created our first dataframe and data visualization. In this lesson we will go over the NumPy module, and create our first linear regression.

## Installing NumPy
Before working with NumPy, we first have to install it into our environment. The way we do this is by running the following command:

`pip install numpy`

This will allow us to work with the NumPy module and perform mathematical operations faster and easier

## Creating a NumPy Array
Like we have instantiated the dataframe object in the previous lesson, NumPy arrays have a similar method of instantiation. In this case, just place a list inside your parenthesis.

In [None]:
import numpy as np

one_dim_arr = np.array([1, 2, 3, 4, 5])

two_dim_arr = np.array([[1, 2, 3],[4, 5, 6], [7, 8, 9]])

print(one_dim_arr)
print(two_dim_arr)

In the example above, we have a one-dimensional array (1x5 matrix) and a two-dimensional array (3x3 matrix).

## Additional Functionality
NumPy arrays have much more functionality than your standard array.

### Arithmetic Operations
The code snippet below shows us creating a 1x5 matrix and calling it `one_dim_arr`. We are then performing scalar operations onto it by using the standard operation symbols that we are accustomed to. This also called element-wise operations, where the operations are performed on the matrix as a whole.

In [None]:
import numpy as np

one_dim_arr = np.array([1, 2, 3, 4, 5])

# Arithmetic operations are performed on each element of the array
print(one_dim_arr + 1)
print(one_dim_arr - 1)
print(one_dim_arr * 2)
print(one_dim_arr / 2)

### Matrix Transposition
By using the `.reshape()` method on a NumPy array, we can turn a 1x5 matrix into a 5x1 matrix. Let's take a look at the example below of this functionality

In [None]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(arr.reshape(-1, 1))

### Generating a Random NumPy Array
By using the following statement, we can create a NumPy array of random numbers. Keep in mind this will return a warning about how the function is deprecated. That just means that this function isn't being supported. There is a way we can get around the warning and that will be included below the implementation you see below.

In [None]:
import numpy as np

rand_numbers = np.random.random_integers(low=0, high=10, size=5)
print(rand_numbers)

In [None]:
import random

# We use _ to denote an unnamed iterator, this is used so we don't use up extra memory.
rand_numbers = np.array([random.randint(1, 10) for _ in range(5)])
print(rand_numbers)

## Pandas Review
If Pandas is not installed on your environment, remember to install it using `pip install pandas` in your virtual environment.

Last lesson we went over creating our first DataFrame, cleaning data, and creating a bar graph to visualize some data.

We used the `.read_csv()` method to read a dataset into Python and create a dataframe out if it.

We used the `.drop()` method to drop unnecessary columns in our dataframe.

We used the `[]` operator to access different columns of our dataframe.

We also used a `GroupBy()` object to group data into different clusters and perform operations on each group such as taking the mean.  

## Installing Scikit-Learn
In order to create linear regression models in our Python code, we will need to install the Scikit-Learn library onto our environment. Run the following command in your terminal:

`pip install scikit-learn`

## Let's Create our Linear Regression Model
First we have to define our dataframe, which will contain our dependent variables and independent variables.

For this example we will simply work with random values. We will also use an example in which we will import a csv file containing data in which we will have to clean the data before creating a regression.

### Step 1: Generating our data
For this example we will use the random library to create our dependent variable, in this case it will be profit. Our independent variable will be months, in this case 1-12. Let's create the data frame using a dictionary in which the names of our columns will be Months and Profits respectively, and they will contain the data associated with them.

In [None]:
import random
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt


"""
Months is a list of months ranging from 1-12
"""
months = [_ for _ in range(1, 13)]

"""
Profits is a list of integers ranging from 4000-5000
to show some amount of profit a company makes within
a year
"""
profits = [random.randint(4000, 5000) for _ in range(12)]


"""
df is our DataFrame, it contains two columns:
    * Month
    * Profit
such that each row is some profit in a given month
"""
df = pd.DataFrame({
    "Month": months,
    "Profit": profits
})

"""
We will print our dataframe below to show we have a
populated dataframe
"""
print(df)

### Step 2: Calculating our Linear Regression Model
Now that we have a dataframe to work with, let's compute the linear regression for our data.

In [None]:
"""
We can use the [] operator to isolate each column into a separate variable for us to create our model.

The [] returns a Series. We can see this by printing the type of an isolated column
"""
print(type(df["Month"]))

"""
The LinearRegression() object uses NumPy arrays to perform operations. The way we can create NumPy arrays
from our isolated columns is by calling .values on our Series
"""
month = df["Month"].values
profit = df["Profit"].values

"""
Let's declare a variable called model, which will instantiate a LinearRegression object.
"""
model = LinearRegression()

"""
To create a Regression fit we use the .fit() method on our model variable. We also have to use .reshape() on our months
to turn it into a 2D numpy array, allowing for the fit to function. Your program will return an exception otherwise.
"""
model = model.fit(month.reshape(-1, 1), profit)

"""
Let's begin preparation to visualize our data as a scatter plot and regression line. To create our scatter
plot, we will use plt.plot() to plot the raw data.
"""
plt.scatter(month, profit, label="Raw Data", color="blue")

"""
To get our regression line, we will use the .predict() method to predict the line of best fit for our data. Once again we
will have to use the .reshape() method to perform this operation.
"""
fit_line = model.predict(month.reshape(-1, 1))

"""
Let's plot the regression line using plt.plot(). This will create a line plot of our data.
"""
plt.plot(month, fit_line, label="Regression Line", color="red")

"""
Like last time, we will add some labels for the aesthetics.
We will also include a legend for our graph.
"""
plt.title("Regression for Profits")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.legend()
plt.show()

Let's put it all together, this time reading a csv file instead

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

df = pd.read_csv("sales.csv")

# df has columns Month and Sales, separate them into their own np.array() objects
months = df["Month"].values
sales = df["Sales"].values

# We can stack the regression calculation by chaining method calls
model = LinearRegression().fit(months.reshape(-1, 1), sales).predict(months.reshape(-1, 1))

# Add the scatter plot of raw data and line plot of regression line to the screen plus labels and legend
plt.scatter(months.reshape(-1, 1), sales, label="Raw Data", color="red")
plt.plot(months, model, label="Linear Regression", color="green")
plt.legend()
plt.xlabel("Month")
plt.ylabel("Sales")
plt.show()