# linear regression in python

In this notebook, 

## let's get started

This is a "text" cell. It is used to provide instructions and information about what the code cells in the notebook are doing.

You can edit this text cell by double-clicking anywhere in its contents.

If you double-click this cell, you can add, delete or modify its contents using a simple text editor that includes things like:

**bold** and *italicized* text, but also:

numbered lists
1.   List item
2.   List item

and un-numbered lists
*   List item
*   List item

You can also insert [hyperlinks](https://ai.ufl.edu).

Try double-clicking on this text cell and editing its contents. You'll see that there is a simple 'markdown language' that controls the text formatting. For example, you can make a text **bold** by enclosing it in double-stars "**". And you can create a title by prefacing the title text with 'hashtags' "#". You can also control text formatting using the text-formatting icons that appear at the top of the text cell when you double-click it to start editing.

## what about code cells?

Text cells are interesting and all, but they provide little more than what you'd get from a simple web page.

The real 'magic' of jupyter notebooks comes from the "code" cells, which allow you to write, edit and execute Python code in real time.

We saw a very simple Python code cell in the first jupyter notebook. It is repeated below this text cell. Clicking on the left-hand side of the code cell executes the code. If the code produces any output (not all code produces output), the output is displayed below the code cell when the code finishes running.

Go ahead and run the following 'hello world' code cell. You'll likely get a Warning message stating that this notebook was not created by google; that's fine, run it anyway!

You should see the "hello world!" output. You might also notice that the upper-right corner of the google colab window has changed! By running a code cell, you have also connected to an online runtime environment - a computer on the internet that is executing your code for you. Now you should see your notebook's RAM and Disk usage displayed in the upper-right corner.

In [None]:
print('hello world!')

## plotting linear data

Printing "hello world" can get pretty boring.

The next code cell gets more interesting. Here we are generating some linearly-distributed data points and plotting them as a scatterplot.

We will be doing this type of thing a lot in this course. Don't worry if the code doesn't make any sense to you right now; we'll learn how to do this type of thing in detail later on.

For now, just run the following code cell to see how you can generate and graph data in real time using jupyter notebooks.

In [None]:
import sklearn.datasets
import matplotlib.pyplot as plt

x,y = sklearn.datasets.make_regression(n_samples=100, n_features=1,
                                       n_informative=1, noise=10, 
                                       coef=False)

plt.scatter(x,y, marker='o')

After running the above code cell, you should see a 2-dimensional scatter plot with a bunch of blue dots that fall approximately along a diagonal line. Not too bad for essentially 2 lines of code!

If you are curious about what this code cell is doing...

The first 2 lines start with the word "import"; these lines are loading Python libraries, so you can use the code in those libraries. The "sklearn.datasets" library has a function called "make_regression" that creates random data for you.

The "matplotlib.pyplot" library has a "scatter" function that produces a scatterplot.

If you want to see line numbers in your code cells (hint: you probably do!), click on the gear-shaped icon in the upper-right corner of google colab, right next to your user icon. Select the "Editor" tab, click "Show line numbers", and then "Save". You should now see line numbers in the code cells!

Lines 4-6 in the above code cell are actually all part of a single function
call to "make_regression"; the formatting is for human readability, only. This function call generates the 'random' x,y points and stores them in the "x" and "y" variables, respectively. 

Line 8 is the function call that produces the scatterplot from these points.

Again, don't worry about digesting all the Python code yet; we'll get to that later on.

## editing code cells

So, now you can execute Python code cells from within your jupyter notebook and generate results in real-time, which is cool. But what's cooler is that you can also edit the code and see the results in real-time.

Try this with the code cell below, which is just a replica of the linear scatter plot. Try running it once, just to see that it works.

Notice that the points aren't exactly the same? That's because the "make_regression" function generates 'random' data, which is (potentially) different every time! In fact, you can keep running the code cell over and over, to produce different data.

### Let's change how the data are displayed. 

Edit line 8, below, changing the

    marker='o'

to

    marker='*'

and re-run the code cell. Now you have a different type of marker for your data plot.

In [None]:
import sklearn.datasets
import matplotlib.pyplot as plt

x,y = sklearn.datasets.make_regression(n_samples=100, n_features=1,
                                       n_informative=1, noise=10, 
                                       coef=False)

plt.scatter(x,y, marker='o')

## playing with random data generation

Remember how we were saying that the "make_regression" function generates 'random' data that is not the same each time? Let's confirm this.

The code cell below uses the same "make_regression" function call as before, but instead of plotting the resulting points, it just prints the x,y coordinates of the first generated point.

Try running this code multiple times, to see how the x,y location of the first data point changes each time.

Sometimes we want our random data generator to be able to generate the same data more than once. To do this, we can set the "random_state" of the "make_regression" function call to a specific numerical value.

As an example, on line 6 of the code cell below, change the
  
    coef=False)

line to

    coef=False, random_state=2021)

Adding a "random_state" value of "2021" to the function call.

Try running the code cell multiple times.

You should see the same x,y coordinates being printed every time!

Keep a note of these x,y values; you will likely need them to answer a quiz question :)

In [None]:
import sklearn.datasets
import matplotlib.pyplot as plt

x,y = sklearn.datasets.make_regression(n_samples=100, n_features=1,
                                       n_informative=1, noise=10, 
                                       coef=False, random_state=2021)

print('x,y = {:.4f},{:.4f}'.format(x[0][0], y[0]))

After you take the quiz for this notebook, make sure you terminate your notebook's runtime session by clicking on the downward-facing arrow in the upper-right corner of google colab, next to the RAM and Disk usage icon. Select "Manage sessions" and TERMINATE any sessions you have running.