# Exercise goals
In this exercise, we will use Python to build scatter plots of data. A scatter plot is used for data with two independent variables (x,y).

In the demo, we will use random data for the scatter plot.

In the following exercise, we will use train pickup location data.

# Scatter plot demo
Run the following cell to see a scatter plot demo.  

The code will:
- Generate 100 random points, each with an x and y coordinate
- Classify each point as above `y=0` or below
- Print out a sample of the points generated in a table

In [None]:
# Scatter Plot Demo
# We continue to import our librarires to start
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns; sns.set()
# bokeh is another plotting tool we have that comes from Javascript.
# bokeh is a wrapper around that library in python
# it improves aesthetics, simplicity, and allow us to plot large
# amounts of data
from bokeh.plotting import figure, output_notebook, show; output_notebook()

# we create an easy way to return a set of size n random numbers
def rup(n):
    # returns a random number in [-1, 1)
    return np.random.uniform(low=-1, high = 1, size=n)
# we initialize 100 random points in the [-1,1] x [-1,1] plane
data = pd.DataFrame({"X":rup(100), "Y":rup(100)})
# this list will hold our labels for points above/below y=0
label=[]
# we iterate over the y-values
for yval in data["Y"]:
    if yval > 0:           # the pt. is above y=0 if its y-value > 0
        label.append('+')  # denote pts. above origin as '+'
    else:
        label.append('-')  # denote pts. below origin as '-'
# add a column to our data with labels
data['Label'] = label
# the '+' labels will be grouped, followed by '-'
data.sort_values(by=["Label"], inplace=True)
# this  resets the index 
# from disordered (34, 3, 8) to ordered (0, 1, 2)
data.reset_index(drop=True, inplace=True)

# seperate '+' from '-' entries
ind=0
# enumerate(['a', 'b', 'c']) --> [(0, 'a'), (1, 'b'), (2, 'c')]
for i, val in enumerate(data['Label']):
    # at what index do we change from '+' to '-'
    if val == '-':
        ind = i
        # exit the for loop once you've found the transition
        # break exits its innermost for loop
        # so it doesn't matter that it's within an if here
        break

# see the point in data where labels switch
data.iloc[i-1:i+1,:]

# Code documentation
Run the following cell to see docs for the Python `figure` class.

In [None]:
from bokeh.plotting import figure
?figure

# Generate the scatter plot
Run the following cell to generate a scatter plot based on the random data you just generated.

In [None]:
# pixels of resulting image
plot_width, plot_height = int(500), int(500)

# these tools will allow you to move around within the plot, zoom in, or reset 
# to original image
tools='pan, wheel_zoom, reset'
p = figure(title = '100 Random Points',
           tools=tools, plot_width=plot_width, plot_height=plot_height,
           x_range=(-1,1), y_range=(-1,1))


options1 = dict(line_color=None, fill_color='blue', size=5)
# this plots points as circles
# options1 gets added to the end
p.circle(x=data.iloc[:i, 0], y=data.iloc[:i, 1], **options1)

options2 = dict(line_color=None, fill_color='red', size=5)
# this plots points as squares 
# options2 gets added to the end
p.square(x=data.iloc[i:, 0], y=data.iloc[i:, 1], **options2)

# creates a generate-similar to a list-from [-1,1) 
t=np.arange(-1,1,.05)

# this plots y=0 at each point defined in t (-1, -.95, -.90, ...)
p.line(t, np.zeros(len(t)))

# this displays the plot
show(p)

# Scatter plot exercise
In this exercise, we will use train data to create a scatter plot of pickup locations. We will add pickup points to a graph, with the longitude as `x` and latitude as `y`.

## Load the data

As the first step, run the cell below to load the train data into the variable `df`. The code then prints a summary of longitude and latidue data.

In [None]:
# Scatter Plot Exercise
# we continue with importing libraries to start
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns; sns.set()
# output_notebook() is similar to %matplotlib inline
# it embeds the plot within the notebook
from bokeh.plotting import figure, output_notebook, show; output_notebook()

df = pd.read_csv('train.csv')
# lets get a better look at each variable
print(df['pickup_longitude'].describe(),'\n')
print(df['pickup_latitude'].describe())

# Code documentation
Run the cell below to see documentation for the `circle()` method.

In [None]:
?p.circle()

# Create the scatter plot
You will run the cell below to create a scatter plot of a sample of the train data pickup locations.

First, you must set the values for the following lines:
- Add parameters to the `c.circle()` . 
- Add values for `p.xaxis.axis_label` and `p.yaxis.axis_label`. Remember to put the values in quotation marks.

**Hint** For the `circle` parameter, you want to use the pickup longitude and latitude of the data sample (`sample['pickup_longitude'], sample['pickup_latitude']`.

Try it first.  You can check your code against the answer in the cell that follows.

In [None]:
# pixel dimensions of resulting image
plot_width, plot_height = int(500), int(500)
p = figure(tools='pan,wheel_zoom,reset', plot_width=plot_width, plot_height=plot_height)

# turn off gridlines
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None

# There are more than one million rows in our data.
# We use the sample because it makes it a lot easier/faster for your comp
# to display the graph without losing much meaning from the data. 
# use sample when you're plotting
sample = df.sample(n=10000)

options = dict(line_color=None, fill_color='blue', size=5)
#
#---------------Enter your code here------------------------#
# plots each point as a small circle
p.circle()
# label the axes
p.xaxis.axis_label = 
p.yaxis.axis_label = 
show(p)
#-----------------------------------------------------------#


# Answer code
We used the following code in the cell above:

```python
#---------------Enter your code here------------------------#
# plots each point as a small circle
p.circle(x=sample['pickup_longitude'], y=sample['pickup_latitude'], **options)
# label the axes
p.xaxis.axis_label="Longitude"
p.yaxis.axis_label="Latitude"
show(p)
#-----------------------------------------------------------#
```

# Zoom in on the scatter plot
Now we can zoom in on the data to get a better view. You may get a sample that contains an
outlier, which makes this especially relevant. 

By eye, enter an x and y range to zoom in on from the previous plot.

**Hint** For the `x` and `y` variables, enter two numbers separated by a comma, such as `(1, 2)`. Don't forget that the range might be in negative numbers.

Try it first.  You can check your code against the answer in the cell that follows.

In [None]:

#----------------Enter your code here-------------#
x=()
y=()
#-------------------------------------------------#
c = figure(tools=tools, plot_width=plot_width, plot_height=plot_height, 
           x_range=x, y_range =y)

sample = df.sample(n=10000)

options = dict(line_color=None, fill_color='blue', size=5)
# plots each point as a small circle
c.circle(x=sample['pickup_longitude'], y=sample['pickup_latitude'], **options)
c.xaxis.axis_label="Longitude"
c.yaxis.axis_label="Latitude"
show(c)

# Answer code
We used the following code in the cell above:

```python
#----------------Enter your code here-------------#
x=(-74.2,-73.7)
y=(40.6, 40.9)
#-------------------------------------------------#
```