This notebook has been adapted from... 

https://github.com/callysto/basketball-and-data-science/blob/main/content/02-visualizing-data.ipynb, with permmission.

(Open in 
[Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Analysis&branch=main&subPath=BADS/02-visualize/02-02-scatter-plots.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Analysis/blob/main/BADS/02-visualize/02-02-scatter-plots.ipynb))

# Setup

Let's first take care of importing the libraries we're going to need:

In [None]:
import pandas as pd
import plotly.express as px

For convenience, we're going to create a Python function that will get the data and load it into a dataframe for us. You'll see that using a function can make a program more efficient.

Re-run this function anytime you need to re-initialize the dataframe. To run it, simple enter this command:

`get_data_and_create_dataframe()`

We're going to continue working with Pascal Siakam data. 

In [None]:
# Define a function that gets data from a given URL and creates a Pandas DataFrame
def get_data_and_create_dataframe():
    # Define the URL where the data is located
    url = 'https://raw.githubusercontent.com/callysto/basketball-and-data-science/main/content/data/nba-players/Pascal_Siakam.csv'
    
    # Use Pandas to read the data from the URL into a DataFrame
    global df 
    df = pd.read_csv(url)


# Deleting Rows

Before we play further with this data set, let's get rid of the "Careers" row we saw earlier.

Here's what the dataframe looks like now:

In [None]:
get_data_and_create_dataframe()

display(df['Season'])

We can see that is the row with index #7. Recall that to delete the row we use the `drop()` function, like this:

In [None]:
get_data_and_create_dataframe()

df.drop(7, inplace=True)

display(df['Season'])

That's better!

# Scatter Plots

Notice we are now using `px.scatter()` instead of `px.bar()`.

In [None]:
fig = px.scatter(df, 
                 x='FGA', 
                 y='FG', 
                 title='Siakam Field Goals versus Field Goal Attempts')

fig.show()

Let's update our x- and y-axis labels to make our graph more understandable:

In [None]:
fig.update_xaxes(title='Field Goal Attempts')

fig.update_yaxes(title='Field Goals')

fig.show()

# Method Chaining

As an alternative to the method of updating the x- and y-axes tites used above, you can chain them together using a technique called **method chaining**. It is a way of calling multiple methods on the same object in a single line of code. In this specific example, we can use method chaining like this:

`fig.update_xaxes(title='Field Goal Attempts').update_yaxes(title='Field Goals')`

The `update_xaxes()` and `update_yaxes()` methods are being called on the same `fig` object in a single line of code.

It is totally your choice in which technique you'd like to use.

# Trends Analysis

What conclusions can you draw from the graph above? 

To help us with that, let's add a line of best fit, which we call a trendline. We often use the ordinary least squares method of calculating when dawing a trendline, which we will do here:

In [None]:
fig = px.scatter(df, 
           x='FGA', 
           y='FG', 
           title='Siakam Field Goals versus Field Goal Attempts', 
           trendline='ols')

fig.update_xaxes(title='Field Goal Attempts')

fig.update_yaxes(title='Field Goals')

fig.show()


That makes it easier to draw a conclusion, doesn't it!

# Colour-coding the Data

To make the graph more interesting, we can also colour-code the data points by season, by making one little addition to the px.scatter() function:

`color='Season'`

In [None]:
fig = px.scatter(df, 
           x='FGA', 
           y='FG', 
           title='Siakam Field Goals versus Field Goal Attempts', 
           color='Season') # <<<<<<<< NEW LINE

fig.update_xaxes(title='Field Goal Attempts').update_yaxes(title='Field Goals')

fig.show()


# Changing the Size of the Data Points

And to make it even more interesting, we can change the size of the data points so their size changes proportionally to one of the fields (columns) of data. Again, all it takes it a small change to the px.scatter() function:

`size='FG'`

In [None]:
fig = px.scatter(df, 
           x='FGA', 
           y='FG', 
           title='Siakam Field Goals versus Field Goal Attempts', 
           color='Season', 
           size='FG') # <<<<<<<< NEW LINE

fig.update_xaxes(title='Field Goal Attempts').update_yaxes(title='Field Goals')

fig.show()

Cool, eh!

# Exercise

Create a scatter plot with assists per game `('AST')` on the x-axis, points per game `('PTS')` on the y-axis, and `color='Age'`. Include a trendline.

What do you observe about the relationship between these columns?

In [None]:
import pandas as pd
import plotly.express as px

url = 'https://raw.githubusercontent.com/callysto/basketball-and-data-science/main/content/data/nba-players/Pascal_Siakam.csv'
df = pd.read_csv(url)

# Enter the rest of the program below.
