# Creating Scatter Plots

(Open in 
[Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Dunkers&branch=main&subPath=Demos/scatter-plots.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Dunkers/blob/main/Demos/scatter-plots.ipynb))

# Lesson Objectives

By the end of this lesson, students will be able to:
- Create scatter plots using Plotly Express to analyze relationships between two variables.
- Utilize filtering to clean data and prepare it for visualization.
- Customize scatter plot aesthetics by updating axis titles and employing method chaining to streamline code.
- Interpret trends in data by incorporating trend lines using ordinary least squares (OLS) methods.
- Apply color-coding to scatter plots to differentiate data points by categories such as season or age, enhancing the analysis.
- Adjust the size of data points in scatter plots based on quantitative variables to reflect the magnitude of data points visually.
- Analyze the relationship between various basketball statistics, such as assists and points per game, to understand player performance dynamics.
- Develop a comprehensive understanding of data visualization techniques that allow for effective communication of complex data insights.

# Getting and Cleaning Our Data

We're going to continue with the same importing and processing of the Pascal Siakam data, just like before:

In [None]:
import pandas as pd
import plotly.express as px

# URL of the data source
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'

# Read the data into a pandas dataframe
df = pd.read_csv(url)

# Filter to only use data <= 2022-23
filter = df['SEASON_ID'] <= '2022-23'
df = df[filter]

# Display the dataframe
display(df)

# Scatter Plots

Previously we used `px.bar()`. We now use `px.scatter()` for the scatter plot:

In [None]:
# Create a scatter plot using plotly express
fig = px.scatter(df, 
  x='FGA', 
  y='FGM', 
  title='Siakam Field Goals versus Field Goal Attempts')

fig.show()

Let's update our x- and y-axis labels to make our graph more understandable:

In [None]:
fig.update_xaxes(title='Field Goal Attempts')

fig.update_yaxes(title='Field Goals Made')

fig.show()

# Method Chaining

As an alternative to the method of updating the x- and y-axes tites used above (on separate lines), you can put them together using a technique called **method chaining**. It is a way of calling multiple methods on the same object in a single line of code. In this specific example, we can use method chaining like this:

`fig.update_xaxes(title='Field Goal Attempts').update_yaxes(title='Field Goals Made')`

The `update_xaxes()` and `update_yaxes()` methods are being called on the same `fig` object in a single line of code.

It is totally your choice in which technique you'd like to use.

In [None]:
fig = px.scatter(df, 
  x='FGA', 
  y='FGM', 
  title='Siakam Field Goals Made versus Field Goal Attempts')

# Method chaining example
fig.update_xaxes(title='Field Goal Attempts').update_yaxes(title='Field Goals Made')

fig.show()

You can even combine them like this, but this makes for a very long line:

In [None]:
fig = px.scatter(df, x='FGA', y='FGM', title='Siakam Field Goals Made versus Field Goal Attempts').update_xaxes(title='Field Goal Attempts').update_yaxes(title='Field Goals Made')

fig.show()

A more "Pythonic" way of doing the same thing is shown below, where you break the line up but put a backslash ('**\\**') at the end of each wrapped line:

In [None]:
fig = px.scatter(df, x='FGA', y='FGM', title='Siakam Field Goals Made versus Field Goal Attempts') \
    .update_xaxes(title='Field Goal Attempts') \
    .update_yaxes(title='Field Goals Made')

fig.show()

Again, it's totally your choice in which technique you'd like to use.

# Trends Analysis

What conclusions can you draw from the graph above? 

To help us with that, let's add a line of best fit, which we call a trendline. We often use the ordinary least squares method of calculating when dawing a trendline, which we will do here:

In [None]:
fig = px.scatter(df, 
    x='FGA', 
    y='FGM', 
    title='Siakam Field Goals Made versus Field Goal Attempts', 
    trendline='ols')

fig.update_xaxes(title='Field Goal Attempts')

fig.update_yaxes(title='Field Goals Made')

fig.show()


That makes it easier to draw a conclusion, doesn't it!

# Colour-coding the Data

To make the graph more interesting, we can also colour-code the data points by season, by making one little addition to the px.scatter() function:

`color='Season'`

In [None]:
fig = px.scatter(df, 
    x='FGA', 
    y='FGM', 
    title='Siakam Field Goals Made versus Field Goal Attempts', 
    color='SEASON_ID') 

fig.update_xaxes(title='Field Goal Attempts').update_yaxes(title='Field Goals Made')

fig.show()


# Changing the Size of the Data Points

To make it even more interesting, we can change the size of the data points so their size is proportional to one of the fields (columns) of data. Again, all it takes it a small change to the px.scatter() function:

`size='FG'`

In [None]:
fig = px.scatter(df, 
    x='FGA', 
    y='FGM', 
    title='Siakam Field Goals versus Field Goal Attempts', 
    color='SEASON_ID', 
    size='FGM') 

fig.update_xaxes(title='Field Goal Attempts').update_yaxes(title='Field Goals')

fig.show()

Cool, eh!

# Exercise

Create a scatter plot with assists per game `('AST')` on the x-axis, points per game `('PTS')` on the y-axis, and `color='AGE'`. Include a trendline.

What do you observe about the relationship between these columns?

In [None]:
import pandas as pd
import plotly.express as px

url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Pascal_Siakam.csv'
df = pd.read_csv(url)

# Enter the rest of the program below.

---
*Report issues or give us feedback about this notebook [here](https://docs.google.com/forms/d/e/1FAIpQLSdMRX2hPqZyD8-argFJXxB3ABQdLk3aUH1CAfmMEtcFAlWzCw/viewform?usp=pp_url&entry.1771525592=Module%20Resources%20%28the%20Jupyter%20notebooks%2C%20PPTS%20or%20additional%20resources%29&entry.1364186163=Creating%20Scatter%20Plots).*

---
Back to [Lessons](https://github.com/pbeens/Data-Dunkers/blob/main/Lessons.ipynb)

---
This notebook has been adapted from... 

https://github.com/callysto/basketball-and-data-science/blob/main/content/02-visualizing-data.ipynb, with permission.