# Visualizing Data with Graphs (Part 1)

## Introduction

### What is a graph?
- Definition: A diagram showing the relationship of quantities, especially such a diagram in which lines, bars, or proportional areas represent how one quantity depends on or changes with another.
- Pics of different types of graphs, all show relationships between quantities.

### Plotting Equations
- Ask to recall first experiences with graphs in school, discuss
- Remember plotting equations?
  - Given y = x + 2, how to plot? discuss
  - Best Answer: Choose a few values of x and calculate y, then plot each point and draw a line that goes through all points
  - Activity (Instructor example): Hand plot y = x + 2 using x = 0, 1, 2
    - Calculate values table
    - How to plot a point? discuss
    - Answer: match axis and count over then up
  - Activity (Individual): Hand plot y = 2x + 1 and submit picture via Discord PM

### Scatterplots and Trendlines
- Always given the equation and generate points, but what if you only had points, how to get equation? discuss

## Making Scatterplots with Plotly

Dataset: Demographic data from Kalahari !Kung San people collected by Nancy Howell in Botswana between August 1967 and May 1969.

Columns:
- Height (cm)
- Age (years)
- Gender (male, female)

### Question: How does height change with age?

### Step 1. Read CSV Data into Pandas Dataframe

#### Substep. Import Pandas Library (if Needed)

- `import pandas` &nbsp;as&nbsp; `pd`

<details>
  <summary>Blockly Hints</summary>
  <ol>
    <li><code>import .. as</code> block found under IMPORT</li>
    <li>Variable <code>pd</code> must be created under VARIABLES before it appears in the <code>as</code> dropdown</li>
  </ol>
</details>

In [1]:
import pandas as pd

Try it!

#### Substep. Read CSV data and Save in Variable

- Create variable &nbsp;`df`
- `Set df to` &nbsp;:&nbsp; `with pd do read_csv` &nbsp;using&nbsp; `datasets/age_height.csv`

<details>
  <summary>Blockly Hints</summary>
  <ol>
    <li>Variable <code>df</code> must be created under VARIABLES before it can be used</li>
    <li><code>set .. to</code> block found under VARIABLES</li>
    <li><code>with .. do .. using</code> block found under VARIABLES</li>
    <li><code>with .. do .. using</code> block found under VARIABLES</li>
    <li>If <code>do</code> dropdown in <code>with .. do .. using</code> block will not populate, try "Run All Above Selected Cell" from the "Run" menu in top-left.</li>
    <li>You need a <code>".."</code> block found under TEXT to fill in the <code>using</code> part of the <code>with .. do .. using</code> block</li>
    <li>If <code>with .. do .. using</code> block does not want to snap together nicely with the <code>set .. to</code> block, try dragging the <code>set .. to</code> block instead.</li>
  </ol>
</details>

In [2]:
df = pd.read_csv('datasets/age_height.csv')


Try it!

#### Substep. Display Dataframe Contents

- Place a &nbsp;`df`&nbsp; block (VARIABLES)

In [3]:
df

Try it!

### Step 2. Generate Plotly Scatterplot

#### Substep. Import Plotly Express Library

- `import plotly.express` &nbsp;as&nbsp; `px`

<details>
  <summary>Blockly Hints</summary>
  <ol>
    <li><code>import .. as</code> block found under IMPORT</li>
    <li>Variable <code>px</code> must be created under VARIABLES before it appears in the <code>as</code> dropdown</li>
  </ol>
</details>

In [4]:
import plotly.express as px

Try it!

#### Substep. Generate Scatterplot

- Get a &nbsp;`with px do scatter using`&nbsp; block
- Inside that block put a &nbsp;`create list with`&nbsp; block, and inside that block put
    - `df`&nbsp; (from VARIABLES)
    - a freestyle block with &nbsp;`x="Age"`&nbsp; in it
    - a freestyle block with &nbsp;`y="Height"`&nbsp; in it

<details>
  <summary>Blockly Hints</summary>
  <ol>
    <li>If <code>do</code> dropdown in <code>with .. do .. using</code> block will not populate, try "Run All Above Selected Cell" from the "Run" menu in top-left.</li>
    <li>You need a <code>create list with</code> block found under LISTS to fill in the <code>using</code> part of the <code>with .. do .. using</code> block</li>
    <li>The <code>create list with</code> block should not have any empty rows. You can use the <code>+ -</code> controls on the block to change the number of rows.</li>
  </ol>
</details>

In [5]:
px.scatter(df, x='Age', y='Height')


Try it!

### Trendlines

In [6]:
px.scatter(df, x='Age', y='Height', trendline="ols")

In [7]:
df_ols = df
df_ols.loc[df_ols["Age"] == 0, "Age"] = 0.5
df_ols


In [8]:
px.scatter(df_ols, x='Age', y='Height', trendline="ols", trendline_options=dict(log_x=True))

In [9]:
px.scatter(df, x='Age', y='Height', trendline="lowess")