# Adding New Columns

(Open in [Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Dunkers&branch=main&subPath=Demos/new-columns.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Dunkers/blob/main/Demos/new-columns.ipynb))

---

## Lesson Objectives

By the end of this lesson, students will be able to:
- Understand the process of adding new columns to a DataFrame in Python using Pandas, to enhance data with calculated metrics.
- Calculate averages and other derived statistics, such as shot averages from field goal and free throw percentages, and add these as new columns to a DataFrame.
- Apply mathematical operations to DataFrame columns to create meaningful new data points, such as total minutes played across all games.
- Use rounding functions to format numerical data in DataFrames, ensuring clarity and precision in data presentation.
- Manipulate DataFrames to display specific columns and limit the output to a subset of records, such as the top 10 players by a particular metric.
- Develop proficiency in using Python for data manipulation, enhancing their ability to prepare data for analysis or reporting.

For this lesson we will be using a Raptors data file named [raptors-2023.csv](https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/raptors-2023.csv).

Let's create a new column that is the average of the Field Goal Percentage (FG%) and Free Throw Percentage (FT%). The program works by calculating the shot average for each player by averaging their field goal percentage (FG%) and free throw percentage (FT%), then multiplying that by 100 to convert it back to a percentage format. This new value is added as a new column named Shot Average (%) to the DataFrame.

For your reference, here are the names of all the data fields:

| Field Name | Definition | Field Name | Definition |
| --- | --- | --- | --- |
| Age | Age of the player | AST | Assists |
| BLK | Blocks | DRB | Defensive rebounds |
| eFG% | Effective field goal percentage | FG | Field goals made |
| FG% | Field goal percentage | FGA | Field goal attempts |
| FT | Free throws made | FT% | Free throw percentage |
| FTA | Free throw attempts | G | Games played |
| GS | Games started | Lg | League |
| MP | Minutes played | ORB | Offensive rebounds |
| PF | Personal fouls | Player | Name of the player |
| Pos | Position played | PTS | Points scored |
| Season | Season year | STL | Steals |
| Tm | Team abbreviation | TOV | Turnovers |
| TRB | Total rebounds | 2P | Two-point field goals made |
| 2PA | Two-point field goal attempts | 2P% | Two-point field goal percentage |
| 3P | Three-point field goals made | 3PA | Three-point field goal attempts |
| 3P% | Three-point field goal percentage | | |

In [None]:
# Import the pandas library
import pandas as pd

# Set the URL of the data file
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/raptors-2023.csv'

# Read the data file into a pandas data frame
raptors_df = pd.read_csv(url)

# Calculate the shot average for each player and add it as a new column to the data frame
raptors_df['Shot Average (%)'] = (raptors_df['FG%'] + raptors_df['FT%']) / 2 * 100

# Display the columns 'Player', 'FG%', 'FT%', and 'Shot Average (%)' of the data frame
display(raptors_df[['Player', 'FG%', 'FT%', 'Shot Average (%)']])


Let's round that column so it has just one decimal place, by using `round(1)`. Note that this technique overwrites the column with the newly rounded data.

In [None]:
# Round the 'Shot Average (%)' column to one decimal place
raptors_df['Shot Average (%)'] = raptors_df['Shot Average (%)'].round(1)

# Display the columns 'Player', 'FG%', 'FT%', and 'Shot Average (%)' of the data frame
display(raptors_df[['Player', 'FG%', 'FT%', 'Shot Average (%)']])

That's better!

# Exercise

Create a new column that is Games ('G') multiplied by Minutes Played Per Game ('MP'). Use the * symbol for multiplication. Round to one decimal place. 

What might you call the new column?

Display the data so only the top 10 players are shown.

In [None]:
# Write your program here.

import 

df = 



---
*Report issues or give us feedback about this notebook [here](https://docs.google.com/forms/d/e/1FAIpQLSdMRX2hPqZyD8-argFJXxB3ABQdLk3aUH1CAfmMEtcFAlWzCw/viewform?usp=pp_url&entry.1771525592=Module%20Resources%20%28the%20Jupyter%20notebooks%2C%20PPTS%20or%20additional%20resources%29&entry.1364186163=Adding%20New%20Columns).*

---
Back to [Lessons](https://github.com/pbeens/Data-Dunkers/blob/main/Lessons.ipynb)

---
This notebook has been adapted from... 

https://github.com/callysto/basketball-and-data-science/blob/main/content/03-statistics.ipynb, with permission.