# Getting Data From a CSV File (Hoops Activity)

Open this notebook in [Callysto](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Dunkers&branch=main&subPath=Demos/data-from-csv.ipynb&depth=1) | [Colab](https://githubtocolab.com/pbeens/Data-Dunkers/blob/main/Demos/data-from-csv.ipynb).

# Lesson Objectives

By the end of this lesson, students will be able to:
- Utilize the Pandas library to load data from a CSV file into a DataFrame.
- Display the top and bottom rows of data using the `head()` and `tail()` functions in Pandas.
- Identify and manipulate column names within a DataFrame using the `columns` attribute.
- Create a line plot using Plotly Express by specifying data frames and column mappings.
- Recognize the importance of accurately specifying column names in data analysis and visualization tasks.

## Program Setup 

This first code block may have to be run if these libraries haven't already been installed. Once this has been done once, it will never have to be done again. You can skip it for now, but if you get an error message related to a library not being installed, go ahead and run it.

In [3]:
%pip install pandas -q
%pip install plotly.express -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Introduction

There are many ways we can import data, but the most common are from the program itself, a CSV (comma separated values) file, from an Excel spreadsheet, from a Google Sheet, or from a webpage. 

In this demo, we will demonstrate how to get data from within the Jupyter Notebook itself.

## Setup & Input


In this example program, we first import the **Pandas** library using `import pandas as pd` (we still need `plotly.express` so that's imported as well). We then use the `pd.read_csv()` function to read the [CSV file](https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Hoops_Data.csv) into a **Pandas DataFrame**. 

In [4]:
# import plotly.express and pandas
import plotly.express as px
import pandas as pd

# Read the CSV file into a DataFrame named df
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/Hoops_Data.csv'
df = pd.read_csv(url)

## Process

Just for fun, let's look at the top few lines of data we just inputted. We use the Pandas `head()` function for this:

In [5]:
# Display the first 5 rows of the data
print(df.head())

             Timestamp First Name  Shot Distance (feet)  Shot Made?
0  10/18/2023 10:51:01      David                    10       False
1  10/18/2023 10:53:00      David                    10        True
2  10/18/2023 13:38:16         MG                     8        True
3  10/18/2023 13:38:25         MG                     8       False
4  10/18/2023 13:38:33         MG                     8       False


What about the bottom rows? (Let's only look at the bottom 2 rows)

In [6]:
# Display the last 2 rows of the data
print(df.tail(2))

               Timestamp First Name  Shot Distance (feet)  Shot Made?
1358  5/23/2024 12:45:11        EAJ                    10       False
1359  5/23/2024 12:46:23        A.D                    10        True


You'll see that Pandas has inserted an index column before the data. We won't worry about that at this time because it won't affect us here.

Besides using `head()` to have a quick look at the data, data scientists also often look at what columns are included in the datafile. To do that, we use the `df.columns` attribute. Here's how:

In [7]:
# Display the column names
print(df.columns)

Index(['Timestamp', 'First Name', 'Shot Distance (feet)', 'Shot Made?'], dtype='object')


Does that look familiar? Note that the case of the letters is important, so always pay attention to that. 