# Comparing Common Input Sources

## Lesson Objectives

By the end of this lesson, students will be able to:
- Import and manage data in a Jupyter Notebook using pandas from various sources, including direct entry, CSV files, Excel files, webpages, and Google Sheets.

## Introduction

In this tutorial, we'll explore five different ways to input data into a Jupyter Notebook. We will cover importing data from within the notebook itself, from a CSV file, an Excel file, a webpage, and finally from a Google Sheet. Let's get started.

## Getting Data From Within the Notebook

Our first example demonstrates how to input data directly within the Jupyter Notebook. This method is useful for small datasets or for testing purposes. Here, we define lists of x and y values and use them to create a simple line plot.

In [2]:
import pandas as pd

# Define the data
x_data = [0, 1, 2, 3, 4, 5]
y_data = [0, 1, 4, 9, 16, 25]

# Create a DataFrame from the data
df = pd.DataFrame({'X': x_data, 
                   'Y': y_data})

# Display the DataFrame
print(df)

   X   Y
0  0   0
1  1   1
2  2   4
3  3   9
4  4  16
5  5  25


## Getting Data from a CSV File

Next, we'll import data from a CSV file, a common format for data storage. We use the pandas library to read the CSV file into a DataFrame, which we can then display.

CSV files are widely used in data science due to their simplicity and compatibility with various tools. The Pandas library makes it easy to load and work with this data format.

In [3]:
import pandas as pd

# Read the CSV file into a DataFrame named df
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/x-y-data.csv'
df = pd.read_csv(url)

# Display the DataFrame
print(df)

   X   Y
0  0   0
1  1   1
2  2   4
3  3   9
4  4  16
5  5  25


## Data from an Excel Spreadsheet

For our third method, we'll load data from an Excel file. Excel is a popular tool for data management, and Pandas provides built-in support for reading Excel files just as we did with CSV files.

By simply substituting read_csv with read_excel, you can easily import data from Excel spreadsheets, making this method both flexible and powerful.

In [4]:
import pandas as pd

# Read the Excel file into a DataFrame named df
url = 'https://raw.githubusercontent.com/Data-Dunkers/data-dunkers-modules/main/data/x-y-data.xlsx'
df = pd.read_excel(url)

# Display the DataFrame
print(df)

   X   Y
0  0   0
1  1   1
2  2   4
3  3   9
4  4  16
5  5  25


## Getting Data from a Webpage

Now let's move on to extracting data from a webpage. Web scraping is an invaluable skill for gathering data that's publicly available online. In this example, we'll use the Pandas read_html function to load a table directly from a webpage.

With this method, we can easily import tabular data from any webpage, provided we correctly identify the table we want to use. This opens up a wide range of possibilities for data analysis.

In [6]:
import pandas as pd

# Read the HTML table into a DataFrame named df
url = 'https://raw.githubusercontent.com/Data-Dunkers/data/main/demo/x-y-data.html'
df = pd.read_html(url)[0]  # Index 0 is the first table

# Display the DataFrame
print(df)

   X   Y
0  0   0
1  1   1
2  2   4
3  3   9
4  4  16
5  5  25


## Getting Data from a Google Sheet

Finally, we'll demonstrate how to import data from a Google Sheet. This method is particularly useful for collaborative projects, as Google Sheets allow multiple users to edit and view the data in real-time.

To use this method, ensure your Google Sheet is set to public access, and then treat it like any other CSV file. This approach seamlessly integrates cloud-based data into your workflow.

In [8]:
import pandas as pd

# Google Sheet URL variable, with modified /export?format=csv ending
url = 'https://docs.google.com/spreadsheets/d/1ZULKhYzsMd4eYwiprsyGgE9Df3gaVtO8WRalUQDn-xE/export?format=csv'

# Read the Google Sheet into a DataFrame named df
df = pd.read_csv(url)

# Display the DataFrame
print(df)

   X   Y
0  0   0
1  1   1
2  2   4
3  3   9
4  4  16
5  5  25
