Open this notebook in Callysto [here](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Analysis&branch=main&subPath=Demos/where-can-we-get-data-from-webpage.ipynb&depth=1) or in Colab [here](https://githubtocolab.com/pbeens/Data-Analysis/blob/main/Demos/where-can-we-get-data-from-webpage.ipynb).

## Program Setup

This first code block may have to be run if these libraries haven't already been installed. Once this has been done once, it will never have to be done again. You can skip it for now, but if you get an error message related to a library not being installed, go ahead and run it.

In [None]:
%pip install pandas -q
%pip install plotly_express -q
%pip install openpyxl -q

## Introduction

There are many ways we can import data, but the most common are from the program itself, a CSV (comma separated values) file, from an Excel spreadsheet, from a Google Sheet, or from a webpage. 

So far we have lookedd at how to get data from [in the Jupyter Notebook itself](where-can-we-get-data-from.ipynb), from a [CSV file](where-can-we-get-data-from-csv.ipynb), and from an [Excel file](where-can-we-get-data-from-excel.ipynb).

In this demo, we will demonstrate how to get data from a webpage.

## Data from a table on a webpage

As you might imagine, the overall program won't be much different than the ones above. Instead of `read_csv` or `read_excel`, we are using `read_html`, but one important difference is we have to tell the program which table we want to use. 

When Pandas reads in the tables on a webpage, it indexes them, with the first table being indexed with the value 0 (zero).

Here's our program. Look closely at how the table index number is referenced.

In [None]:
# import plotly.express and pandas
import plotly.express as px
import pandas as pd

# Read the html file into a DataFrame named df
# Note we are using the first table which is index 0
url = 'https://raw.githubusercontent.com/pbeens/Data-Analysis/main/Data/x-y-data.html'
df = pd.read_html(url)[0]

# Create the plot
fig = px.line(data_frame=df, 
              x='X', 
              y='Y', 
              title='Data from a table on a webpage')

# Show the plot
fig.show()

Note: if you have multiple rows for the column headers, look at **Fixing a multi-index** in the [cheat sheet](../cheatsheet.md).

---
In our next demonstration we will get our data from a [Google Sheet](where-can-we-get-data-from-google-sheet.ipynb). ([GitHub link](https://github.com/pbeens/Data-Analysis/blob/main/Demos/where-can-we-get-data-from-google-sheet.ipynb))