# Reading tabular data

Tabular data can be read into a Pandas data frame in a number of ways.

* Read from a csv file
* Read from an excel spreadsheet
* Read from a SQL database
* Read just the tabular part of a spatial file by setting ignore_geometry parameter to True
* Read tables from a website
* Many other formats

Lets begin as usual with the matplotlib inline directive, importing GeoPandas, and also importing pandas

In [None]:
%matplotlib inline
import geopandas as gpd
import pandas as pd

Now lets read a csv file using the pandas read_csv method

In [None]:
BAEA_sources = pd.read_csv("data/BAEA_sources.csv")

In [None]:
BAEA_sources

Lets try another

In [None]:
BAEA_surveys = pd.read_csv("data/BAEA_surveys.csv")

We can display the data frame in sorted order by calling the sort_values method

In [None]:
BAEA_surveys.sort_values("nest")

We can also select just a subset very easily

In [None]:
BAEA_surveys[BAEA_surveys["nest"]==14]

And we can use this notation anywhere we can use a pandas data frame

In [None]:
BAEA_surveys = pd.read_csv("data/BAEA_surveys.csv", parse_dates=["date"])

In [None]:
BAEA_surveys.info()

## Read from excel file

Not terribly different from read_csv but you can specify a sheet by name

This method does require that the python xlrd module is present in your environment

This method also only reads xls files, not the xlsx file type which has been the norm in excel since 2004. But it is relatively trivial to save an excel file in xls format. 

In [None]:
!pip install xlrd

In [None]:
albatross = pd.read_excel("data/wildlife_sightings.xls", sheet_name="albatross")

In [None]:
albatross

In [None]:
albatross.plot()

# Converting non-spatial data to spatial data

You can convert a pandas data frame that has columns for longitude and latitude into a spatial GeoPandas data frame by calling GeoPandas points_from_xy method as below.

In [None]:
albatross_spatial = gpd.GeoDataFrame(
    albatross, geometry=gpd.points_from_xy(albatross["longitude"], albatross["latitude"]))

In [None]:
albatross_spatial

If you have a pandas data frame with geometry in wkt format, you can also convert that to a geopandas data frame.  Instructions can be found in the [GeoPandas documentation](https://geopandas.org/gallery/create_geopandas_from_pandas.html)

## Reading tabular data from an HTML page

Pandas can also read an HTML table from a website into a pandas data frame which can be incredibly useful.

Before using this you need to install a few libraries.

We can actually do this from within jupyter notebooks

In [None]:
!pip install lxml html5lib BeautifulSoup4

Once this is done there are a few more lines of code that are necessary to allow you to read tables from web pages over secure (https) connections.

In [None]:
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

Once this is done you simply need to provide a web page.

**NOTE:** The return value is a python list of pandas dataframes.  One dataframe for each table in the specified web page.

In [None]:
gdp_list = pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)_per_capita")

In [None]:
gdp_list[4]

Also note that this will not work on all web pages. In my experience web pages that generate tables dynamically rather than static web pages are especially problematic.

# Many other formats

In [None]:
pd.read_

In [None]:
help(pd.read_html)