# Exploratory data analysis

Pandas provides a rich set of plotting tools for exploratory data analysis, however there is a huge ecosystem avaiable for plotting not just maps but charts of various types.  In fact I expect to create an entire course about Geospatial Data Visualization.

In the meantime, however I will introduce the topic in this lecture as I think it provides an additional reason to use GeoPandas in that we can integrate graphical output with our spatial and tabular data.  Keep in mind however that this lecture is just the briefest of introduction as to what is possible.

Lets load some data.

In [None]:
%matplotlib inline
import geopandas as gpd

raptor = gpd.read_file("data/Raptor_Nests.shp")
raptor.rename(inplace=True, columns={"postgis_fi":"gid", "lat_y_dd":"latitude", "long_x_dd":"longitude"})
raptor

One of the most common types of exploratory data analysis is plotting out a histogram of numeric data to look at its distribution.  

The Pandas hist() method returns a histogram of all numeric fields in the DataFrame.

In [None]:
raptor.hist()

You can also get a histogram of a specific data series by calling the hist method on the series.

In [None]:
raptor['longitude'].hist()

Of course you can include parameters to limit the column, partition the histogram by a categorical column, specify bin size, output size and colors, etc. Please, refer to the documentation for the full set of parameters available.

In [None]:
raptor.hist(column = 'latitude', by='recentstat', legend=True, bins=30, figsize=(15,15), color='red')

We can also use the bloxplot method to see boxplots for numerical data partitioned by a categorical value as follows

In [None]:
raptor.boxplot(column='latitude', by='recentstat')

Of course, if we just use the plot method on a GeoDataFrame the output will be a map of the geometry column by default.

In [None]:
raptor.plot()

but we can also call some of the Pandas plotting method on a single Pandas data series.

In [None]:
raptor['latitude'].hist(bins=30)

The plot method on a non-geometry dataseries returns a line graph, with the index values on the x axis.

In [None]:
raptor['latitude'].plot()

If you want to use pandas plotting methods on your tabular data you can easily reduce the GeoDataFrame to a normal Pandas DataFrame simply by subsetting the dataframe by column and not including the geometry field.

In this example we create a scatterplot of longitude and latitude which presents a similar output to the GeoPandas plot method although the scaling of the axes is not guaranteed to be equal

In [None]:
raptor[['longitude', 'latitude', 'recentstat']].plot.scatter(x='longitude', y='latitude')

For some plots you will need to summarize the dataframe. In this example we summarize the raptor data by the recentstat category and use the count aggregator to create a new dataset that is appropriate to use with a pie chart.

In [None]:
stat_count = raptor[['longitude', 'recentstat']].groupby('recentstat').agg('count')
stat_count

In [None]:
stat_count.plot.pie(y='longitude')

This same data coud be used in a bar chart

In [None]:
stat_count.plot.bar(y='longitude')

Hopefully this provides you with a starting point for exploratory data analysis with Pandas and GeoPandas.  There could certainly be a lot more coverage of this topic.

Keep in mind that Pandas and GeoPandas are using matplotlib in the background for plotting.  Although the syntax in Pandas is simpler in my opinion, using matplotlib directly is also possible and provides a stunning amount of flexibility.

Other data vizualization libraries you can use with GeoPandas include Seaborn which also uses matplotlib in the background but adds some nice functionality with simpler syntax.  Plotly is another package for vizualizations but Plotly provides interactive charts that allow you to see actual data values when hovering the mouse over the chart, zoom in or out,  and much more.