---
title: "GitHub Gists for GIS in Python: Loading a Zipped Local or Web-based Shapefile with One Function" 
author:
  twitter: linwoodc3
summary: This post introduces a utility function that can automatically read web-based or local shapefiles in zip format into the Python ecosystem.  It takes one line of code!
excerpt: "In the world of data science, we embrace the concept of spatial awareness and knowing where the data are (or datum is). In the same way that geospatial grounding (i.e. georeferenced data) brings clarity to a lost traveler, spatial context can bring clarity to a data set.  Moreover, this “where” does not always have to apply to a location on the earth’s surface . Spatial context (i.e. analytic geometry), or understanding data in the context of geometric space, is just as enlightening."
---

## GitHub Gists for GIS in Python: Loading a Zipped Local or Web-based Shapefile with One Function 
**Date:** {{ page.date | date_to_rfc822 }}<br><br>

There is nothing worse than not knowing where you are.

We have all experienced it.  It’s the panic that overtakes you when you don’t recognize your surroundings.  The buildings and roads are unfamiliar.  You don’t know where you are.  Naturally, you focus on getting to a familiar landmark or location.  Reaching that landmark brings a sense of relief.   Comfort.  Peace.  Because you know where you are on a map, it’s easier to plot a course to your final destination.

In the world of data science, we embrace the concept of spatial awareness and knowing where the data are (or datum is). In the same way that  familiar surroundings (i. e. [geo-referenced data](https://en.wikipedia.org/wiki/Georeferencing)) brings clarity to a lost traveler, spatial context can bring clarity to a data set.  This “where” does not always have to apply to a location on the earth’s surface. Spatial context (i.e. [analytic geometry](https://en.wikipedia.org/wiki/Analytic_geometry)), or understanding data in geometric space, is just as enlightening.

[Ansecombe’s quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet) is a great example. Despite having nearly same summary statistics, the plots are nowhere near same.  This is a reminder to plot your data before drawing a conclusion.  It can prevent costly errors. 

Python's `seaborn` library includes this data set, and we load it and compute the summary statistics.  Each row is a data set, and it's clear that the numbers are nearly identical.

In [2]:
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

df = sns.load_dataset('anscombe')

ddf = df.groupby('dataset').describe().unstack()



Now here is a plot of the four data sets.

<center><img src="{{ site.url }}/assets/img/anscombe.png" alt="Anscombe's Quartet" width="600" height="450"></center>


In [17]:
import pandas as pd
def pandas_df_to_markdown_table(df):
    from IPython.display import Markdown, display
    fmt = ['---' for i in range(len(df.columns))]
    df_fmt = pd.DataFrame([fmt], columns=df.columns)
    df_formatted = pd.concat([df_fmt, df])
    display(Markdown(df_formatted.to_csv(sep="|", index=False)))

pandas_df_to_markdown_table(ddf)

x|x|x|x|x|x|x|x|y|y|y|y|y|y|y|y
count|mean|std|min|25%|50%|75%|max|count|mean|std|min|25%|50%|75%|max
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---
11.0|9.0|3.3166247903554|4.0|6.5|9.0|11.5|14.0|11.0|7.500909090909093|2.031568135925815|4.26|6.3149999999999995|7.58|8.57|10.84
11.0|9.0|3.3166247903554|4.0|6.5|9.0|11.5|14.0|11.0|7.500909090909091|2.0316567355016177|3.1|6.695|8.14|8.95|9.26
11.0|9.0|3.3166247903554|4.0|6.5|9.0|11.5|14.0|11.0|7.500000000000001|2.030423601123667|5.39|6.25|7.11|7.98|12.74
11.0|9.0|3.3166247903554|8.0|8.0|8.0|8.0|19.0|11.0|7.50090909090909|2.0305785113876023|5.25|6.17|7.04|8.190000000000001|12.5


In [40]:
!pip install pytablewriter

Collecting pytablewriter
  Downloading pytablewriter-0.18.0-py2.py3-none-any.whl (40kB)
[K    100% |████████████████████████████████| 40kB 722kB/s 
Collecting DataProperty>=0.18.1 (from pytablewriter)
  Downloading DataProperty-0.20.0-py2.py3-none-any.whl
Collecting toml>=0.9.2 (from pytablewriter)
  Downloading toml-0.9.2.tar.gz
Collecting typepy>=0.0.6 (from pytablewriter)
  Downloading typepy-0.0.8-py2.py3-none-any.whl
Collecting pathvalidate>=0.15.0 (from pytablewriter)
  Downloading pathvalidate-0.15.0-py2.py3-none-any.whl
Collecting XlsxWriter>=0.9.6 (from pytablewriter)
  Downloading XlsxWriter-0.9.6-py2.py3-none-any.whl (137kB)
[K    100% |████████████████████████████████| 143kB 1.9MB/s 
[?25hCollecting dominate>=2.3.1 (from pytablewriter)
  Downloading dominate-2.3.1.tar.gz
Collecting logbook>=1.0.0 (from pytablewriter)
  Downloading Logbook-1.0.0.tar.gz (178kB)
[K    100% |████████████████████████████████| 184kB 1.9MB/s 
[?25hCollecting mbstrdecoder (from pytablewriter)


In [3]:
import tabulate



In [4]:
(df.groupby('dataset')).cov()

Unnamed: 0_level_0,Unnamed: 1_level_0,x,y
dataset,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
I,x,11.0,5.501
I,y,5.501,4.127269
II,x,11.0,5.5
II,y,5.5,4.127629
III,x,11.0,5.497
III,y,5.497,4.12262
IV,x,11.0,5.499
IV,y,5.499,4.123249


In [6]:
tabulate.tabulate(ddf)

'---  --  -  -------  -  ---  -  ----  --  --  -------  -------  ----  -----  ----  ----  -----\nI    11  9  3.31662  4  6.5  9  11.5  14  11  7.50091  2.03157  4.26  6.315  7.58  8.57  10.84\nII   11  9  3.31662  4  6.5  9  11.5  14  11  7.50091  2.03166  3.1   6.695  8.14  8.95   9.26\nIII  11  9  3.31662  4  6.5  9  11.5  14  11  7.5      2.03042  5.39  6.25   7.11  7.98  12.74\nIV   11  9  3.31662  8  8    8   8    19  11  7.50091  2.03058  5.25  6.17   7.04  8.19  12.5\n---  --  -  -------  -  ---  -  ----  --  --  -------  -------  ----  -----  ----  ----  -----'