# Lesson 5: Python for GIS. (The brief introduction) 

## Sections:

1. [File Requirements](#step_1)
2. [Using Python](#step_2)


- [Suggested Readings](#readings)
- [References](#references)


# <a id='step_1'></a>

## Step 1: File Requirements

As you may have caught on from the previous readings and references, the 'shapefiles' that are used for the background and basemaps on our map canvas (ie. our vector layers) ar not just a single file at all. Each shapefiles requires the collection of three files:

1. A **.shp** file which acts as the main file

2. A **.shx** file which contains indexing data

3. A **.dbf** file which is the database file, containing the geomtery attributes


- See *"Learning Geospatial analysis with Python 2nd Ed"* page 62 for more on file types. 

For the examples in this lesson we will be using files that have been retrieved as **.zip** packages, and unzipped or extraced into a directory with the same name as the files it contains.

**Important:** if you wish to change a file name, you either do it to all the files in that folder, or use an app such as **QGIS Browser** to ensure you don't end up with an orphaned .shp file. That is, when you open a .shp file in QGIS, it is expecting there to be *at least* the .shx and .dbf files with the same name/prefix in the same directory. 

From here-on-in: the term **shapefile** will refer the generic collection of files required, not any particular extension. 


# <a id='step_2'></a>

## Step 2: Using Python

Although QGIS does come bundled with python 2.7, and has a interactive window within the QGIS GUI, we will be sticking with scripting some tools to use upon shapefiles independently of QGIS. This is for a couple of reasons:

- At the time of writing, QGIS 3.x is on its way, but with an undertermined time frame. QGIS 3.0 will be using Python 3.x. see <a href='http://blog.qgis.org/2016/02/10/qgis-3-0-plans/'>QGIS 3 Blog.</a>. Switching between python 2 and python 3 is best left to those that have no other choice. 


- Using the pyShp library we can treat shapefiles like any other database file, and access most of its information in the form of lists. This is good practise for programming in general, and lets us explore the way shapefiles are generically written for all GIS application.


- When we read and write files in pyShp we will be creating object instances, and is a *fairly* painless way to get acquainted with object orientated programming. 


- This is just an introduction, so we better leave something for you to do on your summer holidays. 

First thing we will need is the pyShp library. On the command line/terminal:

```
pip install physhp
```

- This library is wriiten by Joel Lawhead, and has pretty good documentation at https://pypi.python.org/pypi/pyshp.

As with any script that is reading / writing files, the complier needs to know where these files are kept. For the instructions below, make sure you either replace the given path with the approprite directory/folder etc for the script to find the shapefiles, or ```cd``` to the directory in the terminal and run python directly from there.

Firstly, we will start by taking a shapefile form Stats Canada that shows the boundaries of all provinces and territoires. We will create a new file with a only a selcted province 'clipped' from the original file:

In [38]:
# importing the pyShp library

import shapefile as sf

Uncomment the line below and run the cell to read the documentaion regarding classes / methods / inheritence etc. 

In [39]:
# help(sf)

In [40]:
# create a reader instance

in_file = sf.Reader("Data/Canada/Canada")

Each shapefile has a 'shapeType' indicating if it is holding point data, polygon data, etc:

In [41]:
in_file.shapeType

5

Since we are going to clip a section of this vector layer to create another file, we will make sure the file types are the same, and create an instance of a writer object by:

In [42]:
# create a writer instance

out_file = sf.Writer(shapeType = infile.shapeType)

The file then has 'feilds' to explore, yet in this case they are very brief. The 'deletion flag' is to do with the .dbf database, and is a hidden feild. In the "Canada" file the only other two fields are the names of the provinces and terrritories. One field for English, the other for French:

In [43]:
in_file.fields

[('DeletionFlag', 'C', 1, 0), ['NAME', 'C', 40, 0], ['NOM', 'C', 40, 0]]

We will want our writer instance to write the same fields too:

In [70]:
out_file.fields = list(in_file.fields)

The names of the proviences / territories are in the files 'records':

In [44]:
in_file.records()

[['Quebec', b'Qu\xe9bec'],
 ['Nova Scotia', b'Nouvelle-\xc9cosse'],
 ['Saskatchewan', 'Saskatchewan'],
 ['Alberta', 'Alberta'],
 ['Newfoundland and Labrador', 'Terre-Neuve-et-Labrador'],
 ['British Columbia', 'Colombie-Britannique'],
 ['New Brunswick', 'Nouveau-Brunswick'],
 ['Prince Edward Island', b'\xcele-du-Prince-\xc9douard'],
 ['Yukon Territory', 'Territoire du Yukon'],
 ['Manitoba', 'Manitoba'],
 ['Ontario', 'Ontario'],
 ['Nunavut', 'Nunavut'],
 ['Northwest Territories', 'Territories Nord-Ouest']]

From here we can identify the index for a particular province, say 'Alberta'. Note the difference in the syntax between **.records()** and **.record(i)**

In [45]:
in_file.record(3)

['Alberta', 'Alberta']

One of the properties of the object instance is 'bounding box', which is the smallest square/rectangular box that can be drawn around the object. ie If you were to draw a box to contain Canada with only four corners, those courners would be at the points:

In [46]:
in_file.bbox

[-2314694.546259914, 321591.8552395543, 3093024.6057310738, 4811136.8047284745]

Except in this case they don't make much sence to us, without knowing the projection. When QGIS reads these points together with a netadata file, it then knows where Canada fits on the map canvas. These points are important though because you may want to see if features from another file are contained inside this bounding box.

The outlines / polygons of the provinces are known as .shapes, and in this case there is one 'shape' for each province territory:

In [63]:
len(in_file.shapes())

13

Again, we can just select the shape we want, and we will jets access the data for Alberta:

In [69]:
shapes = in_file.shapes()
shapes[3]

<shapefile._Shape at 0x7f2cb23d5550>

# <a id='readings'></a>

## Suggested Readings

# <a id='references'></a>

## References

- Python pyShp library: https://pypi.python.org/pypi/pyshp. Python Software Foundation. 2015.

- Lawhead, J; *"Learning Geospatial Analysis with Python Second Edition"*. Packt Publishing. 2015

- *"ESRI Shapefile Technical Discription"*. Environmental Systems Research Institute, Inc. 1998.

- Lawhead, J; <a href='http://geospatialpython.com/2015/05/clipping-shapefile-in-pure-python.html'>*"Clipping a Shapefile in Pure Python"*</a> . GeospatialPython. 2015