# Getting Started with Pandas and GeoPandas

## Introduction 
`Pandas` is a python library for data analysis. It provides high-performance data structures and data analysis tools.

`GeoPandas` extends the datatypes used by `pandas` to allow spatial operations on geometric types.

- `Pandas` have tools for importing and exporting data from different formats: comma-separated value (CSV), text files, Microsoft Excel, SQL databases, and the fast HDF5 format.
- `GeoPandas` reads data from file formats containing both data and geometry, e.g. GeoPackage, GeoJSON, Shapefile.

For more information see [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html) and [GeoPandas](https://geopandas.readthedocs.io/en/latest/getting_started/introduction.html).

These exercises will provide an introduction to many `Pandas` and `GeoPandas` tools through an analysis of data from The Swedish Forest Agency [Skogsstyrelsen](https://www.skogsstyrelsen.se/).

## Import Libraries

Start by importing the libraries we need. That is `pandas` and `geopandas`, in addition to `numpy` and `matplotlib`.

## Download Data Files
The learner is provided by a set of files:
sksNycelBiotoper_shapefiles, 
attribute_table.csv, 
rutor_shapefiles, 
sweden_map_shapefiles,
Grouping_of_biotopes.csv, 
laser_files.

- To Do:
1. For this set of exercises, download attribute_table.csv, sksNycelBiotoper_shapefiles, sweden_map_shapefiles and grouping_of_biotopes.

# Reading Files

## 1. DataFrame:
DataFrame is a 2-dimensional labeled data structure in `Pandas`.

Lets start with **attribute_table.csv**:
- This file contains the attributes of key-biotopes (Nyckelbiotoper) in Sweden.
- Key-biotopes are high conservation value forests mapped by The Swedish Forest Agency though field surveys. A database of approximately 67 000 areas, consisting of polygons. The polygon represents the area the field officer determined had high conservation value.
- For each polygon, data has been gathered in the field on a number of qualities, such as tree-species composition, groundcover vegetation, type of habitat etc. Habitats are grouped into approximately 50 different types, based largely on tree species composition, topography, and proximity to water. See for more information about these habitat groups [here](https://www.skogsstyrelsen.se/miljo-och-klimat/biologisk-mangfald/nyckelbiotoper/biotoptyper/)

- To Do:
1. Load **attribute_table.csv** into a DataFrame.

## Explore The Data

- To Do:
1. Explore and get descriptive statistics of all columns.
    - Make use of `describe` and `plot` functions.
2. How many key-biotopes are in each county 'Län'?
    - Make use of `value_counts` function.
3. Find the most common habitat type in the key-biotopes. Three habitats are registered for each key-biotope in the columns 'Biotop1', 'Biotop2' and 'Biotop3'.
    - Make use of `stack` and `value_counts` functions.
4. Clean the dataset from key-bishops that have no habitats registered.
    - Make use of `dropna` function.

## 2. GeoDataFrame

A GeoDataFrame is a data structure in `GeoPandas`. It is basically a `Pandas` DataFrame that has a column with geometry. 

Lets now study **sksNyckelBiotoper** shapefile:
- This shapeful display the **locations** and attributes of key-biotopes in Sweden, i.e. it contains all the data in **attribute_table.csv** + locations of each key-biotope.

- To Do:
1. Read sksNyckelBiotoper shapefile into a GeoDataFrame and explore it.
2. Clean the dataset from key-biotopes that have no habitats registered.

## Ploting on the map

The most relevant information in shapefile is stored in the geometry. The best way to gain some feel of the data is to plot it.
- To Do:
1. Plot all key-biotopes in the shapefile.
2. On the "map" you got in (1), what are the units drawn on x-axis and y-axis?
- Examine the attribute `crs` (coordinate reference system) of the geodataframe.
3. Visualize the key-biotopes over a map of Sweden. Here are two ways for you to do this:
    1. Read sweden_map_shapefiles into a GeoDataFrame, and plot it with the key-biotopes
    2. Make use of module `contextily` to draw a base-map
In both cases make sure your maps have the same coordinate reference system. Geographic coordinate reference systems are identified by EPSG codes, for example sksNyckelBiotoper_shapefiles has the code epsg: 3006.

# Groupings

We have seen that key-biotopes contain habitats that are grouped into approximately 50 different types. However, for a general analysis, it might be beneficial to consolidate these types into a fewer groups. One example of such grouping is provided in 'grouping_of_biotopes.csv'.

- To Do:
1. Read **'grouping_of_biotopes.csv'** using `Pandas` tools. Get a dictionary for transforming each habitat type to the corresponding habitat group.
2. Transform habitat types in the 3 columns 'Biotop1', 'Biotop2' and 'Biotop3' to their corresponding habitat groups. Add 3 new columns 'BioGrp1', 'BioGrp2' and 'BioGrp3' to hold these groups.
3. Find the most common habitat-group in the key-biotopes. Avoid double counting.
4. (Bonus) Visualize the distribution of the top 5 habitat-group on the map. Where is the distribution of deciduous hardwood forests (Ädellövskogar) concentrated? How does its distribution compare to that of coniferous forests (Barrskogar)?

# Write

In next exercises we will aim to analyse key-biotopes which contain certain habitat groups. Lets write the GeoDataFrame that include habitat groups for each key-biotopes into a shapefile and keep it for later analysis.

- To Do:
1.  Write the GeoDataFrame resulting from the previous section into a shapefile called 'keybiotopes_habitatgroups.shp'

- To Do:
1.  Write the GeoDataFrame resulting from the previous section into a shapefile called 'keybiotopes_habitatgroups.shp'