# Programmatically Reading, Writing, Editing, and Converting Data Files
#### 11/7/2017 | Hunter Heaivilin | Sprint 1
#### Description
Learning how to programmatically read, write, edit, and convert data files so that I can convert back and forth between data formats.

## Skill Backlog User Story
As a modeler, I need to programmatically read, write, edit, and convert data files so that I can convert back and forth between data formats and make the data interact.

## Developing Project Proposals

#### Data Formats I Work With
- Shapefiles
- GRID files
- Python Scripts
- Databases on websites without APIs (e.g., [The Agroforestree Database](http://www.worldagroforestry.org/output/agroforestree-database), [Hawaii Agriculture & Food Products Database](https://hawaiiagrproducts.hawaii.gov/s/), [FAO's EcoCrop Database](http://ecocrop.fao.org/ecocrop/srv/en/home))
- Data buried in websites without APIs
- Data with APIs (e.g., [USDA Soil Data Access API](https://www.programmableweb.com/api/usda-soil-data-access), [USDA National Agricultural Statistics Service](https://www.nass.usda.gov/developer/index.php), [etc](https://catalog.data.gov/organization/4ae51f6c-467a-4f9d-b40a-2c52e83c326a?groups=ecosystems0617&harvest_source_title=USDA+JSON))
- Data buried in pdfs (e.g., [Traditional Tree Profiles](http://agroforestry.org/free-publications/traditional-tree-profiles))
- CSV files (e.g., outputs of [Farm Link Hawaii](http://farmlinkhawaii.com/) sales and users)


#### Potential Projects
1. Convert geospatial data to other formats [GIS with Python, Shapely, and Fiona](https://macwright.org/2012/10/31/gis-with-python-shapely-fiona.html) 
2. Look at the different ways to interact with geospatial files in a gui vs. r vs python.
3. Run spatial analysis to cross-tabulate species suitability areas for multiple crops by island through different means (e.g., ArcGIS vs Python vs R); and create a streamlined process to do similar analysis in the future (e.g, ArcGIS ModelBuilder or script)
4. Geocode [2017 Dedicated Agricultural Parcels list](https://www.realpropertyhonolulu.com/media/1465/ag.pdf) from site address (or TMK if can find good data) to csv with lat long (via geocodio or similar) and map with Python using Shaply and Fiona.
5. Perform similar but using [Organic Integrity Database API](https://organic.ams.usda.gov/integrity/Developer/APIHelp.aspx) to map certified organic farms in the state.
6. Convert some ~~shapefile~~ GIS file formats (potentially [local rainfall](http://rainfall.geography.hawaii.edu/downloads.html), which is in GRID format not a shapefile) into csv with lat long points using Python or R.

## Selected Project
Run spatial analysis to cross-tabulate species suitability areas for multiple crops by island through different means (e.g., ArcGIS vs Python vs R); and create a streamlined process to do similar analysis in the future (e.g, ArcGIS ModelBuilder or script).
- Ben suggests finding another kind of data to incorporate or do conversion into additional new data types
- Need to spit out a table of areas by island

## Key Questions
- Need to understand the basics of the structure of ~~a shapefile~~ [GIS file formats](https://en.wikipedia.org/wiki/GIS_file_formats), limitations, loading libraries
- Look at conversion tools for various formats (e.g., raster/polyon, JSON, csv, etc) and explore how they work and what, if anything, is lost. 
- Document starting file structure and how it changes through the process.
- Understand how to convert shapefiles into csv that can be read by other tools
- Understand how to connect to APIs and convert data into mapped outputs
- Identify multiple proceses to perform and automate spatial analysis tasks outside of GIS gui 


## Key Findings
- Libraries
    - [Intro to Geospatial Data using Python](https://github.com/SocialDataSci/Geospatial_Data_with_Python/blob/master/Intro%20to%20Geospatial%20Data%20with%20Python.ipynb)
    - [Python Geocoder](https://github.com/DenisCarriere/geocoder)
    - [Essential Python Geospatial Libraries](https://github.com/SpatialPython/spatial_python/blob/master/packages.md)

### Shapefile Basics

#### [ESRI Shapefile Technical Description](https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf)

>**The shapefile (.shp) is a spatial data format that "stores nontopological geometry and attribute information for the spatial features in a data set.** The geometry for a feature is stored as a shape comprising a set of vector coordinates. Because shapefiles do not have the processing overhead of a topological data structure, they have advantages over other data sources such as faster drawing speed and edit ability. **Shapefiles handle single features that overlap or that are noncontiguous.** They also typically require less disk space and are easier to read and write. **Shapefiles can support point, line, and area features.** Area features are represented as closed loop, double-digitized polygons. Attributes are held in a dBASE® format file."


#### An ESRI shapefile consists of: #####
- A **main file** (suffix .shp). The main file is a direct access, variable-record-length file in which each record describes a shape with a list of its vertices.  
- An **index file** (suffix .shx.). In the index file, each record contains the offset of the corresponding main file record from the beginning of the main file. 
- A **dBASE table** (suffix .dbf.). The dBASE table contains feature attributes with one record per feature. The one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the >main file. 
- **The main file, the index file, and the dBASE file have the same prefix.** For example, counties.shp, counties.shx, and counties.dbf. The prefix must start with an alphanumeric character (a–Z, 0–9), followed by zero or up to seven characters (a–Z, 0–9, _, -). 




## Gameplan
Here is my overall approach 
1. Step 1: Convert shapefile to various formats: GeoJSON, csv
2. Step 2: Perform geoprocesses in ArcGIS, Python, and R
3. Step 3: 
4. Step 4: 

**Possible deliverables**
- description of .shp files
- conversion of shp files to geojson, and illustration / comparison of two file formats
- conversion of geojson to xml
- conversion of .shp data tables to xml/json/csv

---

## Day 1 Work




- description of .shp files
    - See above
- conversion of shp files to geojson, and 
- illustration / comparison of two file formats
    - GeoJSON is a a JSON file with geospatial components
    - JSON (JavaScript Object Notation)
        - *date values are supported by Shapefiles, but not in JSON*
    - Shapefiles are a package of files, the primary of which are stored in binary
    - JSON files on the other hand are human-readable (ASCII)
- conversion of geojson to xml
- conversion of .shp data tables to xml/json/csv




### What I Wanted to Happen

Convert shapefile into GeoJSON

This is what I tried to do on Day 1. Here's some Code

Read up on GeoJSON file format [here](https://macwright.org/2015/03/23/geojson-second-bite.html)

### What Actually Happened




**Setting up a virtual environment in conda**

Followed ["Create virtual environments for python with conda"](https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/) to create a virtual env in terminal. In an attempt to make the ["Reading Shapefiles from a URL into GeoPandas"](https://github.com/agaidus/census_data_extraction/blob/master/Reading_Zipped_Shapefiles.ipynb).

## Day 2 Work

This is what I tried to do on Day 2. Here's some more code

## Peer Feedback on Day 3

After talking it over with a peer, I received the following feedback and decided to make these changes

## Here are some overall notes on the skills I learned
And perhaps some stream of consciousness notes about what I did, and other questions I might have

["Mapshaper.org"](http://mapshaper.org/) was able to quickly convert my shapefile to a json file. 

In [24]:
filehome + "/blockgroups.shp"

'hh_01_files/shapefilesblockgroups.shp'