# Programmatically Reading, Writing, Editing, and Converting Data Files
#### 11/7/2017 | Hunter Heaivilin | Sprint 1
#### Description
Learning how to programmatically read, write, edit, and convert data files so that I can convert back and forth between data formats.

## Skill Backlog User Story
As a modeler, I need to programmatically read, write, edit, and convert data files so that I can convert back and forth between data formats and make the data interact.

## Developing Project Proposals (2017-11-08)

#### Data Formats I Work With
- Shapefiles
- GRID files
- Python Scripts
- CSV files (e.g., outputs of [Farm Link Hawaii](http://farmlinkhawaii.com/) sales and users)

#### Datasets I'd Like to Work With
- Databases on websites without APIs (e.g., [The Agroforestree Database](http://www.worldagroforestry.org/output/agroforestree-database), [Hawaii Agriculture & Food Products Database](https://hawaiiagrproducts.hawaii.gov/s/), [FAO's EcoCrop Database](http://ecocrop.fao.org/ecocrop/srv/en/home))
- Data buried in websites without APIs
- Data with APIs (e.g., [USDA Soil Data Access API](https://www.programmableweb.com/api/usda-soil-data-access), [USDA National Agricultural Statistics Service](https://www.nass.usda.gov/developer/index.php), [etc](https://catalog.data.gov/organization/4ae51f6c-467a-4f9d-b40a-2c52e83c326a?groups=ecosystems0617&harvest_source_title=USDA+JSON))
- Data buried in pdfs (e.g., [Traditional Tree Profiles](http://agroforestry.org/free-publications/traditional-tree-profiles))



#### Potential Projects
1. Convert geospatial data to other formats [GIS with Python, Shapely, and Fiona](https://macwright.org/2012/10/31/gis-with-python-shapely-fiona.html) 
2. Look at the different ways to interact with geospatial files in a gui vs. r vs python.
3. Run spatial analysis to cross-tabulate species suitability areas for multiple crops by island through different means (e.g., ArcGIS vs Python vs R); and create a streamlined process to do similar analysis in the future (e.g, ArcGIS ModelBuilder or script)
4. Geocode [2017 Dedicated Agricultural Parcels list](https://www.realpropertyhonolulu.com/media/1465/ag.pdf) from site address (or TMK if can find good data) to csv with lat long (via geocodio or similar) and map with Python using Shaply and Fiona.
5. Perform similar but using [Organic Integrity Database API](https://organic.ams.usda.gov/integrity/Developer/APIHelp.aspx) to map certified organic farms in the state.
6. Convert some ~~shapefile~~ GIS file formats (potentially [local rainfall](http://rainfall.geography.hawaii.edu/downloads.html), which is in GRID format not a shapefile) into csv with lat long points using Python or R.

## Initially Selected Project (2017-11-09)
**Run spatial analysis to cross-tabulate species suitability areas for multiple crops by island through different means (e.g., ArcGIS vs Python vs R); and create a streamlined process to do similar analysis in the future (e.g, ArcGIS ModelBuilder or script).** 
- Ben suggests finding another kind of data to incorporate or do conversion into additional new data types
- Need to spit out a table of areas by island

### Key Questions
- Need to understand the basics of the structure of ~~a shapefile~~ [GIS file formats](https://en.wikipedia.org/wiki/GIS_file_formats), limitations, loading libraries
- Look at conversion tools for various formats (e.g., raster/vector, JSON, csv, etc) and explore how they work and what, if anything, is lost. 
- Document starting file structure and how it changes through the process.
- Understand how to convert shapefiles into csv that can be read by other tools
- Understand how to connect to APIs and convert data into mapped outputs
- Identify multiple proceses to perform and automate spatial analysis tasks outside of GIS gui 


## Key Findings
- Libraries
    - [Intro to Geospatial Data using Python](https://github.com/SocialDataSci/Geospatial_Data_with_Python/blob/master/Intro%20to%20Geospatial%20Data%20with%20Python.ipynb)
    - [Python Geocoder](https://github.com/DenisCarriere/geocoder)
    - [Essential Python Geospatial Libraries](https://github.com/SpatialPython/spatial_python/blob/master/packages.md)
  
### Helpful Tools
- [Markdown Cheatsheet]()





## Gameplan
Here is my overall approach 
1. Step 1: Convert shapefile to various formats: GeoJSON, csv
2. Step 2: Perform geoprocesses in ArcGIS, Python, and R
3. Step 3: 
4. Step 4: 

**Possible deliverables**
- description of .shp files
- conversion of shp files to geojson, and illustration / comparison of two file formats
- conversion of geojson to xml
- conversion of .shp data tables to xml/json/csv

---

## Day 1 Work (2017-11-09)

### Getting to Know Shapefiles

The shapefile (.shp) is a very common vector data file format, for use in geospatial analysis with Geographic Information Systems (GIS), developed by [Esri](http://www.esri.com/about-esri), which also created [ArcGIS](https://www.arcgis.com/features/index.html).

GIS systems commonly use both vector and raster data formats. Vector data uses points with lat/long coordinates, lines (pairs of points), and areas/polygons (groups of points) to represent discrete features and boundaries. 



The [ESRI Shapefile Technical Description](https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf) offers the following key bits:

- **The shapefile (.shp) is a spatial data format that "stores nontopological geometry and attribute information for the spatial features in a data set.** 
- **Shapefiles handle single features that overlap or that are noncontiguous.** 
- **Shapefiles can support point, line, and area features.** 


#### An ESRI shapefile consists of: #####
- A **main file** (suffix .shp) which is a direct access, variable-record-length file in which each record describes a shape with a list of its vertices.  
- An **index file** (suffix .shx.) within which each record contains the offset of the corresponding main file record from the beginning of the main file. 
- A **dBASE table** (suffix .dbf.) that contains feature attributes with one record per feature. The one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the >main file. 
- **The main file, the index file, and the dBASE file have the same prefix.** For example, counties.shp, counties.shx, and counties.dbf. The prefix must start with an alphanumeric character (a–Z, 0–9), followed by zero or up to seven characters (a–Z, 0–9, _, -). 
- There can also be projection files. (suffix .prj)


I explored the shapefile of the [Statewide Agricultural Land Use Baseline 2015 Study](http://hdoa.hawaii.gov/salub/) ([downloaded here](http://planning.hawaii.gov/gis/download-gis-data-expanded/).


The zipped shapefile (2015AgBaseline.shp.zip) is 11.8 mb, while the unzipped folder (2015AgBaseline.shp) is 19 mb.
Within the unzipped file are 11 files:
- **2015AgBaseline.CPG**	
- **2015AgBaseline.dbf** which is a database file containing attributes that can be associated with the shapefile
- **2015AgBaseline.lyr**		
- **2015AgBaseline.prj** which is the layer projection data	
- **2015AgBaseline.sbn**		
- **2015AgBaseline.sbx**
- **2015AgBaseline.shp** which is the shapefile containing the geospatial features
- **2015AgBaseline.shp.xml** which is an xml version of the entire metadata of the shapefile
- **2015AgBaseline.shx**
- **2015AgBaseline_Protocols.pdf** which explains the reserach protocols
- **aglanduse_2015.pdf** which is the metadata file explaining the shapefile





### What I Wanted to Happen

Convert shapefile into GeoJSON

- description of .shp files
    - See above
- conversion of shp files to geojson
- illustration / comparison of two file formats
    - GeoJSON is a a JSON file with geospatial components
    - JSON (JavaScript Object Notation)
        - *date values are supported by Shapefiles, but not in JSON*
    - Shapefiles are a package of files, the primary of which are stored in binary
    - JSON files on the other hand are human-readable (ASCII)
- conversion of geojson to xml
- conversion of .shp data tables to xml/json/csv



### What Actually Happened


**Setting up a virtual environment in conda**

Followed ["Create virtual environments for python with conda"](https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/) to create a virtual env in terminal. In an attempt to make the ["Reading Shapefiles from a URL into GeoPandas"](https://github.com/agaidus/census_data_extraction/blob/master/Reading_Zipped_Shapefiles.ipynb).

---

## Day 2 Work (2017-11-14)


### Getting to know JSON & GeoJSON


JSON is a "lightweight, text-based,language-independent data interchange format" ([JSON spec](https://tools.ietf.org/html/rfc7159)). JSON structures data following a set formatting rules.

GeoJSON is a "geospatial data interchange format based on JSON" and it defines several types of JSON objects and the manner in which they are combined to represent data about geographic features, their properties, and their spatial extents" ([GeoJSON Spec](https://tools.ietf.org/html/rfc7946)).

 
- GeoJSON uses the World Geodetic System 1984 (WGS 84) geographic coordinate reference system (CRS)
- Latitute and longitude are represented in decimal degrees, as opposed to degrees &deg; , minutes, and seconds
    - Lat and long are [ordered differently](https://macwright.org/lonlat/) in different file types
    - Both GeoJSON and Shapefiles use the order: long lat
- GeoJSON has 7 different "geometry types", each represented as case-sensitive strings:
   - "Point", "MultiPoint", "LineString", "MultiLineString", "Polygon", "MultiPolygon", and "GeometryCollection".



##### More Learning
A far more informed take on GeoJSON file formats can be found [here](https://macwright.org/2015/03/23/geojson-second-bite.html).

[How to convert Shapefiles to GeoJSON maps for use on GitHub (and why you should)](https://ben.balter.com/2013/06/26/how-to-convert-shapefiles-to-geojson-for-use-on-github/) has soome good info on the whys of GeoJSON.



I'd like to learn more about:
- [GeoJSON.io](http://geojson.io/), which is a 
- [GDAL (Geospatial Data Abstraction Library)](http://www.gdal.org/), which seems like plays a significant role in geospatial analyses with python. 

### Attempting to Convert a Shapefile to GeoJSON
I tried a few different online tools to convert a shapefile of the [Statewide Agricultural Land Use Baseline 2015 Study](http://hdoa.hawaii.gov/salub/) ([downloaded here](http://planning.hawaii.gov/gis/download-gis-data-expanded/) to GeoJSON.


#### [Mapshaper](http://mapshaper.org/) 
- Mapshaper quickly uploaded my data
- Upon upload it showed the polygons of the entire dataset and could be zoomed into and around
- There are a few export options
- I exported as GeoJSON file, 2015AgBaseline.json and 41.3 mb in size


#### [MyGeodata Converter](https://mygeodata.cloud/converter/)
- MyGeodata Converter uploaded my data fairly quickly. 
- Handily it has a viewer with a bounding box of the spatial extent the file overlaid a basemap
- Unhandily it has a paywall and would only export a sample of the file
- The file downloaded as mygeodata.zio and unzipped to 2015AgBaseline.geojson, but due to the trimming was only 214 kb

#### [ogr2ogr web client](https://ogre.adc4gis.com/)
- User must check the 'force download' button or the entire JSON output will load in the broswer window.
- File downloaded as convert.json and 38.5 mb in size


--- 


  - GeoJSON is a a JSON file with geospatial components
    - JSON (JavaScript Object Notation)
        - *date values are supported by Shapefiles, but not in JSON*
        
### Summary
- Shapefiles are a package of files, the primary of which (.shp, .shx) are stored in binary
- JSON files on the other hand are human-readable (ASCII)



Went through [notes](https://docs.google.com/document/d/1cfmbCcUbqQZJHqEPifhVT5hC07r0LHNTwIlckaj5gRk/edit#heading=h.qyv3vsnf9q3z) from a 3 day [intro to python](https://github.com/leouieda/python-hawaii-2017) class i went to at UH in the Spring.




** I think between describing that and the shapefile work, your project is in good shape. I think the documentation can be cleaned up, though to reflect what you're trying to accomplish and the details of the project **

--- 

## Peer Feedback on Day 3 (2017-11-15)

After talking it over with a peer, I received the following feedback and decided to make these changes

## Here are some overall notes on the skills I learned
And perhaps some stream of consciousness notes about what I did, and other questions I might have

- Learned some of the basics of using terminal
- 

### Markdown Refresher

Went through [notes](https://docs.google.com/document/d/1cfmbCcUbqQZJHqEPifhVT5hC07r0LHNTwIlckaj5gRk/edit#heading=h.qyv3vsnf9q3z) from a 3 day [intro to python](https://github.com/leouieda/python-hawaii-2017) class i went to at UH in the Spring.






### Learning the Shell

Started on [Software Carpentry's](swcarpentry.github.io) [unix shell novice](https://swcarpentry.github.io/shell-novice) tutorial, which was exceedingling helpful. 

Some of the commands I can recall learning:
- ls
    - ls -R
- mkdir
- cd
    - cd . 
    - cd ..
- nano
- rm
    - rm -r 
    - rm -r -l


**I did get a strange error  I haven't been able to sort out:**

When I attempted to perform ls --help I get the following:

> ls: illegal option -- -

>usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]

In [1]:
pwd

'/Users/hunterheaivilin/Documents/GitHub/BigDataAnalyst_ProjectDocumentation/Sprint01_Data_Formats_and_Terminology'