# Dataplay: The Data Handling Handbook and Python Library

> Our one stop shop to learn about data intake, processing, and visualization.
- template: template_article
- sitemap: dataplay
- tab: 00_dataplay
- csp: img-src 'self' https://charleskarpati.com/ data: https://raw.githubusercontent.com/ https://bniajfi.org/ https://static.mybinder.org/ https://mybinder.org/ https://pete88b.github.io/ https://badges.frapsoft.com/ https://img.shields.io/ http://img.shields.io/; connect-src 'self';

<img align="right" src="https://raw.githubusercontent.com/bniajfi/bniajfi/main/bnia_logo_new.png" height="160px" width="auto">

<h2 align="left"><img src="https://raw.githubusercontent.com/sidbelbase/sidbelbase/master/wave.gif" width="30px">Hi! We are <a href="https://bniajfi.org/">BNIA-JFI</a>.</h2>

This package was made to help with data handling

__Included__
- Functions built and used by BNIA for day to day tasks.
- Made to be shared via IPYNB/ Google Colab notebooks with in-built examples using 100% publicly accessible data & links.
- Online [documentation](https://bniajfi.org/dataplay/)  and [PyPi](https://pypi.org/project/dataplay/) libraries created from the notebooks.

[TOC](https://github.com/bniajfi)

[Dataplay](https://bniajfi.org/dataplay/) uses functions found in our [VitalSigns](https://bniajfi.org/VitalSigns/) Module and vice-versa.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bnia/dataplay/main?filepath=%2Fnotebooks%2Findex.ipynb)
[![Binder](https://pete88b.github.io/fastpages/assets/badges/colab.svg)](https://colab.research.google.com/github/bnia/dataplay/blob/main/notebooks/index.ipynb)
[![Binder](https://pete88b.github.io/fastpages/assets/badges/github.svg)](https://github.com/bnia/dataplay/tree/main/notebooks/index.ipynb)
[![Open Source Love svg3](https://badges.frapsoft.com/os/v3/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badges/)

[![NPM License](https://img.shields.io/npm/l/all-contributors.svg?style=flat)](https://github.com/bnia/dataplay/blob/main/LICENSE)
[![Active](http://img.shields.io/badge/Status-Active-green.svg)](https://bnia.github.io) 
[![Python Versions](https://img.shields.io/pypi/pyversions/dataplay.svg)](https://pypi.python.org/pypi/dataplay/)
[![GitHub last commit](https://img.shields.io/github/last-commit/bnia/dataplay.svg?style=flat)]()  

[![GitHub stars](https://img.shields.io/github/stars/bnia/dataplay.svg?style=social&label=Star)](https://github.com/bnia/dataplay) 
[![GitHub watchers](https://img.shields.io/github/watchers/bnia/dataplay.svg?style=social&label=Watch)](https://github.com/bnia/dataplay) 
[![GitHub forks](https://img.shields.io/github/forks/bnia/dataplay.svg?style=social&label=Fork)](https://github.com/bnia/dataplay) 
[![GitHub followers](https://img.shields.io/github/followers/bnia.svg?style=social&label=Follow)](https://github.com/bnia/dataplay) 

[![Tweet](https://img.shields.io/twitter/url/https/github.com/bnia/dataplay.svg?style=social)](https://twitter.com/intent/tweet?text=Check%20out%20this%20%E2%9C%A8%20colab%20by%20@bniajfi%20https://github.com/bnia/dataplay%20%F0%9F%A4%97) 
[![Twitter Follow](https://img.shields.io/twitter/follow/bniajfi.svg?style=social)](https://twitter.com/bniajfi)

<h2 align="left">Create maps, networks graphs, and gifs!</h2>
<img src="https://bniajfi.org/images/mermaid/vitalSignsCorrelations.png" width="500px">
<img src="https://bniajfi.org/images/mermaid/vitalSignsGif.gif" width="500px">

## About this Tutorial: 

### Whats inside?

#### __The Tutorial__

You use can use these docs to learn from or as documentation when using the attached library.

#### __TIPS__
 
- Content covered in previous tutorials will be used in later tutorials. 

- __New code and or  information *should* have explanations and or descriptions__ attached. 

- Concepts or code covered in previous tutorials will be used without being explaining in entirety.

- __If content can not be found in the current tutorial and is not covered in previous tutorials, please let me know.__

- This notebook has been optimized for Google Colabs ran on a Chrome Browser. 

- Statements found in the index page on view expressed, responsibility, errors and ommissions, use at risk, and licensing  extend throughout the tutorial.

#### __Objectives__
 
By the end of this tutorial users should have an understanding of:
- Importing data with pandas and geopandas
- Querying data from Esri
- Retrieveing data programmatically
- This module assumes the data needs no handling prior to intake
- Loading data in a variety of formats
- Visualizing said data

## Usage Instructions

### Install the Package

The code is on <a href="https://pypi.org/project/test-template/">PyPI</a> so you can install the scripts as a python library using the command:

`!pip install dataplay geopandas`

> Important: Contributers should follow the maintanance instructions and will not need to run this step. 
>
> Their modules will be retrieved from the VitalSigns-GDrive repo they have mounted into their Colabs Enviornment. 

In [None]:
#hide 
!pip install VitalSigns geopandas #dataplay

Then...

### Import Modules

1) Import the installed module into your code:
``` 
from VitalSigns.acsDownload import retrieve_acs_data 
```
2) use it
```
retrieve_acs_data(state, county, tract, tableId, year, saveAcs)
```
Now you could do something like merge it to another dataset! 
```
from dataplay.merge import mergeDatasets
mergeDatasets(left_ds=False, right_ds=False, crosswalk_ds=False,  use_crosswalk = True, left_col=False, right_col=False, crosswalk_left_col = False, crosswalk_right_col = False, merge_how=False, interactive=True)
```

### Getting Help

You can get information on the package, modules, and methods by using the help command.

Here we look at the package's modules:

In [None]:
import dataplay
help(dataplay)

Help on package dataplay:

NAME
    dataplay

PACKAGE CONTENTS
    _nbdev
    corr
    geoms
    gifmap
    html
    intaker
    merge

VERSION
    0.0.28

FILE
    /usr/local/lib/python3.7/dist-packages/dataplay/__init__.py




Lets take a look at what functions the geoms module provides:

In [None]:
import dataplay.geoms
help(dataplay.geoms)

  """)
  pd.set_option('display.max_colwidth', -1)


Help on module dataplay.geoms in dataplay:

NAME
    dataplay.geoms - # AUTOGENERATED! DO NOT EDIT! File to edit: notebooks/03_Map_Basics_Intake_and_Operations.ipynb (unless otherwise specified).

FUNCTIONS
    map_points(data, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=11, plot_points=True, cluster_points=False, pt_radius=15, draw_heatmap=False, heat_map_weights_col=None, heat_map_weights_normalize=True, heat_map_radius=15, popup=False)
        Creates a map given a dataframe of points. Can also produce a heatmap overlay
        
        Arg:
            df: dataframe containing points to maps
            lat_col: Column containing latitude (string)
            lon_col: Column containing longitude (string)
            zoom_start: Integer representing the initial zoom of the map
            plot_points: Add points to map (boolean)
            pt_radius: Size of each point
            draw_heatmap: Add heatmap to map (boolean)
            heat_map_weights_col: Column containing he

And here we can look at an individual function and what it expects:

In [None]:
import VitalSigns.acsDownload
help(VitalSigns.acsDownload.retrieve_acs_data)

Help on function retrieve_acs_data in module VitalSigns.acsDownload:

retrieve_acs_data(state, county, tract, tableId, year, save)



## Examples

#### Examples

So heres an example:


Import your modules

In [None]:
%%capture 
import pandas as pd
from VitalSigns.acsDownload import retrieve_acs_data 
from dataplay.geoms import workWithGeometryData
from dataplay.geoms import map_points
from dataplay.intaker import Intake

In [None]:
#hide 
pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', 6)
pd.set_option('display.width', 10)
pd.set_option('max_colwidth', 20)

Read in some data

Define our download parameters.

More information on these parameters can be found in the tutorials!

In [None]:
tract = '*'
county = '510'
state = '24'
tableId = 'B19001'
year = '17'
saveAcs = False

And download the Baltimore City ACS data using the imported VitalSigns library.

In [None]:
df = retrieve_acs_data(state, county, tract, tableId, year, saveAcs)

In [None]:
#hide_input
df.head(1)

Here we can import and display a geospatial dataset with special intake requirements.

Here we pull a list of Baltimore Cities CSA's

In [None]:
help(csa_gdf.plot)

In [None]:
#hide 
csa_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Hhchpov/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
csa_gdf = dataplay.geoms.readInGeometryData(url=csa_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=2248, out_crs=False)
csa_gdf.plot(column='hhchpov15')

Now in this example we will load in a bunch of coorinates

In [None]:
geoloom_gdf_url = "https://services1.arcgis.com/mVFRs7NF4iFitgbY/ArcGIS/rest/services/Geoloom_Crowd/FeatureServer/0/query?where=1%3D1&outFields=*&returnGeometry=true&f=pgeojson"
geoloom_gdf = dataplay.geoms.readInGeometryData(url=geoloom_gdf_url, porg=False, geom='geometry', lat=False, lng=False, revgeocode=False,  save=False, in_crs=4326, out_crs=False)
geoloom_gdf = geoloom_gdf.dropna(subset=['geometry'])
# geoloom_gdf = geoloom_gdf.drop(columns=['POINT_X','POINT_Y'])
geoloom_gdf.head(1)

And here we get the number of **points** **in** each of our corresponding CSAs (**polygons**)

In [None]:
geoloom_w_csas = dataplay.geoms.workWithGeometryData(method='pinp', df=geoloom_gdf, polys=csa_gdf, ptsCoordCol='geometry', polygonsCoordCol='geometry', polyColorCol='hhchpov18', polygonsLabel='CSA2010', pntsClr='red', polysClr='white')

And we plot it with a legend

In [None]:
geoloom_w_csas.plot( column='pointsinpolygon', legend=True)

What were to happen if I wanted to create a interactive click map with the label of each csa (**polygon**) **on** each **point**?

Well we just run the reverse operation!

In [None]:
geoloom_w_csas = workWithGeometryData(method='ponp', df=geoloom_gdf, polys=csa_gdf, ptsCoordCol='geometry', polygonsCoordCol='geometry', polyColorCol='hhchpov18', polygonsLabel='CSA2010', pntsClr='red', polysClr='white')

In [None]:
#hide 
# This is an ugly bit of code we use to filter for points within a range of baltimore for plotting reasons.

geoloom_w_csas['POINT_Y'] = geoloom_w_csas.centroid.y
geoloom_w_csas['POINT_X'] = geoloom_w_csas.centroid.x

# We already know the x and y columns because we just saved them as such.
geoloom_w_csas['POINT_X'] = pd.to_numeric(geoloom_w_csas['POINT_X'], errors='coerce')
geoloom_w_csas['POINT_Y'] = pd.to_numeric(geoloom_w_csas['POINT_Y'], errors='coerce')
# df = df.replace(np.nan, 0, regex=True)

# And filter out for points only in Baltimore City. 
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] > 39.3  ]
geoloom_w_csas = geoloom_w_csas[ geoloom_w_csas['POINT_Y'] < 39.5  ]

And then we can visualize it like:

In [None]:
outp = map_points(geoloom_w_csas, lat_col='POINT_Y', lon_col='POINT_X', zoom_start=12, plot_points=True, cluster_points=False,
               pt_radius=1, draw_heatmap=True, heat_map_weights_col=None, heat_map_weights_normalize=True,
               heat_map_radius=15, popup='CSA2010')

In [None]:
#hide_input
outp

These interactive visualizations can be exported to html using a tool found later in this document. 

Its how I made this page!

If you like what you see, there is more in the package you will just have to explore. 

<h2 align="left">Have Fun!</h2>
<img src="https://bniajfi.org/images/mermaid/vitalSignsCorrelations.png" width="500px">
<img src="https://bniajfi.org/images/mermaid/vitalSignsGif.gif" width="500px">