gdutils.datamine
==============

``datamine`` is a module in package ``gdutils`` that provides functions for finding, listing, and mining data.

---

__Examples Setup__

The following commands are used for setting up the examples below. 

*Note:* The example input files were pulled and converted from the GeoJSON [link](http://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_land.geojson) provided in the [geopandas IO docs](https://geopandas.org/io.html).

In [None]:
# Install ``gdutils`` package
!pip install git+https://github.com/KeiferC/gdutils.git > /dev/null

In [None]:
import gdutils.datamine as dm # imports the ``datamine`` module

import geopandas as gpd
import pandas as pd

---

Example 1. Get a list of public GitHub repos
---------------------------------------------------


__Example 1.1.__ Get a list of public repos from a GitHub user account

In [None]:
# Ex. 1.1

user_account = 'octocat'
user_repos = dm.list_gh_repos(user_account, 'users') # gets repos
user_repos # renders raw list of repos

In [None]:
# prints list of repos in pretty format using pattern-matching
print('{:20} : {}'.format('repo name', 'repo url'))
print('-------------------------------')

for (repo_name, repo_url) in user_repos:
    print('{:20} : {}'.format(repo_name, repo_url))

__Example 1.2.__ Get a list of public repos from a GitHub organization account

In [None]:
# Ex. 1.2.

org_account = 'mggg-states'
org_repos = dm.list_gh_repos(org_account, 'orgs')

# prints list of repos in pretty format using pattern-matching
print('{:20} : {}'.format('repo name', 'repo url'))
print('-------------------------------')

for repo_name, repo_url in org_repos:
    print('{:20} : {}'.format(repo_name, repo_url))

Example 2. Clone public GitHub repos
---------------------------------------------------------------

__Example 2.1.__ Clone all repositories of a known account

In [None]:
# Ex. 2.1

dm.clone_gh_repos(user_account, 'users')

__Example 2.2.__ Clone specific repositories of a known account

In [None]:
# Ex. 2.2.

dm.clone_gh_repos(org_account, 'orgs', ['AK-shapefiles', 'AZ-shapefiles'])

__Example 2.3.__ Clone specific repos into a given directory

In [None]:
# Ex. 2.3.

dm.clone_gh_repos(org_account, 'orgs', ['CT-shapefiles'], 'outputs/')

__Example 2.4.__ Clone all repos into a given directory

In [None]:
# Ex. 2.4.

dm.clone_gh_repos(user_account, 'users', outpath='outputs/')

Example 4. Get a list of local files of specific types
-----------------------------------------------------------

__Example 4.1.__ Recursively list files of a given type starting from current working directory

In [None]:
# Ex. 4.1.

files_from_cwd = dm.list_files_of_type('.zip')
files_from_cwd

__Example 4.2.__ Recursively list files of a given type starting from a given directory

In [None]:
# Ex. 4.2.

files_from_dir = dm.list_files_of_type('.zip', 'outputs/')
files_from_dir

__Example 4.3.__ Recursively list files of given types starting from a given directory

In [None]:
# Ex. 4.3.

zips_and_mds = dm.list_files_of_type(['.zip', '.md'], 'outputs/')
zips_and_mds

__Example 4.4.__ Recursively list files of a given type from current working directory, including hidden files

In [None]:
# Ex. 4.4.

files_incl_hidden = dm.list_files_of_type('.zip', exclude_hidden=False)
files_incl_hidden

Example 5. Get a list of keys from a nested (categorized) dictionary
-------------------------------------------------------------------------------

In [None]:
# Example nested dictionary
example_dict = {
    'category1' : [ # category
        {'key1_1' : 'value1'}, # key-value pair
        {'key1_2' : 2}
    ],
    'category2' : [
        {'key2_1' : True},
        ['key2_2', 'key2_3', 'key2_4'] # list of keys
    ],
    'category3' : [
        ['key3']
    ]
}

__Example 5.1.__ Get a list of keys from a single category

In [None]:
keys = dm.get_keys_by_category(example_dict, 'category2')
keys

__Example 5.2.__ Get a list of keys from a list of categories

In [None]:
keys = dm.get_keys_by_category(example_dict, ['category1', 'category3'])
keys

Example 6. Remove repos from local filesystem
--------------------------------------------------------

__Example 6.1.__ Remove a specific repository

In [None]:
# Ex. 6.1.

path_to_repo_to_remove = 'outputs/Hello-World'
dm.remove_repos(path_to_repo_to_remove)

__Example 6.2.__ Recursively remove all repos in a directory

In [None]:
# Ex. 6.2.

dm.remove_repos('outputs/')

---

__Examples Cleanup__

The following commands are used to reset and clean up the examples above.

In [None]:
# Remove all cloned repos
dm.remove_repos('.')

In [None]:
# Remove outputs
!rm -r outputs

In [None]:
# Uninstall Package
!echo y | pip uninstall gdutils

In [None]:
# Reset Jupyter Notebook IPython Kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")