<h1>Resource Watch GitHub Repo Editor Tool</h1>

In [1]:
# Import necessary modules
# ! pip install PyGithub
from github import Github
import requests 
import os
import glob
import dotenv
from pprint import pprint

<font color=red>**ACTION REQUIRED** <br>
<font color=blue>**Enter the path to your .env file below. There should be a GITHUB_TOKEN stored in the .env file that allows you to interact with the GitHub API. If you don't have one yet, go to this [this website](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token) to create one**</font>

In [2]:
# insert the location of your .env file here:
dotenv.load_dotenv('C:\\Users\\yujing.wu\\OneDrive - World Resources Institute\\Documents\\Github\\cred\\.env')

# API token needed to make changes
API_TOKEN = os.getenv('GITHUB_TOKEN')
if API_TOKEN:
    print('Your .env successfully loaded!')
else:
    print('Please check the path to your .env file and make sure you have a key called RW_API_KEY in your .env file.')

Your .env successfully loaded!


<font color=red>**ACTION REQUIRED** <br>
<font color=blue>**Enter the GitHub repo you want to edit** <br>
Run the next cell to show the readme of the repo you have selected.</font>

In [5]:
# create a GitHub instance
g = Github(API_TOKEN)
# get the GitHub repo you want to make change to 
repo = g.get_repo("resource-watch/data-pre-processing")
# print the decoded readme of the selected repo 
print(repo.get_readme().decoded_content.decode('utf-8'))

# Resource Watch Dataset Pre-processing Github
#### Purpose
This Github repository was created to document the pre-processing done to any dataset displayed on [Resource Watch](https://resourcewatch.org/).

#### File Structure
The processing done to each dataset should be stored in a single file, named with the WRI ID and public title used on Resource Watch. This folder should **always** include a README.md file that describes the processing that was done. A template for this file can be found in this repository with the name README_template.md. If a script (preferably Python) was used to process the dataset, that code should also be included as a separate file. The general structure can be summarized as follows:

```
Repository
|
|- Dataset 1 folder = {wri_id}_{public_title}
| |-{wri_id}_{public_title}_processing.py # optional, script used to process the dataset
| |-README.md # file describing the processing
| +-...
|
|-Dataset 2 folder
| +-...
|
+-...
```

#### Contents of README.md
I

<font color=red>**ACTION REQUIRED** <br>
<font color=blue>**Enter a name for the new branch you are creating for the edits** <br>
Run the next cell to create a new branch based on the master branch.</font>

In [21]:
# name of the branch 
branch_name = 'test_data_team_tool'
# create a branch based on the master branch 
repo.create_git_ref(
        'refs/heads/{branch_name}'.format(branch_name=branch_name),
        repo.get_branch('master').commit.sha
    )

GitRef(ref="refs/heads/test_data_team_tool")

<font color=red>**ACTION REQUIRED** <br>
<font color=blue>**Enter the type of the file you try to edit** <br>
Run the next cell to find all the files of this file type.</font>

In [22]:
# choose the type of files you want to edit 
file_type = "README.md"
# fetch all the content of the selected github repo
contents = repo.get_contents("", branch_name)
# create an empty list to store all the files you can find
files = []
# loop through the directories in the github repo to search for files of this file type 
for content in contents: 
    # if it's a directory
    if content.type == 'dir':
        # go into the directory and put all its content into a list
        file_paths = [file.path for file in repo.get_contents(content.path, branch_name)]
        # loop through the list of content and store any file of our selected file type to the list created
        for file_path in file_paths:
            if file_type in file_path:
                files.append(repo.get_contents(file_path, branch_name))

<font color=blue>Run the next cell to view the content that you may want to change.<br>
We will print the first file of the selected file type to use as an example.</font>

In [28]:
# get the first in the list as an example file 
file = files[0]
# print the path to the file
print('path to file: \n{} \n'.format(file.path))
# content of the file 
ex_content = file.decoded_content.decode('utf-8')
print('content of file: \n{} \n'.format(ex_content))

path to file: 
bio_004a_coral_reef_locations/README.md 

content of file: 
## Coral Reef Locations Dataset Pre-processing
This file describes the data pre-processing that was done to [the Global Distribution of Coral Reefs (2018)](http://data.unep-wcmc.org/datasets/1) for [display on Resource Watch](https://resourcewatch.org/data/explore/1d23838e-40da-4cf3-b61c-56258d3a5c56).

The source provided this dataset as two shapefiles - one of which contains polygon data, and the other contains point data.

Below, we describe the steps used to reformat the shapefile:
1. Read in the polygon shapefile as a geopandas data frame.
2. Change the data type of column 'PROTECT', 'PROTECT_FE', and 'METADATA_I' to integers.
3. Convert the geometries of the data from shapely objects to geojsons.
4. Create a new column from the index of the dataframe to use as a unique id column (cartodb_id) in Carto.

Next, a mask layer was created so that it could be overlayed on top of other datasets to highlight where 

<h1>Define our replacement.</h1>

<font color=red>**ACTION REQUIRED**</font> <br>
<font color=blue>**Enter the text/code you would like to edit (from those printed above) and the replacement**

In [24]:
# string that needs to be replaced
ex_text = 'http://wri-public-data.s3.amazonaws.com'
# replacement string 
replacement_text = 'https://wri-public-data.s3.amazonaws.com'

<font color=red>**ACTION REQUIRED**</font> <br>
<font color=blue>**Run this cell and make sure that the replacement text/code (printed below) looks ok for the example file.**

In [29]:
# the edited content of the file
updated_content = ex_content.replace(ex_text, replacement_text)
print(f'\nUpdated content:\n{updated_content}')


Updated content:
## Coral Reef Locations Dataset Pre-processing
This file describes the data pre-processing that was done to [the Global Distribution of Coral Reefs (2018)](http://data.unep-wcmc.org/datasets/1) for [display on Resource Watch](https://resourcewatch.org/data/explore/1d23838e-40da-4cf3-b61c-56258d3a5c56).

The source provided this dataset as two shapefiles - one of which contains polygon data, and the other contains point data.

Below, we describe the steps used to reformat the shapefile:
1. Read in the polygon shapefile as a geopandas data frame.
2. Change the data type of column 'PROTECT', 'PROTECT_FE', and 'METADATA_I' to integers.
3. Convert the geometries of the data from shapely objects to geojsons.
4. Create a new column from the index of the dataframe to use as a unique id column (cartodb_id) in Carto.

Next, a mask layer was created so that it could be overlayed on top of other datasets to highlight where coral reefs were located. In order to create this, a 10km

<h1>Loop through all layers and edit the selected field.</h1>

<font color=red>**ACTION REQUIRED**</font> <br>
<font color=blue>**Create an informative message for the commits. The cell below loops through the list of files we found and replaces the selected text/code with your replacement.**</font>

In [26]:
# a message that summarizes what you are changing
message = "change http links"
# for each file in the list of files you want to edit
for file in files:
    # first fetch the decoded content of the file
    file_content = file.decoded_content.decode('utf-8')
    # then update the file on the branch created previously
    repo.update_file(file.path, message, file_content.replace(ex_text, replacement_text), file.sha, branch=branch_name)

<font color=red>**ACTION REQUIRED**</font> <br>
<font color=blue>**Include a title and a summary of the changes you've made on the branch for a pull request. The cell below creates a pull request to merge your branch to the master branch.**</font>

In [30]:
# the title of the pull request
title = "update all http links in readme files"
# the body of the pull request 
body = 'this is a test run of the github repo editor tool'
# create a pull request
pr = repo.create_pull(title=title, body=body, head=branch_name, base="master")