# Python for Nonprofits Part 1: Sharing Database Data Via Google Sheets

By Kenneth Burchfiel

Released under the MIT license

(Note: This project was based on my [Google Sheets Database Connections](https://github.com/kburchfiel/google_sheets_database_connections) Python project.)

I also created a video walkthrough of this project that shows how to complete the Google Cloud prerequisites described below. You can find that video [on my YouTube channel.](https://www.youtube.com/watch?v=9vW_c_1ngxQ)

# Introduction

This project demonstrates how to use Python to read data from a database, then export that data to a Google Sheets document. This approach can be a great way to make the contents of a database accessible to individuals who don't have a background in SQL.

If you need to update this Google Sheets document with the latest copy of your data on a regular basis, you can export this script to a Python file, then have your operating system run that task each day, hour, etc. using a tool like cron or Task Scheduler. (If you're on Windows, you can use the run_sharing_database_data_py.bat as a starting point for a .bat file that you could feed into Task Scheduler to automatically run this script; you'd just need to replace the file path shown in the first line with your own path.)

This project will import an entire table from a local SQLite database, then export an unmodified copy of that table to Google Sheets. The 'Spreadsheet Ops' section of Python for Nonprofits demonstrates how Python's pandas library can be used to reformat and analyze data.

# Prerequisites:

Before you can apply this code to your own projects (or get it to run locally on your own computer), you'll need to perform some setup tasks. Many of these tasks are based on the ['For Bots: Using Service Account'](https://docs.gspread.org/en/latest/oauth2.html#service-account) section of gspread's documentation.  

## Step 1:
Open a Google Cloud Platform project. I used the Google Cloud Console to accomplish this step. For instructions, go to https://cloud.google.com/resource-manager/docs/creating-managing-projects#console. 

NOTE: You may incur expenses when using the Google Cloud platform.

## Step 2:
Enable the Google Sheets API for your project. To do so, enter 'Sheets API' within the search box near the top of the Google Cloud Platform window. Click on the 'Google Sheets API' result and then select the blue 'Enable' button. 

## Step 3:
Create a Google service account. You can do so by following the steps shown in Google's [Create service accounts](https://cloud.google.com/iam/docs/service-accounts-create#iam-service-accounts-create-console) documentation page. (Although this page instructs you to "enable the IAM API," I didn't need to do so in order for the following steps to work, but it's possible that this API had been enabled beforehand for my Cloud Console project.)

## Step 4:
Create a key in JSON format for your new service account, then download it to your computer (as a .json file) and store it in a safe location. See https://cloud.google.com/iam/docs/creating-managing-service-account-keys 

## Step 5:
Grant this service account Editor access to the Google Sheet to which you will need to connect. You can grant it access by clicking the 'Share' button within the presentation and then entering the service account's email address within the box that appears. This email address can be found within the 'Service account details' page of your service account within the Google Cloud platform.

## Step 6: 
To get this notebook to run on your own computer, update the 'service_key_path' variable with the path to your own service account key, then update wb_id and ws_name with your own Google Sheets workbook.

# Code:

In [1]:
print("Starting program.")
import time
start_time = time.time() # Allows the program's runtime to be measured
import pandas as pd
import sqlalchemy
import gspread
from gspread_dataframe import set_with_dataframe

Starting program.


## Connecting to our database:

This local SQLite database was created using the database_generator.ipnyb code found in supplemental/db_generator. The steps for connecting to an online database are quite similar; for guidance on this process, visit the [app_functions_and_variables.py](https://github.com/kburchfiel/dash_school_dashboard/blob/main/dsd/app_functions_and_variables.py) file within my [Dash School Dashboard](https://github.com/kburchfiel/dash_school_dashboard) project.

In [2]:
pfn_db_engine = sqlalchemy.create_engine(
'sqlite:///'+'../data/network_database.db')
# Based on:
#  https://docs.sqlalchemy.org/en/13/dialects/sqlite.html#connect-strings

pfn_db_engine

Engine(sqlite:///../data/network_database.db)

The following line shows the tables present in this database:

(Note that you can pass the SQLAlchemy engine directly to the `con` argument of `read_sql`. This is made more explicit within the documentation for [DataFrame.to_sql](https://pandas.pydata.org/pandas-docs/stable//reference/api/pandas.DataFrame.to_sql.html), which defines `con` as "sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection." 

However, this functionality is also noted within the [read_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html) documentation, which states that "SQLAlchemy connectable" is one of the options for `con`. A warning message within the [source code for pandasSQL_builder()] explains that "pandas only supports SQLAlchemy connectable (engine/connection))," thus confirming that SQLAlchemy engines are one of the 'connectables' that can be entered within the `con` argument. 

I belabor this point simply because this approach saves you from having to enter a line like `connection = engine.connect()` [source](https://docs.sqlalchemy.org/en/20/core/connections.html) before calling `read_sql`, and it's always nice to reduce the amount of code needed to accomplish a task!

In [3]:
pd.read_sql("Select * from sqlite_schema", con = pfn_db_engine)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,curr_enrollment,curr_enrollment,2,"CREATE TABLE curr_enrollment (\n\t""Student_ID""..."
1,table,test_results,test_results,5,"CREATE TABLE test_results (\n\t""Student_ID"" BI..."
2,table,grad_outcomes,grad_outcomes,7,"CREATE TABLE grad_outcomes (\n\t""Student_ID"" B..."


In [4]:
df_curr_enrollment = pd.read_sql("Select * from curr_enrollment", 
con = pfn_db_engine)
df_curr_enrollment

Unnamed: 0,Student_ID,First_Name,Last_Name,Full_School_Name,School,Grade,Gender,Race,Ethnicity,Street,City,State,Zip,Lat,Lon,Address,Students,Grade_for_Sorting
0,42026,Sarah,Acevedo,Chestnut Academy,CA,1,Female,Asian,Hispanic,5100 Cleveland Street,Virginia Beach,VA,23462,36.843364,-76.159497,"5100 Cleveland Street, Virginia Beach, VA 23462",1,1
1,43491,Cynthia,Allen,Chestnut Academy,CA,1,Female,African American,Non-Hispanic,8501 Silverbrook Road,Lorton,VA,22079,38.717276,-77.238618,"8501 Silverbrook Road, Lorton, VA 22079",1,1
2,41637,Michael,Allen,Chestnut Academy,CA,1,Male,African American,Non-Hispanic,5146 Snead Rd,Richmond,VA,23224,37.476913,-77.489299,"5146 Snead Rd, Richmond, VA 23224",1,1
3,40365,James,Anderson,Chestnut Academy,CA,1,Male,African American,Non-Hispanic,1420 Great Bridge Blvd,Chesapeake,VA,23320,36.764300,-76.283100,"1420 Great Bridge Blvd, Chesapeake, VA 23320",1,1
4,41516,Brian,Andrade,Chestnut Academy,CA,1,Male,African American,Non-Hispanic,3051 Old Bridge Road,Woodbridge,VA,22192,38.682400,-77.305100,"3051 Old Bridge Road, Woodbridge, VA 22192",1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3995,41060,Jeremy,Webb,Sycamore Academy,SA,K,Male,White,Non-Hispanic,201 Poythress Street,Hopewell,VA,23860,37.302999,-77.288852,"201 Poythress Street, Hopewell, VA 23860",1,0
3996,43942,Christy,Wheeler,Sycamore Academy,SA,K,Female,African American,Non-Hispanic,20325 Claiborne Parkway,Ashburn,VA,20147,39.055300,-77.502600,"20325 Claiborne Parkway, Ashburn, VA 20147",1,0
3997,40479,Frank,Williams,Sycamore Academy,SA,K,Male,White,Non-Hispanic,119 W Main St,Boyce,VA,22620,39.095300,-78.065300,"119 W Main St, Boyce, VA 22620",1,0
3998,41160,Christian,Woods,Sycamore Academy,SA,K,Male,African American,Non-Hispanic,42149 Greenstone Dr.,Aldie,VA,20105,38.925421,-77.546673,"42149 Greenstone Dr., Aldie, VA 20105",1,0


## Importing our Google Cloud Project service key:

**Note: This project stores the service key in the same folder as this notebook so that you can see what it looks like; however, for real-world applications, I highly recommend storing the key in an alternate location in order to keep it more secure.**

In [5]:
# The following code determines which key to select. If 'use_author_key'
# is True, the program will use my personal Google service account
# (which allows me to more easily debug and update this code);
# otherwise, it uses the sample service account key (which is no longer active)
# that's stored within this project's directory. 

use_author_key = False
if use_author_key == True:
    with open('kjb3_service_account_key_path.txt') as file:
        service_key_path = file.read()
else:
    service_key_path = 'db-to-gsheets-demo-0a2a95a56f00.json' # Make
# sure to replace this path with your own service key's path; otherwise,
# this code won't work. The path can be either full or relative.

# Note: The service account to which this key belonged 
# *and* its corresponding project have been deleted,
# so even though I've left the original key file in this project folder,
# that file will no longer allow you access to the Google Sheets workbook
# shown below.

gc = gspread.service_account(service_key_path) 
    # Based on https://docs.gspread.org/en/latest/oauth2.html . The 
    # 'For Bots: Using Service Account' section of this page offers a helpful
    # guide for creating and utilizing Google Cloud Console service accounts.
# This is the path to my downloaded Google Service Account key, which is 
# necessary for connecting to Google Sheets documents from your computer.

## Using gspread and gspread-dataframe to connect to and update a Google Sheets worksheet

First, we'll connect to our workbook:

In [6]:
workbook_id = '1LcB3bqPJ-CPUNPeR-Ohdd5bI6jjV6enh5Gd338Dqqcs' # As with your service
# key path, make sure to replace this workbook ID with your own workbook's ID.

# This ID was taken from the Google Sheets workbook's full URL:
# https://docs.google.com/spreadsheets/d/1LcB3bqPJ-CPUNPeR-Ohdd5bI6jjV6enh5Gd338Dqqcs/edit#gid=0


workbook = gc.open_by_key(workbook_id)
# Source: https://docs.gspread.org/en/latest/user-guide.html#opening-a-spreadsheet
# The gspread documentation refers to the item returned by this code as a spreadsheet,
# but I think 'workbook' is a better name, since a single workbook 
# can contain multiple spreadsheets or worksheets. Feel free to use whichever
# term makes the most sense to you, though.

Next, we'll navigate to a specific worksheet within this workbook, clear it, and import our DataFrame's data into it:

In [7]:
worksheet_name = 'Current Enrollment' # The name of the worksheet itself.  
# You'll want to either rename this workbook
# to your own worksheet's name or create a worksheet named 'Current Enrollment'
# within your workbook.


worksheet = workbook.worksheet(worksheet_name)
# Source: https://docs.gspread.org/en/latest/user-guide.html#selecting-a-worksheet

worksheet.clear() 
# Source: https://docs.gspread.org/en/latest/user-guide.html#clear-a-worksheet
# Clearing the spreadsheet helps ensure that no cells from the 
# older copy will remain after a new copy gets uploaded. (This would occur
# if the newer copy was smaller in size than the older one.)



set_with_dataframe(worksheet, df_curr_enrollment) 
# Source: https://pypi.org/project/gspread-dataframe/
# This code uploads df_curr_enrollment
# to the worksheet specified by the 'worksheet' variable. 
# If this code doesn't work for you, make
# sure that you have completed all of the prerequisites listed earlier 
# in this notebook.



In [8]:
end_time = time.time()
run_time = end_time - start_time
print(f"Finished running script in {round(run_time, 3)} seconds.")

Finished running script in 4.94 seconds.
