Jin Jeon

HCDE 530 Computational Concepts

## Project 1b: GoogleSheets API and Data Analysis

I use a survey data collected from one of my previous projects in user research. In this project, I work with GoogleSheets API through the Google Clouds. The process requires establishing keys and credentials so the. Below, I write the quick process of setting it up and walk through reading in the sheets and analyze data from the survey responses collected from Google Form. 

### Why I chose this project: 

As UX researchers, survey studies are essential for understanding the users. Because it can be quickly developed and sent out to (usually) receive a good amount of sample in a short period of time, I use surveys to conduct initial research to understand the general problem space and behaviors. 

One of the free and efficient tools is the Google Forms. If there was a Python code to directly work and manipulate data collected from the Forms on the fly as the data streams in, it would cut out the additional step of having to download each file. 

The data analysis of data from Google Form is particular of interest because while it generates pie graphs and bar graphs to summarize the survey results, the results are often too rudimentary or basic. Using Python, we can easily handle large sets of data and breakdown the data by each demographic.


## Setting up Google Sheets API

In [1]:
from __future__ import print_function
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials

#### Set scope and spreadsheet ID 

**Auth scopes** express the permissions that your code requests to authorize for the app
For example, scope code of:
- **.readonly** allows to read all resources and the meta data but with no writing operations
- **.label** lets you create, read, update, and delete labels. 

**Spreadsheet ID** is the parameter that gets used to tell which spreadsheet to access. The ID is called from a part of the full URL.

In [2]:
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']
SPREADSHEET_ID = '11Den6g5nuR4B2CCUML1KrA0bEZXRpPZ7t83Ieyi7NJ4'

# Specify which sheet or row/column of data to call in
# refer to https://developers.google.com/sheets/api/guides/concepts#a1_notation for detail
RANGE_NAME = 'health_data'

creds = Credentials.from_authorized_user_file('token.json', SCOPES)
service = build('sheets', 'v4', credentials=creds)

# Call the Sheets API to read in the data
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId = SPREADSHEET_ID,
                            range = RANGE_NAME).execute()
values = result.get('values', [])

We now have the sheet successfully loaded without having to open the Google Drive and search for the sheet. We can use the data on the fly. Below, we will just confirm the data type that the data was read in. 

In [3]:
print(type(result))
print(type(values))

<class 'dict'>
<class 'list'>


In [4]:
# let's quickly import libaries 
import os
import pandas as pd
from collections import Counter
import re
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from nltk.corpus import stopwords
import warnings
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

Now let's convert it into a pandas dataframe so we can easily manipuate the data.

In [5]:
data = pd.DataFrame(values[1:], columns=values[0])

# let's confirm 
print(type(data))

data.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Timestamp,What age range are you?,What is your gender?,What actions do you take regarding your health?,How would you rate your health?,Have you ever tracked your health and/or fitness?,Why do you not track your health and/or fitness?,How do you generally like to keep track of activities?,Which of the following did you keep track of? (Select all that apply.),What did you use to record your health and/or fitness? (Select all that apply.),...,Which of the following do you keep track of? (Select all that apply.),What do you use to record your health and/or fitness? (Select all that apply.),"If you use any app or device, could you tell us which one(s)?",Why do you track your health and/or fitness?,Is there anything you like about your current health and/or fitness tracking method?,"In the last 30 days, how often have you tracked your health and/or fitness?",Who views your health and/or fitness information?,How is your health and/or fitness information being used?,"What, if anything, has been helpful about the information you tracked?",Is there anything that could be better about your current health and/or fitness tracking method?
0,2020/07/18 9:59:33 AM EST,Under 18,Man,Exercise;Learn more about your health (e.g. fr...,3,Yes and I am currently still tracking,,,,,...,"Exercise (e.g. Steps taken, Distance, Calories...",Mobile App,Google fit and Pixels,It's interesting to look back at the data I ha...,My mood tracking method is very useful for dis...,Everyday,Myself,It's only used by me. It simply interests me.,I have concluded that I am prone to mood swing...,It could be more extensive.
1,2020/07/18 10:15:07 AM EST,25 - 34,Man,Exercise,4,"Yes, I have tracked before but not in the last...",,,"Cardiovascular (e.g. Heart rate, Blood pressur...",Wearable;Mobile App,...,,,,,,,,,,
2,2020/07/18 10:32:24 AM EST,18 - 24,Man,None of the above,2,No,"I eat very little junk food, and am very thin....",To do lists and notes,,,...,,,,,,,,,,
3,2020/07/18 10:40:09 AM EST,18 - 24,Man,Exercise,4,No,Too much effort,I don't track anything in particular,,,...,,,,,,,,,,
4,2020/07/18 10:57:18 AM EST,35 - 44,Woman,Take medication and/or health supplements;Exer...,4,Yes and I am currently still tracking,,,,,...,"Sleep (e.g. Sleep time, Sleep quality);Exercis...",Wearable;Mobile App,"Fitbit, Sleep Cycle, Flo, Strava",I want to know more about myself and it's impo...,"Data visualization, articles, self challenge, ...",3-4 times / week,Myself,Self assessment,It seems like the app found my period cycle.,No. I think it's enough for my current capacity.


### Health rating by gender
Participants were asked, `How would you rate your health? (5 being healthy, 1 being not healthy).`

Let's breakdown the data to see how self-perception of health wellness varies by gender and different age groups. In the code below, I first quary females and males from the data.

In [6]:
females = data.loc[data['What is your gender?'] == 'Woman']
males = data.loc[data['What is your gender?'] == 'Man']

# columns[4] is the column for health rating
mean_males = np.mean(males[males.columns[4]])
mean_females = np.mean(females[females.columns[4]])

print("Mean of males' self-health wellness: " + str(mean_males))
print("Mean of females' self-health wellness: " + str(mean_females))

Mean of males' self-health wellness: 1.1414481144448116e+28
Mean of females' self-health wellness: 1.1429345932482697e+36
