## Reading Data from Google Sheets

Let us see how we can read actual data from a Google Sheet.
* We have to create resource using `values` on top of `service.spreadsheets`.
* Then we need to send http request (invoke get) by passing both spreadsheet id as well as range of values.

> Here is how I am able to define the range for all rows and all 7 columns in my data set using the sheet name with in the main sheet.

In [2]:
%run 05_overview_of_google_sheets_api.ipynb

In [3]:
SPREADSHEET_ID = '1lgyVuw6nVyRnmKtCPbXF4kYcop5HMJ8H3eeNsArAlVk'

In [4]:
def get_credentials():
    SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)

    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)
            
    return creds

In [5]:
creds = get_credentials()

In [6]:
RANGE_NAME = 'Form Masked!A1:G'

In [8]:
service = build('sheets', 'v4', credentials=creds)

In [9]:
sheet = service.spreadsheets()

In [10]:
sheet_values = sheet.values()

In [11]:
type(sheet_values)

googleapiclient.discovery.Resource

In [12]:
sheet_details = sheet_values.get(spreadsheetId=SPREADSHEET_ID,
                            range=RANGE_NAME).execute()

In [13]:
type(sheet_details)

dict

In [None]:
sheet_details

In [15]:
sheet_details.keys()

dict_keys(['range', 'majorDimension', 'values'])

In [16]:
sheet_details['values'][0]

['Timestamp',
 'ITVersity Id',
 'Email Address',
 'First Name',
 'Last Name',
 'Why you want to learn Python?',
 'Current Status']

In [55]:
sheet_details.get('values')[0]

['Timestamp',
 'ITVersity Id',
 'Email Address',
 'First Name',
 'Last Name',
 'Why you want to learn Python?',
 'Current Status']

In [18]:
sheet_data = sheet_details.get('values', [])

In [19]:
if not sheet_data:
    print('No data found.')
else:
    columns = sheet_data[0]
    rows = sheet_data[1:]

In [20]:
for column in columns: print(column)

Timestamp
ITVersity Id
Email Address
First Name
Last Name
Why you want to learn Python?
Current Status


In [21]:
rows[:3]

[['12/16/2020 13:33:22',
  'ITV00002',
  'ITV00002@gmail.com',
  'Vijay',
  'Garudeswar',
  'Data engineering and python developmentnfor server administration',
  '7+ years of experience'],
 ['12/16/2020 13:33:54',
  'ITV00003',
  'ITV00003@gmail.com',
  'Vishnu',
  'Munagala',
  'Yes',
  '7+ years of experience'],
 ['12/16/2020 13:33:57',
  'ITV00004',
  'ITV00004@gmail.com',
  'SATISH',
  'KUMAR',
  'Career growth ',
  '3 to 7 years of experience']]

In [22]:
def get_sheet_data(service, spreadsheet_id, spreadsheet_range):
    sheet = service.spreadsheets()
    sheet_values = sheet.values()
    sheet_details = sheet_values.get(spreadsheetId=spreadsheet_id,
                            range=spreadsheet_range).execute()
    return sheet_details.get('values')[0], sheet_details.get('values')[1:]

In [23]:
get_sheet_data(service, SPREADSHEET_ID, RANGE_NAME)[0]

['Timestamp',
 'ITVersity Id',
 'Email Address',
 'First Name',
 'Last Name',
 'Why you want to learn Python?',
 'Current Status']

In [62]:
get_sheet_data(service, SPREADSHEET_ID, RANGE_NAME)[1][0]

['12/16/2020 13:33:22',
 'ITV00002',
 'ITV00002@gmail.com',
 'Vijay',
 'Garudeswar',
 'Data engineering and python developmentnfor server administration',
 '7+ years of experience']

In [24]:
columns, rows = get_sheet_data(service, SPREADSHEET_ID, RANGE_NAME)

In [25]:
emails = [row[2] for row in rows]

In [26]:
emails[:3]

['ITV00002@gmail.com', 'ITV00003@gmail.com', 'ITV00004@gmail.com']

In [28]:
import pandas as pd

df = pd.DataFrame(rows, columns=columns)

In [29]:
df.head(5)

Unnamed: 0,Timestamp,ITVersity Id,Email Address,First Name,Last Name,Why you want to learn Python?,Current Status
0,12/16/2020 13:33:22,ITV00002,ITV00002@gmail.com,Vijay,Garudeswar,Data engineering and python developmentnfor se...,7+ years of experience
1,12/16/2020 13:33:54,ITV00003,ITV00003@gmail.com,Vishnu,Munagala,Yes,7+ years of experience
2,12/16/2020 13:33:57,ITV00004,ITV00004@gmail.com,SATISH,KUMAR,Career growth,3 to 7 years of experience
3,12/16/2020 13:34:29,ITV00005,ITV00005@gmail.com,Marvathi,Gopi,I'm planning to make a career as a data engine...,Fresh Graduate
4,12/16/2020 13:34:40,ITV00006,ITV00006@gmail.com,Shams,Shaikh,PySpark Programming,< 3 years of experience


In [30]:
df['Email Address'].head(5)

0    ITV00002@gmail.com
1    ITV00003@gmail.com
2    ITV00004@gmail.com
3    ITV00005@gmail.com
4    ITV00006@gmail.com
Name: Email Address, dtype: object

In [32]:
df.columns

Index(['Timestamp', 'ITVersity Id', 'Email Address', 'First Name', 'Last Name',
       'Why you want to learn Python?', 'Current Status'],
      dtype='object')

In [36]:
df.drop?

[0;31mSignature:[0m
[0mdf[0m[0;34m.[0m[0mdrop[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mlabels[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maxis[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mindex[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcolumns[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlevel[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minplace[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0merrors[0m[0;34m=[0m[0;34m'raise'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding
axis, or by specifying directly index or column names. When using a
multi-index, labels on different levels can be removed by specifying
the level.

Parameters
----------
labels

In [38]:
df = df.drop(columns=df.columns[1]).drop(columns=df.columns[5:])

In [39]:
df.head(5)

Unnamed: 0,Timestamp,Email Address,First Name,Last Name
0,12/16/2020 13:33:22,ITV00002@gmail.com,Vijay,Garudeswar
1,12/16/2020 13:33:54,ITV00003@gmail.com,Vishnu,Munagala
2,12/16/2020 13:33:57,ITV00004@gmail.com,SATISH,KUMAR
3,12/16/2020 13:34:29,ITV00005@gmail.com,Marvathi,Gopi
4,12/16/2020 13:34:40,ITV00006@gmail.com,Shams,Shaikh


In [40]:
df.columns=['submitted_ts', 'email_id', 'first_name', 'last_name']

In [41]:
df.head(5)

Unnamed: 0,submitted_ts,email_id,first_name,last_name
0,12/16/2020 13:33:22,ITV00002@gmail.com,Vijay,Garudeswar
1,12/16/2020 13:33:54,ITV00003@gmail.com,Vishnu,Munagala
2,12/16/2020 13:33:57,ITV00004@gmail.com,SATISH,KUMAR
3,12/16/2020 13:34:29,ITV00005@gmail.com,Marvathi,Gopi
4,12/16/2020 13:34:40,ITV00006@gmail.com,Shams,Shaikh
