### **Data Transformations**
Web scraping often results in raw, messy data that can be inconsistent, incomplete, or improperly formatted. This unrefined data typically includes issues like missing values, typographical errors, varied date formats, and numeric values stored as text. Data transformation is the process of cleaning and standardizing this data using tools such as Pandas, which converts the raw output into a structured DataFrame. Through transformation, we can trim unnecessary whitespace, correct inconsistent casing, fill in missing values, and convert data types appropriately, ensuring that the data is accurate and ready for further analysis.

In summary, mastering data transformation techniques is essential for web scrapers to unlock the full potential of their collected data and to facilitate a seamless transition from raw data to actionable insights.

In [1]:
# TODO: Execute this cell to get a CSV file to work with

import pandas as pd


scraping_data = [
    {"name": "Alice", "age": "25", "email": "alice@example.com", "rate": "$20.5", "join_date": "2004-01-10"}, # rate is a string
    {"name": "bob", "age": "30", "email": "my email is bob30@example.com", "rate": "$35", "join_date": "2020/01/12"},   # email typo, different date format
    {"name": "Charlie", "age": "20", "email": "charlie@example.com", "rate": "$20", "join_date": "2004-01-15"},  # age missing, non-numeric price
    {"name": "David", "age": "40", "email": "", "rate": "45.0", "join_date": "2004-01-15"},  # missing email
    {"name": None, "age": None, "email": None, "rate": None, "join_date": None},  # invalid data row
    {"name": "Frank", "age": "35", "email": "frank@example.com", "rate": 20, "join_date": "2004-01-25"},
    {"name": "Grace", "age": "28", "email": "grace@example.com", "rate": 30, "join_date": "2004-12-01"} 
]

df = pd.DataFrame(scraping_data)
df.to_csv("scraping_data.csv", index=False)

In [None]:
"""
Objective: Reading data from a CSV file
"""

# TODO: Read the data from the CSV file into a DataFrame
# TODO: Apart from the read_csv() method, what else can pandas read from?

df = pd.read_csv("scraping_data.csv")

"""
1. Excel files:
df = pd.read_excel('file.xlsx')

2. JSON files:
df = pd.read_json('file.json')

3. SQL databases:
df = pd.read_sql('query', connection)

4. HTML tables:
df = pd.read_html('url_or_file.html')

5. XML files:
df = pd.read_xml('file.xml')

6. Parquet files:
df = pd.read_parquet('file.parquet')

7. HDF5 files:
df = pd.read_hdf('file.h5', 'key')

8. Pickle files:
df = pd.read_pickle('file.pkl')

9. SAS files:
df = pd.read_sas('file.sas7bdat')

10. STATA files:
df = pd.read_stata('file.dta')
"""

df

Unnamed: 0,name,age,email,rate,join_date
0,Alice,25.0,alice@example.com,$20.5,2004-01-10
1,bob,30.0,my email is bob30@example.com,$35,2020/01/12
2,Charlie,20.0,charlie@example.com,$20,2004-01-15
3,David,40.0,,45.0,2004-01-15
4,,,,,
5,Frank,35.0,frank@example.com,20,2004-01-25
6,Grace,28.0,grace@example.com,30,2004-12-01


In [None]:
"""
Objective: Understanding the data
"""
# TODO: Use the info() and describe(include="all") methods to understand the data
# TODO: The info() method returns different non-null values, why is that?
# The info() method shows different non-null values because:
# - Some columns have missing values (NaN)
# - Row 4 has all null values
# - 'email' column has an empty string which is converted to NaN

# TODO: The describe() method returns 7 count name and 7 unique name, what is that means?
# The describe() method shows:
# - count: number of non-null values
# - unique: number of unique values
# - top: most frequent value
# - freq: frequency of the most common value
# - mean, std, min, 25%, 50%, 75%, max: numerical statistics (only for numeric columns)

print(df.info())
# prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.

#print(df.describe(include="all"))
# display summary statistics of numerical columns


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   name       6 non-null      object 
 1   age        6 non-null      float64
 2   email      5 non-null      object 
 3   rate       6 non-null      object 
 4   join_date  6 non-null      object 
dtypes: float64(1), object(4)
memory usage: 412.0+ bytes
None


In [5]:
"""
Objective: Handling missing data
Resource: https://www.kaggle.com/code/rtatman/data-cleaning-challenge-handling-missing-values
"""
# TODO: Print the dataframe and notice which columns have missing values
# TODO: Add a separator print("======================================")
# TODO: Remove all the rows that contain a missing value using the dropna() method

print(df)

print("======================================")

df = df.dropna(how="all")

print(df)


      name   age                          email   rate   join_date
0    Alice  25.0              alice@example.com  $20.5  2004-01-10
1      bob  30.0  my email is bob30@example.com    $35  2020/01/12
2  Charlie  20.0            charlie@example.com    $20  2004-01-15
3    David  40.0                            NaN   45.0  2004-01-15
4      NaN   NaN                            NaN    NaN         NaN
5    Frank  35.0              frank@example.com     20  2004-01-25
6    Grace  28.0              grace@example.com     30  2004-12-01
      name   age                          email   rate   join_date
0    Alice  25.0              alice@example.com  $20.5  2004-01-10
1      bob  30.0  my email is bob30@example.com    $35  2020/01/12
2  Charlie  20.0            charlie@example.com    $20  2004-01-15
3    David  40.0                            NaN   45.0  2004-01-15
5    Frank  35.0              frank@example.com     20  2004-01-25
6    Grace  28.0              grace@example.com     30  2004-1

In [None]:
"""
Objective: Manipulating Textual Data
"""
# TODO: Replace all the $ signs with nothing using str.replace
# TODO: Shows the result

# df["rate"] = df["rate"].str.replace("$", "")
df.loc[:, "rate"] = df["rate"].str.replace("$", "")
df

# 1. Uses str.replace() to remove all "$" symbols from the rate column
# 2. Uses df.loc[:, "rate"] for explicit column access (safer than direct indexing)
# 3. The result shows the rate column with clean numeric values without dollar signs


Unnamed: 0,name,age,email,rate,join_date
0,Alice,25.0,alice@example.com,20.5,2004-01-10
1,bob,30.0,my email is bob30@example.com,35.0,2020/01/12
2,Charlie,20.0,charlie@example.com,20.0,2004-01-15
3,David,40.0,,45.0,2004-01-15
4,,,,,
5,Frank,35.0,frank@example.com,20.0,2004-01-25
6,Grace,28.0,grace@example.com,30.0,2004-12-01


In [15]:
"""
Objective: Manipulating Textual Data
"""
# TODO: Replace all the / signs with - using str.replace
# TODO: Shows the result

df.loc[:, "join_date"] = df["join_date"].str.replace("/", "-")
df

Unnamed: 0,name,age,email,rate,join_date
0,Alice,25.0,alice@example.com,20.5,2004-01-10
1,bob,30.0,my email is bob30@example.com,35.0,2020-01-12
2,Charlie,20.0,charlie@example.com,20.0,2004-01-15
3,David,40.0,,45.0,2004-01-15
4,,,,,
5,Frank,35.0,frank@example.com,20.0,2004-01-25
6,Grace,28.0,grace@example.com,30.0,2004-12-01


In [17]:
"""
Objective: Manipulating Textual Data
"""
# TODO: Validate emails from the email column
# TODO: Shows the result
df_copy = df.copy()

df_copy["email"] = df_copy["email"].str.extract(r'([\w\-]+@[\w\.-]+\.\w+)')
df_copy
df = df_copy


In [19]:
"""
Objective: Filling Missing Data
"""
# TODO: Fill empty value in the email column with "-"
# TODO: Shows the result

# df["email"] = df["email"].fillna("-")
df.loc[:, "email"] = df["email"].fillna("-")
df

Unnamed: 0,name,age,email,rate,join_date
0,Alice,25.0,alice@example.com,20.5,2004-01-10
1,bob,30.0,bob30@example.com,35.0,2020-01-12
2,Charlie,20.0,charlie@example.com,20.0,2004-01-15
3,David,40.0,-,45.0,2004-01-15
4,,,-,,
5,Frank,35.0,frank@example.com,20.0,2004-01-25
6,Grace,28.0,grace@example.com,30.0,2004-12-01


In [20]:
"""
Objective: Convert Data Types
"""
# TODO: Check the data types of each column using the info() method
df.info()

df_copy = df.copy()

df_copy["name"] = df_copy["name"].astype("string") # this will convert the name column to string
df_copy["age"] = df_copy["age"].astype("int") # this will convert the age column to integer
df_copy["join_date"] = pd.to_datetime(df_copy["join_date"]) # this will convert the join_date column to datetime

# TODO: Convert email column to string
# TODO: Convert rate column to float
# TODO: Check the data types of each column using the info() method to make sure the conversion was successful

# Expected Output
# <class 'pandas.core.frame.DataFrame'>
# Index: 6 entries, 0 to 6
# Data columns (total 5 columns):
#  #   Column     Non-Null Count  Dtype         
# ---  ------     --------------  -----         
#  0   name       6 non-null      string        
#  1   age        6 non-null      int64         
#  2   email      6 non-null      string        
#  3   rate       6 non-null      float64       
#  4   join_date  6 non-null      datetime64[ns]
# dtypes: datetime64[ns](1), float64(1), int64(1), string(2)
# memory usage: 288.0 bytes

df_copy["email"] = df_copy["email"].astype("string")
df_copy["rate"] = df_copy["rate"].astype("float")



df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   name       6 non-null      object 
 1   age        6 non-null      float64
 2   email      7 non-null      object 
 3   rate       6 non-null      object 
 4   join_date  6 non-null      object 
dtypes: float64(1), object(4)
memory usage: 412.0+ bytes


IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

In [24]:
df = df_copy

In [25]:
"""
Objective: Add new columns based on existing columns
"""

today = pd.to_datetime("today") # get the current date
df['period of employment'] = df['join_date'].apply(lambda x: (today - x).days) # this will calculate the period of employment in days

# TODO: Re-assign apply() to calculate the period of employment in years
# TODO: Re-assign apply() to round up the period of employment

df['period of employment'] = df['period of employment'].apply(lambda x: (x/365))
df['period of employment'] = df['period of employment'].apply(lambda x: round(x, 0))

df

TypeError: unsupported operand type(s) for -: 'Timestamp' and 'str'

In [51]:
"""
Objective: Transpose Rows and Columns
"""
# TODO: Execute this cell before continue

# Sample data (as rows)
data = {
    'Attribute': ['Price', 'Change', 'Volume'],
    'Apple': [150.00, '+2%', '1M'],
    'Microsoft': [250.00, '-1%', '500K'],
    'Google': [2800.00, '+1%', '2M']
}

# Create DataFrame
df = pd.DataFrame(data)

# # Set 'Attribute' as index (attributes as rows)
# df.set_index('Attribute', inplace=True)

df

Unnamed: 0,Attribute,Apple,Microsoft,Google
0,Price,150.0,250.0,2800.0
1,Change,+2%,-1%,+1%
2,Volume,1M,500K,2M


In [54]:
"""
Objective: Transpose Rows and Columns
"""
# TODO: Transpose the DataFrame using the transpose() method
# TODO: Shows the result
# TODO: Rename the column headers to be the values from the first row
# TODO: Drop the first row, reset the index before dropping

df = df.transpose()
print(df)

# # Set the first row as column headers and drop it from the DataFrame
df.columns = df.iloc[0]  # Set the first row as column headers
print(df)
df.reset_index(drop=True, inplace=True)

df = df.drop(index=0)  # Drop the first row

# # Reset the index for clean output
df.reset_index(drop=True, inplace=True)
df

Attribute   Price  Change  Volume
Price                            
Price       Price  Change  Volume
150.0       150.0     +2%      1M
250.0       250.0     -1%    500K
2800.0     2800.0     +1%      2M
Price    Price  Change  Volume
Price                         
Price    Price  Change  Volume
150.0    150.0     +2%      1M
250.0    250.0     -1%    500K
2800.0  2800.0     +1%      2M


Price,Price.1,Change,Volume
0,150.0,+2%,1M
1,250.0,-1%,500K
2,2800.0,+1%,2M


In [26]:
"""
Objective: Understanding Concatenation (pd.concat): Stack DataFrames vertically (rows) or horizontally (columns).
"""
# Create first DataFrame
df1 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
})

# Create second DataFrame
df2 = pd.DataFrame({
    'ID': [4, 5, 6],
    'Name': ['David', 'Eva', 'Frank']
})

# Concatenate vertically (row-wise)
df_concat = pd.concat([df1, df2], axis=0, ignore_index=True)
print(df1)
print("===============================")
print(df2)
print("===============================")
print(df_concat)
print("===============================")

# TODO: Execute this cell to understand the before and after concatenation
# TODO: Re-apply concatenation vertically to add Age and City columns from df3 below by using axis=1 and set ignore_index=False

# Create third DataFrame
df3 = pd.DataFrame({
    'Age': [25, 30, 35, 25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago', 'New York', 'Los Angeles', 'Chicago']
})

df_concat_vertical = pd.concat([df_concat, df3], axis=0, ignore_index=False)
print(df3)
print("===============================")
print(df_concat_vertical)

df_concat_horizontal = pd.concat([df_concat, df3], axis=1)
print("Adding new columns horizontally (axis=1):")
print(df3)
print("===============================")
print(df_concat_horizontal)

   ID     Name
0   1    Alice
1   2      Bob
2   3  Charlie
   ID   Name
0   4  David
1   5    Eva
2   6  Frank
   ID     Name
0   1    Alice
1   2      Bob
2   3  Charlie
3   4    David
4   5      Eva
5   6    Frank
   Age         City
0   25     New York
1   30  Los Angeles
2   35      Chicago
3   25     New York
4   30  Los Angeles
5   35      Chicago
    ID     Name   Age         City
0  1.0    Alice   NaN          NaN
1  2.0      Bob   NaN          NaN
2  3.0  Charlie   NaN          NaN
3  4.0    David   NaN          NaN
4  5.0      Eva   NaN          NaN
5  6.0    Frank   NaN          NaN
0  NaN      NaN  25.0     New York
1  NaN      NaN  30.0  Los Angeles
2  NaN      NaN  35.0      Chicago
3  NaN      NaN  25.0     New York
4  NaN      NaN  30.0  Los Angeles
5  NaN      NaN  35.0      Chicago
Adding new columns horizontally (axis=1):
   Age         City
0   25     New York
1   30  Los Angeles
2   35      Chicago
3   25     New York
4   30  Los Angeles
5   35      Chicago
   ID 

In [None]:
"""
Objective: Understanding Merging (pd.merge): Join DataFrames based on common column values (like SQL joins).
"""
# Create first DataFrame
df1 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
})

# Create second DataFrame
df2 = pd.DataFrame({
    'ID': [2, 3, 4],
    'Age': [30, 35, 40]
})

# Merge DataFrames on 'ID' (inner join by default)
df_merged = pd.merge(df1, df2, on='ID', how='inner')

print(df1)
print("===============================")
print(df2)
print("===============================")
print(df_merged)

# TODO: Execute this cell to understand the inner join in pandas
# TODO: After that change how='inner' to how='outer' to understand the outer join
# TODO: What is the difference between inner join and outer join?

# First, inner join
print("Inner Join Result:")
df_merged_inner = pd.merge(df1, df2, on='ID', how='inner')
print(df1)
print("===============================")
print(df2)
print("===============================")
print(df_merged_inner)

print("\nOuter Join Result:")
# Now  outer join
df_merged_outer = pd.merge(df1, df2, on='ID', how='outer')
print(df_merged_outer)

# The key differences between inner and outer joins:

# 1. Inner Join ( how='inner' ):
   
#    - Only keeps rows where the 'ID' exists in both DataFrames
#    - Result only contains IDs 2 and 3
#    - Rows with ID 1 and 4 are dropped because they don't exist in both DataFrames
# 2. Outer Join ( how='outer' ):
   
#    - Keeps all rows from both DataFrames
#    - Result contains all IDs (1, 2, 3, and 4)
#    - Missing values are filled with NaN
#    - ID 1 will have NaN for Age
#    - ID 4 will have NaN for Name
# The output will show how inner join creates a more restricted result while outer join preserves all data from both DataFrames.

   ID     Name
0   1    Alice
1   2      Bob
2   3  Charlie
   ID  Age
0   2   30
1   3   35
2   4   40
   ID     Name  Age
0   2      Bob   30
1   3  Charlie   35


In [63]:
"""
Objective: Understanding Joining (df.join): Join DataFrames based on index.
"""
# Create first DataFrame with custom index
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie']
}, index=[1, 2, 3])

# Create second DataFrame with matching index
df2 = pd.DataFrame({
    'Age': [25, 30, 35]
}, index=[2, 3, 4])

# Join DataFrames on index
df_joined = df1.join(df2)

print(df1)
print("===============================")
print(df2)
print("===============================")
print(df_joined)

# TODO: Execute this cell to understand the join in pandas
# TODO: Create third dataframe with different index
# TODO: Join DataFrames using join()

      Name
1    Alice
2      Bob
3  Charlie
   Age
2   25
3   30
4   35
      Name   Age
1    Alice   NaN
2      Bob  25.0
3  Charlie  30.0


In [64]:
"""
Objective: Choosing between concat, merge, and join
"""
# Create first DataFrame (2021 Sales Report)
df_2021 = pd.DataFrame({
    'Product': ['A', 'B', 'C'],
    'Sales': [1200, 1500, 1300],
    'Revenue': [24000, 30000, 26000],
    'Expenses': [15000, 18000, 16000],
    'Profit': [9000, 12000, 10000]
})

# Create second DataFrame (2022 Sales Report)
df_2022 = pd.DataFrame({
    'Product': ['A', 'B', 'C'],
    'Sales': [1250, 1600, 1350],
    'Revenue': [25000, 32000, 27000],
    'Expenses': [16000, 19000, 17000],
    'Profit': [9000, 13000, 10000]
})

# Create third DataFrame (2023 Sales Report)
df_2023 = pd.DataFrame({
    'Product': ['A', 'B', 'C'],
    'Sales': [1300, 1700, 1400],
    'Revenue': [26000, 34000, 28000],
    'Expenses': [17000, 20000, 18000],
    'Profit': [9000, 14000, 10000]
})

# Display DataFrames
print("2021 Sales Report:")
print(df_2021)
print("\n2022 Sales Report:")
print(df_2022)
print("\n2023 Sales Report:")
print(df_2023)

# TODO: There are 3 difference dataframe, this are representing sales report in a year. Each year they have different file.
# TODO: Your task is to combine them into single dataframe for further analysis
# TODO: Determine how you combine them. You can choose between concat, merge, and join.
# Concatenate DataFrames vertically, with year as a new column to identify the year
df_2021['Year'] = 2021
df_2022['Year'] = 2022
df_2023['Year'] = 2023

# Concatenate vertically (adding rows)
df_combined = pd.concat([df_2021, df_2022, df_2023], axis=0, ignore_index=True)

print("\nCombined Sales Report:")
print(df_combined)


2021 Sales Report:
  Product  Sales  Revenue  Expenses  Profit
0       A   1200    24000     15000    9000
1       B   1500    30000     18000   12000
2       C   1300    26000     16000   10000

2022 Sales Report:
  Product  Sales  Revenue  Expenses  Profit
0       A   1250    25000     16000    9000
1       B   1600    32000     19000   13000
2       C   1350    27000     17000   10000

2023 Sales Report:
  Product  Sales  Revenue  Expenses  Profit
0       A   1300    26000     17000    9000
1       B   1700    34000     20000   14000
2       C   1400    28000     18000   10000

Combined Sales Report (Vertical Concatenation):
  Product  Sales  Revenue  Expenses  Profit  Year
0       A   1200    24000     15000    9000  2021
1       B   1500    30000     18000   12000  2021
2       C   1300    26000     16000   10000  2021
3       A   1250    25000     16000    9000  2022
4       B   1600    32000     19000   13000  2022
5       C   1350    27000     17000   10000  2022
6       A   13

### **Reflection**
When would you choose to use pd.concat() instead of pd.merge() or df.join()? And how do the performance and functionality of these methods differ when dealing with large datasets?

(answer here)

### pd.concat()
Best for:

1. Simple stacking of DataFrames (vertically or horizontally)
2. When you don't need to match on specific columns
3. When dealing with multiple DataFrames at once
Performance:

- Fast for simple concatenations
- Memory efficient for vertical stacking
- Can be memory-intensive for horizontal stacking with large datasets
### pd.merge()
Best for:

1. Combining DataFrames based on common column values
2. When you need SQL-like join operations (inner, outer, left, right)
3. When matching on multiple columns
Performance:

- Optimized for column-based joins
- Can be slower with large datasets due to key matching
- Memory usage depends on join type and data size
### df.join()
Best for:

1. Index-based joining
2. Simple left joins
3. When DataFrames share a common index
Performance:

- Fastest for index-based operations
- Memory efficient when indexes are aligned
- Less flexible than merge()
### Large Dataset Considerations:
1. pd.concat() :
   
   - Use for simple vertical stacking
   - Avoid for wide horizontal concatenations
2. pd.merge() :
   
   - Use when you need complex joins
   - Consider chunking data for very large datasets
3. df.join() :
   
   - Best choice when data is already indexed properly
   - Most memory efficient for simple joins
Choose pd.concat() when you need simple combining of DataFrames without matching values. Use pd.merge() for complex joins based on column values, and df.join() for simple index-based operations.

### **Exploration**
Learn how to store web-scraped data or any Pandas DataFrame into a Google Spreadsheet programmatically using Google Sheets API and gspread library.

1. First, install the required packages:

        #pip install gspread oauth2client pandas

2. Set up Google Sheets API:

- Go to Google Cloud Console
- Create a new project
- Enable Google Sheets API
- Create credentials (Service Account)
- Download the JSON credentials file
- Share your Google Sheet with the client_email from your credentials
3. Here's the code to store DataFrame in Google Sheets:

In [None]:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd

def upload_to_sheets(df, spreadsheet_name, worksheet_name):
    # Define the scope
    scope = ['https://spreadsheets.google.com/feeds',
             'https://www.googleapis.com/auth/drive']

    # Load credentials from JSON file
    credentials = ServiceAccountCredentials.from_json_keyfile_name(
        'path/to/your/credentials.json', scope)

    # Authorize with Google
    client = gspread.authorize(credentials)

    try:
        # Open the spreadsheet
        spreadsheet = client.open(spreadsheet_name)
        
        # Create or open worksheet
        try:
            worksheet = spreadsheet.worksheet(worksheet_name)
        except:
            worksheet = spreadsheet.add_worksheet(worksheet_name, 
                                               rows=df.shape[0]+1, 
                                               cols=df.shape[1])

        # Clear existing content
        worksheet.clear()

        # Update with new data
        worksheet.update([df.columns.values.tolist()] + df.values.tolist())
        
        print(f"Data successfully uploaded to {spreadsheet_name}/{worksheet_name}")
        
    except Exception as e:
        print(f"An error occurred: {str(e)}")

# Example usage
if __name__ == "__main__":
    # Sample DataFrame
    data = {
        'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'London', 'Paris']
    }
    df = pd.DataFrame(data)

    # Upload to Google Sheets
    upload_to_sheets(df, 
                    spreadsheet_name='Your Spreadsheet Name',
                    worksheet_name='Sheet1')

4. To use this script:
   - Replace 'path/to/your/credentials.json' with the actual path to your credentials file
   - Replace 'Your Spreadsheet Name' with your actual Google Sheets document name
   - Modify the sample DataFrame with your actual data
Key Features of this implementation:

- Automatically creates new worksheet if it doesn't exist
- Clears existing content before uploading new data
- Preserves column headers
- Handles errors gracefully
- Can be easily modified to append data instead of overwriting
Remember to keep your credentials file secure and never share it publicly. Also, make sure the Google Sheet is shared with the service account email address from your credentials.