<img width="8%" alt="Google Sheets.png" src="https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/.github/assets/logos/Google%20Sheets.png" style="border-radius: 15%">

# Google Sheets - Update growth database

**Tags:** #googlesheets #gsheet #data #naas_drivers #growth-engine #automation #interactions

**Author:** [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/)

**Description:** This notebook updates growth database with new people that interacted with content.

## Input

### Import libraries

In [1]:
from naas_drivers import gsheet
import pandas as pd
import os
from datetime import date
import naas_data_product
import openai
import time

🕣 Your Production Timezone is Europe/Paris

✅ utils file '/home/ftp/abi/utils/data.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/naas_chat_plugin.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/naas_lab.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/openai.ipynb' successfully loaded.


### Setup variables
**Inputs**
- `input_dir`: Input directory to retrieve file from.
- `file_interactions`: Name of the file to be retrieved.

**Outputs**
- `spreadsheet_url`: Google Sheets spreadsheet URL.
- `sheet_name`: Google Sheets sheet name.
- `output_dir`: Output directory to save file to.
- `output_file`: Output file name to save as picke.

In [2]:
# Inputs
input_dir = os.path.join(naas_data_product.OUTPUTS_PATH, "growth-engine", date.today().isoformat())
file_interactions = "linkedin_interactions"

# Outputs
spreadsheet_url = naas.secret.get("ABI_SPREADSHEET")
sheet_name = "GROWTH"
output_dir = os.path.join(naas_data_product.OUTPUTS_PATH, "growth-engine", date.today().isoformat())
output_file = "growth"

## Model

### Get growth db

In [3]:
df_gsheet = gsheet.connect(spreadsheet_url).get(sheet_name=sheet_name)
if not isinstance(df_gsheet, pd.DataFrame):
    df_gsheet = pd.DataFrame()
print("- Growth (init):", len(df_gsheet))
# df_gsheet.head(1)

- Growth (init): 1696


### Get interactions database

In [4]:
df_interactions = pload(input_dir, file_interactions)    
print('- Interactions:', len(df_interactions))
# df_interactions.head(1)

- Interactions: 2299


### Get interactions by profile and scenario

In [5]:
def get_interactions_by_profile_and_scenario(
    df_init,
):
    # Init
    df = df_init.copy()
    df_interactions = pd.DataFrame()
    
    # Cleaning
    to_select = [
        "SCENARIO",
        "PROFILE_URL",
        "CONTENT_TITLE",
        "CONTENT_URL",
        "INTERACTION",
        "INTERACTION_CONTENT"
    ]
    df = df[to_select].sort_values(by="PROFILE_URL").reset_index(drop=True)
    df["INTERACTION_TEXT"] = ""
    df.loc[df["INTERACTION"] == "POST_REACTION", "INTERACTION_TEXT"] = "Sent '" + df["INTERACTION_CONTENT"].str.lower() + "' reaction to '" + df["CONTENT_TITLE"].str.strip() + "' (" + df["CONTENT_URL"] + ")"
    df.loc[df["INTERACTION"] == "POST_COMMENT", "INTERACTION_TEXT"] = "Comment '" + df["INTERACTION_CONTENT"].str.capitalize() + "' on '" + df["CONTENT_TITLE"].str.strip() + "' (" + df["CONTENT_URL"] + ")"

    # Create interactions by profile
    df_keys = df_init.copy()
    df_keys = df_keys[["SCENARIO", "PROFILE_URL"]].drop_duplicates()
    for row in df_keys.itertuples():
        index = row.Index
        scenario = row.SCENARIO
        profile_url = row.PROFILE_URL
        tmp_df = df.copy()
        tmp_df = tmp_df[(tmp_df["SCENARIO"] == scenario) & (tmp_df["PROFILE_URL"] == profile_url)].reset_index(drop=True)
        interests = ""
        for row in tmp_df.itertuples():
            interaction_text = row.INTERACTION_TEXT
            interests = f"{interests}{interaction_text}, "
        df_keys.loc[index, "INTERACTIONS"] = interests.strip()
    return df_keys

df_interaction_text = get_interactions_by_profile_and_scenario(df_interactions)
print("- Interactions text:", len(df_interaction_text))
df_interaction_text.head(1)

- Interactions text: 1696


Unnamed: 0,SCENARIO,PROFILE_URL,INTERACTIONS
0,W51-2023,https://www.linkedin.com/in/ACoAAAEUKLEBMQzYyQ...,Sent 'like' reaction to 'Modern Businesses sho...


### Get last interaction by profile and scenario

In [6]:
to_keep = [
    "SCENARIO",
    "PROFILE_URL",
    "OCCUPATION",
    "DATE",
    "CONTENT_URL",
    "CONTENT_TITLE",
    "DATE_ORDER",
]
df_last_interaction = df_interactions[to_keep].drop_duplicates().drop_duplicates(["SCENARIO", "PROFILE_URL"])
print("- Last interactions:", len(df_last_interaction))
df_last_interaction.head(1)

- Last interactions: 1696


Unnamed: 0,SCENARIO,PROFILE_URL,OCCUPATION,DATE,CONTENT_URL,CONTENT_TITLE,DATE_ORDER
0,W51-2023,https://www.linkedin.com/in/ACoAAAEUKLEBMQzYyQ...,Deeptech Product Management | Transforming Ent...,Tue. 26 Dec.,https://www.linkedin.com/feed/update/urn:li:ac...,Modern Businesses should start visualizing the...,20231226


### Create growth database

In [7]:
def create_growth_db(
    df_init,
    df_interaction_text,
    df_last_interaction,
):
    # Init
    df = df_init.copy()
    
    # Get cohort
    df_cohort = df_init[["SCENARIO", "PROFILE_URL"]].drop_duplicates(keep='last')
    df_cohort.loc[df_cohort["SCENARIO"] == TW, "SCENARIO"] = "NEW"
    cohorts = df_cohort.set_index('PROFILE_URL')['SCENARIO'].to_dict()
    
    # Add cohort to df
    df["COHORT"] = df["PROFILE_URL"].map(cohorts)
    
    # Groupby
    to_group = [
        "ENTITY",
        "SCENARIO",
        "PLATFORM",
        "FULLNAME",
        "COHORT",
        "PROFILE_URL",
    ]
    to_agg = {
        "INTERACTION_SCORE": "sum"
    }
    df = df.groupby(to_group, as_index=False).agg(to_agg)

    # Merge data
    df = pd.merge(df, df_interaction_text, how="left")
    df = pd.merge(df, df_last_interaction, how="left")
    
    # Rename columns
    to_rename = {
        "DATE": "LAST_INTERACTION_DATE",
        "CONTENT_URL": "LAST_CONTENT_URL_INTERACTION",
        "CONTENT_TITLE": "LAST_CONTENT_TITLE_INTERACTION"
    }
    df = df.rename(columns=to_rename)
    
    # Cleaning
    to_order = [
        "ENTITY",
        "SCENARIO",
        "PLATFORM",
        "FULLNAME",
        "OCCUPATION",
        "COHORT",
        "INTERACTION_SCORE",
        "INTERACTIONS",
        "LAST_INTERACTION_DATE",
        "LAST_CONTENT_TITLE_INTERACTION",
        "LAST_CONTENT_URL_INTERACTION",
        "PROFILE_URL",
        "DATE_ORDER",
    ]
    df = df[to_order]
    df = df.sort_values(by=["SCENARIO", "INTERACTION_SCORE", "LAST_INTERACTION_DATE"], ascending=[False, False, False])
    return df.reset_index(drop=True)

df_growth = create_growth_db(
    df_interactions,
    df_interaction_text,
    df_last_interaction
)
print("🚀 Growth DB:", len(df_growth))
df_growth.head(5)

🚀 Growth DB: 1696


Unnamed: 0,ENTITY,SCENARIO,PLATFORM,FULLNAME,OCCUPATION,COHORT,INTERACTION_SCORE,INTERACTIONS,LAST_INTERACTION_DATE,LAST_CONTENT_TITLE_INTERACTION,LAST_CONTENT_URL_INTERACTION,PROFILE_URL,DATE_ORDER
0,Jérémy Ravenel,W51-2023,LinkedIn,G. P.,Web Developer ~ All my Posts are my Personal O...,W51-2023,13,Sent 'like' reaction to 'Modern Businesses sho...,Sat. 23 Dec.,Modern Businesses should start visualizing the...,https://www.linkedin.com/feed/update/urn:li:ac...,https://www.linkedin.com/in/ACoAABI9Rz8BIxnTIt...,20231223
1,Jérémy Ravenel,W51-2023,LinkedIn,Matteo Castiello,Generative AI Advisor and Researcher | Guiding...,W45-2023,11,Comment 'Rate both but dot is my preferred' on...,Sat. 23 Dec.,Modern Businesses should start visualizing the...,https://www.linkedin.com/feed/update/urn:li:ac...,https://www.linkedin.com/in/ACoAACenYg8BoVOSWA...,20231223
2,Jérémy Ravenel,W51-2023,LinkedIn,Vin Vashishta,AI Advisor | Author “From Data To Profit” | Co...,W44-2023,8,Sent 'like' reaction to 'Imagine if your busin...,Fri. 22 Dec.,Modern Businesses should start visualizing the...,https://www.linkedin.com/feed/update/urn:li:ac...,https://www.linkedin.com/in/ACoAAADS0WQBhQQVMD...,20231222
3,Jérémy Ravenel,W51-2023,LinkedIn,David Knickerbocker,"Chief Scientist, Co-founder, Author",W44-2023,7,Sent 'praise' reaction to 'Modern Businesses s...,Sat. 23 Dec.,Modern Businesses should start visualizing the...,https://www.linkedin.com/feed/update/urn:li:ac...,https://www.linkedin.com/in/ACoAAAA2npgBLQ7PHn...,20231223
4,Jérémy Ravenel,W51-2023,LinkedIn,Mark Hebert,GIS Manager at Coral Gables,W51-2023,7,Comment 'This would be cool to model on local ...,Sat. 23 Dec.,Modern Businesses should start visualizing the...,https://www.linkedin.com/feed/update/urn:li:ac...,https://www.linkedin.com/in/ACoAAAJoeikBXiKayu...,20231223


## Output

### Save data

In [8]:
pdump(output_dir, df_growth, output_file)

### Send data to Google Sheets spreadsheet

In [9]:
df_check = pd.concat([df_gsheet.astype(str), df_growth.astype(str)]).drop_duplicates(keep=False)
if len(df_check) > 0:
    gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=df_growth, append=False)
else:
    print("Noting to update in Google Sheets!")