<img width="8%" alt="Google Sheets.png" src="https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/.github/assets/logos/Google%20Sheets.png" style="border-radius: 15%">

# Google Sheets - Update interaction database

**Tags:** #googlesheets #gsheet #data #naas_drivers #growth-engine #automation #picke #linkedin #interactions #comments #likes

**Author:** [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/)

**Description:** This notebook updates interaction database with new interactions from likes and comments.

## Input

### Import libraries

In [1]:
from naas_drivers import gsheet
import pandas as pd
import os
from datetime import date
import naas_data_product

🕣 Your Production Timezone is Europe/Paris

✅ utils file '/home/ftp/abi/utils/data.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/naas_chat_plugin.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/naas_lab.ipynb' successfully loaded.
✅ utils file '/home/ftp/abi/utils/openai.ipynb' successfully loaded.


### Setup variables
**Inputs**
- `input_dir`: Input directory to retrieve file from.
- `file_reactions`: Name of the file with reactions to be retrieved.
- `file_comments`: Name of the file with comments to be retrieved.

**Outputs**
- `output_dir`: Output directory to save file to.
- `output_file`: Output file name to save as picke.
- `spreadsheet_url`: Google Sheets spreadsheet URL.
- `sheet_name`: Google Sheets sheet name.

In [2]:
# Inputs
input_dir = os.path.join(naas_data_product.OUTPUTS_PATH, "growth-engine", date.today().isoformat())
file_reactions = "linkedin_post_reactions"
file_comments = "linkedin_post_comments"

# Outputs
output_dir = os.path.join(naas_data_product.OUTPUTS_PATH, "growth-engine", date.today().isoformat())
output_file = "linkedin_interactions"
spreadsheet_url = naas.secret.get("ABI_SPREADSHEET")
sheet_name = "INTERACTIONS"

## Model

### Get interactions

In [3]:
df_gsheet = gsheet.connect(spreadsheet_url).get(sheet_name=sheet_name)
if not isinstance(df_gsheet, pd.DataFrame):
    df_gsheet = pd.DataFrame()
print("🗂️ Interactions (init):", len(df_gsheet))
# df_gsheet.head(1)

🗂️ Interactions (init): 2299


### Get reactions

In [4]:
# df_reactions = pload(input_dir, file_reactions)    
# print('👍 Total Reactions:', len(df_reactions))
# # df_reactions.head(1)

In [5]:
limit = (datetime.now(TIMEZONE) - timedelta(days=datetime.now(TIMEZONE).weekday() + 7)).date()
reactions_files = sorted(glob.glob(os.path.join(naas_data_product.OUTPUTS_PATH, "growth-engine", "**", f"{file_reactions}.pickle"), recursive=True))
df_reactions = pd.DataFrame()
posts_url = []
for index, file in enumerate(reactions_files):
    input_dir_r = file.split(file_reactions)[0]
    tz = pytz.timezone('Europe/Paris')
    date_dir = datetime.strptime(file.split("/")[-2], "%Y-%m-%d").replace(tzinfo=tz).date()
    if date_dir >= limit:
        tmp_df_reactions = pload(input_dir_r, file_reactions)
        tmp_posts_url = tmp_df_reactions["POST_URL"].unique().tolist()
        for x in tmp_posts_url:
            if x not in posts_url:
                tmp_df_reactions["DATE_REACTION"] = pd.to_datetime(tmp_df_reactions['PUBLISHED_DATE'], format='%Y-%m-%d %H:%M:%S%z').dt.tz_convert(TIMEZONE).dt.strftime("%Y-%m-%d %H:%M:%S%z")
                posts_url.append(x)
            else:
                tmp_df_reactions["DATE_REACTION"] = pd.to_datetime(tmp_df_reactions['DATE_EXTRACT'], format='%Y-%m-%d %H:%M:%S').dt.tz_localize(pytz.timezone("Europe/Paris")).dt.tz_convert(TIMEZONE).dt.strftime("%Y-%m-%d %H:%M:%S%z")
        df_reactions = pd.concat([df_reactions, tmp_df_reactions])
        
df_reactions = df_reactions.drop_duplicates(["PROFILE_URL", "POST_URL"], keep="first")
df_reactions = df_reactions.sort_values("DATE_REACTION").reset_index(drop=True)
print('👍 Total Reactions:', len(df_reactions))
df_reactions.head(1)

👍 Total Reactions: 276


Unnamed: 0,ENTITY,SCENARIO,CONTENT_URL,TITLE,PUBLISHED_DATE,PROFILE_ID,PROFILE_URL,PUBLIC_ID,FIRSTNAME,LASTNAME,FULLNAME,OCCUPATION,PROFILE_PICTURE,BACKGROUND_PICTURE,PROFILE_TYPE,REACTION_TYPE,POST_URL,DATE_EXTRACT,DATE_REACTION
0,Jérémy Ravenel,W50-2023,https://www.linkedin.com/feed/update/urn:li:ac...,"Similar to the pharmaceutical industry, where ...",2023-12-12 17:30:06+0100,ACoAACwiE1kBE4o-yVl7iOjaBn71zIL_g4ot8f8,https://www.linkedin.com/in/ACoAACwiE1kBE4o-yV...,shaik-parveen-roshni-b89b75187,Shaik,Parveen Roshni,Shaik Parveen Roshni,Master's in Applied Machine Intelligence @Nort...,https://media.licdn.com/dms/image/D4E35AQHzbOi...,https://media.licdn.com/dms/image/D4E16AQGNzq0...,PROFILE,LIKE,https://www.linkedin.com/feed/update/urn:li:ac...,2023-12-23 18:47:40,2023-12-12 17:30:06+0100


### Get comments

In [6]:
df_comments = pload(input_dir, file_comments)
print('🗨️ Total Comments:', len(df_comments))
df_comments.head(1)

🗨️ Total Comments: 40


Unnamed: 0,ENTITY,SCENARIO,CONTENT_URL,TITLE,PUBLISHED_DATE,PROFILE_ID,PROFILE_URL,PUBLIC_ID,FIRSTNAME,LASTNAME,...,BACKGROUND_PICTURE,PROFILE_TYPE,TEXT,CREATED_TIME,LANGUAGE,DISTANCE,COMMENTS,LIKES,POST_URL,DATE_EXTRACT
0,Jérémy Ravenel,W51-2023,https://www.linkedin.com/feed/update/urn:li:ac...,Modern Businesses should start visualizing the...,2023-12-22 22:07:42+0100,ACoAAABsDQQBeFmCyTF_cP-7cvDNZuuLO86AQpU,https://www.linkedin.com/in/ACoAAABsDQQBeFmCyT...,mohamed-moniem,Mohamed,Monem,...,https://media.licdn.com/dms/image/D4D16AQE0sMu...,PROFILE,I am very excited to be involved and sharing i...,2023-12-23 21:47:58,English,DISTANCE_2,0,0,https://www.linkedin.com/feed/update/urn:li:ac...,2023-12-26 08:30:51


### Cleaning data

In [7]:
def create_interactions_dataset(
    df_gsheet,
    df_reactions,
    df_comments,
):
    # Init
    df1 = pd.DataFrame()
    df2 = pd.DataFrame()
    
    if len(df_reactions) > 0:
        # Df reactions
        data_reaction = {
            "ENTITY": df_reactions["ENTITY"],
            "SCENARIO": df_reactions["SCENARIO"],
            "PLATFORM": "LinkedIn",
            "FIRSTNAME": df_reactions["FIRSTNAME"],
            "LASTNAME": df_reactions["LASTNAME"],
            "FULLNAME": df_reactions["FULLNAME"],
            "OCCUPATION": df_reactions["OCCUPATION"],
            "INTERACTION": "POST_REACTION",
            "INTERACTION_CONTENT": df_reactions["REACTION_TYPE"],
            "INTERACTION_SCORE": 1,
            "PROFILE_URL": df_reactions["PROFILE_URL"],
            "PUBLIC_ID": df_reactions["PUBLIC_ID"],
            "CONTENT_TITLE": df_reactions["TITLE"],
            "CONTENT_URL": df_reactions["POST_URL"],
            "PUBLISHED_DATE": df_reactions['PUBLISHED_DATE'],
            "DATE_INTERACTION": df_reactions["DATE_REACTION"],
            "DATE_EXTRACT": pd.to_datetime(df_reactions['DATE_EXTRACT']).dt.tz_localize(pytz.timezone("Europe/Paris")).dt.strftime("%Y-%m-%d %H:%M:%S%z"),
        }
        df1 = pd.DataFrame(data_reaction)
        
    if len(df_comments) > 0:
        # Df comments
        data_comment = {
            "ENTITY": df_comments["ENTITY"],
            "SCENARIO": df_comments["SCENARIO"],
            "PLATFORM": "LinkedIn",
            "FIRSTNAME": df_comments["FIRSTNAME"],
            "LASTNAME": df_comments["LASTNAME"],
            "FULLNAME": df_comments["FULLNAME"],
            "OCCUPATION": df_comments["OCCUPATION"],
            "INTERACTION": "POST_COMMENT",
            "INTERACTION_CONTENT": df_comments["TEXT"],
            "INTERACTION_SCORE": 3,
            "PROFILE_URL": df_comments["PROFILE_URL"],
            "PUBLIC_ID": df_comments["PUBLIC_ID"],
            "CONTENT_TITLE": df_comments["TITLE"],
            "CONTENT_URL": df_comments["CONTENT_URL"],
            "PUBLISHED_DATE": df_comments['PUBLISHED_DATE'],
            "DATE_INTERACTION": pd.to_datetime(df_comments['CREATED_TIME']).dt.tz_localize(pytz.timezone("Europe/Paris")).dt.tz_convert(TIMEZONE).dt.strftime("%Y-%m-%d %H:%M:%S%z"),
            "DATE_EXTRACT": pd.to_datetime(df_comments['DATE_EXTRACT']).dt.tz_localize(pytz.timezone("Europe/Paris")).dt.strftime("%Y-%m-%d %H:%M:%S%z"),
        }
        df2 = pd.DataFrame(data_comment)
    
    # Concat df
    df = pd.concat([df1, df2]).reset_index(drop=True)
    df.insert(loc=2, column="DATE", value=pd.to_datetime(df['DATE_INTERACTION'], format="%Y-%m-%d %H:%M:%S%z").dt.strftime("%a. %d %b."))
    
    # Exclude Entity from Full name
    if len(df) > 0:
        entity = df.loc[0 , "ENTITY"]
        df = df[df["FULLNAME"] != entity]
    # Histo
    if "DATE_INTERACTION" not in df_gsheet.columns:
        df_gsheet["DATE_INTERACTION"] = df_gsheet["PUBLISHED_DATE"]
        df_gsheet.insert(loc=2, column="DATE", value=pd.to_datetime(df_gsheet['DATE_INTERACTION'], format="%Y-%m-%d %H:%M:%S%z").dt.strftime("%a. %d %b."))
        
    # Drop duplicates
    drop_duplicates = [
        "ENTITY",
        "SCENARIO",
        "PROFILE_URL",
        "INTERACTION_CONTENT",
        "CONTENT_URL"
    ]
    df = pd.concat([df, df_gsheet]).drop_duplicates(drop_duplicates).reset_index(drop=True)
    df["DATE_ORDER"] = pd.to_datetime(df['DATE_INTERACTION'], format="%Y-%m-%d %H:%M:%S%z").dt.strftime("%Y%m%d")
    df["DATE_EXTRACT"] = pd.to_datetime(df['DATE_EXTRACT']).dt.tz_convert(TIMEZONE).dt.strftime("%Y-%m-%d %H:%M:%S%z")
    
    # Sort values
    df = df.sort_values(by=["DATE_ORDER", "FULLNAME"], ascending=[False, True])
    return df.reset_index(drop=True)

df_interactions = create_interactions_dataset(
    df_gsheet,
    df_reactions,
    df_comments,
)
print('🗂️ Interactions:', len(df_interactions))
df_interactions.head(5)

🗂️ Interactions: 2299


Unnamed: 0,ENTITY,SCENARIO,DATE,PLATFORM,FIRSTNAME,LASTNAME,FULLNAME,OCCUPATION,INTERACTION,INTERACTION_CONTENT,INTERACTION_SCORE,PROFILE_URL,PUBLIC_ID,CONTENT_TITLE,CONTENT_URL,PUBLISHED_DATE,DATE_INTERACTION,DATE_EXTRACT,DATE_ORDER
0,Jérémy Ravenel,W51-2023,Tue. 26 Dec.,LinkedIn,Anshu,Jain,Anshu Jain,Deeptech Product Management | Transforming Ent...,POST_REACTION,LIKE,1,https://www.linkedin.com/in/ACoAAAEUKLEBMQzYyQ...,anshukjain,Modern Businesses should start visualizing the...,https://www.linkedin.com/feed/update/urn:li:ac...,2023-12-22 22:07:42+0100,2023-12-26 08:30:47+0100,2023-12-26 08:30:47+0100,20231226
1,Jérémy Ravenel,W51-2023,Tue. 26 Dec.,LinkedIn,Camilla,Hemström,Camilla Hemström,Growth catalyst | Change enabler | Strategic a...,POST_REACTION,LIKE,1,https://www.linkedin.com/in/ACoAAANf1lsB-UVS3A...,camilla-hemstrom,Modern Businesses should start visualizing the...,https://www.linkedin.com/feed/update/urn:li:ac...,2023-12-22 22:07:42+0100,2023-12-26 08:30:47+0100,2023-12-26 08:30:47+0100,20231226
2,Jérémy Ravenel,W51-2023,Tue. 26 Dec.,LinkedIn,Holke,Visser,Holke Visser,"Data Architect, owner Visser Data B.V.",POST_REACTION,LIKE,1,https://www.linkedin.com/in/ACoAAAGLiDEBiLOJXD...,holkevisser,Modern Businesses should start visualizing the...,https://www.linkedin.com/feed/update/urn:li:ac...,2023-12-22 22:07:42+0100,2023-12-26 08:30:47+0100,2023-12-26 08:30:47+0100,20231226
3,Jérémy Ravenel,W51-2023,Tue. 26 Dec.,LinkedIn,Mayank,Rajesh Manjhi,Mayank Rajesh Manjhi,Data Scientist @ CodSoft | Expertise in Data S...,POST_REACTION,LIKE,1,https://www.linkedin.com/in/ACoAADdAHiIBQay1Oj...,mayank-rajesh-manjhi,Imagine if your business could learn and adapt...,https://www.linkedin.com/feed/update/urn:li:ac...,2023-12-19 22:26:49+0100,2023-12-26 08:30:57+0100,2023-12-26 08:30:57+0100,20231226
4,Jérémy Ravenel,W51-2023,Tue. 26 Dec.,LinkedIn,WEI-YI,CHO,WEI-YI CHO,Stat & Psyc and CS @ UIUC,POST_REACTION,LIKE,1,https://www.linkedin.com/in/ACoAADDQyBUBqzIOOW...,weiyi-cho,Modern Businesses should start visualizing the...,https://www.linkedin.com/feed/update/urn:li:ac...,2023-12-22 22:07:42+0100,2023-12-26 08:30:47+0100,2023-12-26 08:30:47+0100,20231226


## Output

### Save data

In [8]:
pdump(output_dir, df_interactions, output_file)

### Send data to Google Sheets spreadsheet

In [9]:
df_check = pd.concat([df_gsheet.astype(str), df_interactions.astype(str)]).drop_duplicates(keep=False)
if len(df_check) > 0:
    gsheet.connect(spreadsheet_url).send(sheet_name=sheet_name, data=df_interactions, append=False)
else:
    print("Noting to update in Google Sheets!")

Noting to update in Google Sheets!
